138 9 52MB
English Pages 603 [602] Year 2023
Lecture Notes in Networks and Systems 716
Ajith Abraham · Sabri Pllana · Gabriella Casalino · Kun Ma · Anu Bajaj Editors
Intelligent Systems Design and Applications 22nd International Conference on Intelligent Systems Design and Applications (ISDA 2022) Held December 12–14, 2022 - Volume 3
Lecture Notes in Networks and Systems
716
Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland
Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas—UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Türkiye Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science. For proposals from Asia please contact Aninda Bose ([email protected]).
Ajith Abraham · Sabri Pllana · Gabriella Casalino · Kun Ma · Anu Bajaj Editors
Intelligent Systems Design and Applications 22nd International Conference on Intelligent Systems Design and Applications (ISDA 2022) Held December 12–14, 2022 - Volume 3
Editors Ajith Abraham Faculty of Computing and Data Science FLAME University Pune, Maharashtra, India Machine Intelligence Research Labs Scientific Network for Innovation and Research Excellence Auburn, WA, USA
Sabri Pllana Center for Smart Computing Continuum Burgenland, Austria Kun Ma University of Jinan Jinan, Shandong, China
Gabriella Casalino University of Bari Bari, Italy Anu Bajaj Department of Computer Science and Engineering Thapar Institute of Engineering and Technology Patiala, Punjab, India
ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-3-031-35500-4 ISBN 978-3-031-35501-1 (eBook) https://doi.org/10.1007/978-3-031-35501-1 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
Welcome to the 22nd International Conference on Intelligent Systems Design and Applications (ISDA’22) held in the World Wide Web. ISDA’22 is hosted and sponsored by the Machine Intelligence Research Labs (MIR Labs), USA. ISDA’22 brings together researchers, engineers, developers and practitioners from academia and industry working in all interdisciplinary areas of computational intelligence and system engineering to share their experience and to exchange and cross-fertilize their ideas. The aim of ISDA’22 is to serve as a forum for the dissemination of state-of-the-art research, development and implementations of intelligent systems, intelligent technologies and useful applications in these two fields. ISDA’22 received submissions from 65 countries, each paper was reviewed by at least five or more reviewers, and based on the outcome of the review process, 223 papers were accepted for inclusion in the conference proceedings (38% acceptance rate). First, we would like to thank all the authors for submitting their papers to the conference, for their presentations and discussions during the conference. Our thanks go to the program committee members and reviewers, who carried out the most difficult work by carefully evaluating the submitted papers. Our special thanks to the following plenary speakers, for their exciting talks: • • • • • • • • • •
Kaisa Miettinen, University of Jyvaskyla, Finland Joanna Kolodziej, NASK- National Research Institute, Poland Katherine Malan, University of South Africa, South Africa Maki Sakamoto, The University of Electro-Communications, Japan Catarina Silva, University of Coimbra, Portugal Kaspar Riesen, University of Bern, Switzerland Mário Antunes, Polytechnic Institute of Leiria, Portugal Yifei Pu, College of Computer Science, Sichuan University, China Patrik Christen, FHNW, Institute for Information Systems, Olten, Switzerland Patricia Melin, Tijuana Institute of Technology, Mexico
We express our sincere thanks to the organizing committee chairs for helping us to formulate a rich technical program. Enjoy reading the articles!
ISDA 2022—Organization
General Chairs Ajith Abraham Andries Engelbrecht
Machine Intelligence Research Labs, USA Stellenbosch University, South Africa
Program Chairs Yukio Ohsawa Sabri Pllana Antonio J. Tallón-Ballesteros
The University of Tokyo, Japan Center for Smart Computing Continuum, Forschung Burgenland, Austria University of Huelva, Spain
Publication Chairs Niketa Gandhi Kun Ma
Machine Intelligence Research Labs, USA University of Jinan, China
Special Session Chair Gabriella Casalino
University of Bari, Italy
Publicity Chairs Pooja Manghirmalani Mishra Anu Bajaj
University of Mumbai, India Machine Intelligence Research Labs, USA
Publicity Team Members Peeyush Singhal Aswathy SU Shreya Biswas
SIT Pune, India Jyothi Engineering College, India Jadavpur University, India
viii
ISDA 2022—Organization
International Program Committee Abdelkrim Haqiq Alexey Kornaev Alfonso Guarino Alpana Srk Alzira Mota Amit Kumar Mishra Andre Santos Andrei Novikov Anitha N. Anu Bajaj Arjun R. Arun B Mathews Aswathy S U Ayalew Habtie Celia Khelfa Christian Veenhuis Devi Priya Rangasamy Dhakshayani J. Dipanwita Thakur Domenico Santoro Elena Kornaeva Elif Cesur Elizabeth Goldbarg Emiliano del Gobbo Fabio Scotti Fariba Goodarzian Gabriella Casalino Geno Peter Gianluca Zaza Giuseppe Coviello Habib Dhahri Habiba Drias Hiteshwar Kumar Azad Horst Treiblmaier Houcemeddine Turki Hudson Geovane de Medeiros
FST, Hassan 1st University, Settat, Morocco Innopolis University, Russia University of Foggia, Italy Jawaharlal Nehru University, India Polytechnic of Porto, School of Engineering, Portugal DIT University, India Institute of Engineering, Polytechnic Institute of Porto, Portugal Sobolev Institute of Mathematics, Russia Kongu Engineering College, India Thapar Institute of Engineering and Technology, India Vellore Institute of Technology, India MTHSS Pathanamthitta, India Marian Engineering College, India Addis Ababa University, Ethiopia USTHB, Algeria Technische Universität Berlin, Germany Kongu Engineering College, Tamil Nadu, India National Institute of Technology Puducherry, India Banasthali University, Rajasthan, India University of Bari, Italy Orel State University, Russia Istanbul Medeniyet University, Turkey Federal University of Rio Grande do Norte, Brazil University of Foggia, Italy Universita’ degli Studi di Milano, Italy University of Seville, Spain University of Bari, Italy University of Technology Sarawak, Malaysia University of Bari, Italy Polytechnic of Bari, Italy King Saud University, Saudi Arabia USTHB, Algeria Vellore Institute of Technology, India Modul University, Austria University of Sfax, Tunisia Federal University of Rio Grande do Norte, Brazil
ISDA 2022—Organization
Isabel S. Jesus Islame Felipe da Costa Fernandes Ivo Pereira Joêmia Leilane Gomes de Medeiros José Everardo Bessa Maia Justin Gopinath A. Kavita Gautam Kingsley Okoye Lijo V. P. Mahendra Kanojia Maheswar R. Marìa Loranca Maria Nicoletti Mariella Farella Matheus Menezes Meera Ramadas Mohan Kumar Mrutyunjaya Panda Muhammet Ra¸sit Cesur Naila Aziza Houacine Niha Kamal Basha Oscar Castillo Paulo Henrique Asconavieta da Silva Pooja Manghirmalani Mishra Pradeep Das Ramesh K. Rasi D. Reeta Devi Riya Sil Rohit Anand Rutuparna Panda S. Amutha Sabri Pllana Sachin Bhosale
ix
Institute of Engineering of Porto, Portugal Federal University of Bahia (UFBA), Brazil University Fernando Pessoa, Portugal Universidade Federal e Rural do Semi-Árido, Brazil State University of Ceará, Brazil Vellore Institute of Technology, India University of Mumbai, India Tecnologico de Monterrey, Mexico Vellore Institute of Technology, India Sheth L.U.J. and Sir M.V. College, India KPR Institute of Engineering and Technology, India UNAM, BUAP, Mexico UNAM, BUAP, Mexico University of Palermo, Italy Universidade Federal e Rural do Semi-Árido, Brazil University College of Bahrain, Bahrain Sri Krishna College of Engineering and Technology, India Utkal University, India Istanbul Medeniyet University, Turkey USTHB-LRIA, Algeria Vellore Institute of Technology, India Tijuana Institute of Technology, México Instituto Federal de Educação, Ciência e Tecnologia Sul-rio-grandense, Brazil Machine Intelligence Research Labs, India National Institute of Technology Rourkela, India Hindustan Institute of Technology and Science, India Sri Krishna College of Engineering and Technology, India Kurukshetra University, India Adamas University, India DSEU, G.B. Pant Okhla-1 Campus, New Delhi, India VSS University of Technology, India Vellore Institute of Technology, India Center for Smart Computing Continuum, Forschung Burgenland, Austria University of Mumbai, India
x
ISDA 2022—Organization
Saira Varghese Sam Goundar Sasikala R Sebastian Basterrech Senthilkumar Mohan Shweta Paliwal Sidemar Fideles Cezario Sílvia M. D. M. Maia Sindhu P. M. Sreeja M U Sreela Sreedhar Surendiran B. Suresh S. Sweeti Sah Thatiana C. N. Souza Thiago Soares Marques Thomas Hanne Thurai Pandian M. Tzung-Pei Hong Vigneshkumar Chellappa Vijaya G. Wen-Yang Lin Widad Belkadi Yilun Shang Zuzana Strukova
Toc H Institute of Science & Technology, India RMIT University, Vietnam Vinayaka Mission’s Kirupananda Variyar Engineering College, India VSB-Technical University of Ostrava, Czech Republic Vellore Institute of Technology, India DIT University, India Federal University of Rio Grande do Norte, Brazil Federal University of Rio Grande do Norte, Brazil Nagindas Khandwala College, India Cochin University of Science and Technology, India APJ Abdul Kalam Technological University, India NIT Puducherry, India KPR Institute of Engineering and Technology, India National Institute of Technology Puducherry, India Federal Rural University of the Semi-Arid, Brazil Federal University of Rio Grande do Norte, Brazil University of Applied Sciences and Arts Northwestern Switzerland, Switzerland Vellore Institute of Technology, India National University of Kaohsiung, Taiwan Indian Institute of Technology Guwahati, India Sri Krishna College of Engineering and Technology, India National University of Kaohsiung, Taiwan Laboratory of Research in Artificial Intelligence, Algeria Northumbria University, UK Technical University of Košice, Slovakia
Contents
Simulation, Perception, and Prediction of the Spread of COVID - 19 on Cellular Automata Models: A Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. S. Rakshana, R. Anahitaa, Ummity Srinivasa Rao, and Ramesh Ragala Value Iteration Residual Network with Self-attention . . . . . . . . . . . . . . . . . . . . . . . Jinyu Cai, Jialong Li, Zhenyu Mao, and Kenji Tei Dental Treatment Type Detection in Panoramic X-Rays Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nripendra Kumar Singh, Mohammad Faisal, Shamimul Hasan, Gaurav Goshwami, and Khalid Raza
1
16
25
From a Monolith to a Microservices Architecture Based Dependencies . . . . . . . . Malak Saidi, Anis Tissaoui, and Sami Faiz
34
Face Padding as a Domain Generalization for Face Anti-spoofing . . . . . . . . . . . . Ramil Zainulin, Daniil Solovyev, Aleksandr Shnyrev, Maksim Isaev, and Timur Shipunov
45
Age-Related Macular Degeneration Using Deep Neural Network Technique and PSO: A Methodology Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . F. Ajesh and Ajith Abraham
55
Inclusive Review on Extractive and Abstractive Text Summarization: Taxonomy, Datasets, Techniques and Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . Gitanjali Mishra, Nilambar Sethi, and L. Agilandeeswari
65
COVID-ViT: COVID-19 Detection Method Based on Vision Transformers . . . . Luis Balderas, Miguel Lastra, Antonio J. Láinez-Ramos-Bossini, and José M. Benítez
81
Assessment of Epileptic Gamma Oscillations’ Networks Connectivity . . . . . . . . Amal Necibi, Abir Hadriche, and Nawel Jmail
91
Clustering of High Frequency Oscillations HFO in Epilepsy Using Pretrained Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Zayneb Sadek, Abir Hadriche, and Nawel Jmail
xii
Contents
Real Time Detection and Tracking in Multi Speakers Video Conferencing . . . . . 108 Nesrine Affes, Jalel Ktari, Nader Ben Amor, Tarek Frikha, and Habib Hamam Towards Business Process Model Extension with Quality Perspective . . . . . . . . . 119 Dhafer Thabet, Sonia Ayachi Ghannouchi, and Henda Hajjami Ben Ghézala Image Compression-Encryption Scheme Based on SPIHT Coding and 2D Beta Chaotic Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Najet Elkhalil, Youssouf Cheikh Weddy, and Ridha Ejbali A Meta-analytical Comparison of Naive Bayes and Random Forest for Software Defect Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Ch Muhammad Awais, Wei Gu, Gcinizwe Dlamini, Zamira Kholmatova, and Giancarlo Succi Skeleton-Based Human Activity Recognition Using Bidirectional LSTM . . . . . . 150 Monika, Pardeep Singh, and Satish Chand A Data Warehouse for Spatial Soil Data Analysis and Mining: Application to the Maghreb Region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 Widad Hassina Belkadi, Yassine Drias, and Habiba Drias A New Approach for the Design of Medical Image ETL Using CNN . . . . . . . . . . 171 Mohamed Hedi Elhajjej, Nouha Arfaoui, Salwa Said, and Ridha Ejbali An Improved Model for Semantic Segmentation of Brain Lesions Using CNN 3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 Ala Guennich, Mohamed Othmani, and Hela Ltifi Experimental Analysis on Dissimilarity Metrics and Sudden Concept Drift Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 Sebastián Basterrech, Jan Platoš, Gerardo Rubino, and Michał Wo´zniak Can Post-vaccination Sentiment Affect the Acceptance of Booster Jab? . . . . . . . 200 Blessing Ogbuokiri, Ali Ahmadi, Bruce Mellado, Jiahong Wu, James Orbinski, Ali Asgary, and Jude Kong Emotion Detection Based on Facial Expression Using YOLOv5 . . . . . . . . . . . . . . 212 Awais Shaikh, Mahendra Kanojia, and Keshav Mishra LSTM-Based Model for Sanskrit to English Translation . . . . . . . . . . . . . . . . . . . . . 219 Keshav Mishra, Mahendra Kanojia, and Awais Shaikh
Contents
xiii
Alzheimer Disease Investigation in Resting-State fMRI Images Using Local Coherence Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 Sali Issa, Qinmu Peng, and Haiham Issa Enhanced Network Anomaly Detection Using Deep Learning Based on U-Net Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 P. Ramya, S. G. Balakrishnan, and A. Vidhiyapriya An Improved Multi-image Steganography Model Based on Deep Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 Mounir Telli, Mohamed Othmani, and Hela Ltifi A Voting Classifier for Mortality Prediction Post-Thoracic Surgery . . . . . . . . . . . 263 George Obaido, Blessing Ogbuokiri, Ibomoiye Domor Mienye, and Sydney Mambwe Kasongo Hybrid Adaptive Method for Intrusion Detection with Enhanced Feature Elimination in Ensemble Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 S. G. Balakrishnan, P. Ramya, and P. Divyapriya Malware Analysis Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 Bahirithi Karampudi, D. Meher Phanideep, V. Mani Kumar Reddy, N. Subhashini, and S. Muthulakshmi SiameseHAR: Siamese-Based Model for Human Activity Classification with FMCW Radars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 Mert Ege and Ömer Morgül Automatic Bidirectional Conversion of Audio and Text: A Review from Past Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 Pooja Panapana, Eswara Rao Pothala, Sai Sri Lakshman Nagireddy, Hemendra Praneeth Mattaparthi, and Niranjani Meesala Content-Based Long Text Documents Classification Using Bayesian Approach for a Resource-Poor Language Urdu . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 Muhammad Pervez Akhter, Muhammad Atif Bilal, and Saleem Riaz Data-driven Real-time Short-term Prediction of Air Quality: Comparison of ES, ARIMA, and LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322 Iryna Talamanova and Sabri Pllana A Flexible Implementation Model for Neural Networks on FPGAs . . . . . . . . . . . 332 Jesper Jakobsen, Mikkel Jensen, Iman Sharifirad, and Jalil Boudjadar
xiv
Contents
SearchOL: An Information Gathering Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 Farhan Ahmed, Pallavi Khatri, Geetanjali Surange, and Animesh Agrawal Blockchain for Smart Healthcare: A SWOT Analysis from the Patient Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350 Kamal Bouhassoune, Sam Goundar, and Abdelkrim Haqiq Enhancing the Credit Card Fraud Detection Using Decision Tree and Adaptive Boosting Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358 K. R. Prasanna Kumar, S. Aravind, K. Gopinath, P. Navienkumar, K. Logeswaran, and M. Gunasekar A Manual Approach for Multimedia File Carving . . . . . . . . . . . . . . . . . . . . . . . . . . 366 Pallavi Khatri, Animesh Agrawal, Sumit Sah, and Aishwarya Sahai NadERA: A Novel Framework Achieving Reduced Distress Response Time by Leveraging Emotion Recognition from Audio . . . . . . . . . . . . . . . . . . . . . . 375 Harshil Sanghvi, Sachi Chaudhary, and Sapan H. Mankad AI-Based Extraction of Radiologists Gaze Patterns Corresponding to Lung Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386 Ilya Pershin, Bulat Maksudov, Tamerlan Mustafaev, and Bulat Ibragimov Explainable Fuzzy Models for Learning Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . 394 Gabriella Casalino, Giovanna Castellano, and Gianluca Zaza DL vs. Traditional ML Algorithms to Recognize Arabic Handwriting Script: A Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404 Anis Mezghani, Mohamed Elleuch, and Monji Kherallah Data Virtualization Layer Key Role in Recent Analytical Data Architectures . . . 415 Montasser Akermi, Mohamed Ali Hadj Taieb, and Mohamed Ben Aouicha Extracting Knowledge from Pharmaceutical Package Inserts . . . . . . . . . . . . . . . . . 427 Cristiano da Silveira Colombo, Claudine Badue, and Elias Oliveira Assessing the Importance of Global Relationships for Source Code Analysis Using Graph Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437 Vitaly Romanov and Vladimir Ivanov A Multi-objective Evolution Strategy for Real-Time Task Placement on Heterogeneous Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448 Rahma Lassoued and Rania Mzid
Contents
xv
Comprehensive Analysis of Rice Leaf Disease Detection and Classification Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458 L. Agilandeeswari and M. Kiruthik Suriyah Multimodal Analysis of Parkinson’s Disease Using Machine Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470 C. Saravanan, Anish Samantaray, and John Sahaya Rani Alex Stock Market Price Trend Prediction – A Comprehensive Review . . . . . . . . . . . . 479 L. Agilandeeswari, R. Srikanth, R. Elamaran, and K. Muralibabu Visceral Leishmaniasis Detection Using Deep Learning Techniques and Multiple Color Space Bands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492 Armando Luz Borges, Clésio de Araújo Gonçalves, Viviane Barbosa Leal Dias, Emille Andrade Sousa, Carlos Henrique Nery Costa, and Romuere Rodrigues Veloso e Silva On the Use of Reinforcement Learning for Real-Time System Design and Refactoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503 Bakhta Haouari, Rania Mzid, and Olfa Mosbahi Fully Automatic LPR Method Using Haar Cascade for Real Mercosur License Plates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513 Cyro M. G. Sabóia, Adriell G. Marques, Luís Fabrício de Freitas Souza, Solon Alves Peixoto, Matheus A. dos Santos, Antônio Carlos da Silva Barros, Paulo A. L. Rego, and Pedro Pedrosa Rebouças Filho Dynamic Job Shop Scheduling in an Industrial Assembly Environment Using Various Reinforcement Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . 523 David Heik, Fouad Bahrpeyma, and Dirk Reichelt Gated Recurrent Unit and Long Short-Term Memory Based Hybrid Intrusion Detection System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534 M. OmaMageswari, Vijayakumar Peroumal, Ritama Ghosh, and Diyali Goswami Trace Clustering Based on Activity Profile for Process Discovery in Education . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545 Wiem Hachicha, Leila Ghorbel, Ronan Champagnat, and Corinne Amel Zayani IoT Based Early Flood Detection and Avoidance System . . . . . . . . . . . . . . . . . . . . 555 Banu Priya Prathaban, Suresh Kumar R, and Jenath M
xvi
Contents
Hybrid Model to Detect Pneumothorax Using Double U-Net with Segmental Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564 P. Akshaya and Sangeetha Jamal An Approach to Identify DeepFakes Using Deep Learning . . . . . . . . . . . . . . . . . . 574 Sai Siddhu Gedela, Nagamani Yanda, Hymavathi Kusumanchi, Suvarna Daki, Keerthika Challa, and Pavan Gurrala Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585
Simulation, Perception, and Prediction of the Spread of COVID - 19 on Cellular Automata Models: A Survey B. S. Rakshana, R. Anahitaa, Ummity Srinivasa Rao, and Ramesh Ragala(B) School of Computer Science and Engineering, Vellore Institute of Technology, Vandalur – Kelambakkam Road, Chennai, Tamil Nadu, India {bs.rakshana2018,anahitaa.radhakrishnan2018}@vitstudent.ac.in, {umitty.srinivasarao,ramesh.ragala}@vit.ac.in
Abstract. Some of the socio-economic issues encountered today are boosted by the prevalence of a gruesome pandemic. The spread of a rather complex disease—COVID-19—has resulted in a collapse of social life, health, economy and general well-being of man. The adverse effects of the pandemic have devastating consequences on the world and the only hope apart from a hypothetical cure for the disease would be measures to understand its propagation and bring in effective measures to control it. This paper surveys the role of Cellular Automata in modeling the spread of COVID-19. Possible solutions and perceptions regarding dynamics, trends, dependent factors, immunity, etc. have been addressed and elucidated for better understanding. Keywords: COVID-19 · Cellular automaton · Simulation · Perception · Prediction
1 Introduction The spread of SARS-CoV-2 [1] virus, which is believed to have originated from Wuhan in China, has wreaked havoc in our world. This is because of its communicability and the consequential deterioration in overall health of infected and recovered individuals, with no actual cure available yet to eradicate the disease. This virus causes a disease called ‘COVID-19’in human beings and affects certain animals too [2]. COVID-19 causes a plethora of symptoms very similar to that of a common cold accompanied with a fever, fatigue, reduced senses of smell and taste, nausea, vomiting, etc. [3], but much more severe. Another worrisome fact to consider is the incubation period of the virus [4], where the infected individual exhibits symptoms of the disease long after the invasion and multiplication of the virus within the host (approximately a fortnight). This increases the probability of unintentional transmission of the virus to other healthy individuals as the host carries on with their mundane activities. This means that the transmission of this disease cannot be tracked efficiently and the situation can become grave very quickly. Figure 1 given below depicts the number of infected individuals in each country on a world map. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 1–15, 2023. https://doi.org/10.1007/978-3-031-35501-1_1
2
B. S. Rakshana et al.
Exhaustion and saturation of human and medical resources respectively due to a surfeit of patients as a consequence of exponential increase in infected individuals is the main concern of healthcare authorities of all the nations affected by the pandemic. Due to this, untreated patients may face severe health-related repercussions which cause unnecessary and avoidable deaths in the worst-case scenario. Several nations impose restrictions such as quarantine and lockdowns across different regions to curb the rate of transmission in hopes of reducing the number of COVID-19 cases per day. Transmittable diseases [5] are a major cause of human deaths today. The study of trends pertaining to development of a disease and spread provides a great basis in determining ways for prevention and regulation of the spread of infectious diseases. Presently vital COVID-19 pandemic has influenced researchers to study the spread of epidemics using computational techniques. Existing approaches to study trends include SVM, K-means, Decision tree and other such relevant Machine Learning techniques [6, 7].
Fig. 1. Global COVID-19 pandemic at a crossroads [8].
A Cellular Automaton (CA) is an abstract and discrete model for a machine that computes parallely. It can be defined as an assembly of adjoining objects called ‘cells’ arranged as grids over ‘n’ dimensions, where each cell is in one of the finite number of states. The state of a cell on the grid is determined by a mathematical function that takes into consideration the previous states of its respective adjacent cells. The mathematical functions may be different and altered according to needs to simulate and analyze patterns. Each cell interacts locally and operates parallely on the grid to execute computational operations. Out of the different types of CAs, several researchers have worked explicitly with 1D and 2D CAs in recent years. With the incorporation of complex mathematics, 3D cellular automata can also be designed and worked with. This paper aims to provide a survey of existing cellular automata models and their applications relevant to COVID-19. The degree of usefulness of these automata in studying, simulating and predicting COVID-19 wave patterns gives us a better perspective regarding the major factors which influence its spread. This helps gain sufficient insight
Simulation, Perception, and Prediction of the Spread of COVID - 19
3
to answer questions regarding how the ongoing pandemic can be controlled, how improving immunity can be a curbing factor and how else cellular automata can be used in the fields of epidemiology to come up with a plausible solution. Cellular Automaton enables the simulation of both the spatial and temporal growth of infectious diseases. This provides policy makers useful insights to identify critical aspects of the disease spread. Presently vital COVID-19 pandemic has influenced researchers to study the spread of epidemics using computational techniques. Understanding the different human interactions in a community can help simulate models for the spread of COVID-19. This will be useful for governmental organizations to anticipate transmission patterns of an epidemic. Simulations of various control measures in different social settings built on a probabilistic network can help identify the most influential factor in the pandemic prevention and control. Dynamic compartmental models like SIR (susceptible-infected-recovered), SEIR (susceptible-exposed-infected-recovered), SEIS (Same as SEIR but without the attainment of immunity), etc. generally used to mathematically trace propagation of infectious diseases can be configured to better consider randomness in the transmission of coronavirus, the COVID - 19 outbreaks. The SIR model takes into account individuals who are susceptible to an infection (S), infected by the virus while potentially transmitting the disease too (I), and recovered/removed (R). The SEIR model on the other hand considers the same variables as SIR, along with the additional inclusion of those who were exposed but yet to be infected (E). These numbers are considered as functions of time for computational purposes. The transition rates from one state to another (S to E, E to I, I to R) is determined and the dynamics of the epidemiology model chosen are determined with the help of ordinary differential equations. All these factors considered are used to mathematically determine propagation and analyze spread of diseases. The SIR model is used generally to analyze infections that result in obvious symptoms with lower incubation period. One drawback of the SIR model is that it cannot be used to effectively simulate transmission of asymptomatic diseases, as they are difficult to detect without the exhibition of symptoms. In such a case, the SEIR model can be used. Two strategies can be employed to battle COVID-19. The fight against COVID-19 has been dealt with using two strategies. First strategy includes control measures like enforcing lockdown and social distancing to contain its spread. Second strategy is developing Herd Immunity which is the unconstrained transmission of the virus so that a large section of the population can develop immunity [9]. Herd Immunity can be achieved by vaccination or developing immunity by getting cured after infection of the virus [9]. Hence, CA simulations can include both disease specific and region specific parameters which can be effectively used to study the dynamics of the disease spread. The survey follows the following layout: The second section provides information about cellular automata used for modeling biological and epidemiological systems. The third section contains information about the different types of cellular automata structures proposed for disease simulation over the years such as one dimensional hybrid CA, two dimensional CA, Kinetic Monte Carlo CA, etc. A historical perspective regarding the efforts undertaken to understand the global dynamics from local rules are also provided along with the results, conclusions drawn and future works emphasized in each of the papers.
4
B. S. Rakshana et al.
2 CA for Modeling Biological and Epidemiological Systems Spread of epidemics is a complex subject to understand and quite difficult to simulate accurately. Therefore, several CA models which may differ mathematically can be compared to study regular and patch-movement based spread of diseases [10]. Various epidemiology models such as SEIS with various influential parameters such as contact rate, activation rate and curing rate can be analyzed with the help of suitable automata [11] with the von Neumann neighborhood. Multilevel simulators [12] built upon hierarchical CAs serve to be of notable use in epidemiology as well. High speed low cost pattern classifiers built around CAs have gained tremendous amounts of attention, as these classifiers can be incorporated in any field of research such as medicine, economics, etc. A particular category of CA, termed as Multiple Attractor Cellular Automata (MACA) [13], has been developed by Genetic Algorithm (GA) formulation to carry out the task of pattern classification. Competitive growths [14] can be modeled and explained using CA’s too. For instance, the growth and succession process of underwater species such as Chara aspera and Potamogeton pectinatus may be studied in a water body that has been completely or partially eutrophicated. Biofilm growth [15] and bioreactors with biofilm—core research subjects in microbiology—could be efficiently modeled with the help of a 2D CA. The mathematical model takes into account substrate diffusion, its utilization, growth and displacement of microorganisms and death of microorganisms [15]. The resulting 1D density and porosity distributions were used for quantitative comparisons of the biofilm structure [15]. Transmittable diseases [5] are a major cause of human deaths today. The study of trends pertaining to development of a disease and spread provides a great basis in determining ways for prevention and regulation of the spread of infectious diseases. Bin S. et al. examine an epidemic spread model Susceptible-Latent-Infected-RecoveredDead-Susceptible (SLIRDS) based on cellular automata. Various factors like literacy rates, economic development and characteristics of a population in an area are also taken into account to study the transmission of infectious disease. The aspect of noise or randomness [16] in the study of complex population patterns is often overlooked. Sun et al. [16], present an epidemic model utilizing cellular automation with noise and explored the dual role of noise on disease spread i.e., to cause the emergence and disappearance of the disease. The effect of significant levels of noise can cause the extinction of disease in a resonant manner. The outcome acquired indicates that noise plays immensely in the spread of the disease state, which can suggest preventing, and in due course eliminating, disease. Mosquito-transmitted diseases [17] like chikungunya, yellow fever, dengue and malaria are a problem in tropical countries. Pereira et al., study the dengue using population modeled by Cellular Automata. This paper consists of a flexible model for dengue spread with one, two or three different serotypes existing together in the same population. The problem of precise grouping of medical images from various modalities for effective content-based retrieval [18] is studied. Klyeko et al., propose the solution positioned on the principles of hyper-dimensional computing jointly with the usage of cellular automata (CA).
Simulation, Perception, and Prediction of the Spread of COVID - 19
5
3 CA for Modeling COVID-19 3.1 One-Dimensional Cellular Automata Elementary cellular automata [19] are also known as one dimensional cellular automata. They consist of cells arranged linearly (one next to another in one direction) which typically possess one of the two states (on or off). The next state of a cell is determined by local rules which consider the current state of the cell and those of its neighboring cells towards the left and right/up and down (if any). Hybrid Cellular Automata (HYCA) on the other hand combine the local rules that govern varying cellular automaton paradigms to obtain a non-conventional computational model. In a hybrid CA, the rules are not the same for all the cells. These rules vary based on certain boundary conditions defined along with the combination of rules. Some of the primary objectives of Pokkuluri Kiran Sree et al. [20] include: developing a new 1D hybrid cellular automata classifier to predict trends of several COVID hotspots in India, handling various parameters (such as infection control, virus reproduction and spread rate), etc. The accuracy reported in this paper is approximately 91.5%, which is higher compared to other ML-based approaches like SVM, K-means, Decision-tree, etc. It proved to be unique, robust, flexible and efficient. This approach can be utilized by administrative authorities in order to follow and monitor the rising hotspots in a nation. The predictions seemed to be congruent with real data observed. 3.2 Two-Dimensional Cellular Automata A two-dimensional cellular automaton (2D CA) [21] consists of regular grid on twodimensional space and each containing a ‘state’. The 2DCA grid consists of an infinite number of square cells, each cell state is in one state of a finite state set at every discrete time step and each cell interacts with its nearest eight neighbors, which are horizontally, vertically and diagonally adjacent, called Moore neighborhood. The current state of a particular cell and its eight other adjacent cells influence the rules associated to determine the next state of a particular cell in 2D CA. The rules that govern the automaton are usually static, unless the automaton is stochastic. Typically, the cells may possess 2 states (on and off), sometimes more than that but never less. An example of cellular automata implementation is Wolfram’s Rule 30. They may be classified into four different classes: class 1 (homogenous), class 2 (homogenous with some randomness), class 3 (pseudorandom) and class 4 (complex). The next state of any cell in a linear CA can be determined by XOR operation of the states of its adjacent cells affiliated with the rule. If different rules are practiced for all the cells of particular 2D cellular automata configuration, it is called hybrid. Otherwise, the CA is known as uniform. Kermack and McKendrick developed an epidemic model known as the SIR model to simulate transmission of diseases in 1920. This model symbolizes the changes in the number of individuals susceptible (S), infected (I), and recovered (R) within the selected population. This system serves as a great mechanism for Medical Researchers, and Government personnel to gain insights into the transmission of diseases, control measures and interaction of people.
6
B. S. Rakshana et al.
A 2D cellular automaton built based on the SI model used in epidemiology is utilized to select the most desirable testing frequency [22] and diagnose COVID in order to effectively isolate individuals in an attempt to curb the spread of the pandemic. The combination of model parameters seems to represent cost-efficient intervention compared to others and it has proved that small-scale interventions are the best options as local communities can self-organize testing and effectively monitor infected cases. The SIER model is implemented using 2D Cellular Automata in [23] to simulate the spread of a disease after the public health interventions. There were several limitations encountered in this study, but the CA efficiently predicted simulated results despite impedance. Zhou et al., designed a dynamic CA - SEIR model [8] with different isolation rates. It is inferred that greater the ratio of isolation, lower the number of individuals affected. As proof, the city of Wuhan closed down all means of public transport on 23rd January, 2020 which significantly led to the surge in the isolation ratio. The number of infected individuals became stable after a 10 day buffer period and then slowly diminished, which showed that strict implementation of Lockdowns and traffic restrictions yielded a positive effect on curbing the transmission of the virus. Also, the proposed model was used to simulate conditions with different contact numbers. As it is possible for a large number of susceptible people to be infected by the asymptomatic patients when they gather for large festivals. To illustrate, there was a major outbreak among the church worshippers that began in the city of Daegu, South Korea on February 18 after a church worshipper was identified to be infected with COVID-19. This resulted in more than 6,000 infected cases confirmed in the next two weeks. A better perspective for parameter setting of the CA-SEIR model is also discussed with numerical data. In the beginning of the outbreak, South Korea did not regulate domestic mobility and people gathered in large numbers for various events. Effective isolation and quarantine measures will need to be implemented without any further delay in such cases to curb the transmission. The number of infected people before carrying out the preventive measures for the pandemic can also be used to ascertain the initial transmission rate of the pandemic. The effect of medical resources such as isolation beds and the number of medical workforce plays a direct role as to whether the confirmed cases are diagnosed and treated effectively. At the beginning of the COVID-19 outbreaks, the medical resources in Hubei province faced a severe shortage of medical resources, and could not handle the outbreak. However, the Chinese government rapidly built hospitals and restocked the medical resources to accommodate more patients. This succeeded in slowing down the infection rate. The impact of control measures on pandemic regulation has been studied in [24], where anon-linear cellular automaton with hybrid rules is employed to aid this study. This model has proved to be better than deep learning models—predicting infections, recoveries, deaths, etc. – with an accuracy of about 78.8% on its own. Such predictions enable the respective governing bodies of healthcare sectors to plan accordingly. Medrek et al. [25] deal with the usage of 2D cellular automata by simulating the spread of COVID-19 disease. CA-based numerical framework helps create a comprehensive data. The SEIR (Susceptible, Exposed, Infectious, and Recovered) model has been utilized to judge the epidemic trends in countries such as Spain, France and Poland with the help of new parameter. The observations recorded the impact of structure and
Simulation, Perception, and Prediction of the Spread of COVID - 19
7
actions of the population on vital parameters such as mortality and infection rates. Empirical data revealed the correlation between age of infected people and the level of mortality, so it was taken into consideration for the system probability of death which is dependent on age demographics of the populations under analysis. One per two days contact of infectious people apparently leads to infection of over three different individuals. Ideal response time during the initial tenure of the pandemic was also analyzed and taken as basis for introducing appropriate actions. In order to simulate and analyze epidemic evolution along with the propagation of COVID-19, [26] uses a 2D cellular automaton model. A simple community with parameters such as sex ratio, population movement, age demographics, immunity, treatment period, etc. has been considered to validate the model. Dai et al., proposed a model based on the SIR model. The individual states include susceptible (S), self-isolated (Si), infected (I), recovered (R), confirmed (C), hospitalized (H), and dead (D). Asymptomatic patient plays a crucial role in SARS-CoV2 infections so a parameter is established to include the possibility of becoming an asymptomatic patient. COVID-19 data from New York City (NYC) and Iowa were used so as to achieve validation for the proposed model. The outcome of comparisons of New York City’s real time data and simulated numbers showed that the curves of hospitalized and dead individuals were similar to the trend of the daily confirmed. However, the peak value of the hospitalized and dead groups was way less than that of the confirmed, which reports that most infected individuals can recover without any assistance. Downward trend was recorded with the simulated number of daily confirmed in NYC. The daily confirmed cases are reduced constantly. The second simulation case about the epidemic outbreak in the city of Iowa recorded an increase with the daily confirmed cases unlike NYC. Since, there are different in COVID-19 restrictions in different states which results in different outcomes. NYC adopted prevention measures more promptly than Iowa. The yielded results supported the claim that synergistic action of self-isolation and diagnostic testing can be effective in preventing transmission of diseases with pandemic potential. The tests performed under longer intervals with larger scale prove to yield better results than when carried out at shorter intervals and small scale. Yaroslav et al., recommend ways [27] to enhance the interaction rules of agents in the SIR model of epidemiology with 2D CA. The modeling of spatial distribution of COVID19 is done in a single location with individual regions that interact with one another by transport, considering components such as shops, educational institutions, gyms, places of worship, etc. Impact of restrictions, quarantine, mask regime, incubation period, etc. are taken into consideration to assess the situation. Calculations provide insight regarding the accurate matching of the forecast model with real data. The responsiveness of the model to different protocols to contain infection was deconstructed, and both the absolute and complex effectiveness of these protocols were determined. Restrictions on the work of shops, places of worship and adhering to strict quarantine and sensibly obeying social distancing protocols in places such as parks prove to be effective. As asymptomatic infections are transmissible and hard to be noticed, the authors [28] established an SEIR model for those diagnosed with asymptomatic infections. Distribution of asymptomatic infections is simulated using the data from the Beijing Municipal Health Commission and integrated with the four states of cellular automaton model
8
B. S. Rakshana et al.
which are susceptible, infected, asymptomatic, and recovering. China implemented rigid measures such as “home order” 10 days after the epidemic breakout. The simulations exhibited a notable decline in the prevalence of asymptomatic infections after the “home order”. 3.3 Probabilistic Cellular Automata/Stochastic Cellular Automata Stochastic or Probabilistic cellular automaton is a locally interacting Markov Chain, also known as a random cellular automaton. It is a dynamic system where each state is discrete. The state of each cell is updated according to some simple homogenous rule. The new states are chosen according to probability distributions. Some examples of Probabilistic Cellular Automaton (PCA) include majority cellular automaton which works on Toom’s rule, Cellular Potts model, Non-Markovian generalization, etc. In the transition rules, randomness is involved to a certain degree in PCAs and the outcome is the corresponding stationary distribution. The rules which govern the automaton change accordingly in PCA. Supplying a stochastic model with demographic elements such as age ratio, prior health conditions if any can help in simulating a robust model to predict similar outbreaks in the future. Joydeep et al., present a predictive model [29] to illustrate the effect of dependent population in terms of demography dynamics on the spread of disease. The proposed model is also validated with the case of coronavirus spread in New York State and was found to agree with the real data. The predictions also showed that an extension of lockdown likely up to 180 days can notably reduce the menace of a possible second wave. Saumyak et al., use two simulations [30] to model the pandemic - a refined SIR model and a stochastic CA to comprehend the ongoing crisis. One of the key outcomes inferred from this study is the importance of time dependence of rate of infection. For instance, the rate of transmission is always high in the beginning as the reason for the spread is often unknown but decreases as awareness increases among the people. The infection rate is clearly visible discussed with contour maps. Measures such as quarantine and social distancing to limit the spread of the disease turned out to be effective in this study. Also, the longer the incubation period of the virus, the higher is the chance for an individual to be asymptomatic. An asymptomatic individual can pose a great threat to the susceptible population. In order to study the spread of SARS-CoV-2 focusing on the symptomatic and asymptomatic cases during the infection, Monteiro et al., devised a model [31] using probabilistic cellular automata to the study of long-term behavior of this transmissible infection. The knowledge about possible disease transfer and identification of vulnerable groups of population is essential. Gwizdałła studies [32], two problems of creating a graph for simulating communities and modeling the illness transmission. This paper uses the Barabasi-Albert (BA) model to observe graphs illustrating growth of networks in a community. The Epidemic curves are presented on the number of recovered, new cases and examination of reducing contact among the people in a community. It is imperative to find the exchange of interaction between infected and susceptible to model the
Simulation, Perception, and Prediction of the Spread of COVID - 19
9
transmission of illness. Three types of graphs are used (i) the BAgraph (ii) a graph constructed on allotment to one of four groups with feigned sizes (iii) a graph constructed on allotment to one of 16 groups with sizes based on the number of inhabitants in Lodz, a city in Poland. The probability of meeting, as we cannot fall sick without transfer of pathogens and the probability of infection is presented as the main basis for construction of the model in this paper. This paper specifies the SEIR model by involving realistic times in which an individual stays exposed to the virus or infectious to others in a community. According to Tomasz’s opinion, the most significant property of the proposed model is its ability to include stochastic character of the disease transmission as this allows for individualizing the features to every possible contact in a network. This will be useful in deciding the factors involved in the disease spread between two individuals. The proposed model also permits various types of interventions such as new likely diseases. The results of procedures such as vaccination and quarantines can be easily simulated which can aid the government in the decision-making process to curb the spread. Forecast of the epidemic duration can also be estimated using the suggested model. The authors in [9] focus on the study that time of herd immunity evolution had on people, by taking into consideration several vulnerable and resilience. Studies have found that the slower attainment of herd immunity is relatively less fatal compared to rapid immunization. Slow progress towards attainment of herd immunity is hindered by several intervening factors whose impact can be studied much more in detail with the help of PCA. The propagation dynamics of the disease exhibit essential similarities with that of a complex chemical reaction, both of which rely upon time dependence. SIR model is used to examine the repercussions of achieving herd immunity without vaccine or drugs in the pandemic context. It can be concluded that delayed attainment of immunity saturation is relatively less fatal, based on the observation of non-linear trends in the dependence of the cured and dead population on the early population of vulnerable. Spatial Markov-Chain CA model has been made use of in [33] to track the spread of COVID-19 virus and two methods for parameter estimation. Networking topologies are considered for the progression mapping of epidemics using this model, by taking each individual on a grid and using stochastic principles to find the transitions between different states. Maximum likelihood estimation and Bayesian estimation methods were used on simulated data to estimate parameters, which were found to be relatively accurate. Algorithms called Metropolis Hastings algorithm and the Metropolis-Adjusted Langevin algorithm (MALA) were used too. An important finding was that the recovery rate parameter μ is crucial for determining the number of susceptible, recovered and dead people, as long as the infection probability rate δg was larger than 0.3 (relative error of δg was generally in the range of 1–3%). Designing a data-driven, spatial framework which can be utilized to estimate parameters pertinent to the spread of the pandemic is the main goal of [34], using PCA to model the infection dynamics and driving factors of the spread of pandemics. In order to determine propagation parameters, a sequential genetic algorithm is used to optimize the parameters of the cellular automaton. This approach is extremely flexible and robust, enabling estimation of time trajectories of epidemics. The final observations recorded
10
B. S. Rakshana et al.
predictions extremely congruent with observed trends. It can be further enhanced with the aid of demographic and socio-economic features. Schimit [35] examined the impact of social isolation on the population of Brazil by simulating the number of casualties due to COVID-19 and saturation of hospital resources, and analyzed healthcare system responses to the crisis. The SIER (SusceptibleExposed-Infected-Removed) model has been elucidated in terms of PCA with the help of ordinary differential equations to trace the transmission of COVID-19. This approach can be used to estimate the impact on society for a myriad of scenarios. According to the analysis, if social distancing and isolation procedures increased from 40 to 50% in Brazil, it would suffice to purchase 150 ICU and mechanical ventilators per day to keep up with the requirements of the healthcare system. Furthermore, premature relaxations of curfew showed to be detrimental, as they would inevitably lead to subsequent waves of contamination. Schimit et al., investigate further and a simulation model was implemented [36] to simulate the pandemic, taking into due consideration the mutation rate of Sars-CoV-2 virus and vaccination rate of individuals yet unaffected. The study considered a vaccine with an efficiency of 50%. High mutation rates can result in more deaths and elongation of the pandemic period. Nevertheless, two positive outcomes have been observed: vaccination reduces the number of mutations and the number of deaths due to COVID-19. A direct relationship between the number of successful mutations and the number of death cases has been established. Similarly, Ghosh et al., uses PCA [37] to study the impact of strict measures to curb the effects of the spread of the pandemic using a data-driven approach on an epidemiology model. This provides spatial and temporal insight, as the transitions are based on several factors such as chronology, symptoms, pathogenesis, transmissivity, etc. Epidemic dynamics are studied for various countries using this model. Tunable parameters are linked with other practical factors such as population density, testing efficiency, general immunity, health-care facility, awareness among the public, etc. The study results in the findings of firm ground for the changes in distribution (peak position, sharpness of rise, lifetime of epidemic, asymmetric long-tailed fall etc.) driven by its parameters.
4 Lattice Gas CA Lattice gas Cellular Automaton—the precursor form of the Lattice Boltzman methods – a specific form of CA which is widely used to simulate fluid flows. This cellular automaton consists of a lattice with a certain number of states or particles which travel with certain velocities. The simulation is performed with the progression of each discrete time step, where the state of the particle is determined by relevant mathematical formulae. The particle may be in the ‘propagation’ state or ‘collision’ state. In the propagation state, movements are determined by the velocities of particles. There exists a principle called the ‘exclusion principle’, which prevents two particles from travelling along the same link with the same velocity at the same time, although it must be noted that two particles can travel along the same link in opposite directions. Similarly, collision handling techniques determine what happens to the particles after they collide with one another i.e., reach the
Simulation, Perception, and Prediction of the Spread of COVID - 19
11
same site at the same time, while maintaining mass conservation and total momentum. A pseudo-random process can choose a random output in case of several possibilities for a given collision condition. Hexagonal model and HPPsquare model (developed by Hardy, Pomeau and de Pazzis) may be used to implement the Lattice Gas CA. In the HPP model, the lattice takes the form of a two-dimensional square grid, with particles capable of moving to any of the four adjacent grid points which share a common edge, and particles cannot move diagonally [38]. The hexagonal model, also known as FHP model (designed by Uriel Frisch, Brosl Hasslacher and Yves Pomeau), contains hexagonal grids with particles moving along any of the six possible directions (vertical, horizontal, diagonal), providing more movement options for them [39]. These individual models have their own fair share of advantages and disadvantages, thus necessitating experts to choose best suited models accordingly. For instance, the HPP model does not facilitate diagonal movement of particles, making it an-isotropic and lacking rotational invariance. The Hexagonal model on the other hand lacks Galilean invariance and has statistical noise. It is also rather difficult to expand the model to solve 3D problems which may require more dimensions and sufficient symmetrical grid. Using cellular automata, a lattice model is designed to observe the spread of the epidemic. The driving algorithm of the cellular automaton is framed in such a way that it can capture the effects of getting infected despite having recovered from the illness. In [40], the incubation period has also been taken into account. Kermack-McKendrick SIR model is clubbed with cellular automata to produce the desired results and glean insightful observations from it. Rate of growth of the infection over the space is provided. Growth of correlations as a function of length can be used. The observations include a giant growing cluster of infected people, congruent with observations made in real life and the circular front of the infected cluster moves linearly in time. The effects of medication have also been incorporated with the model too and agree quite well in previously proposed models of variants of the Kermack-McKendrickmodel. Léon [41] uses of the Lattice-gas cellular automaton model to simulate the spread of the COVID-19 in Chilean cities. This model includes a mobility indicator to elucidate actuality of the Chilean cities with respect to the quarantine system and normal mobility of the population. The study indicated that the strategy of partial quarantines to combat the spread was inadequate to control the development of the pandemic. It is likely that the implementation of sanitary barriers was actually passable by the infected individual and therefore, an outbreak occurs in the virus-free zone when an individual leaves the outbreak zone. Salcido [42] proposed a 2D lattice gas model to study the COVID-19 spread in the Mexico City Metropolitan Area (MCMA). Influence of people’s mobility on the growth of the epidemic, number of infected individuals, death rate and duration of the epidemic was analyzed and predicted using the model. All particles are assumed to represent the susceptible population and were spread randomly covering the lattice by 4.5 particles per site. A small amount of Infectious people (0.001 particles per site) is dispersed at random in the lattice. Salcido’s approach on the model can help formulate policies to control the pandemic.
12
B. S. Rakshana et al.
5 Fuzzy CA Generally, cellular automaton is a discrete model of computation. However, Fuzzy cellular automaton (Fuzzy CA) is continuous cellular automaton. The rules which govern the CA are fuzzified of a normal Boolean CA. Random, smoothing or homogenous background may be used for the initial fuzziness of background. Fuzzy behavior usually destroys Boolean values as all values become homogeneous or heterogeneous fuzzy. Different fuzzified rules produce a variety of space-time patterns. Space-time patterns can be generally classified as static, periodic, complex or chaotic. Fuzzy CA is a sharper tool specifically used to detect the properties associated with ‘chaotic behavior’ of patterns. It can also be used to analyze complex dynamics of the shifting rules. The most important use of Fuzzy CA is to determine spatial distribution and temporal change, and estimate changes in environment. Sumita et al., employ measurable variables [43] and parameters by means of Crisp Cellular Automata model. Uncertainty evolved due to impropriety is called fuzzy uncertainty and it can be eliminated with the usage of Fuzzy CA. The discrete dynamical system where the measurements of parameters are incorrectly defined are modeled by Fuzzy Difference Equations or Fuzzy CA. Fuzzy CA is used for the dynamic system representing MERS and COVID-19 virus spread. In this experiment, a moderately large time period is divided into short equal intervals of time. Growth of the number of infected people is designed by the Fuzzy CA in each of the time intervals. Alternate usage of Fuzzy CA and temporally hybrid Fuzzy CA seemed to explain the model better.
6 Kinetic Monte Carlo CA (KMC) The Monte-Carlo integration is a non-deterministic approach, i.e., we choose random points at which the integrand is evaluated. This method is particularly useful for multidimensional definite integrals. This type of cellular automaton is used to simulate time evolution for naturally-occurring processes. These automata contain transition rates which usually act as rules. KMC may be of two types: Rejection-free KMC—where the rates are taken into consideration for a more complex mathematical approach—and Rejection KMC, where the latter is standard. Monte Carlo algorithm, which forms the basis for this specific kind of CA, solves problems through the use of random input. It is widely used in probability-dependent problems and relies solely on randomness to do so. These CA work best when used to simulate problems pertaining to kinematics—associated with motion or movements—like surface adsorption, diffusion, material growth, etc. Distribution of Inherent susceptibility [30, 44, 45] which largely depends on the immunity of an individual and external infectivity is often inadequately addressed in the development of an infectious disease. Mukherjee, S. et al., have resolved this issue by performing Kinetic Monte Carlo Cellular Automata (KMC-CA) simulations. This is a significant improvement over the naive SIR model to simulate a more realistic version of the Pandemic.
Simulation, Perception, and Prediction of the Spread of COVID - 19
13
7 Future Work Based on the discussions made in this paper, more strategies can be explored to assist better decision-making. It is necessary to study the spread of infection based on the data analysis to frame feasible policies. Including new parameters will allow for the analysis of various scenarios for different diseases which will aid decision-makers in determining the critical factors for disease control. It is apparent over the last few months that the spread of the COVID-19 virus depends on several factors that are difficult to contain. Not only should we consider mitigation measures, but also medical resources such as vaccines, number of available beds to accommodate infected patients in order to increase the model’s accuracy. Further factors like ventilation and indoor environments could be beneficial for better assessment of the virus transmission.
8 Conclusion Some of the papers discussed previously talk about the measures that are used to regulate the spread of covid-19. The impact of various social restrictions, herd immunity, driving factors of the pandemic, etc. have also been analyzed thoroughly to get a better idea of how the pandemic affects human beings across different parts of the world, thereby enabling us to come up with worthwhile solutions for this prevalent issue. The pandemic caused by the SARS-CoV-2 virus and its variants has been modeled and simulated, and its dynamics elucidated after further studies. Asymptomatic individuals have also been considered in the equation of modeling. Trends across various regions and suggestions for effective control measures have also been provided as a result of the studies conducted. Several factors which lead to uncertainty have also either been eliminated or made lucid. Several future possibilities considering the prevailing aspects have also been discussed. Although the papers discussed cover most of the aspects, there are certain limitations too which could be eliminated in future works and optimizations. Limited availability of data, non-homogenous nature of the pandemic, factors susceptible to variations according to geographical constraints are some to be taken into due consideration. Once these challenges are overcome, it would open up many more options and tactics to solve issues pertaining to the pandemic.
References 1. Andersen, K.G., et al.: The proximal origin of SARS-CoV-2. Nat. Med. 26(4), 450–452 (2020) 2. Wu, Y.-C., Chen, C.-S., Chan, Y.-J.: The outbreak of COVID-19: an overview. J. Chin. Med. Assoc. 83(3), 217 (2020) 3. https://www.cdc.gov/coronavirus/2019-ncov/symptoms-testing/symptoms.html 4. Lauer, S.A., et al.: The incubation period of coronavirus disease 2019 (COVID-19) from publicly reported confirmed cases: estimation and application. Ann. Intern. Med. 172(9), 577–582 (2020) 5. Bin, S., Sun, G., Chen, C.-C.: Spread of infectious disease modeling and analysis of different factors on spread of infectious disease based on cellular automata. Int. J. Environ. Res. Public Health 16(23), 4683 (2019)
14
B. S. Rakshana et al.
6. Ragala, R., Guntur, B.K.: Recursive block LU decomposition based ELM in apache spark. J. Intell. Fuzzy Syst. 39, 8205–8215 (2020) 7. Ragala, R., et al.: Rank based pseudoinverse computation in extreme learning machine for large datasets. arXiv preprint arXiv:2011.02436 (2020) 8. Zhou, Y., et al.: The global COVID-19 pandemic at a crossroads: relevant countermeasures and ways ahead. J. Thorac. Dis. 12(10), 5739 (2020) 9. Mondal, S., et al.: Mathematical modeling and cellular automata simulation of infectious disease dynamics: applications to the understanding of herd immunity. J. Chem. Phys. 153(11), 114119 (2020) 10. Athithan, S., Shukla, V.P., Biradar, S.R.: Dynamic cellular automata based epidemic spread model for population in patches with movement. J. Comput. Environ. Sci. 2014, 8 (2014). Article ID 518053 11. Ilnytskyi, J., Pikuta, P., Ilnytskyi, H.: Stationary states and spatial patterning in the cellular automaton SEIS epidemiology model. Phys. A 509, 241–255 (2018) 12. Dascalu, M., Stefan, G., Zafiu, A., Plavitu, A.: Applications of multilevel cellular automata in epidemiology. Stevens Point, Wisconsin, USA, pp. 439–444 (2011) 13. Maji, P., Shaw, C., Ganguly, N., Sikdar, B.K., Chaudhuri, P.P.: Theory and application of cellular automata for pattern classification. Fundam. Inf. 58(3–4), 321–354 (2003) 14. Chen, Q., Mynett, A., Minns, A.: Application of cellular automata to modelling competitive growths of two underwater species Chara aspera and Potamogetonpectinatus in Lake Veluwe. Ecol. Model. 147, 253–265 (2002) 15. Skoneczny, S.: Cellular-automata based modeling of heterogeneous biofilm growth for microbiological processes with various kinetic models. Chem. Process. Eng. 40(2), 145–155 (2019) 16. Sun, G.-Q., Jin, Z., Song, L.-P., Chakraborty, A., Li, B.-L.: Phase transition in spatial epidemics using cellular automata with noise. Ecol. Res. 26(2), 333–340 (2010). https://doi.org/10.1007/ s11284-010-0789-9 17. Pereira, F.M., Schimit, P.H.: Dengue fever spreading based on probabilistic cellular automata with two lattices. Physica A: Stat. Mech. Appl. 499, 75–87 (2018). https://doi.org/10.1016/j. physa.2018.01.029 18. Kleyko, D., Khan, S., Osipov, E., Yong, S.: Modality classification of medical images with distributed representations based on cellular automata reservoir computing. In: 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), pp. 1053–1056 (2017). https://doi.org/10.1109/ISBI.2017.7950697 19. Wolfram, S.: Statistical mechanics of cellular automata. Rev. Mod. Phys. 55(3), 601 (1983) 20. Sree, P.K., Smt, S.S.S.N., Usha Devi, N.: COVID-19 hotspot trend prediction using hybrid cellular automata in India. Eng. Sci. Technol. 2(1), 54–60 (2020) 21. Neumann, J.V.: The Theory of Self-Reproducing Automata. In: Burks, A.W. (ed.) University of Illinois Press, Urbana and London (1966) 22. Lugo, I., Alatriste Contreras, M.: Intervention strategies with 2D cellular automata for testing SARS-CoV-2 and reopening the economy. Sci. Rep. 12(1), 13481 (2020). https://doi.org/10. 21203/rs.3.rs-40739/v1 23. Wang, S., Fang, H., Ma, Z., Wang, X.: Forecasting the 2019-ncov epidemic in Wuhan by SEIR and cellular automata model. In: Journal of Physics: Conference Series (2020) 24. Pokkuluri, K.S., Devi Nedunuri, S.U.: A novel cellular automata classifier for COVID-19 prediction. J. Health Sci. 10(1), 34–38 (2020) 25. Medrek, M., Pastuszak, Z.: Numerical simulation of the novel coronavirus spreading. Expert Syst. Appl. 166, 114109 (2021) 26. Dai, J., Zhai, C., Ai, J., Ma, J., Wang, J., Sun, W.: Modeling the spread of epidemics based on cellular automata. Processes 9(1), 55 (2021). https://doi.org/10.3390/pr9010055
Simulation, Perception, and Prediction of the Spread of COVID - 19
15
27. Vyklyuk, Y., et al.: Modeling and analysis of different scenarios for the spread of COVID-19 by using the modified multi-agent systems - evidence from the selected countries. Results Phys. 20, 103662 (2021). https://doi.org/10.1016/j.rinp.2020.103662 28. Xiao, M., Zhan, Q., Li, Y.: Research on combating epidemics based on differential equations and cellular automata. In: Journal of Physics: Conference Series, vol. 1865, no. 4. IOP Publishing (2021) 29. Munshi, J., et al.: Spatiotemporal dynamics in demography-sensitive disease transmission: COVID-19 spread in NY as a case study. arXiv: Populations and Evolution (2020) 30. Mukherjee, S., Mondal, S., Bagchi, B.: Dynamical theory and cellular automata simulations of pandemic spread: understanding different temporal patterns of infections (2020) 31. Monteiro, L.H.A., et al.: On the spread of SARS-CoV-2 under quarantine: a study based on probabilistic cellular automaton. Ecol. Complex. 44, 100879 (2020) 32. Gwizdałła, T.: Viral disease spreading in grouped population. Comput. Methods Programs Biomed. 197, 105715 (2020). https://doi.org/10.1016/j.cmpb.2020.105715 33. Lu, J.: A Spatial Markov Chain Cellular Automata Model for the Spread of the COVID-19 virus: Including parameter estimation (2020) 34. Ghosh, S., Bhattacharya, S.: A data-driven understanding of COVID-19 dynamics using sequential genetic algorithm based probabilistic cellular automata. Appl. Soft Comput. 96, 106692–106692 (2020) 35. Schimit, P.H.T.: A model based on cellular automata to estimate the social isolation impact on COVID-19 spreading in Brazil. Comput. Methods Programs Biomed. 200, 105832 (2021) 36. Schimit, P.: An Epidemiological Model to Discuss the Mutation of the Virus SARS-CoV-2 and the Vaccination Rate (10 March 2021) 37. Ghosh, S., Bhattacharya, S.: Computational model on COVID-19 pandemic using probabilistic cellular automata. SN Comput. Sci. 2, 230 (2021) 38. Wikipedia contributors. (22 January 2021). HPP model. In Wikipedia, The Free Encyclopedia. Accessed 14 Sep 2021 39. Wikipedia contributors. (22 July 2021). Lattice gas automaton. In Wikipedia, The Free Encyclopedia. Accessed 14 Sep 2021 40. Datta, A., Acharyya, M.: Modelling the Spread of an Epidemic in Presence of Vaccination using Cellular Automata (2021) 41. León, A.: Study of the effectiveness of partial quarantines applied to control the spread of the Covid-19 virus. medRxiv (2021) 42. Salcido, A.: A lattice gas model for infection spreading: application to the COVID-19 pandemic in the Mexico City metropolitan area. Results Phys. 20, 103758 (2021) 43. Basu, S., Ghosh, S.: Fuzzy cellular automata model for discrete dynamical system representing spread of MERS and COVID-19 virus (2020) 44. Mukherjee, S., Mondal, S., Bagchi, B.: Origin of multiple infection waves in a pandemic: effects of inherent susceptibility and external infectivity distributions (2020) 45. Mukherjee, S., et al.: Persistence of a pandemic in the presence of susceptibility and infectivity distributions in a population: mathematical model. medRxiv (2021)
Value Iteration Residual Network with Self-attention Jinyu Cai(B) , Jialong Li, Zhenyu Mao, and Kenji Tei Waseda University, Tokyo, Japan [email protected], [email protected]
Abstract. The Value Iteration Network (VIN) is a neural network widely used in path-finding reinforcement learning problems. The planning module in VIN enables the network to understand the nature of a problem, thus giving the network an impressive generalization ability. However, reinforcement learning (RL) with VIN can not guarantee efficient training due to the network depth and max-pooling operation. A great network depth makes it harder for the network to learn from samples when using gradient descent algorithms. The maxpooling operation may increase the difficulty of learning negative rewards due to overestimation. This paper proposes a new neural network, Value Iteration Residual Network (VIRN) with Self-Attention, using a unique spatial self-attention module and aggressive iteration to solve the above-mentioned problems. A preliminary evaluation using Mr. PacMan demonstrated that VIRN effectively improved the training efficiency compared with VIN.
Keywords: Deep Reinforcement Learning Self-Attention · Shortcut Connection
1
· Value Iteration ·
Introduction
RL is an increasingly popular group of machine learning algorithms [1], where an RL agent autonomously learns an optimal policy by searching in the stateaction space and interacting with the environment [2]. Most RL algorithms with neural networks (i.e., deep reinforcement learning) use reactive networks, such as Multilayer Perceptron (MLP), Convolutional Neural Networks (CNN) [3], and Long Short-Term Memory (LSTM) [4]. For these networks, it is difficult to say whether they understand a behavior’s goal-directed nature (e.g., avoid obstacles and reach the destination in path-finding problems) [5]. As a result, they may not guarantee good performance when the environment is changed (e.g., a different map is given in path-finding problems). The Value Iteration Network (VIN) [5] is a neural network first proposed to help solve path-finding problems. The key component in VIN is the Value Iteration (VI) module, which helps the network understand the nature of a problem. The VI module first calculates how each action affects the environment c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 16–24, 2023. https://doi.org/10.1007/978-3-031-35501-1_2
VIRN
17
using convolutional iterative computation and a special max-pooling operation, and it then selects the optimal action as output. The VI module enables the network to understand the problem and learn how to plan [6]; thus, the network performances well even in dynamic environments. However, training efficiency in RL with VIN is not guaranteed due to two reasons: – Current studies on VIN often deal with a large input size but use a small kernel for convolution [5,7,8], which leads to a great network depth for the VI module. Such a great depth makes it more difficult for the network to learn from samples when applying gradient descent algorithms [9]. – The max-pooling operation in the VI module may cause the impact of negative rewards to be harder to spread, as it will be easily overwritten by that of far-range positive rewards. The resulting overestimation makes the negative reward information unable to be effectively learned. To solve the above-mentioned problems, this paper proposes the Value Iteration Residual Network (VIRN) to increase the RL training efficiency. The key idea of VIRN is to reduce the network depth while reducing its negative impact on training efficiency, and to save the results of each iteration to the final output before overwriting. The main contributions of this paper are as follows – This paper proposes aggressive iteration(i.e., using larger convolution kernels with shorter iterations) to reduce the network depth. – This paper proposes a self-attention module for relating the output of each iteration directly to the final output, which effectively alleviates the overwriting problem and reduces the negative impact of network depth. The rest of this paper is organized as follows. Section 2 introduces background techniques. Section 3 introduces related work and Sect. 4 explains VIRN. Section 5 introduces preliminary evaluation. Section 6 concludes this paper and introduces future work.
2 2.1
Background Reinforcement Learning
RL is a branch of machine learning that cope with learning policies for sequential decision-making problems. An RL problem is typically formulated in terms of computing the optimal policies for a Markov Decision Process (MDP) [10]. Formally, RL can be represented as a MDP, which consists of a tuple [S, A, R, P ], with S being the set of state s, A being the set of action a, R being a reward function R (s, a), and P being transition probabilities P (s |s, a). In RL, the agent’s objective is to learn a policy π that maximizes the cumulative expected reward. A standard method for finding π is to iteratively compute the value function: Vk+1 (s) = maxa Qk (s, a) ∀s, where Qk (s, a) = R (s, a) + γ
s
P (s |s, a) Vk (s )
(1)
18
J. Cai et al.
Fig. 1. Overview of Value Iteration Network
where Qk (s, a) is the value function of action a in state s at n iterations, and γ ∈ (0, 1] is the discount rate updating it until convergence. 2.2
Value Iteration Network
VIN is a neural network with a planning module to help solve the path-finding problem. The core idea of the VIN, as shown in Fig. 1, is to use a neural network to express the value iteration algorithm[1]. Taking a path-finding problem as an example, the input of the network is the observed map information states. VIN will first pass through two layers of CNN to generate a reward map. Then the reward map, together with an initialized value map of the same size, is input into the VI module. During the value iteration, the reward map and the value map are input into the convolutional layer, which calculates an action value map(Q map) as an output. Then, the Q map will be maximized on the n-dimension, and the value map and the reward map get updated. The above-mentioned procedure repeats until the value map converges. Finally, the fully connected layer uses values of the agent’s position on the Q map to calculate action values as final outputs.
3
Related Work
VIN on Multiple Levels of Abstraction [11] deals with large-size and highdimension domains by sampling the large-size domains into multiple levels of features. The information loss caused by the decreased resolution is compensated for by increasing the number of feature representations; thus, the capability of VIN is improved. VIN on Multiple Levels of Abstraction manages to deal with
VIRN
19
large-size domains through a different approach, but the overestimation problem remains. The Generalized Value Iteration Network (GVIN) was proposed to deal with irregular spatial graphs inputs [12]. GVIN replaces the traditional convolution operator with a novel graph convolution operator, which enables GVIN to learn and plan on irregular spatial graphs. Furthermore, GVIN uses a novel episodic Qlearning, and it is more stable than traditional n-step Q-learning. GVIN aims to increase training stability, while the proposal in this paper focuses on increasing training efficiency. Value Iteration Networks with Double Estimator for Planetary Rover Path Planning (dVIN) [8] was also proposed to solve the overestimation problem but uses a double estimator method. dVIN uses a double convolutional layer instead of one in the VI module, and it uses the weighted mean of the two values in order to approximate the maximum expected value of the next iteration. This idea is similar to [13], which has been demonstrated to be able to reduce overestimation and result in more stability. dVIN alleviates the overestimation problem to some extent, but the error accumulation due to iterations still remains.
4 4.1
Value Iteration Residual Network Overview
As shown in Fig. 2, compared with VIN, there is an additional self-attention module in VIRN. The overall workflow of VIRN is as follows: pixel information is first input into CNN. The reward map returned by CNN and an initialized value map are conveyed to the VI module. The VI module returns the value map after each iteration to the self-attention module generating an integrated value map. Finally, the final convolutional layer outputs the action values on the basis of the agent coordinate. 4.2
VI Module
The core idea of the VI module is to use large convolution kernels to reduce the number of iterations and the network depth. The convolution kernels used in the VI module in the original VIN are 3 × 3 in size. Using small convolution kernels means that more iterations and deeper networks are needed to make the value function converge. VIRN actively uses larger convolution kernels in accordance with the different input map sizes. A larger convolution kernel also makes the computation more paralleled, thus further improving training efficiency. In addition, the number of convolution kernels used in VIRN is different from VIN. In the original VI module, the number of kernels q is larger than the size of the action space; thus, decoding is required to convert q-dimension vectors into action values. VIRN directly uses the size of the action space as the number of the convolutional kernels in the VI module, skipping the decoding procedure and thus reducing encoding-decoding loss.
20
J. Cai et al.
In VIRN, the value map after each iteration is also returned to the selfattention module. Since each iteration may cause an overwriting problem, as the number of iterations increases, more information might be lost. Returning the result of each iteration to the self-attention module provides the network with more useful information, enabling the self-attention modules to generate an integrated value map. 4.3
Self-attention Module
In VIRN, the future is not always taken into account when planning. For example, when we are playing a game, if we face a trap that will lead to death, we always back off and look for another way out, no matter how attractive the reward behind the trap might be. exp (αi ) · vi V = (2) j∈k exp (αj ) i∈k
αi = rwr · vi wv The reward map returned from the CNN goes through a feature extraction process and is input into the self-attention module with the output of the VI module. The self-attention module first multiplies the result of each iteration in the VI module and the feature of the reward map by the weight matrix wv , wr . The dot product of the two previous results is correlation α. Then, the probability distribution of the k correlations is calculated by a softmax function. Finally, the integrated value map is obtained by calculating the weighted average of k value maps (Eq. 2).
Fig. 2. Overview of Value Iteration Residual Network with Self-attention
VIRN
21
By training the weights wv , wr , the self-attention module enables the network to choose the result of iteration autonomously. This procedure is similar to the residual connection [9], which also effectively alleviates the problem of gradient explosion and gradient disappearance, making it easier for the network to learn from samples.
5
Preliminary Evaluation
In this section, the experiment settings are first explained, and then the results are shown with a brief discussion. A preliminary evaluation was conducted using Mr. Pac-Man under the research question of how well does VIRN increase the training efficiency compared with VIN ? For systematical evaluation, the learning speed is considered, shown as the change in average scores in accordance with the number of training episodes and training time (Fig. 3). 5.1
Experiment Settings
The Mr. Pac-Man game is actually very similar to the 2D path-finding problem but with a more complex environment. For example, the moving paths of the enemy and the scoring criteria change after specific events in the game. The performance difference in handling complex environments between different networks is more obvious. To reduce training costs, the map used for the experiment had a size of 84*80*1 instead of the original 210*180*3 in the game. The enemy’s movement information is recorded every four frames, so the input data was at a size of 84*80*4.
Fig. 3. Mr. Pacman game image
For the convenience of illustration, the evaluation also featured IVIN, which has the same structure as VIN except for the iteration in the VI module. VIRN
22
J. Cai et al.
and IVIN used an 11 × 11 convolution kernel for 15 iterations, and VIN used a 3 × 3 convolution kernel for 100 iterations. targetQ = r + γ ∗ Qmaxa (s , a ) loss = E[(targetQ − Q(st , at ))]2
(3)
Deep Q-learning was used as the baseline training method, where the learning rate was 0.0001, reward decay was 0.99, epsilon greedy was 0.05, and the target network altered every episode. Mini-batch optimization was also used, where the batch size was 32. The loss function was defined as Eq. 3. 5.2
Experiment Result
Figure 4a shows that VIRN had obvious advantages compared with VIN and IVIN. The average score of VIN peaked at about 680 points around 5,000 episodes, after which the scoring curve converged. IVIN outperformed VIN from the early stage of training and saved about 80% of training episodes to reach the peak of the VIN curve. VIRN had the highest training efficiency from the early stage of training, and its performance surpassed the highest score of IVIN at about 2,000 episodes, which also saved about 80% of training episodes. The highest average scores of VIRN, IVIN, and VIN within 15,000 episodes were around 1700, 1100, and 680 points, respectively. When the curve of VIRN still showed an increasing trend at the end of the experiment (15,000 episodes), the curves of VIN and IVIN had already reached convergence. On the other hand, in the time-score comparison shown in Fig. 4b, VIRN and IVIN also had advantages compared with VIN. In particular, IVIN had an extremely fast computing speed. It only took IVIN 100,000 s to complete 15,000 episodes of training. Although the average score of both VIRN and IVIN increased rapidly in the starting 50,000 s, the total training time with VIRN was much longer than IVIN. VIRN is not the fastest method, but its performance within the same training time is always the best. 5.3
Discussion
There are two main improvements in VIRN compared with VIN: aggressive iteration and the self-attention module. As for the 84*80*4 input, VIN was too deep due to the higher number of iterations, leading to a lower training efficiency. Also, because of the depth, some weights in the network could not be trained effectively; thus, the final performance of the network (i.e., the highest score) was worse. Comparing VIN with IVIN, it can be concluded that aggressive iteration can effectively reduce the depth of the network and improve the training efficiency. In addition, aggressive iteration leads to more parallel computing and significantly reduces computation time. During the early stage of training, the performance of VIRN was the best (note that the initial performances of the three methods were all around 300). This is attributed to the residual connection-like structure of the self-attention
VIRN
23
Fig. 4. Comparison of VIRN, IVIN, and VIN in Mr. Pac-Man
module in VIRN enabling the network to be trained more efficiently. On the other hand, the self-attention module improved the network’s ability to handle complex environments, which lead to better performance in this experiment. Although the self-attention module increased the computation time, VIRN still performed better than VIN. Though in this experiment VIRN had a significant improvement in training efficiency compared with VIN, we can reasonably infer that if the input size exceeds a certain level, a part of the weights in the VIRN cannot be trained effectively, so the performance of the network will be degraded. Last but not least, the experiment settings used in the evaluation were fixed, so it will take more experiments with different settings and environments to further evaluate the performance of VIRN.
24
6
J. Cai et al.
Conclusion
This paper proposed a new neural network, VIRN, to increase the training efficiency in RL. Compared with VIN, VIRN has a self-attention module and more aggressive iteration, alleviating the overwriting problem and effectively reducing the network depth. A preliminary evaluation showed that VIRN significantly improved the network training efficiency and outperformed VIN and IVIN in Mr. Pac-Man. However, VIRN needs to be improved in the following aspects. The first is to shorten the computation time, optimizing the change in computation time in accordance with the input size from exponential to linear. Second, to better evaluate VIRN, more experiments with different settings and environments are required. Acknowledgement. This work was partially supported by JSPS KAKENHI, JSPS Research Fellowships for Young Scientists.
References 1. Arulkumaran, K., Peter Deisenroth, M., Brundage, M., Anthony Bharath, A.: Deep reinforcement learning: a brief survey. IEEE Signal Process. Mag. 34, 26–38 (2017) 2. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. A Bradford Book. The MIT Press, Cambridge, MA, USA (2018) 3. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017) 4. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997) 5. Tamar, A., Wu, Y., Thomas, G., Levine, S., Abbeel, P.: Value iteration networks. In: Lee, D., Sugiyama, M., et al., (eds) Advances in Neural Information Processing Systems, vol. 29. Curran Associates, Inc., (2016) 6. Bertsekas, D.: Dynamic programming and optimal control, vol. 1. Athena Scientific (2012) 7. Shen, J., Hankui Zhuo, H., Xu, J., Zhong, B., Jialin Pan, S.: Transfer value iteration networks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, pp. 5676–5683. AAAI Press (2020) 8. Jin, X., Lan, W., Wang, T., Yu, P.: Value iteration networks with double estimator for planetary rover path planning. Sensors (Basel, Switzerland), 21 (2021) 9. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) 10. Bellman, R.: A markovian decision process. J. Math. Mech. 6(5), 679–684 (1957) 11. Schleich, D., Klamt, T., Behnke, S.: Value iteration networks on multiple levels of abstraction. Robotics: Science and Systems XV, abs/1905.11068 (2019) 12. Niu, S., Chen, S., Guo, H., et al.: Generalized value iteration networks: life beyond lattices. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, (2018) 13. Hasselt, H.: Double q-learning. In: Advances in Neural Information Processing Systems, vol. 23, (2010)
Dental Treatment Type Detection in Panoramic X-Rays Using Deep Learning Nripendra Kumar Singh1 , Mohammad Faisal2 , Shamimul Hasan2 , Gaurav Goshwami3 , and Khalid Raza1(B) 1 Department of Computer Science, Jamia Millia Islamia, New Delhi, India
[email protected], [email protected] 2 Faculty of Dentistry, Jamia Millia Islamia, New Delhi, India 3 Deep Learning Research Team, Synergy Labs, Gurugram, India
Abstract. Detection of treatment types in dental panoramic radiographs is still an open problem as the position of the tooth are arbitrarily oriented and usually closely packed. The majority of current two-stage anchor-based detectors are used in oriented object detection techniques. Nevertheless, the positive and negative anchor boxes tend to be severely biased in anchor-based detectors. In this work, we optimized a single-stage anchor-free deep learning model to detect and classify the teeth with or without treatment. We aim to detect dental restoration, root canal treatment (RCT), and teeth without treatment accurately in a full scan of dental panoramic radiographs. We trained our model on 500 images and tested it on 93 images from a dataset of 593 dental panoramic x-rays. The proposed work performance on overall dental treatment detection with an average precision (AP) of 85%. The result of this study suggested that RCT was recognized and predicted with the highest accuracy of 91% AP score. Keywords: Deep Learning · Dental Treatment Detection · Panoramic X-Rays
1 Introduction Over the past ten years, deep learning-based approaches have dominated practically all computational intelligence tasks [1, 2]. Due to the gradual rise in dental publications over the past several years, dentistry has gained a lot of attention from the classification of basic oral objects to more complex ones like the prediction of the course of oral cancer, we can observe how AI is changing overall working environment for dentists [3, 4]. Dental professionals take panoramic radiographs for multiple diagnostic reasons before, throughout, and follow-up procedures as part of routine practice. Recently deep learning and AI received phenomenal success in different domains including medical and dental radiography. All this advancement enriches the original field of the computer vision task in image segmentation, object detection, and classification. In the current scenario, fully convolutional networks (FCN) based deep learning architecture has become the most popular approach to achieve better accuracy. Especially, Variants of R-CNN [5], YOLO [6], and SSD [7] for the object detection task, © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 25–33, 2023. https://doi.org/10.1007/978-3-031-35501-1_3
26
N. K. Singh et al.
U-Net [8] and Mask R-CNN [9] for the segmentation task are the advanced version of FCN architecture because they work well, even in case of small datasets for training the network like medical imaging data, including dental radiographs. However, all of this architecture related to object detectors worked on one or two-stage anchor-based detectors suffer serious imbalance problems. While analyzing dental panoramic x-rays, tooth objects are densely packed and have an arbitrary position which makes it a challenging task to detect correctly using anchor-based detectors. To address this problem, we test the anchor-free deep learning architecture that works on the oriented object detection approach [10]. There are many literatures on analyzing dental images that reported that deep learning was successfully implemented among various modalities of the dental imaging domain for dental treatment and diagnosis [11–13]. Yeshua et al., (2019) [14] and his group reported initially a primary study on the detection and classification of dental restoration on panoramic radiographs. They classified nine classes of dental restoration and treatments and three negative classes to handle the false positive case. To achieve the final classification, initially segmented the tooth RoI with a local adaptive threshold and characterize the restoration using 20 features, which were used to perform classification using weighed k-NN model. The method is evaluated on 63 dental radiographs, where 316 objects are related to dental filings with 92% of accuracy. In the follow-up work [15] they increased the dataset to 83 dental radiographs with 738 dental restorations. In this study, 11 categories of dental restoration were classified using a support vector machine (SVM) and achieved the 93.6% of accuracy. Both [14], and [15] of the study tested on a small dental dataset and employed a basic machine learning algorithm for the classification of dental restoration. However, the first deep learning-based approach for the detection of dental restoration and treatment in the panoramic image was achieved by [16], authors evaluated DenseNet, GoogleNet, and ResNet with 3013 tooth images extracted from the dental panoramic image. Among all three CNN models, ResNet performed better in the most of categories and achieved an overall 94% of accuracy. Yüksel et al., (2021) forwarded a deep learning framework DENTECT [17] to identify five different types of dental restoration and treatment, this work applied a true object detection approach to the dental panoramic image in three stages, first, segment the panoramic image in four quadrants, and then each segmented image is input to the detection model. The method employed 1005 full-scan panoramic images and scored 59% of average precision on treatment detection. The downside of this approach is the training of two deep learning models simultaneously will increase the computation cost and time. In this study, we introduce to employ a single-stage anchor-free object detection approach to recognize teeth with treatment (restoration and RCT) and without treatment in dental panoramic radiographs. We used single-stage CNN architecture which can detect the dental restoration work, root canal treatment, and sound tooth (no treatment). The main contribution in this proposed work are as follows: • As per our knowledge, we first introduce a single-stage anchor-free approach for the detection of dental treatment in the panoramic image. • For training the model, we annotate each tooth in the panoramic image with a polygon boundary box as per the nature of tooth anatomy, all previous approaches used a rectangular box.
Dental Treatment Type Detection in Panoramic X-Rays
27
• We can detect the tooth with even a small restoration on the surface against a sound tooth without any pre-segmentation approach. The proposed work is formatted in the following way: Section 2 encapsulates the dataset preparation, the deep learning architecture, implementation, and evaluation criteria of the proposed work. Section 3 presents the results of the experiment and together with a comparison to state-of-the-art results. Section 4 concludes the work and specifies the path that related projects will take going forward.
2 Material and Methods 2.1 Dataset and Annotation To solve the issues highlighted in the introduction, multiple studies that forwarded the methods to recognize the tooth object and treatments on the panoramic radiographs [14, 16, 18]. However, the detection of treatment in dental radiographs is still an open problem in the field of dentistry. A total of 1500 dental panoramic x-ray images were obtained from the open-source repository created by [19]. All of the 1500 images were categorized among 10 groups on the fact that, the presence of 32 teeth or less, restoration, and prosthetics in the x-ray image. The original dataset contained mask annotation of the corresponding tooth but, such information is not suitable for the detection of dental treatment on the individual tooth. In this work, we consider only those data where at least a single tooth restoration or root canal treatment is present on the dental x-ray image. Furthermore, we excluded the images having distortions, superimpositions, and crowded dentition which may belong to individuals younger than 12 years. For all that, in this experiment, we included 593 images from the original dataset. Using the open-source annotation software (VIA) [20], three generic tooth classifications with associated therapies were manually marked by specialists on panoramic radiographs. There a restoration class referred to the dental treatment with a dental filling or single crown procedure, the RCT class referred to the root canal treatment that may be followed by restoration and crown procedure, and the third class is the tooth class having no treatment. Two dental professionals with more than 10 years of clinical experience completed all the annotations. 2.2 Deep Learning Architecture We optimized the box boundary-aware vectors (BBAVectors) [21] architecture based on oriented bounding box (OBB) detection of the tooth with or without treatment in dental panoramic radiographs. Which is also known as a single-stage anchor-free object detector. BBA’s ability to learn rotational bounding boxes distributed arbitrarily in any quadrants of the cartesian plane without regressing the box width w, height h and angle of orientation θ directly to capture the rotational bounding boxes. Other state-of-the-art work provide anchor-free solution like keypoints-based detector [22, 23] proposed to predict the box information using corner points and the center point of the objects. It was become a pioneer in horizontal bounding box detection and is extensively utilized for
28
N. K. Singh et al.
facial landmark detection and pose estimation processes. Even though the keypoint-based object detectors outperform the anchor-based approach (R-CNN, SSD, and YOLO) in terms of speed and accuracy, they are hardly ever employed for oriented object detection tasks. The proposed architecture (presented in Fig. 1) is based on a U-shaped network mechanism [8]. BBAVectors used ResNet101 [24] as the backbone for down-sample the input image to learn specific features. Skip connections are utilised during the upsampling process to merge the high-level information and low-level particular features. The output feature map improved with a 3 × 3 convolutional layer, following the shallow layer’s concatenation with the revised feature map. At the latent layers, batch normalH W ization and ReLU activation are employed. The final output feature map X ∈ RC× s × s generated against the input image I ∈ R3×H ×W , where s is scale factor, C = 256, H
Fig. 1. Illustrate the architecture of BBAVectors-based oriented bounding box (OBB) detection. The input image is reduced in size to 900 × 900 before being sent to the network. Skip connections are used in the up-sampling process to merge feature maps. The heatmap P, offset mapping O, box parameter map B, and rotation map (α) is the four maps that make up the architecture’s output. Bounding boxes in red, blue, and green indicate the decoded OBBs.
Dental Treatment Type Detection in Panoramic X-Rays
29
and W are image height and width respectively. Output of the network transformed into four branches followed by convolution layers of 3 × 3 kernels. 2.3 Implementation Details In the training and evaluation stage, we resize the input images from 1127 × 1991 to 900 × 900 pixels through bilinear interpolation to maintain the image properties and produce an output resolution to the original image size. The backbone used in the experiment has already been pre-trained on the ImageNet dataset. The default PyTorch parameters are used to initialize the additional weights. In the training phase, we apply the usual data augmentations to the photos, which included random flipping and arbitrary slicing within the scale range. To optimize the overall loss L = lh + lo + lb + lα , we employ Adam with an initial learning rate of 1.25 × 10−5 . We performed the network training about 150 epochs with a batch size of 4 on NVIDIA Quadro P5000 single GPU. Our entire network is implemented on PyTorch with python 3.4. Both training and evaluation were performed on 32 core Dell workstation with Intel Xeon silver 4110 × 2.10 GHz CPU (64 GB RAM) and NVIDIA Quadro P5000 GPU (16 GB VRAM). 2.4 Assessment and Evaluation To evaluate the model performance the standard accepted metrics are required during the assessment. The earlier reported work on dental treatment detection directly in panoramic radiographs performed average precision (AP) metric to evaluate the proposed deep learning framework [17]. In this work, we evaluate the performance of our network with AP, true positive (TP), false positive (FP), and true positive ratio (TP ratio) of the individual class. Intersection-over-union (IoU) of the predicted box more than 0.5 were only calculated for the evaluation. IoU is defined as the percentage of the total area that overlaps the ground truth and estimated bounding boxes, it is ranging from 0 to 1. The number of properly identified objects and incorrectly identified objects were used to calculate TP and FP.
3 Results and Discussion The performance metrics to evaluate our study for the detection of dental treatment are presented in Fig. 2. The number of objects that were successfully detected after all steps of improvement and optimization was calculated visually to assess the accuracy of the detection for the three different types of teeth with or without treatment that were present in the images. The detection success rate for identifying the various kinds of dental treatment accessed using identifying the number of desirable (true positive) and undesirable (false positive) samples in Fig. 2(b). It was found that root canal treatment detected the highest percentage of desirable results with 91% accuracy. Restoration detection was performed least accuracy with 72% of accuracy, this is because multiple techniques were applied for restoration e.g., composite fillings, amalgam filling, inlay restoration, and application of crown followed by fillings that all have a different variation of brightness and appearance. However, our approach worked more balanced in the
30
N. K. Singh et al.
Fig. 2. Statical presentation of training data and evaluation metrics as follows: (a) The overall number of items in all the images obtained throughout the experiment using the training and test samples; (b) TP and FP for the total number of dental treatment (restoration, RCT and tooth) detected correctly and detected other treatment class respectively; (c) TP ratio represent correctly detected a treatment class concerning to all detected treatment; (d) Average precision (AP) of object detection metric for each detected treatment in all the images.
Table 1. Comparison of results with previous state-of-arts work. Method
Backbone
Task
Dataset
mAP
DENTECT [17]
YOLOv4
Detection
1005
0.59
BBAVectors (ours)
ResNet101
Detection
593
0.85
presence of apparent overlap, tooth enamel, and other anatomical structure where the previous approach largely failed in these circumstances (Table 1). When assessing the effectiveness of deep learning-based object detection tasks, two metrics, IoU, and AP, were typically used instead of the confusion matrix. An IoU of higher than 0.7 is thought to be a good performance [25]. An IoU of our study achieved 0.79 which is greater than 0.7. As a result, this learning system performs well. The accuracy of an object detection model is evaluated using an AP, although no clear standard exists about accepting AP as the only metric, the most relevant studies [17] appeared to perceive an AP of 0.59 in the way to detect multiple dental treatments in dental panoramic radiographs. The overall AP of this experiment was 0.85, and from the perspective of AP, the performance of this learning system is regarded as much better than previous results (Fig. 3).
Dental Treatment Type Detection in Panoramic X-Rays
31
Fig. 3. Illustration of the output of oriented bounding box detection in the dental panoramic radiograph. The red color bounding box represents the restoration treatment, the blue color box represents root canal treatment (RCT), and the green color box represents the tooth without treatment.
4 Conclusion In this work, we detect three generalized classes, one for a tooth without treatment, and the other two classes of dental treatment as restoration and RCT. We optimize a novel oriented object detection approach for the recognition of oriented tooth objects in panoramic radiographs. The suggested technique is a single-stage and anchor-free approach. BBAVectors achieve a better result in the detection of an oriented bounding box than the baseline technique, which directly determines the oriented bounding box’s width, height, and angle of rotation. In future work, tooth class can be separated into specialized classes as per tooth anatomical positions to be differentiated by the detection model more accurately. Similarly, dental treatment work can also be increased by incorporating various dental prostheses and implant treatment detection to maximize the automatic treatment analysis using dental panoramic radiographs. The proposed technique can also be tested for similar tasks on other dental modalities for example cone beam computed tomography and intraoral images. Availability of Code. GitHub repository link with the entire python code: https://github.com/ Nripendrakr123/Detection_of_tooth_treatment_type.
References 1. Raza, K., Singh, N.K.: A tour of unsupervised deep learning for medical image analysis. Curr. Med. Imaging 17(9), 1059–1077 (2021). https://doi.org/10.2174/157340561766621012715 4257
32
N. K. Singh et al.
2. Singh, N.K., Raza, K.: Medical image generation using generative adversarial networks: a review. In: Patgiri, R., Biswas, A., Roy, P. (eds.) Health Informatics: A Computational Perspective in Healthcare. SCI, vol. 932, pp. 77–96. Springer, Singapore (2021). https://doi. org/10.1007/978-981-15-9735-0_5 3. Wu, H., Wu, Z.: A few-shot dental object detection method based on a priori knowledge transfer. Symmetry (Basel) 14(6), 1129 (2022). https://doi.org/10.3390/sym14061129 4. Chu, C.S., Lee, N.P., Adeoye, J., Thomson, P., Choi, S.W.: Machine learning and treatment outcome prediction for oral cancer. J. Oral Pathol. Med. 49(10), 977–985 (2020). https://doi. org/10.1111/jop.13089 5. Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 2015 Inter, pp. 1440–1448 (2015). https://doi.org/10.1109/ICCV.2015.169 6. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (December 2016). https://doi.org/10.1109/CVPR.2016.91 7. Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/ 10.1007/978-3-319-46448-0_2 8. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-31924574-4_28 9. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 386–397 (2020). https://doi.org/10.1109/TPAMI.2018.2844175 10. Lin, Y., Feng, P., Guan, J.: IENet: Interacting Embranchment One Stage Anchor Free Detector for Orientation Aerial Object Detection. ArXiv:abs/1912.00969 (2019) 11. Babu, A., Andrew Onesimu, J., Martin Sagayam, K.: Artificial intelligence in dentistry: concepts, applications and research challenges. In: E3S Web Conference, vol. 297 (2021). https:// doi.org/10.1051/e3sconf/202129701074 12. Kumar, A., Bhadauria, H.S., Singh, A.: Descriptive analysis of dental X-ray images using various practical methods: a review. PeerJ Comput. Sci. 7, e620 (2021). https://doi.org/10. 7717/peerj-cs.620 13. Singh, N.K., Raza, K.: Progress in deep learning-based dental and maxillofacial image analysis: a systematic review. Expert Syst. Appl. 199, 116968 (2022). https://doi.org/10.1016/j. eswa.2022.116968 14. Yeshua, T., et al.: Automatic detection and classification of dental restorations in panoramic radiographs. Issues Inform. Sci. Inf. Technol. 16, 116968 (2019). https://doi.org/10.28945/ 4306 15. Abdalla-Aslan, R., Yeshua, T., Kabla, D., Leichter, I., Nadler, C.: An artificial intelligence system using machine-learning for automatic detection and classification of dental restorations in panoramic radiography. Oral Surg. Oral Med. Oral Pathol. Oral Radiol. 130(5), 593–602 (2020). https://doi.org/10.1016/j.oooo.2020.05.012 16. Gurses, A., Oktay, A.B.: Tooth restoration and dental work detection on panoramic dental images via CNN. In: TIPTEKNO 2020 - Tip Teknolojileri Kongresi - 2020 Medical Technologies Congress, TIPTEKNO 2020 (2020). https://doi.org/10.1109/TIPTEKNO50054.2020. 9299272 17. Yüksel, A.E., et al.: Dental enumeration and multiple treatment detection on panoramic Xrays using deep learning. Sci. Rep. 11(1), 1–10 (2021). https://doi.org/10.1038/s41598-02190386-1 18. Park, J., Lee, J., Moon, S., Lee, K.: Deep learning based detection of missing tooth regions for dental implant planning in panoramic radiographic images. Appl. Sci. 12(3), 1595 (2022). https://doi.org/10.3390/app12031595
Dental Treatment Type Detection in Panoramic X-Rays
33
19. Jader, G., Fontineli, J., Ruiz, M., Abdalla, K., Pithon, M., Oliveira, L.: Deep instance segmentation of teeth in panoramic x-ray images. In: Proceedings - 31st Conference on Graphics, Patterns and Images, SIBGRAPI 2018, pp. 400–407 (2019). https://doi.org/10.1109/SIB GRAPI.2018.00058 20. Dutta, A., Zisserman, A.: The VIA annotation software for images, audio and video. In: MM 2019 - Proceedings of the 27th ACM International Conference on Multimedia (2019). https:// doi.org/10.1145/3343031.3350535 21. Yi, J., Wu, P., Liu, B., Huang, Q., Qu, H., Metaxas, D.: Oriented object detection in aerial images with box boundary-aware vectors. In: Proceedings - 2021 IEEE Winter Conference on Applications of Computer Vision, WACV 2021 (2021). https://doi.org/10.1109/WACV48 630.2021.00220 22. Merget, D., Rock, M., Rigoll, G.: Robust facial landmark detection via a fully-convolutional local-global context network. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2018). https://doi.org/10.1109/CVPR.2018.00088 23. Law, H., Deng, J.: CornerNet: detecting objects as paired keypoints. Int. J. Comput. Vision 128(3), 642–656 (2019). https://doi.org/10.1007/s11263-019-01204-1 24. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (December 2016). https://doi.org/10.1109/CVPR.2016.90 25. Rahman, M.A., Wang, Y.: Optimizing intersection-over-union in deep neural networks for image segmentation. In: Bebis, G., et al. (eds.) ISVC 2016. LNCS, vol. 10072, pp. 234–244. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-50835-1_22
From a Monolith to a Microservices Architecture Based Dependencies Malak Saidi1(B) , Anis Tissaoui2 , and Sami Faiz3 1
National School for Computer Science, Manouba, Tunisia [email protected] 2 VPNC Lab, University of Jendouba, Jendouba, Tunisia [email protected] 3 Higher Institute of Multimedia Arts, Manouba, Tunisia [email protected]
Abstract. Modern business information systems are continuously subject to technical, functional and architectural changes aimed at meeting the needs of end users. However, these systems are monolithic making this update and maintenance a big problem. For these reasons and to outface with this monolithic architecture, micro-services allows us to migrate systems with strongly coupled components to systems with weakly coupled, highly cohesive and fine-grained components. These micro-services will allow the organization to react more quickly to new customer demands and requirements and to avoid an interminable development process over several years. Indeed, the main challenge is to determine an appropriate partition of the monolithic system since generally the process of identifying micro-services is done in an intuitive way based on the experience of the software designers and developers and based on the judgment of domain experts. To meet this challenge, this paper proposes a multi-model based on a set of business processes. This approach combines two different independent dimensions: control dependency and data dependency. We will be based on three clustering algorithm in order to automatically identify candidates micro-services. Keywords: monolithic architecture · microservices processes · control dependency · data dependency
1
· business
Introduction
The life cycle of the company is characterized today by increasingly frequent phases of change induced by a continuous search for competitiveness [8,9]. As a result, ensuring control of the evolution of each organization becomes a crucial issue and requires rapid adjustment of the information system to increase its agility and the ability of teams to react easily. Indeed, despite the willingness of these organizations to remain proactive, the monolithic [10] nature of their information systems puts them in front of challenges related to the performance of its services, the cost of the technical infrastructure, development and maintenance. A monolithic application is typically an application system in which c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 34–44, 2023. https://doi.org/10.1007/978-3-031-35501-1_4
From a Monolith to a Microservices Architecture Based Dependencies
35
all of the relevant modules are packaged together as a single deployable unit of execution. At some point, as new features are added, a monolithic application can begin to suffer from the following problems; first, the individual parts of the system cannot be scaled independently. Second, it is hard to maintain. However, the micro-services architecture [4,8] was invented to solve the problems cited such as tight coupling and scalability because over time IT projects tend to grow and little by little we extend the existing functionalities, with many additions and few deletions, we ends up with a thousand complicated and difficult to manage functional tasks. The term micro-service [14,15] describes a set of small, independent, fine-grained, loosely coupled and highly cohesive services. Each micro-service performs its business process autonomously and communicates synchronously via an API or asynchronously via a lightweight messaging bus [7]. So far, the micro-services discovery exercise is done intuitively based on the experience of system designers and domain experts, mainly due to missing formal approaches and lack of automated tools support. In this context, research work has been proposed recently [13,18,19]. Although business process models are a rich reservoir of many details like who does what, when, where, and why, BPs seem almost neglected in the exercise of identifying micro-services. To our knowledge, Amiri [1], Daoud et al. [3] and Saidi et al. [14,15] are the only ones to have adopted BPs in this discovery exercise. A BP is defined as an orchestration of activities [6] that includes an interaction between several actors in order to achieve a well-defined organizational goal. In this paper, we will propose a multi-model approach which aims to deal with the case of several variants of business processes in order to analyze the control and the association rules dependencies at first and to calculate the final dependency matrix based on our formulas proposed in a second step and finally generate the micro-services. The rest of this paper is organized as follows. Section 2 presents the related work. Section 3 presents a case study, gives an overview of our approach to automatically identify micro-services from a set of BPs, and formalizes the control and data dependency models. Section 4 presents the implementation of our proposed approach. Finally, we conclude with some future work.
2
Related Work
Currently, there are a very large number of applications that migrate to a microservices architecture. Since enterprise developers are faced with the challenges of maintenance and scalability of increasingly complex projects, in [5], Escobar et al. proposed a model-based approach to analyze the current application structure and the dependencies between business capability and data capability. This approach aims to break down an application developed in J2EE into microservices through diagrams resulting from the analysis of data belonging to each EJB (Enterprise Java Beans) using the clustering technique. Despite the business process is a central and crucial element for the evolution and success of the
36
M. Saidi et al.
company, only four works that took the business process as input in the exercise of discovering the appropriate micro-services. In [3], Daoud et al. was proposed to remove and deal with the limits of the approach of Amiri already mentioned in their work [1]. The essential goal of the approach is to automatically identify micro-services based on two dependency models (control and data) and using collaborative clustering. To do this Daoud et al. proposed formulas for calculating direct and indirect control dependencies as well as proposed two strategies for calculating data dependency. In [15], Saidi and al. proposed an extension of the control model of Daoud and al. [3] They proposed four calculation formulas to calculate the dependence taking into account the case of loopsequence , loopAnd , loopXor , loopOr in order to calculate the dependence matrix of control to subsequently generate the appropriate micro-services. In [14], Saidi et al. presented an approach based on association rules to calculate the correlation between the attributes of the set of activities and to determine a dependency matrix based on the strong and weak associations. The only paper that addressed the problem of identifying micro-servives from a set of business processes is the approach of Amiri and al. [1]. For this reason, our main objective in this paper is to take several independent business processes as system input and identify the candidate micro-services using three different clustering algorithms. The authors in [17] described an approach which is based on global K-means algorithm. This incremental approach makes it possible to add one center cluster through a deterministic search technique made up of N execution of the algorithm starting from the appropriate initial position.
3 3.1
Our Approach for Identifying Microservices Our Case Study
In the film industry, image post-production is the process that begins filming is completed and deals with the creative editing of the film. Figure 1 shows several independent image post-production processes. A process model is a graph composed of nodes of type activity, gateway and arcs which are based on these elements. Activities capture the tasks performed in the process. Gateways are used to model alternate and parallel branches and merges. They can be of type OR or XOR (inclusive execution, exclusive execution) and AND. Our example in Fig. 1 is represented in BPMN. 3.2
Foundations
It is well known that the business process represents the organized set of activities and software, material and human resources [6], it is considered to be the central element and the backbone of each company. In this paper, we will be based on the business process as a monolithic system in order to break it down into appropriate micro-services using the dependency linked to a given couple of
From a Monolith to a Microservices Architecture Based Dependencies
37
Fig. 1. An example of 3 independent BP of the picture post-production process
activities. These micro-services are fine-grained, loosely coupled and with strong cohesion. We were able to identify four types of dependencies which are marked below. – Control Dependency: If 2 activities are directly connected through a control dependency, they will form a fine-grained and loosely coupled microservice. Otherwise, they form separate micro-services [3,15]. – Data Dependency: This dimension is based on association rules to determine the low and high correlation between the different attributes of a given pair of activities [14]. – Functional Dependency: This dimension is based on the DDD (Domain Driven Design) approach to determine the different sub-domains of our system and their corresponding activities. – Semantic Dependency: This dimension is based on measuring the degree of semantic similarity between a given pair of activities. In this work we are only interested in control and data dependency to identify appropriate micro-services from a set of business process models. 3.3
The Main Steps of Our Approach
Through Fig. 2 we identify three essential steps in our proposed architecture. – Dependencies examination: Through this step we will analyze the specifications of each process taken as system input and we will determine the dependence, according to two dimensions: control and association rules.
38
M. Saidi et al.
Fig. 2. Our architecture
For each business process, we will determine its own control dependency matrix analyzing the different types of connectors in the model and then applying the appropriate formulas. In the same way, we extract the matrices of each business process according to the second type of dependence and by analyzing the model in terms of correlation between shared attributes of each activity. Indeed, for n process, we will have 2n dependency matrices. – Dependency matrices generation: In this step, we will be based on the matrices generated in the previous step in order to calculate the final dependence matrix. To do this, formulas have been proposed for the aggregation of these matrices. This part will be described in details below. – Micro-services candidate generation: In this last step, we will be based on three clustering algorithm which takes as input the generated dependency matrix to identify the candidate micro-services. Each cluster will contain activities that form a micro-service candidate. 3.4
Micro-services Identification
A. Matrix Aggregation Formula – Control dependency formula For the control dependency calculation, we will calculate the dependency matrix for each model separately using the formulas given in [3,15] and then we add up to generate a single output matrix. Note: If there is not an arc that links between a couple of given activities, we will assign the value “0”.
From a Monolith to a Microservices Architecture Based Dependencies
39
n Depc = i=1 (M 1 (ai, aj) , M 2 (ai, aj) , ...M n (ai, aj)) we take the case of our example already represented in Fig. 1 and we try to analyze the dependence in term of control dependency, the first generated matrix for the first BP will be as follows (Table 1):
Table 1. Control dependency matrix 1 a1 a1 –
a2
a3
1/2 1/4
a2 1/2 –
1/2
a3 1/4 1/2 –
– Association rules formula. For each pair of activities ai and aj the value of Dep (ai, aj) is same in all the processes (if the process has both activities), because an activity even in different processes use the same set of attributes, therefore we will use a single binary representation containing all the activities of our process models and we will apply the algorithm for generating the final dependency matrix proposed in [14]. By analyzing the 3 BPs in terms of data, the dependency matrices will be as follows: To generate this matrix described in Fig. 3, we used the a priori algorithm implemented in [14]. Indeed, we set the minimum support value to 0.5 and the minimum confidence value to 0.7 in order to generate the set of association rules that will be used later by the dependency calculation algorithm [14].
Fig. 3. Data dependency matrix
B. Micro-services Identification: After calculating the two matrices of each dimension (control and data), we will use an aggregation formula to calculate the final dependency matrix. DepF = Sum (M i (ai, aj) , M f (ai, aj)) Our final dependency matrix is shown in Table 2.
40
M. Saidi et al.
After generating our final matrix, we will use three different clustering algorithms (partitional clustering, hierarchical clustering and distribution based clustering) to generate our candidate microservices. The candidate micro-services obtained for each algorithm used are represented in the Table 3. Each cluster describes a candidate micro-services.
4 4.1
Experimentation Implementation
Our proposed tool is based on five essential modules as it is modeled on our proposed architecture (Fig. 4). – Camunda modeler: It is the tool that will allow us to model our system input and to generate two types of output: either a graphic model of BP, or an XML file of BP created and which will be used later as input in the second module. Table 2. Final dependency matrix a0 a0
0
a2
a3
4,94 6,695
a1
38,53
a4
a5
a6
a7
20,3 37,305 44,555 38,061
a1
4,94
0
7,07
38,03 44,43
37,18
44,43 38,155
a2
6,695
7,07
0
38,03 22,18
44,43
44,43
a3
38,53
38,03 38,03
0 44,93
44,68
44,93 44,492
a4
20,3
44,43 22,18
44,93
44,93
44,93
44,68
a5 37,305
37,18 44,43
44,68 44,93
0
44,43
44,43
a6 44,555
44,43 44,43
44,93 44,93
44,43
0
a7 38,061 38,155 38,28 44,492 44,68
38,28
0 44,555
44,43 44,555
Table 3. Micro-services identification Clustering algorithm
Clusters
Partitional clustering
cluster1(a0, a6) cluster2(a4, a5, a7) cluster3(a1) cluster4(a2, a3)
Hiarchical clustering
cluster1(a0, a4, a6, a7) cluster2(a1, a2, a3, a5)
Distribution Based clustering cluster1(a4, a7, a5) cluster2(a2, a3) cluster3(a1) cluster4(a0,a6)
0
From a Monolith to a Microservices Architecture Based Dependencies
41
Fig. 4. Micro-services identification architecture
– Control dependency module: Based on the XML file generated by the first tool used, the control dependency module calculates the dependencies between each pair of activities (ai, aj). These dependencies are represented in the form of a control matrix using the different formulas proposed in [3] and [15]. – Association rules matrices generator: Each BP created is essentially based on a set of artifacts and attributes. Since we are working on different BP models, this module makes it possible to calculate the correlation defined between the attributes of the three models in order to determine the activities which will be classified in the same cluster and those which will be classified in a different cluster. This module reuses the method already proposed in [14]. – Global matrix generator: For n BP models, we will have n matrices at the level of the first dimension (control dependency) and the second dimension, which is based on the identification of strong and weak associations between activities. For this reason, we proposed to make an aggregation of n matrices generated for the first dimension and suddenly we will have as output a single matrix instead of several. For the second dimension, if a given pair of activities (ai, aj) is the same in the other variants of BP, then it will be the same dependency value, if not, this value is recalculated by applying the method already proposed by [14]. As output, we will have two matrices. In order to
42
M. Saidi et al.
generate our final dependency matrix, we proposed to take the “Sum” of the two dependencies calculated for a couple of activities (ai, aj). – Micro-services identification module: This module is based on the final matrix calculated in the previous module and using three different clustering algorithms in order to identify our micro-services which are fine-grained and with low coupling and high cohesion. 4.2
Experiments
First, we chose to calculate the appropriate cluster number using the “Elbow” method: Elbow is the point where the rate of decrease in average distance, i.e. SSE, will not change significantly with increasing number of clusters. According to Elbow method, the appropriate number of clusters in our case is equal to 2. – Partitional clustering: With the result of the implementation of the Kmeans algorithm, we have identified four different clusters. – Hierarchical clustering: Determines cluster assignments by creating a hierarchy. With the result of the implementation of the agglomerative algorithm, we have identified two clusters. – Distribution-Based Clustering: Gaussian Mixture Models (GMMs) assume that there are a number of Gaussian distributions, and each of these distributions represents a cluster. According to the comparison we made (Fig. 5), we find that the agglomerative algorithm gives better results compared to the others.
Fig. 5. Qualitative comparison
5
Conclusion
Micro-services-based architectures have become the software architecture of choice for business applications. Indeed, unlike monoliths, micro-services are generally decentralized and loosely coupled execution units.
From a Monolith to a Microservices Architecture Based Dependencies
43
For these reasons, we have adopted this type of architecture on our business process context in order to divide our monolithic system into small autonomous services that can be deployed and individualized separately. We have therefore proposed an approach which takes as input a set of business processes and which aims to calculate the dependence between a given pair of activities taking into account the structural and data aspect of the BPs. Subsequently, and based on a clustering algorithm, we were able to determine our candidate micro-services. As future work, we aim to treat the dependence between a couple of activities in a configurable process model.
References 1. Amiri, M.J.: Object-aware identification of microservices. In: 2018 IEEE International Conference on Services Computing (SCC), pp. 253–256. IEEE, July 2018 2. Chen, R., Li, S., Li, Z.: From monolith to microservices: a dataflow-driven approach. In: 2017 24th Asia-Pacific Software Engineering Conference (APSEC), pp. 466–475 (2017) 3. Daoud, M., Mezouari, A.E., Faci, N., Benslimane, D., Maamar, Z., Fazziki, A.E.: Automatic microservices identification from a set of business processes. In: Hamlich, M., Bellatreche, L., Mondal, A., Ordonez, C. (eds.) SADASC 2020. CCIS, vol. 1207, pp. 299–315. Springer, Cham (2020). https://doi.org/10.1007/978-3-03045183-7_23 4. Djogic, E., Ribic, S., Donko, D.: Monolithic to microservices redesign of event driven integration platform. In: 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pp. 1411–1414 (2018) 5. Escobar, D., et al.: Towards the understanding and evolution of monolithic applications as microservices. In: XLII Latin American computing conference (CLEI), October 2016, pp. 1–11 (2016) 6. Ferchichi, A., Bourey, J.P., Bigand, M.: Contribution à l’integration des processus metier:application a la mise en place d’un referentiel qualite multi-vues. Ph.D. thesis, Ecole Centralede Lille; Ecole Centrale Paris (2008) 7. Indrasiri, K., Siriwardena, P.: Microservices for the Enterprise. Apress, Berkeley (2018) 8. Baresi, L., Garriga, M., De Renzis, A.: Microservices identification through interface analysis. In: De Paoli, F., Schulte, S., Broch Johnsen, E. (eds.) ESOCC 2017. LNCS, vol. 10465, pp. 19–33. Springer, Cham (2017). https://doi.org/10.1007/9783-319-67262-5_2 9. Kherbouche, M.O.: Contribution à la gestion de l’évolution des processus métiers. Doctoral dissertation, Université du Littoral Côté d’Opale (2013) 10. PPonce, F., Márquez, G., Astudillo, H.: Migrating from monolithic architecture to microservices: a rapid review. In: 38th International Conference of the Chilean Computer Science Society (SCCC), November 2019, pp. 1–7. IEEE (2019) 11. Richardson, C.: Pattern: monolithic architecture. Dosegljivo (2018). https:// microservices.io/pattern-s/monolithic.html 12. Estanol, M.: Artifact-centric business process models in UML: specification and reasoning (2016)
44
M. Saidi et al.
13. Gysel, M., Kölbener, L., Giersche, W., Zimmermann, O.: Service cutter: a systematic approach to service decomposition. In: Aiello, M., Johnsen, E.B., Dustdar, S., Georgievski, I. (eds.) ESOCC 2016. LNCS, vol. 9846, pp. 185–200. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44482-6_12 14. Saidi, M., Daoud, M., Tissaoui, A., Sabri, A., Benslimane, D., Faiz, S.: Automatic microservices identification from association rules of business process. In: Abraham, A., Gandhi, N., Hanne, T., Hong, T.-P., Nogueira Rios, T., Ding, W. (eds.) ISDA 2021. LNNS, vol. 418, pp. 476–487. Springer, Cham (2022). https://doi.org/10. 1007/978-3-030-96308-8_44 15. Saidi, M., Tissaoui, A., Benslimane, D., Faiz, S.: Automatic microservices identification across structural dependency. In: Abraham, A., et al. (eds.) HIS 2021. LNNS, vol. 420, pp. 386–395. Springer, Cham (2022). https://doi.org/10.1007/9783-030-96305-7_36 16. Cheung, Y.-M.: k-Means: a new generalized k-means clustering algorithm. Pattern Recogn. Lett. 24(15), 2883–2893 (2003) 17. Likas, A., Vlassis, N., Verbeek, J.J.: The global k-means clustering algorithm. Pattern Recogn. 36(2), 451–461 (2003) 18. Levcovitz, A., Terra, R., Valente, M.T.: Towards a technique for extracting microservices from monolithic enterprise systems. arXiv preprint arXiv:1605.03175 (2016) 19. Mazlami, G., Cito, J., and Leitner, P. : Extraction of microservices from monolithic software architectures. In 2017 IEEE International Conference on Web Services (ICWS) (pp. 524–531). IEEE.2017
Face Padding as a Domain Generalization for Face Anti-spoofing Ramil Zainulin(B) , Daniil Solovyev, Aleksandr Shnyrev, Maksim Isaev, and Timur Shipunov Face2 Inc, Meridiannaya str .4, Kazan 420124, Russian Federation [email protected] https://face2.ru/ Abstract. A modern facial recognition system cannot exist without anti-spoofing protection, such as protection from fake biometric samples. The most common way to get a face image is an optical camera. Due to the vast variability of the conditions for obtaining the picture, the problem is non-trivial. Currently, there is no out-of-the-box solution. In this article, we aim to combine different approaches and provide an effective domain generalization method for convolutional neural networks based on face padding. Also, we suggest a few methods of passing partial information about faces for training. Keywords: face anti-spoofing · domain generalization · face recognition · artificial neural networks · computer vision
1
Introduction
Due to the widespread use of facial recognition technology, the issue of protecting such systems from unauthorized access is becoming increasingly important [3,9, 12]. A standard facial recognition system consists of a face reader, most often a camera and a model that implements the recognition logic. The usual way of “cheating” the system is implemented by providing a copy of a living person’s face, for example, by providing a printed face of a bank customer in front of an ATM. Creating a copy of a face and attacking the system by bringing a copy to a camera that is connected to a face recognition system is called a spoof attack, and the task of protecting the system is face anti-spoofing (FAS). The easiest way to create a copy of a face is by displaying it on a smartphone screen or by printing a photo on a piece of paper, such attacks are the most widespread, but it can also be used 3-d face masks, face cut out along the contour and etc. [7]. Figure 1 shows a facial recognition system, various image sources, live faces and copies of live faces are fed to the input system, the task of the face anti-spoofing algorithm is to reject attacks and pass real faces for recognition. When solving the problem of face anti-spoofing, the first thing you should pay attention to is the data that you will have to work with in the future, it depends on the conditions of the task, most often researchers have optical and c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 45–54, 2023. https://doi.org/10.1007/978-3-031-35501-1_5
46
R. Zainulin et al.
Fig. 1. Schematic representation of the face anti-spoofing task in the facial recognition system.
infrared cameras at their disposal, sometimes there is a depth maps camera, a microphone or LiDAR. This article discusses working with an image obtained from an optical camera, for simplicity we will call it an RGB image. Classical computer vision approaches give a good result [1,5,6], but they cannot guarantee an acceptable level of accuracy when transferring to another data set. Currently, machine learning algorithms based on convolutional neural networks are used to solve this problem, the effectiveness of which also suffers when used on new data sets, but to a lesser degree. In our work, we want to offer an effective method of training neural networks to solve the problem of face anti-spoofing on RGB images, it is also worth noting that we will use lightweight architectures, in particular MobileNetV2 [10], since it is important for us to be able to use the model on devices with low performance. At first glance, this problem is easy to solve using machine learning methods, it is enough to consider it as a classification problem, binary or multiclass, where a living faces is opposed to copies of living faces. And yes, this approach shows itself well when training on a specific data set, even when using classical machine learning methods, but when transferring to other sets, the prediction quality drops significantly [13]. A further development of this approach was the use of neural network architectures and deep learning, trying to improve the quality of the model by changing the architecture of networks [8], to get special features from the image [4,14], using approaches that usually improve the results for deep learning tasks [15]. Such campaigns give a better result, but do not solve the main problem, when transferring the results to other datasets, the quality decreases, which prevents the use of the obtained models in practice. Such results suggest that the main problem in the task of face anti-spoofing is too large a set of conditions under which an image can be obtained, the quality of the camera lens, image processing algorithms, weather conditions, etc. Until 2020, specialized data sets for the FAS task were relatively small and contained a small number of objects, an object means a unique living face, the most widely distributed CASIA-MFSD [17] - 50 objects, 12 videos for each object (3 with a live face and 9 with a fake) under different lighting and image quality,
Face Padding as a Domain Generalization for Face Anti-spoofing
47
attacks using printouts, cropped printouts along the contour of the face and mobile phone screen, Replay-Attack - 50 objects, under different lighting, a total of 1300 videos, attacks using a photo and a mobile phone screen, OULU-NPU [2] - 55 objects, a total of 4950 videos, attacks with a printed face and screens of various phones. It is logical to assume that the problem is an insufficient number of objects, in 2020 a data set containing 625,537 RGB images of 10,177 objects was provided, called CelebA-Spoof [16] and contains public photos of famous people and their copies as objects. Copies of the photos were recorded using different angles, different display devices and all kinds of lighting conditions. Despite the fact that this data set contains much more objects than the previous ones, has a wide variety of conditions for recording spoof attacks, the problem of transferring the trained model to other data sets has not been solved, however, this data set has made an impressive contribution to solving the problem of face anti-spoofing and remains the largest in the open access. In our work, we will use CelebA-Spoof to train models, but we want to note, in our opinion, this data set has a disadvantage related to the quality of images of living people, they were collected from photos of stars taken from open sources, many photos are retouched, have unnatural lighting, etc., this disadvantage may affect on model results.
2
Methods
3
Face Anti-spoofing Problem
Let’s describe the problem statement more formally. We will solve the face antispoofing problem as a classification problem on a labeled dataset D = {(x, y) : x ⊂ RH×W ×3 , y ∈ [0, 1]}, that is train a model f such that f(image) = prediction, where image∈ RH×3 . In the future, for simplicity, we will identify image with a matrix of size W × H assuming the origin of the coordinates is the upper left corner of the image, the x axis is directed to the right, the y axis is down. In our work, we will take the MobileNetV2 architecture as f , training will take place as standard, we use focal loss with gamma equal 2 and learning rate start from 0.001 with reduce learning rate when a metric has stopped improving, AdamW for optimization . A few words about the detector, in cases where the data is provided without face coordinates, we will use MTCNN to find the coordinates of the face. Our method of domain generalization consists in using face augmentation during training, that is, if the face detector on image returns the coordinates x0 , y0 , x1 , y1 , then we will use the wide-area to training x0 − Δx, y0 − Δy, x1 + Δx, y1 + Δy, where x0 − Δx ≥ 0, y0 − Δy ≥ 0, x1 + Δx ≤ W, y1 + Δy ≤ H. This approach is due to two considerations, the analysis of the work and the assumption that it is necessary to use stable information when moving from one data set to another, and the transition area of the face and its environment satisfies this condition and the assumption that a person uses to determine the spoof attack, not the face itself, but the face and its complement, simply by
48
R. Zainulin et al.
getting a face cut out by the detector, it is very difficult for a person to distinguish a spoof attack if there are no obvious signs inherent in spoof attacks, such as paper glare or irregular facial geometry. Also note that for training we will use the imageaug library for image augmentation, but for rotation we will use the method of rotation and cutting out the maximum area within the original image. A square image is fed to the network input, so that when an addition that goes beyond the boundaries of the image does not change the aspect ratio of the face, we will fill in the missing part with black pixels and get the correct addition. Additionally, we propose to modify the images for training by pass the limited information contained in the images. To do this, we use two methods, the quartering method, in which only a quarter of the image is pass, i.e. the image with the size is divided into four equal parts relative to the center and one of the quarters of the size is equally likely to be passed to training, and the second method, which also divides the image into even parts, but unlike the previous one, it is equally likely to apply a blur effect on one from four areas and transmits the full image to training, thanks to such approaches, we want to “focus” the attention of the model on information that is located on the border of the face and the part, at the same time, without having to overfit only on one of the areas of the face. 3.1
Methodology for Result Analysis
Our Dataset Description. Our dataset contains 14453 photos, of which 10685 are printed photos and 3768 photos of living people. There are 10 unique living faces in total, 47 spoof attack objects. Each spoof attack is presented in the form of a printed sheet, cropped along the contour of the face and a face with eyes cut out. Also, when collecting a data set, we tried to simulate the approach of the object to the face reader, so each object corresponds to a set of images for different distances from the reading device. Each such set of images was assembled for bright and dark lighting of the room. The data are not publicly available due to their containing information that could compromise the privacy of research participants. About Metrics Analysis. The main difficulty is the correct selection and interpretation of metrics, based on the fact that it is trivial enough to get good metrics on one dataset for the task of face-anti-spoofing, we decided to measure the quality of the model on sets that did not participate in the training in any way. To do this, we will use our dataset and CASIA-MFSD. To complete view, we will also provide f-score metrics on the test part of CelebA-Spoof. For CASIAMSD, we will also provide advanced metrics for ACER, BPCER and ACER to compare our results with the AENet model [16], which is trained by the authors of CelebA-Spoof. We would like to note that choosing an epoch is also a nontrivial task, because we have not yet been able to determine which set of metrics gives the optimal result, to choose the best model, we use summary metrics for data sets and live testing on our equipment.
Face Padding as a Domain Generalization for Face Anti-spoofing
4 4.1
49
Results Only Face Model
As a basic model with which we will compare the results, we also use Mobile Net V2, the output of the detector will be fed to the input of the model. Augmentation, the data set during training are the same for all our models, they will differ by the method of extracting the face and its complement, as well as subsequent processing by quartering or blurring a quarter of the image. This approach makes it possible to show the advantages of our methods in comparison with training only on a person’s face. As can be seen from Graphs 3, the model learns quite quickly on the CelebA-Spoof dataset, but on our dataset the model performs quite poorly, which is most likely due to the complexity of our dataset. For CASIA-MFSD, the model behaves stably and the approach of blurring a quarter of the face even gives a small positive result.
Fig. 2. The results are for a model who was trained only on the face. From left to right, training without additional methods, using the quarter blur method and using the image quartering method. The first row, the results on our dataset, the second on CASIA-MFSD. For our dataset, the f-score metric is presented for three threshold values, no tr means threshold equals 0.5.
As can be seen from the Fig. 2, on our dataset the model shows itself quite poorly, which is most likely due to the complexity of our dataset. For CASIAMFSD, the model behaves stably and the quarter-face blur approach even gives a small positive result.
50
4.2
R. Zainulin et al.
Default Padding Model
Since a square image is fed to the network input, we will bring the coordinates of the face detector so that the geometry of the face is not distorted, i.e. if x1 − x0 = y1 − y0 , we supplement the smaller of these values to the larger one by shifting the coordinates, so that x1 − x0 = y1 − y0 , usually i.e.to the face is elongated along the y axis, the x coordinates change. Consider the approach of the usual complement, for this we will introduce the parameter pad scale which will reflect how many times the area that the face detector has found will be enlarged. For an image of size W × H and face coordinates x0 , y0 , x1 , y1 that satisfy the condition x1 − x0 = y1 − y0 , the usual padding will be the area x0 pad, y0 pad, x1 pad, y1 pad. Let wf ace = x1 − x0 , hf ace = y1 − y0 wf ace · pad scale ), x0 pad = x0 − ( 2 hf ace · pad scale ), y0 pad = y0 − ( 2 wf ace · pad scale ), x1 pad = x1 + ( 2 hf ace · pad scale ). y1 pad = y1 + ( 2
Fig. 3. Results for a model that was trained only on an image of a face with an addition. From left to right, training without additional methods, using the quarter blur method and using the image quartering method. The first row, the results on our dataset, the second on CASIA-MFSD. For our dataset, the f-score metric is presented for three threshold values, no tr means threshold equals 0.5.
Face Padding as a Domain Generalization for Face Anti-spoofing
51
As can be seen from the Fig. 3, a model trained using face augmentation has better results on our dataset, on CASIA-MFSD it is more clearly seen that the f-score lies higher than for a model that used only a face. The quartering and blurring approaches do not give the increase in all scores. 4.3
Adaptive Padding Model
The difference between adaptive augmentation is that first the parameter f ace percent ∈ [0, 1] is set, which reflects which part of the face will occupy in the augmented image, i.e. f ace percent =
(x1 − x0 ) · (y1 − y0 ) (x1 pad − x0 pad) · (y1 pad − y0 pad)
If the face initially has a percentage of the occupied area higher than f ace percent, then we assume that the image satisfies our condition, if not, then we calculate x1 pad, x0 pad, y1 pad, y0 pad so that they satisfy the formula above. Thus, we get an image in which the face is guaranteed to occupy a percentage not lower than the specified f ace percent.
Fig. 4. Results for a model that was trained only on an image of a face with adaptive padding. From left to right, training without additional methods, using the quarter blur method and using the image quartering method. The first row, the results on our dataset, the second on CASIA-MFSD. For our dataset, the f-score metric is presented for three threshold values, no tr means threshold equals 0.5.
As can be seen from the Fig. 4, the adaptive face augmentation approach shows the best results, the effect of the quartering and blurring approaches gives a good increase in scores. On our dataset, the results have not improved much, but on CASIA-MFSD, a significant improvement in all metrics is visible, here we especially want to pay attention to the quartering method, which makes it possible to achieve a stable HTER of less than 0.2 starting from the 22nd epoch.
52
4.4
R. Zainulin et al.
Compare Results with AENet
In order to compare the results of our models, we chose the AE Net model, which was also trained on the CelebA-Spoof dataset and provided by the authors in the public domain. We chose the AENet model [16] from the authors of CelebASpoof, it was trained on the same dataset and is based on the Resnet18 architecture, which has three times more parameters than our MobileNetV2. It is worth noting an important difference, AENet also uses semantic information about the type of attack and the light environment of the image that CelebA-Spoof contains for training, they also additionally use two convolutional layers after the network exits and upsampling to obtain geometric information [16]. The results on CASIA-MFSD are provided by the authors in the article, We also personally calculated the HTER metric for AENet on CASIA-MFSD and got a result of 37.8 much worse than the one provided in the article. We contacted the authors of the article in order to analyze the result, during the discussion and a more detailed review of the methodology for verifying the results, we found out that such a striking difference in the results may be due to the fact that the authors of AENet calculate the threshold of the model by finding the EER balance on the CASIA-MFSD test part, we also used a threshold equal to 0.5. We believe that such a strong influence of the threshold selection on the results of the network introduces significant limitations and affects the stability during the transition to new data sets. Our approach to network training, which has 3 times fewer parameters, gives a more stable result when used on different datasets and does not require additional configuration for the test part (Table 1). Table 1. Comparison MobilenetV2 and AENet results. Model
Training
AENet
CelebA-Spoof CASIA-MFSD 11.9
Testing
HTER (%)↓
MobileNetV2 CelebA-Spoof CASIA-MFSD 19.5
Fig. 5. Heat map GradCAM visualization a model that has been trained on three different data with quartering method. For real and spoof source image.
Face Padding as a Domain Generalization for Face Anti-spoofing
5
53
Conclusions
Our unspoken goal was to get a model that would pay attention to those areas that a person pays attention to. We managed to achieve this effect for most of the instances that we tested using GradCAM [11] visualization. You can see an example of visualization in Fig. 5. On this graph, we can observe that the areas that lie on the border of the human face and its environment make the greatest contribution to decision-making. Thus, we can say that our approach allows us to use additional information to make a decision, and the adaptive padding method does this better than the ordinary padding. Our article presents an effective method of domain generalization based on face padding, the use of such a method together with other ways to solve the problem of face anti-spoofing seems promising. In the future, we plan to continue exploring the possibility of using face padding for the task of face-anti-spoofing, by expanding experiments. Also interesting is the task of choosing the optimal threshold to keep a high degree of security without a large number of failures to living persons.
References 1. Boulkenafet, Z., Komulainen, J., Hadid, A.: Face anti-spoofing based on color texture analysis. IEEE International Conference on Image Processing (ICIP), pp. 2636–2640 (2015) 2. Boulkenafet, Z., Komulainen, J., Li, L., Feng, X., Hadid, A.: OULU-NPU: a mobile face presentation attack database with real-world variations. In: 12th IEEE International Conference on Automatic Face Gesture Recognition (FG 2017), pp. 612– 618 (2017). https://doi.org/10.1109/FG.2017.77 3. Deng, J., Guo, J., Xue, N., Zafeiriou, S.: ArcFace: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4690–4699 (2019) 4. Feng, H., et al.: Learning generalized spoof cues for face anti-spoofing (2020). https://doi.org/10.48550/ARXIV.2005.03922. https://arxiv.org/abs/2005.03922 5. de Freitas Pereira, T., Anjos, A., De Martino, J.M., Marcel, S.: LBP – TOP based countermeasure against face spoofing attacks. In: Park, J.-I., Kim, J. (eds.) ACCV 2012. LNCS, vol. 7728, pp. 121–132. Springer, Heidelberg (2013). https://doi.org/ 10.1007/978-3-642-37410-4 11 6. Komulainen, J., Hadid, A.: Context based face anti-spoofing. IEEE Sixth International Conference on Biometrics: theory, Applications and Systems (BTAS), pp. 1– 8 (2013) 7. Kumar, S., Singh, S., Kumar, J.: A comparative study on face spoofing attacks. In: 2017 International Conference on Computing, Communication and Automation (ICCCA), pp. 1104–1108 (2017). https://doi.org/10.1109/CCAA.2017.8229961 8. Li, L., Feng, X., Boulkenafet, Z., Xia, Z., Li, M., Hadid, A.: An original face antispoofing approach using partial convolutional neural network. In: 2016 Sixth International Conference on Image Processing Theory, Tools and Applications (IPTA), pp. 1–6 (2016). https://doi.org/10.1109/IPTA.2016.7821013
54
R. Zainulin et al.
9. Liu, W., Wen, Y., Yu, Z., Li, M., Raj, B., Song, L.: SphereFace: deep hypersphere embedding for face recognition. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6738–6746 (2017). https://doi.org/10.1109/ CVPR.2017.713 10. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv2: inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018). https://doi.org/10. 1109/CVPR.2018.00474 11. Selvaraju, R.R., Das, A., Vedantam, R., Cogswell, M., Parikh, D., Batra, D.: Grad-cam: Why did you say that? visual explanations from deep networks via gradient-based localization. CoRR abs/1610.02391 (2016). http://arxiv.org/abs/ 1610.02391 12. Shi, Y., Jain, A.: Probabilistic face embeddings. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6901–6910 (2019). https://doi.org/ 10.1109/ICCV.2019.00700 13. Wen, D., Han, H., Jain, A.K.: Face spoof detection with image distortion analysis. IEEE Trans. Inf. Forensics Secur. 10(4), 746–761 (2015). https://doi.org/10.1109/ TIFS.2015.2400395 14. Yu, Z., Li, X., Shi, J., Xia, Z., Zhao, G.: Revisiting pixel-wise supervision for face anti-spoofing. IEEE Trans. Biometrics Behavior Identity Sci. 3(3), 285–295 (2021). https://doi.org/10.1109/TBIOM.2021.3065526 15. Zhang, K.Y., et al.: Structure destruction and content combination for face antispoofing. In: 2021 IEEE International Joint Conference on Biometrics (IJCB), pp. 1–6 (2021). https://doi.org/10.1109/IJCB52358.2021.9484395 16. Zhang, Y., et al.: CelebA-Spoof: large-scale face anti-spoofing dataset with rich annotations. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 70–85. Springer, Cham (2020). https://doi.org/10. 1007/978-3-030-58610-2 5 17. Zhang, Z., Yan, J., Liu, S., Lei, Z., Yi, D., Li, S.Z.: A face antispoofing database with diverse attacks. In: 2012 5th IAPR International Conference on Biometrics (ICB), pp. 26–31 (2012). https://doi.org/10.1109/ICB.2012.6199754
Age-Related Macular Degeneration Using Deep Neural Network Technique and PSO: A Methodology Approach F. Ajesh1,2(B)
and Ajith Abraham3,4(B)
1 Machine Intelligence Research Labs (MIR Labs) Scientific Network for Innovation and Research Excellence 11, 3rd Street NW, P.O. Box 2259, Auburn, Washington 98071, USA [email protected] 2 Department of Computer Science and Engineering, Sree Buddha College of Engineering, Alappuzha, Kerala, India 3 Machine Intelligence Research Labs (MIR Labs), Auburn, WA 98071, USA [email protected] 4 Center for Artificial Intelligence, Innopolis University, Innopolis, Russia
Abstract. Age-Related Macular Degeneration (AMD), is an eye issue, that can cause central vision to blur. It happens when the macula, the part of the eye that controls accurate, straight-ahead vision, sustains degradation with ageing. The macula is a component of the retinal (the light-sensitive tissue at the back of the eye). Additionally, it is the main reason why older people lose their vision. Curing this issue as early as possible is one of the most tedious tasks for medical experts. In this research, AMD is effectively detected using a deep learning model, and the phases are as follows: a) Participants between the ages of 55 and 80 are included in the data collection from the Age-Related Eye Disease Study (AREDS) repository b) Preprocessing employing a median filter to increase the contrast, c) Extracting Biologically Inspired Features (BIF) using focal region extraction, d) feature selection for dimensionality reduction and finally e) detection of AMD using Alexnet network. Experimental evaluation is conducted over various stateof-art models under various measures in which proposed network outperforms (accuracy: 0.96, detection rate: 0.94, sensitivity: 0.97, specificity: 0.97). Keywords: Age-related macular degeneration · The Age-Related Eye Disease Study · Biologically inspired features · Deep learning · Detection · Feature extraction · Feature selection · Machine learning
1 Introduction An accurate pre-diagnosis can be obtained by looking at the structural components of the retina and the variables that affect how it is designed, therefore the automatic recognition of retinal vessels is an amount of research that has drawn an expanding amount of attention. Retinal scanning can be used to identify problems associated with negative outcomes like age-related macular degeneration, diabetes mellitus, hypertension, and glaucoma [1]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 55–64, 2023. https://doi.org/10.1007/978-3-031-35501-1_6
56
F. Ajesh and A. Abraham
Nowadays, Age-Related Macular Degeneration (AMD), a chronic, degenerative condition, is to blame for the bulk of blindness instances in elderly people. Late AMD damages the central region of the retina and results in blindness when new blood capillaries form. Early AMD does not result in visual loss. AMD is tightly linked to drusen, a targeted buildup of extracellular material that occurs in the region between the retinal pigment epithelium and the innermost collagen fibrils layer of Bruch’s membranes. The position and segmentation of drusen are used to pre-diagnose AMD [7–9], and [10]. Gradient dynamic allocation methods are one technique for identifying and segmenting drusen. Once the borders of the drusen are identified, suspicious objects are removed using the connected component labelling procedure. The border is then clearly defined by joining all of the recognized pixels together using an edge-linking approach [7]. Convolutional neural networks were suggested by Burlina et al. [10] for the creation of an algorithm that could evaluate the severity of AMD in the persons in question. Employing deep neural networks, Lee et al. [9] correctly recognized a dataset of 101,002 optical coherence tomography (OCT) images with an efficiency of 87.63%. Before fragmenting the retinal pigment epithelium and removing the retinal nerve fibre layer [7], a filter is employed to reduce noise. The following are the goals of this article, in which the deep learning model for identifying AMD with high accuracy is considered; • Current state-of-the-art models don’t include several steps that subsequently enhance model identification. • In terms of steps like preprocessing, extraction of features, and feature selection, the suggested deep learning model improved. • The extraction of biological features from images gives another dimension of effect to the network as well. • The findings of the tests demonstrated that the suggested model outperformed other state-of-the-art models substantially. Organization of Paper: The balance of the study is organized as follows because AMD and deep learning were already introduced in Part 1. Part two presents the overall approach, Part three presents the performance evaluation, and Part four concludes with recommendations.
2 Methodology The suggested methodology’s core architecture, which is represented in Fig. 1, is comprised of the following parts. Data was gathered from the well-known database The Age-Related Eye Disease Study (AREDS), where people between the ages of 50 and 80 were enrolled and more than 5000 fundus photographs were ready to be processed. As soon as the data was acquired, it was subjected to b) preprocessing to remove noise and enhance the input images for higher-quality evaluation. These preprocessed pictures proceed via c) extracting the features, which makes use of focus region retrieval. BIF extraction is also done, and the results were provided for d) feature selection using
AMD Using Deep Neural Network Technique and PSO
57
Particle Swarm Optimization (PSO) for dimensionality reduction and finally passed to e) detection stage where with help of the Densenet network AMD gets detected most effectively and accurately. Each stage is described in the following sections briefly.
AREDS dataset (Fundus image)
Preprocessing
Feature extracon: BIF
Contrast improvement
Median filter
Feature selecon: PSO
Neural network for detecon
Evaluaon Measures
Train + Test data Outcomes
Fig. 1. The overall architecture of the proposed framework
2.1 Data Collection We used colour fundus images from the AREDS database to assess the performance of the deep learning technique. The AREDS dataset is lengthy research funded by the NIH that followed a lot of patients for up to 12 years (median time from enrollment to enrollment: 6.5 years) [11, 12]. The participants in the study underwent routine ophthalmological examinations, during which time photos of the left and right fundus were obtained, and the severity of AMD was then graded by specialists at US grading facilities. Each image received a score between 1 and 4, with 1 signifying no evidence of AMD, 2 suggesting early signs of AMD, 3 indicating intermediate symptoms of AMD, and 4 representing one of the extreme models of AMD (Fig. 2). Members in the AREDS experiment had to be between the years of 55 and 80 at the time of recruitment, and they had to be free of any diseases or conditions that may create long follow-up or commitment to investigating treatments unexpected or hard. Depending on fundus photographs graded by a centralized reading centre, greatest visual acuity, and ophthalmologic tests, 4,757 people were enrolled in one of many AMD categories, including those who had no AMD. The number of fundus images is shown in Table 1. 2.2 Preprocessing After collecting, datasets undergo preprocessing to boost the image quality. To address the brightness drift caused by the fundus camera’s powerful burst of light and the eye’s geometry, picture normalizing is necessary. To improve the fundus photos’ capability to convey characteristics, contrast augmentation is also required.
58
F. Ajesh and A. Abraham
Fig. 2. Overall fundus image from AREDS dataset a) Cat-1, b) Cat-2, c) Cat-3, d) Cat-4
Table 1. Number of fundus images in each class Classes
Number of images
No AMD
36135
Early stage
28495
Intermediate
36392
Advanced AMD
15853
To give a fair comparison with their results, we applied the same preprocessing methods as suggested in [12]. It was first observed that the retina has a square-shaped Region Of Interest (ROI). After that, the green channel was set up and cleared. That used a median filter with kernels of a size equivalent to one-fourth of the ROI size, and the background luminance was calculated. The greenish channels were later removed from the final image by editing the filtering image. To boost contrast, the green scores were multiplied by two after being modified for presentation by the mean of their intensity level (Fig. 3).
Fig. 3. ROI that corresponds to the square outlined in the retina’s concentric circle and is the outcome of brightness normalization and contrast stretching applied to the image channel during preprocessing
AMD Using Deep Neural Network Technique and PSO
59
2.3 Feature Extraction Extraction of the BIF at random from the entire image would be wasteful and expensive in terms of computing. According to clinical practice, ophthalmologists evaluate a 2DD 2DD zone with the macula at its centre to diagnose AMD. We, therefore, extract the 2DD 2DD area first. The retina’s macula is a physiological component. It can be found on either the right or left side of the Optic Disc (OD) in both the left and right eyes. We find the macula using OD after knowing this information. Locating the OD and defining its boundaries uses the OD detection and segmentation technique described in the eye-side calculation is the next step. Despite having lower intensity values, the left half of the OD in a typical retinal fundus image of the left eye exhibits more blood vessels than the right half. Additionally, the OD can commonly be seen in the left half of the left eye’s retinal scans, allowing field II pictures to collect both the macula and OD for the drusen examination. In the right eye, these characteristics are inverted. Assuming the image has M N pixels and the OD centre is at (xc, yc), the following computation is done: 1. The Il and Ir numbers each reflect the median illumination of images in the left and right portions of the OD, respectfully. 2. The left and right half of the OD’s blood vessel areas, alternately designated as Vl and Vr. 3. The horizontal distance between the image and OD centres, Ld = N/2 yc. (xc + xd, yc + yd) or (xc − xd, yc − yd) where, for a left or right eye (xd, yd) is the usual comparative deviation between the OD and the macula as determined from a database of 5000 images. As a result, the following entropies were recovered in this work: Yager (y), Renyi (r), Kapoor (kp), Shannon (sh), Fuzzy (f) [13], and Fuzzy (f). 2.4 Feature Selection The selection of characteristics is a difficult and intricate procedure. In the presence of other traits, a relatively significant aspect may lose all of its significance. As a consequence, for a group of the desired image to feed into a classifier, they must complement one another. Good classification performance is also largely determined by feature extraction [68]. As a result, we used the proposed method to choose an ideal set of characteristics rather than feature reduction techniques. In comparison to other strategies, PSO has proven to be a successful feature selection method. 2.5 Detection of AMD We employed the AlexNet (Fig. 5) DCNN model to tackle the ascribable AMD classification task. This model uses training to optimize the weights of all network layers. During the training process, the weights of more than 61 million convolutional filters were tuned. These systems also contained dropout, rectified linear unit activation, and contrast normalization phases in addition to the previously mentioned layers. The dropout
60
F. Ajesh and A. Abraham
stage involved deliberately changing some of the output units to 0 to encourage functional redundancy in the networks and serve as a generalization (selected at random). Our strategy made use of the Keras and TensorFlow DL libraries. Stochastic gradient descent with a Nesterov momentum was used, with a learning phase speed of 0.001 (Fig. 4).
Inialize the groups for each parcle
Move the parcle by exploing the flexibility of the PSO algorithm
Find the best threshold for the groups of each parcle using Otsu funcon
Remove the small components i++ Recombine the groups
Has the soluon reach the
No
Yes
Sort the parcles and keep only the best parcles for each
Select the best parcle
Fig. 4. PSO flowchart for AMD detection
Fig. 5. Alexnet architecture
3 Performance Analysis The Ryzen 5/6 series CPU, NV GTX, 1 TB HDD, and Windows 10 OS are used to run the proposed model, together with Google Collaboratory, an open-source Google
AMD Using Deep Neural Network Technique and PSO
61
platform for creating deep learning models. A free Python library for creating models from deep learning is called Pytorch. The model is developed with a learning rate of 0.01 and is optimized with stochastic gradients while also undergoing k-3 cross-validation. The following metrics are assessed while using contemporary models for experimental investigation: F1-score, detection rate, TPR, FPR, and AUC. The effectiveness metrics used to evaluate the suggested model are shown in Table 2. Table 2. Overall performance measures Measures
Description
Accuracy
(TP+TN ) Accuracy = (TP+FP+TN +FN )
Sensitivity
TruePositive Sensitivity = TruePositive+False
Specificity
Specificity = TrueNegatives+FalsePositives
Precision
TruePositive Precision = TruePositive+FalsePositive
Recall
TruePositive Recall = TruePositive+FalseNegative
F1-score
∗ recall F1 = 2 ∗ Precision Precision+recall
Detection rate
TP DetectionRate = TP+FN
TPR
TrueNegativeRate = TNTN +FP * 100
FPR
FalseNegativeRate =
TrueNegatives
FN
TP
∗ 100
+FN
Table 3 depicts the overall analysis of the accuracy, sensitivity, and specificity of the proposed model. Figure 6 depicts the graphical representation of comparison analysis in which proposed model outperforms (accuracy: 0.96, sensitivity: 0.97, specificity: 0.97) over State-of-the-art models (accuracy;0.89, specificity:0.9, sensitivity:0.92). Table 3. Overall analysis under accuracy, sensitivity, specificity Models
Accuracy
Specificity
Sensitivity
Other state-of-art models
0.89
0.9
0.92
Proposed model
0.96
0.97
0.97
Table 4‘s representation of the full evaluation is made up of precise, recalls, and F1-score. Figure 7 illustrates the graphical depiction of models for accuracy, recall, and F1-score, where the suggested model performs better than state-of-the-art approaches (precision: 0.9, recall: 0.83, F1-score: 0.85). (precision: 0.82, recall: 0.8, F1-score: 0.79).
62
F. Ajesh and A. Abraham
Models
1 0.9 0.8 Other state-of-art models Accuracy
Proposed model Sensitivity
Fig. 6. Model vs a) Accuracy, b) Sensitivity, c) Specificity
Table 4. Overall Analysis under precision, recall F1-score Models
Precision
Recall
F1-score
Other state-of-art models
0.82
0.8
0.79
Proposed model
0.9
0.83
0.85
Models
1 0.8 0.6 Other state-of-art models Precision Recall
Proposed model F1 - score
Fig. 7. Models vs a) Precision, b) Recall and c) F1-score
Table 5 shows the total evaluation under the recognition accuracy, TPR, and FPR columns. In the graphical display of models under detection accuracy, TPR, and FPR in Fig. 9, the proposed model outperforms state-of-the-art models (detection rate: 0.94, TPR: 0.9, and FPR: 0.1). (detection rate = 0.87, TPR = 0.85, and FPR = 0.15). The AUC curve is graphically represented in Fig. 9, which demonstrates that the suggested model outperforms competing approaches because it includes more periods, which enhances performance. Figure 10 depicts the training and testing over the epoch range (Fig. 8). Table 5. Overall analysis under precision, recall F1-score Models
Detection rate
TPR
FPR
Other state-of-art models
0.94
0.9
0.1
Proposed model
0.87
0.85
0.15
AMD Using Deep Neural Network Technique and PSO
63
Models
1 0.5 0 Other state-of-art models Proposed model Detection rate TPR FPR Fig. 8. Models vs a) Detection rate, b) TPR c) FPR
Fig. 9. Models vs AUC score
Fig. 10. Models vs epoch range for testing and training valuation.
4 Conclusion In this research, combining highlights chosen by the PSO algorithm and Alexnet classifiers, we proposed a unique technique for diagnosing AMD with a 96% overall accuracy. Likewise, this is the principal endeavour to separate the fundus pictures into typical, dry ARMD and wet ARMD classes. The promising presentation of the proposed calculation can help ophthalmologists in their conclusion of ARMD. Our framework will distinguish the class of ARMD precisely and thus legitimate treatment can be given to obstruct the movement of the illness. The proposed framework is assembled utilizing advanced instruments which assist with working on the general calculation. Such a robotized ARMD screening framework can be utilized in yearly mass eye screening coordinated for elderly people. Future works will be the integration of other advanced networks to create a hybrid form along with deep learning stages, to improve the overall performance. Acknowledgement. This research has been financially supported by The Analytical Center for the Government of the Russian Federation (Agreement No. 70-2021-00143 dd. 01.11.2021, IGK 000000D730321P5Q0002). Authors acknowledge the technical support and review feedback from AILSIA symposium held in conjunction with the 22nd International Conference on Intelligent Systems Design and Applications (ISDA 2022).
64
F. Ajesh and A. Abraham
References 1. Schlegl, T., et al.: Fully automated detection and quantification of macular fluid in OCT using deep learning. Ophthalmology 125(4), 549–558 (2018) 2. Perepelkina, T., Fulton, A.B.: Artificial intelligence (AI) applications for age-related macular degeneration (AMD) and other retinal dystrophies. In: Seminars in Ophthalmology, vol. 36, no. 4, pp. 304–309. Taylor & Francis (May 2021) 3. Müller, P.L., et al.: Reliability of retinal pathology quantification in age-related macular degeneration: implications for clinical trials and machine learning applications. Transl. Vis. Sci. Technol. 10(3), 4 (2021) 4. Quellec, G., et al.: Feasibility of support vector machine learning in age-related macular degeneration using small sample yielding sparse optical coherence tomography data. Acta Ophthalmol. 97(5), e719–e728 (2019) 5. Mookiah, M.R.K., et al.: Automated diagnosis of age-related macular degeneration using greyscale features from digital fundus images. Comput. Biol. Med. 53, 55–64 (2014) 6. Schranz, M., et al.: Correlation of vascular and fluid-related parameters in neovascular agerelated macular degeneration using deep learning. Acta Ophthalmol. 101, e95–e105 (2022) 7. Fang, V., Gomez-Caraballo, M., Lad, E.M.: Biomarkers for nonexudative age-related macular degeneration and relevance for clinical trials: a systematic review. Mol. Diagn. Ther. 25(6), 691–713 (2021). https://doi.org/10.1007/s40291-021-00551-5 8. Thomas, A., Harikrishnan, P.M., Gopi, V.P.: FunNet: a deep learning network for the detection of age-related macular degeneration. In: Edge-Of-Things in Personalized Healthcare Support Systems, pp. 157–172. Academic Press (2022) 9. Lee, C.S., Baughman, D.M., Lee, A.Y.: Deep learning is effective for classifying normal versus age-related macular degeneration OCT images. Ophthalmol. Retina 1(4), 322–327 (2017) 10. Burlina, P., Freund, D.E., Joshi, N., Wolfson, Y., Bressler, N.M.: Detection of age-related macular degeneration via deep learning. In: 2016 IEEE 13th International Symposium on Biomedical Imaging (ISBI), pp. 184–188. IEEE (April 2016) 11. Peng, Y., et al.: DeepSeeNet: a deep learning model for automated classification of patient-based age-related macular degeneration severity from colour fundus photographs. Ophthalmology 126(4), 565–575 (2019) 12. González-Gonzalo, C., et al.: Evaluation of a deep learning system for the joint automated detection of diabetic retinopathy and age-related macular degeneration. Acta Ophthalmol. 98(4), 368–377 (2020) 13. Kosko, B.: Fuzzy entropy and conditioning. Inf. Sci. 40, 165–174 (1986)
Inclusive Review on Extractive and Abstractive Text Summarization: Taxonomy, Datasets, Techniques and Challenges Gitanjali Mishra1(B) , Nilambar Sethi1 , and L. Agilandeeswari2(B) 1 GIET University, Gunupur 765022, Odisha, India [email protected], [email protected] 2 School of Information Technology and Engineering, VIT Vellore, Vellore 632014, TN, India [email protected]
Abstract. Condensing a lengthy text into a manageable length while maintaining the essential informational components and the meaning of the content is known as summarization. Manual text summarizing is a time-consuming and generally arduous activity that is becoming more and more popular, which is a major driving force behind academic research. Automatic Text summarization (ATS) has significant uses in a variety of Natural Language Processing (NLP) related activities, including text classification, question answering, summarizing legal texts, and news, and creating headlines. This is an emerging research field where most researchers are involved from popular companies namely, Google, Microsoft, Facebook, etc. This motivates us to present an inclusive review of extractive and abstractive summarization techniques for various inputs. In this paper, we are presenting a comparative study of different models, classified based on their techniques used. We have also classified them based on the dataset used at some places for better under-standing and the parametric evaluation of these techniques and their challenges have also been presented. Thus, the study presents a clear-cut view of the happenings of text summarization techniques and provides a roadmap for new re-searchers in this field. Keywords: Automatic Text Summarization (ATS) · Natural Language Processing (NLP) · Extractive summarization · Abstractive summarization · input documents
1 Introduction For solving the problem of ever-increasing data in an unstructured way, we would need the information from any of those data at any time to acquire knowledge. But dealing with this vast amount of data is not easy. For that, we need to navigate those data first. We need to reduce the data size. Data can be in different forms such as text, image, video, etc. but most of the data are in form of text. Therefore, text summarization can work on that. Extracting the most essential and salient part from a given large text content © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 65–80, 2023. https://doi.org/10.1007/978-3-031-35501-1_7
66
G. Mishra et al.
is known as text summarization. In spite of the fact that the endeavors to create automatic summaries started 50 years back, ongoing programmed Text Summarization has encountered exponential development because of new technological advances. Manual text summarization is impossible for this huge amount of data and thus Automatic Text summarization (ATS) plays a major role. This ATS is classified into several types on the basis of their inputs. The major classification of ATS is based on the summary generation, i.e., if the summary is generated based on the sentence extraction from the input text document verbatim. This extractive summarization won’t involve any condensing of inputs in any format. It is mainly dealt with picking up sentences that convey meaningful in-formation to generate a summary. On contrary, Abstractive summarization focused on creating a summary based on an understanding of the key ideas and de-tails found in the original text. In order to analyze and comprehend the text and to look for new terms, linguistics tools are also utilized in abstractive summarization. The various taxonomy of this ATS is presented in Fig. 1 on the basis of the following, (i) Genre, (ii) Input document frequency, (iii) Summary Types, (iv) Languages used, (v) Purpose of Summary, and (vi) Methods used to Summarize.
Fig. 1. Taxonomy of text summarization
The rest of the article is elaborated with the various taxonomy of text summarization as, Sect. 2 details the various genres and its dataset. Section 3 delivers the evaluation metrics, Sect. 4 enlightens the summarization based on frequency of input documents. Section 5 explains the types of summary. Section 6 clarifies the languages used. Section 7 and 8 explicates the purpose of summary and various methods used for summary. Finally, conclusions and future directions are presented in Sect. 9.
2 Various Genres and Its Dataset Text summarization is required for various genres in daily life, namely, (i) short documents [1], including news articles, digests (summaries of stories on the same topic),
Inclusive Review on Extractive and Abstractive Text Summarization
67
highlights (summaries of some events), and feedback of any product, book, music, and movie. (ii) Long documents [2] such as abstracts of research papers (arXiv), biomedical texts (PubMed), educational texts, novels (books, or magazines) [3], online blogs, and debate forums [4] as shown in Fig. 2.
Fig. 2. Various genres
2.1 News Articles CNN/Daily Mail: The English-language dataset particularly the combined form of CNN and Daily Mail is just over 300,000 unique news stories that were authored by journalists for CNN and the Daily Mail. For more instances of data to achieve superior training, the datasets CNN and Daily Mail are united. Although the initial version was developed for automated reading, comprehension, and abstractive question answering, the current version supports both extractive and abstractive summarization. As per https://huggingface.co/datasets/viewer/?dataset=cnn_dailymail&config=3.0.0, the proposed models also use the average count of the token for the articles and the highlights are 781 and 56 respectively. BBC News: This dataset was made using a dataset for data categorization from the 2004–2005 work by D. Greene and P. Cunningham [5] which consist of 2225 documents from the BBC news website relating to the news of five topical divisions (http://mlg. ucd.ie/datasets/bbc.html). DUC: Document Understanding Conference (DUC) Datasets are generated by the National Institute of Standards and Technology (NIST). The DUC corpus from 2002, 2003, 2004, 2006, and 2007 are used for evaluation and its URL is given as http://wwwnlpir.nist.gov/projects/duc/data.
2.2 Research Papers arXiv: There are more than 2 million academic publications in eight subject areas in the arXiv preprint database’s complete corpus as of the arXiv annual report 2021.
68
G. Mishra et al.
2.3 Biomedical Texts The popular biomedical literature datasets which are long documents namely S2ORC [6], PubMed [7], and CORD-19 [8]. S2ORC: S2ORC is a dataset made available to the public that contains articles from scientific journals in fields including biology, medicine, and computer science. PubMed: PubMed is a frequently utilized dataset that includes scientific publications from the biomedical field. The study [7] that made the original Pub-Med-Long dataset available used each document’s abstract as its gold summary and the entire text as its input. Recent studies have modified this dataset, which we refer to as Pub-Med-Short, by using just the document’s introduction as the input [9]. CORD-19: Scientific works on COVID-19 are included in the free dataset CORD19. The COVID-19 Open Scientific Dataset was created in response to the COVID-19 epidemic by the White House and a coalition of top research organizations (CORD19). The CORD-19 database contains more than a mil-lion research publications about COVID-19, SARS-CoV-2, and similar corona-viruses, including more than 400,000 fulltext articles. Some of the existing research related to this long documents are summarized in Table 1. Table 1. Existing works for long documents Ref
Methodology
Advantage
Limitation
Dataset
Metrics
[10]
A novel neural single-document extractive summarization model
In a parameter lean and modular architecture the proposed method is giving the best findings on neural networks
Need to combine the traditional and neural network to get the effective results
arXiv dataset and Pubmed dataset
arXiv dataset R1 - 43.58, R2-17.37 RL: 29.30, METEOR: 21.71 Pubmed dataset R1: 44.81, R2: 19.74, R3: 31.48, METEOR: 20.83
[11]
MemSum (Multi-step Episodic Markov decision process extractive SUMmarizer)
The proposed MemSum’s method is giving high quality and low redundancy of the generated summaries
The proposed method should Improve the fluency of the extracted summaries
PubMed, PubMedtrunc, arXiv, and GovReport
PubMed - R1: 49.25, R2: 22.94, RL: 44.42, PubMedtrunc R1: 43.08, R2: 16.71, RL: 38.30 arXiv - R1: 48.42, R2: 20.30, RL: 42.54
Inclusive Review on Extractive and Abstractive Text Summarization
69
3 Metrics Used for Evaluation The common quantitative metric for measuring the summarization model is the Recall Oriented Understudy for Gisting Evaluation (ROUGE) [12]. It is an evaluation tool which is co-selection based, which involves counting overlapping units between the candidate summary and other human-generated summaries, such as the n-gram (ROUGE –N), Longest Common Subsequence (ROUGE – L or RL), and Weighted Longest Common Subsequence (ROUGE - W), the quality of the created summary is assessed using this metric. ROUGE - N is a formal n-gram recall measure between the generated summary and the ground truth summary (N = 2 in our studies). If N = 1, then ROUGE – 1 (R1) is obtained called Unigrams and Bigrams can be obtained with N = 2 as ROUGE – 2 (R2). S∈{Ground truth Summaries} gramn ∈S Count match gramn (1) ROUGE − N = S∈{Ground truth Summaries} gramn ∈S Count gramn Where, n represents the n-gram’s length gramn is the maximum co-occurring n-grams in the generated summary. Count match (gramn ) is the maximum number of co-occurring n-grams in a set of reference summaries, and Count is the total n-grams in the reference summaries.
4 Input Document Frequency It can be categorized as a single document or multi-document summarization depending on the size of input sources used. Single-document summarizing refers to the process of generating a text summary from a single document, as opposed to multi-document summation, which uses multiple documents as input. There are certain problems that could lower the caliber of text summaries, particularly in multi-document summarizing where pertinent data is dispersed over numerous inputs [12]. For instance, searching the MEDLINE biomedical literature database for articles about a certain ailment may return hundreds of documents. In this situation, all of the retrieved articles may include crucial information, so it is necessary to give the user a summary of the key points contained in the papers. As a result, the summary should, to the greatest extent feasible, contain several crucial subjects [13, 14] (Table 2).
70
G. Mishra et al. Table 2. Existing system based on input document frequency
Ref
Methodology
Advantage
Limitation
Dataset
Metrics
[15] Archive-based Micro Genetic-2 Algorithm (AMGA2)
The proposed method achieved the superiority in terms of convergence rate and Rouge scores
DUC-2001 DUC-2001 and R1: 0.50, DUC-2002 R2: 0.29 DUC-2002 R1: 0.52, R2: 0.31
[16] Neural network and genetic algorithms
For the different sizes of the documents and different needs of the user’s neural networks can be trained and the genetic algorithms got most precise summary
To find the best fitness function using different machine learning techniques
[17] Optimization strategy
The proposed optimization shows the best performance on objective functions and the similarity/dissimilarity measure between sentence
To solve the DUC2001, multi-document and summarization DUC2002 problem the optimization methods, need to be used
[18] Score based and supervised machine learning techniques
The proposed method shows the best results among all the state-of-the-art methods to summarize the Arabic text
To reflect the weights of the extracted features through the genetic algorithms
Essex R1 - 0.67, Arabic R2 - 0.63, Summaries Corpus (EASC)
[19] Term Frequency-Inverse Document Frequency (TF-IDF)
The proposed TF-IDF is summarizing the automatic text to compare with the existing methods
–
Online text R1: 0.66, summaries R2: 0.672
[20] A modified greedy The proposed method algorithm is giving the superior results
–
DUC’04
DUC2002
Simulation results: 0.6067 Fivefold simulation 0.5593 R1:0.72, R2: 0.67
R1 0.38
(continued)
Inclusive Review on Extractive and Abstractive Text Summarization
71
Table 2. (continued) Ref
Methodology
Advantage
Limitation
Dataset
Metrics
[21] Support vector regression (SVR)
For estimating the importance of the sentences, regression models are giving best results
To get a greater DUC 2005 number of - 2007 features for sentence relations or document structure regression models can be used in different ways
DUC2005 R1 0.0641 R-SU4 0.1208 DUC 2006 R1 0.0834 R-SU4 0.1387
[22] Evaluation framework
Proposed method is giving the best results for multi-document summarization
INTSUMM DUC 2006 systems should be evaluated according to their agreement with subjective and personalized use
R1 for S1 Overall: 74.2 R2 for S1 Overall: 77.1
[23] Modified quantum-inspired genetic algorithm (QIGA)
QIGA is summarizing the extractive text
To improve the text summarization the proposed method can be improved to get the sentiment analysis and co-reference
DUC 2005 and 2007
R1: 0.47, R2: 0.12 R-SU4: 0.1885
[24] Fuzzy logic
The proposed fuzzy logic is extracting the Redundant information in multi-document summarization
In document DUC 2004 summarization and in question answering system sentence ordering is difficult task
R2: 0.15, R4: 0.06
(continued)
72
G. Mishra et al. Table 2. (continued)
Ref
Methodology
Advantage
Limitation
Dataset
Metrics
[25] Fuzzy inference systems
The fuzzy inference system is giving better results than the utilized neural networks
Using deep learning algorithms hands-crafted feature vectors with vectors should be learned for raw data
DUC 2002
R1: 0.65, R2:0.58, RL:0.66
[26] Graph independent sets
The proposed method is giving the best results for the extraction of multi-documents
–
DUC-2002 R1 - 0.38, and R2 - 0.51, DUC-2004 R4 - 0.59
[27] Novel deep learning based method
To achieve significant accuracy the ensemble of statistical models is giving best results
The influence of different similarity measures should be done for the deep learning models
DUC 2002, R1 - 0.46, DUC 2001 R2 - 0.08 and Movie datasets
5 Summary Types The popular summary types include (i) Generic, (ii) Domain Specific, (iii) Query Oriented, (iv) User-focused, and (v) Topic focused. Generic: While domain-independent or generic summarizing uses general features to extract the key passages from the text, Domain-Specific: This summarization uses domain-specific knowledge to generate the summary. Today’s researchers have shifted their focus to domain-specific summary techniques. Query Oriented: It is depending on the query raised by the audience for whom the summary is intended. The different queries will have different summaries. It reacts to user-initiated queries. User-Focused: User-focused techniques develop summaries to respond to the interests of specific users. Topic–Focused: This specific summary offers an extra emphasis on the specific topic of the document (Fig. 3 and Table 3).
Inclusive Review on Extractive and Abstractive Text Summarization
73
Fig. 3. Types of summary
Table 3. Related works based on summary types Generic oriented Article Methodology
Advantages
Limitations
Dataset
Results
CNN/Daily Mail dataset
R1: 39.53, R2: 17.28 RL: 36.38
[28]
Hybrid Reduces Low levels of pointer-generator inaccuracies and abstraction network repetition
[29]
Intra-attention neural network that combines standard supervised word prediction and reinforcement learning (RL)
Better performance for CNN/Daily mail dataset
Improvement is CNN/Daily needed for Mail and New short output York Times sequences
CNN/Daily Mail R1: 41.16, R2: 15.75 RL: 39.08 NYT - R1: 47.22 R2: 30.51, RL: 43.27
[30]
Neural attention Neural Machine transformation
A fully data-driven approach to abstractive sentence summarization that scales to a large amount of training data
An efficient alignment and consistency in generation is needed
Baseline: MOSES + DUC 2004 R1: 26.50 R2: 8.13 RL: 22.85 Gigaword R1: 28.77, R2: 12.10 RL: 26.4
DUC- 2004 s Gigaword
(continued)
74
G. Mishra et al. Table 3. (continued)
Generic oriented Article Methodology
Advantages
Limitations
Dataset
Results
DUC 2005, 2006, 2007
DUC2005 R1: 45.4, R2 = 13.0, RL = 18.3 DUC2007 R1: 46.0, R2: 12.9, RL: 18.5
DUC QFS (DUC 2005, 2006 and 2007)
DUC 2005: R1: 12.65, R2: 1.61, RL: 11.79 DUC 2006 R1–16.34, R2–2.56, RL-14.69 DUC 2007 R1: 17.80, R2: 03.45, RL: 16.38
Query – Oriented [31]
Weakly Supervised Learning model using a transformer and pre-trained BERTSUM
For Single and Multi documents Domain adaptation techniques for solving lack of training data
[32]
Seq2seq with attention model
Incorporates query relevance to pretrained abstractive model-neural seq2seq models with attention mechanism Multi document based query oriented summariztion
[33]
EncoderDecoder with query attention and diversity attention
Introduced a – new query-based summarization dataset building on Debatepedia. The problem of Repetition of phrases is solved
A dataset from R1: 41.26, Debatepedia R2: 18.75, an RL: 40.43 encyclopedia of pro and con arguments and quotes on critical debate topic
Model uses Oracle relevance model, so there is gap between the practice
Topic focused Ref
Methodology
Advantages
Limitations
Dataset
Results
[34]
Oracle score-based Probability distribution of unigrams
An automatic method of summarization is performing good
The oracle score based is working good for task-based summarization
DUC 2005 and R1: 0.35, 2006 R2: 0.09
6 Languages Used Input documents can be monolingual or multilingual in terms of language used (Table 4).
Inclusive Review on Extractive and Abstractive Text Summarization
75
Table 4. Existing systems based on languages used Monolingual Ref
Methodology
Advantage
Limitation
Dataset
Metrics
[35]
Transformer encoder-decoder
The proposed methods are giving the quality for monolingual models
–
Catalan and Spanish newspaper Articles (DACSA)
Catalan Test1, BERT Score: 72.03 Test2, BERT Score: 70.56 Spanish Test1, BERT Score: 73.11 Test2, BERT Score: 71.26
Multilingual [36]
Restricted Boltzmann machine
The features are exploring to improve the relevance of sentences
By adding more features to get more relevant sentences
Hindi and English documents
Recall: 0.83, Precision: 0.87, F-Measure: 0.85
[37]
Genetic programming and linear programming
The proposed system has been achieved good results on bench mark datasets
–
DUC 2002, DUC 2004, and DUC 2007
-
[38]
MUSE (Multilingual Sentence Extractor)
If train this proposed method once, it can be used for two languages
Needs to be improved the similarity-based metrics in the multilingual domain
English and Hebrew
ROUGE-1 English: 0.4461 Hebrew: 0.5921 English + Hebrew: 0.4633
[39]
Enhanced feature With the proposed vector method the text summarization methods are encouraging
The flow of the summarized text is not very smooth
DUC dataset -
[40]
BERT models
For keyword extraction the combination of other methods should be done
CNN/Daily Mail, NYT and XSum
The monolingual models are giving best results to compare with the multilingual models
ROUGE-1: 77.44 ROUGE-2: 52.01
76
G. Mishra et al.
7 Purpose of Summary The summary can be indicative and informative. Indicative: The original document’s metadata is included in the indicative summary. Without incorporating their contents, it gives a general sense of the documents’ main contents. The book’s title pages must be provided in indicative re-ports. In the medical field, an indicative summary can be used to give information on the medical history and state of the patient. It will be advantageous since it will provide readers a general idea of the books’ contents and enable them to consume the content more quickly. It is best suited for Essay, Editorial, Book, etc. Informative: The summary is created through informative text summarizing to communicate the information (primary ideas, points, highlights, etc.) present in the original document. In research articles, where authors summarize their findings. This summary can be utilized in the clinical setting in place of the original patient record. It will assist researchers and medical experts in obtaining more complete information from the appropriate and required paper. It is best suitable for Investigation, Survey and Experiment.
8 Methods Used Whether or not training data is necessary will determine whether a summarization system is supervised or unsupervised. Using labelled data, a supervised system is trained to extract key concepts from documents. Systems that are predominantly unsupervised produce summaries from documents. They don’t rely on any training examples with labels applied by people (Table 5). From the above study, it is clear that for multilingual based summarization, BERT models [40] produced better results and among the methods used combined Graph based models [43] under unsupervised produced better ROUGE score. Thus, the researchers have to analyze the purpose of the summarization and then should choose the appropriate type.
Inclusive Review on Extractive and Abstractive Text Summarization
77
Table 5. Related systems based on supervised and unsupervised methods Supervised Ref
Methodology
Advantage
Limitation
Dataset
Metrics
[41]
BERT models: BERT-base DistilBERT SqueezeBERT
The BERT models are giving best results for extractive summarization
Hyperparameter CNN/DM tuning is needed to generate better summarization performance, and also to produce an extractive summarizer for a medical or academic extractive summarizer a domain specific dataset needs to be used
BERT-base R1: 43.23, R2: 20.24 RL: 39.63 DistilBERT R1: 42.54, R2: 19.53 RL: 38.86 SqueezeBERT R1: 42.54, R2: 19.53 RL: 38.86
R1: 44.5, R2: 24.5, R3: 41.0
Unsupervised [42]
RankSum
The proposed method capturing the multi-dimensional information from the document using keywords, sentence position, sentence embedding and signature topics from the document
RankSum method will use for abstractive text summarization
[43]
Graph based Bengali text documents summarization model
Cannot generate new words
139 samples of human-written abstractive document-summary pairs
CNN/DailyMail and DUC 2002
R1: 61.62 R2: 55.97 RL: 61.09
Supervised and unsupervised [44]
Graph models
The proposed methods are giving the superior results to extract the summarization of text using supervised and unsupervised graph models
To get the prior values for nodes in a graph is to be improved
DUC2001 DUC2002
DUC 2001 R1: 0.44, R2: 0.19 DUC2002 R1: 0.47, R2: 0.21
9 Conclusion and Future Directions This article presents an inclusive survey on extractive and abstractive text summarization mechanism with its taxonomy, datasets, methodologies and challenges in each approach.
78
G. Mishra et al.
From this study, we infer that, the most required genres needs summarization is the research article (arXiv) and biomedical texts (Pub-med) in addition to the News article. In frequency, the multi-documents are more attracted research now-a-days particularly to summarize the patient reports. When dealt with a types of summary, domain specific and query oriented are more needed domain of research when compared to generic oriented. The languages used can be monolingual. The purpose of summary depends on the required field mostly, informative are used particularly in biomedical domain. The methods used can be hybrid mode of both supervised and unsupervised, as the labelled set of data is very little now-a-days and also semi-supervised approaches has its scope in future. Thus, the scope for future research can be in the biomedical domain dealing with a multi-documents which is query oriented, informative and semi-supervised based hybrid extractive and abstractive summarization techniques.
References 1. Mishra, G., Sethi, N., Agilandeeswari, L.: Two phase ensemble learning based extractive summarization for short documents. In Proceedings of the 14th International Conference on Soft Computing and Pattern Recognition (SoCPaR 2022), pp. 129–142. Springer, Cham (2023) 2. Mishra, G., Sethi, N., Agilandeeswari, L.: Fuzzy Bi-GRU based hybrid extractive and abstractive text summarization for long multi-documents. In: Proceedings of the 14th International Conference on Soft Computing and Pattern Recognition (SoCPaR 2022), pp. 153–166. Springer, Cham 3. Wu, Z., Lei, L., Li, G., Huang, H., Zheng, C., Chen, E., et al.: A topic modeling based approach to novel document automatic summarization. Expert Syst. Appl. 84, 12–23 (2017). https:// doi.org/10.1016/j.eswa.2017.04.054 4. Cai, X., Li, W.: A spectral analysis approach to document summarization: clustering and ranking sentences simultaneously. Inf. Sci. 181(18), 3816–3827 (2011). https://doi.org/10. 1016/j.ins.2011.04.052 5. Greene, D., Cunningham, P.: Practical solutions to the problem of diagonal dominance in kernel document clustering. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 377–384 (June 2006) 6. Lo, K., Wang, L.L., Neumann, M., Kinney, R., Weld, D.S.: S2ORC: the semantic scholar open research corpus. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4969–4983 (2020) 7. Cohan, A., et al.: A discourse-aware attention model for abstractive summarization of long documents. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 2 (Short Papers), pp. 615–621 (2018) 8. Wang, L.L., et al.: CORD-19: the COVID-19 open research dataset. In: Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020 (2020) 9. Zhong, M., Liu, P., Chen, Y., Wang, D., Qiu, X., Huang, X.-J.: Extractive summarization as text matching. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 6197–6208 (2020) 10. Xiao, W., Carenini, G.: Extractive summarization of long documents by combining global and local context. arXiv preprint arXiv:1909 (2019) 11. Gu, N., Ash, E., Hahnloser, R.H.: MemSum: Extractive Summarization of Long Documents using Multi-step Episodic Markov Decision Processes. arXiv preprint arXiv:2107 (2021)
Inclusive Review on Extractive and Abstractive Text Summarization
79
12. Lin, C.-Y., Hovy, E.: Automatic evaluation of summaries using n-gram co-occurrence statistics. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 71–78. Association for Computational Linguistics (2003) 13. Ferreira, R., de Souza Cabral, L., Freitas, F., Lins, R.D., de Franca Silva, G., Simske, S.J., et al.: A multi-document summarization system based on statistics and linguistic treatment. Expert Syst. Appl. 41, 5780–5787 (2014) 14. Sankarasubramaniam, Y., Ramanathan, K., Ghosh, S.: Text summarization using Wikipedia. Inf. Process. Manag. 50, 443–461 (2014) 15. Chatterjee, N., Mittal, A., Goyal, S.: Single document extractive text summarization using genetic algorithms. In: Third International Conference on Emerging Applications of Information Technology (2012) 16. Chatterjee, N., Jain, G., Bajwa, G.S.: Single document extractive text summarization using neural networks and genetic algorithm. In: Science and Information Conference (2018) 17. Saini, N., Saha, S., Chakraborty, D., Bhattacharyya, P.: Extractive single document summarization using binary differential evolution: optimization of different sentence quality measures. PLoS ONE 14, e0223477 (2019) 18. Qaroush, A., Farha, I.A., Ghanem, W., Washaha, M., Maali, E.: An efficient single document Arabic text summarization using a combination of statistical and semantic features. J. King Saud Univ.-Comput. Inf. Sci. 33, 677–692 (2021) 19. Christian, H., Agus, M.P., Suhartono, D.: Single document automatic text summarization using term frequency-inverse document frequency (TF-IDF). ComTech: Comput. Math. Eng. Appl. 7, 285–294 (2016) 20. Lin, H., Bilmes, J.: Multi-document summarization via budgeted maximization of submodular functions. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (2010) 21. Ouyang, Y., Li, W., Li, S., Lu, Q.: Applying regression models to query-focused multidocument summarization. Inf. Process. Manag. 47, 227–237 (2011) 22. Shapira, O., Pasunuru, R., Ronen, H., Bansal, M., Amsterdamer, Y., Dagan, I.: Extending multi-document summarization evaluation to the interactive setting. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 657–677 (2021) 23. Mojrian, M., Mirroshandel, S.A.: A novel extractive multi-document text summarization system using quantum-inspired genetic algorithm: MTSQIGA. Expert Syst. Appl. 171, 114555 (2021) 24. Patel, D., Shah, S., Chhinkaniwala, H.: Fuzzy logic based multi document summarization with improved sentence scoring and redundancy removal technique. Expert Syst. Appl. 134, 167–177 (2019) 25. Mutlu, B., Sezer, E.A., Akcayol, M.A.: Multi-document extractive text summarization: a comparative assessment on features. Knowl.-Based Syst. 183, 104848 (2019) 26. Uçkan, T., Karcı, A.: Extractive multi-document text summarization based on graph independent sets. Egypt. Inform. J. 21, 145–157 (2020) 27. Abdi, A., Hasan, S., Shamsuddin, S.M., Idris, N., Piran, J.: A hybrid deep learning architecture for opinion-oriented multi-document summarization based on multi-feature fusion. Knowl.Based Syst. 213, 106658 (2021) 28. See, A., Liu, P.J., Manning, C.D.: Get to the point: summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368 (2017) 29. Paulus, R., Xiong, C., Socher, R.: A deep reinforced model for abstractive summarization. arXiv preprint arXiv:1705.04304 (2017) 30. Rush, A.M., Chopra, S., Weston, J.: A neural attention model for abstractive sentence summarization. arXiv preprint arXiv:1509.00685 (2015)
80
G. Mishra et al.
31. Laskar, M.T., Hoque, E., Huang, J.X.: Domain adaptation with pre-trained transformers for query-focused abstractive text summarization. Comput. Linguist. 48, 279–320 (2022) 32. Baumel, T., Eyal, M., Elhadad, M.: Query focused abstractive summarization: Incorporating query relevance, multi-document coverage, and summary length constraints into seq2seq models. arXiv preprint arXiv:1801.07704 (2018) 33. Nema, P., Khapra, M., Laha, A., Ravindran, B.: Diversity driven attention mod-el for querybased abstractive summarization. arXiv preprint arXiv:1704.08300 (2017) 34. Conroy, J., Schlesinger, J.D., O’leary, D.P.: Topic-focused multi-document summarization using an approximate oracle score. In: Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions (2006) 35. Ahuir, V., Hurtado, L.-F., González, J.Á., Segarra, E.: NASca and NASes: two mono-lingual pre-trained models for abstractive summarization in Catalan and Spanish. Appl. Sci. 11, 9872 (2021) 36. Singh, S.P., Kumar, A., Mangal, A., Singhal, S.: Bilingual automatic text summarization using unsupervised deep learning. In: International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT) (2016) 37. Litvak, M., Vanetik, N., Last, M., Churkin, E.: Museec: a multilingual text summarization tool. In: Proceedings of ACL-2016 System Demonstrations (2016) 38. Litvak, M., Last, M., Friedman, M.: A new approach to improving multilingual summarization using a genetic algorithm. In: 48th Annual Meeting of the Association for Computational Linguistics (2010) 39. Patel, A., Siddiqui, T., Tiwary, U.S.: A language independent approach to multilingual text summarization. Large scale semantic access to content (text, image, video, and sound) (2007) 40. To, H.Q., Nguyen, K.V., Nguyen, N.L.-T., Nguyen, A.G.-T.: Monolingual versus multilingual bertology for Vietnamese extractive multi-document summarization. arXiv preprint arXiv:2108 (2021) 41. Abdel-Salam, S., Rafea, A.: Performance study on extractive text summarization using BERT models. Information 13, 67 (2022) 42. Joshi, A., Fidalgo, E., Alegre, E., Alaiz-Rodriguez, R.: RankSum—an unsupervised extractive text summarization based on rank fusion. Expert Syst. Appl. 200, 116846 (2022) 43. Rayan, C.R., Nayeem, M.T., Mim, T.T., Chowdhury, M., Rahman, S., Jannat, T.: Unsupervised abstractive summarization of Bengali text documents. arXiv preprint arXiv:2102.04490 (2021) 44. Mao, X., Yang, H., Huang, S., Liu, Y., Li, R.: Extractive summarization using supervised and unsupervised learning. Expert Syst. Appl. 133, 173–181 (2019)
COVID-ViT: COVID-19 Detection Method Based on Vision Transformers Luis Balderas1(B) , Miguel Lastra2 , Antonio J. Láinez-Ramos-Bossini3 , and José M. Benítez1 1
2
Department of Computer Science and Artificial Intelligence, DiCITS, DaSCI, University of Granada, 18071 Granada, Spain [email protected] Department of Software Engineering, DiCITS, DaSCI, University of Granada, 18071 Granada, Spain 3 Department of Radiology, Hospital Universitario Virgen de las Nieves, 18014 Granada, Spain Abstract. The Coronavirus Disease 2019 (COVID-19) has had a devastating impact on healthcare systems, requiring improvements to the screening process of infected patients in Emergency Departments, being chest radiography a fundamental approach. This work, thoroughly documenting the development of this system, provides an overview of how cutting-edge technology such as Vision Transformers can be used for diagnosing COVID-19 by analyzing chest X-rays (CXR), including an explanation of how the network is fine-tuned and how it was validated. Through the COVID-Net Open Source Initiative, which provides a dataset of 30000 CXR images, our proposed Vision Transformer model obtains an accuracy of 94.75% and a sensitivity for COVID-19 cases of 99%, outperforming other widely used models in literature such as ResNet-50, VGG-19 or COVID-Net.
Keywords: COVID-19
1
· Vision Transformers · Chest X-ray
Introduction
The COVID-19 pandemic caused by severe acute respiratory syndrome Coronavirus 2 (SARS-CoV-2) has had an enormous impact on healthcare systems and the global economy, with businesses and countries struggling to cope with the outbreak. As far as we know, nearly 6.5 million deaths due to COVID-19 and 616.5 million cases have been reported [1]. The lack of personal protective equipment and effective treatments generated a critical situation in hospitals. Real time polymerase chain reaction (RT-PCR) is the most reliable technique for COVID-19 detection. Nonetheless, the drought of test RT-PCR kits, along with the fact that accurate diagnosis takes between 24 to 72 h to be obtained, have led doctors to consider patient’s computer tomography (CT) and chest X-ray imaging (CXR) as diagnostic tools. In fact, these kind of images exhibit characteristics which might facilitate the rapid diagnosis of COVID-19, such c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 81–90, 2023. https://doi.org/10.1007/978-3-031-35501-1_8
82
L. Balderas et al.
as ground-glass opacity, out-of-the-air bilateralities or interstitiality [2] [3]. In consequence, expert radiologists can confirm the presence of COVID-19 in the chest area examining the scans. However, this process is time consuming and inefficient having in mind the vertiginous uprising in cases. Deep Learning solutions applied to medicine are showing extraordinary results in healthcare domain and this would be helpful in identifying COVID-19 patients with minimum time consumption and maximum effectiveness. A high number of automatic systems for COVID-19 detection have been published. Most of them do not provide enough information to allow for their reproducibility. In this paper, we present Covid-ViT, an AI system for COVID19 detection based on Vision Transformers, which constitute the cutting-edge state of art deep learning architecture. A two-step fine tuning process was employed: first, for pneumonia detection; and finally, for COVID-19 detection. As a result, Covid-ViT outperforms CNN-based models in the benchmark COVIDNet dataset [4]. Besides, we provide all technical details to reproduce all the process. Specifically, our main contributions are: 1. We build a ViT-based image classification model, Covid-ViT, for diagnosing COVID-19 from X-ray images. 2. We propose a two-stage fine tuning process, provoking a knowledge transfer between the original ViT prediction purpose (Imagenet) to pneumonia and, finally, to COVID-19. 3. We show the efficacy of our method through experiments on the largest open access benchmark dataset in terms of COVID-19 positive patient cases, outperforming general purpose CNN models such as VGG-19 or ResNet-50 and ad-hoc designed models as COVID-Net. 4. We provide all technical details to reproduce the process, including the hyperparameters for both fine tuning stages. The rest of this paper is structured as follows: In Sect. 2, we introduce the state-of-art of different approaches for diagnosing COVID-19. In Sect. 3, we describe our methods. In Sect. 4, experiments’ results are presented. Finally, Sect. 5 highlights the conclusions.
2
Related Work
Due to the enormous increase in the number of COVID-19 cases, researchers are working to develop diagnostic and prediction measures that may assist healthcare systems. From a theoretical point of view, in [5] analysis of the type of data utilized, different technologies, approaches used in diagnosis and prediction of COVID-19 are carried out. On the other hand, inspired by several approaches for different biomedical problems, such as those found in [6–9] or [10], many practical techniques have been proposed. In [11], a non-entropic threshold selection method for COVID-19 X-Ray image analysis using Fast Cuckoo Search. Besides, a mixed-integer linear programming model designed to enhance the resilience of the healthcare network is proposed in [12].
COVID-ViT: COVID-19 Detection Method Based on Vision Transformers
83
Many initiatives based on deep learning solutions for COVID-19 detection have been presented. The majority of them uses Convolutional Neural Networks (CNN) to grasp the problem. For example, [13] employed deep transfer learning techniques and examined 15 pre-trained CNN models to find the best model using COVID-19 image collection (CIDC) [14] and RSNA [15] as datasets. They found that VGG-19 was the most accurate model. In [16], a deep learning system based on MobileNet is presented using the NIH dataset [17]. In [18] a CNN model named DarkCovidNet is developed, trained with CIDC and ChestX-ray8 [19] datasets. In [20], a hybrid approach based on CNN is presented, First, feature vectors were extracted using CNNs. Then, the binary differential metaheuristic algorithm selected the most valuable features. Finally, an SVM classifier was used to predict COVID-19, using CIDC and “Labeled optical coherence tomography (OCT) and Chest X-ray images for classification” [21] datasets. In [22], U-Net, for lung segmentation, and VGG-19, for classification, are joined to build a COVID-19 detector. They developed their algorithm using the BIMCVCOVID19+,the BIMCV-COVID19- and the Spain Pre-COVID-19 era datasets [23]. In [4], a dataset is created collecting up to 30894 CXR images from different data repositories. We will take this dataset as benchmark for evaluating our proposal. Besides, a CNN tailored for COVID-19 architecture is designed. Nonetheless, CNN-based models have shown limitations when it comes to determine moderate COVID-19 cases. On the other hand, Vision Transformer (ViT) models have been used for COVID-19 with great success. In [24], a ViT model is presented for COVID-19 detection using CXR and CT images, employing multi-stage transfer learning to avoid data scarcity. They used the Kaggle repository [4] as dataset. In [25] a ViT model is proposed by using the low-level CXR features, using CNUH, YNU and KNUH [26] datasets. In [27], a ViT for multiclass classification problem to detect COVID-19, Pneumonia and Normal cases is presented, using SIIMFISABIO-RSNA COVID-19 [28] and RSNA datasets.
3
Vision Transformers
Transformer [29] is a transduction model that, without using sequence-aligned RNNs nor convolutions, computes representations of its input and output using stacked multi head-attention and point-wise fully connected layers for the encoder and decoder. Self-attention is a key component in Transformer architecture, exhibiting learning of the syntactic and semantic structure of sentences. In terms of computational complexity, self-attention layers are faster than recurrent layers, given the fact that the self-attention layer connects all positions with a constant number of sequential operations, whereas recurrent layers require O(n) sequential operations. Besides, a great amount of computations is parallelized. Finally, self-attention could generate more interpretable models [29]. It is applied in many NLP disciplines, such as machine translation, name-entity recognition, or part-of-speech-tag, outperforming RNNs approach and becoming the NLP state-of-art model.
84
L. Balderas et al.
Vision Transformer (ViT), which was introduced in [30], is a direct application of Transformers to image recognition, interpreting an image as a sequence of 16×16 patches and providing the sequence of linear embeddings of these patches as an input to a standard Transformer encoder as used in NLP. In other words, image patches are treated the same way as tokens in NLP approaches. The overall architecture of ViT is depicted in Fig. 1. Again, self-attention is a crucial part of ViT, since it allows the integration of information across the entire image even in the lowest layers. Therefore, the model can recognize image regions that could be considered semantically relevant for the learning task proposed and suppress the irrelevant background ones [31]. Although CNN models have been extensively used over the last decade in object recognition tasks, ViT models have recently achieved even better results than CNNs due to several aspects, such as that ViT retains more spatial information than CNNs (ResNet and other CNN image classification models propagate representations with decreasing resolution) and can learn high-quality intermediate representations with large amounts of data [33]. For this reason, in this paper we address the challenge of building a ViT-based image classification model for COVID-19 detection from X-ray chest images.
4
Methods
In this section, we present both the dataset and the model trained for facing the COVID-19 detection learning task successfully. 4.1
Dataset
The dataset used to train COVID-Net, called COVIDx, is the largest open access benchmark dataset in terms of the number of COVID-19 positive patient cases [4]. Concretely, it is formed by data from each of the next image repositories: – – – –
COVID-19 Image Data Collection. COVID-19 Chest X-ray Dataset Initiative. RSNA Pneumonia Detection Challenge COVID-19 radiography database
Finally, after different updates and releases from COVID-Net Open Source Initiative, Covid-ViT was trained with 30488 CXR images (16490 positive COVID-19 images and 13995 negative). The test set was formed by 400 CXR images equally distributed by class. 4.2
Covid-ViT Model. Fine-tuning Process
Covid-ViT is a deep learning model based on a Vision Transformers ViT model, concretely, the ViT Base-Patch-16, a ViT model pretrained on ImageNet-21 K (14 million images, 21843 classes).
COVID-ViT: COVID-19 Detection Method Based on Vision Transformers
85
Fig. 1. Illustration of the original Vision Transformer ViT Architecture [32]. The input image is split into patches and embedded by linear projection. Position embedding is used and the results are given to a Transformer Encoder
Covid-ViT’s fine-tuning process is divided into two stages. First, ViT BasePatch-16 is fine-tuned on the dataset of validated OCT and CXR images described in [34]. The dataset is available online in [35]. The training process resulted in a model with an 95.51% of accuracy, with the following hyperparameters: – – – –
Learning rate: 2e − 5 Train batch size: 16 Evaluation batch size: 8 Number of epochs: 10
The second stage fine-tuning process trains ViT for COVID-19 detection after being prepared to detect pneumonia; in other words, we build Covid-ViT in this process. In order to enhance the learning process, after normalizing and resizing images, we introduce data augmentation through different transformations such as random horizontal flip (25%), random vertical flip (25%) and random rotation (15 degress). The training hyperparameters applied are the following: – – – – –
Learning rate: 2e − 5 Train batch size: 50 Evaluation batch size: 25 Number of epochs: 50 Weight decay: 0.01
As we can see in Fig. 2, there is no overfitting in the training process, making Covid-ViT a suitable model for this learning task.
86
L. Balderas et al.
Fig. 2. Illustration of the relation between training and validation loss along the epochs. There is a successful learning of the problem and no overfitting.
5
Results
To measure the quality of Covid-ViT in a objective manner, we computed sensitivity (Table 1), positive predictive value (Table 2) for each class and test accuracy, on the aforementioned COVIDx dataset. We compared the results with other state-of-art architectures, VGG-19, ResNet-50 and COVID-Net, which have been evaluated on COVIDx dataset too. In Fig. 3 we can find the confusion matrix for Covid-ViT on the COVIDx test dataset. Table 1. Sensitivity for each class comparing Covid-ViT to CNN models proposed in [4]. Best results highlighted in bold Architecture
No COVID-19 COVID-19
VGG-19 [4]
94.36
58.7
ResNet-50 [4]
95.3
83
COVID-Net [4] 94.5
94
Covid-ViT
99
90.5
Observing Table 1, Covid-ViT achieved noticeably higher sensitivity for the COVID-19 class compared to the other state-of-art approaches. Besides, examining the entires of Table 2, we notice that Covid-ViT outperformed the other architectures in terms of the Positive Predictive Value for No COVID-19 class. Other metrics, such as precision (91.24%), specificity (90.5%), recall (90.5) or F1-Score (0.9495) show that our method is an effective predictor for screening the COVID-19 disease. As we can see, Covid-ViT is a more reliable approach for COVID-19 diagnose compared to other state-of-art predictors based on several factors: on one hand, the experiments show that the Vision Transformer models are more accurate than traditional CNN models, taking into account that Covid-ViT outperforms
COVID-ViT: COVID-19 Detection Method Based on Vision Transformers
87
Table 2. Positive Predictive Value for each class comparing Covid-ViT to CNN models proposed in [4]. Best results highlighted in bold Architecture
No COVID-19 COVID-19
VGG-19 [4]
77.2
98.4
ResNet-50 [4]
87.4
98.8
COVID-Net [4] 90.86
98.9
Covid-ViT
91.3
98.91
Table 3. Accuracy for each architecture comparing Covid-ViT to CNN models proposed in [4]. Best results highlighted in bold Architecture
Accuracy (%)
VGG-19 [4]
83
ResNet-50 [4]
90.6
COVID-Net [4] 93.3 Covid-ViT
94.75
general models as VGG-19 or ResNet-50 and specifically CNNs designed for this problem, as the COVID-Net model, obtaining the highest value for accuracy (94.75%, Table 3). Focusing on the other state-of-art Transformer-based models, there is no benchmark dataset to compare the results faithfully. Many of them are trained on reduced datasets, as in [24], where a dataset formed by only 7000 images is used. In contrast, Covid-ViT is trained on the COVID-Net Open Source Initiative dataset, which is, to the best of our knowledge, the largest publicly available COVID-19 dataset, counting with more than 30000 images. On the other hand, Covid-ViT incorporates a two-stage fine tuning process, firstly for pneumonia detection and finally for COVID-19 diagnosis. This process
Fig. 3. Confusion matrix for Covid-ViT on the COVIDx test dataset
88
L. Balderas et al.
guarantees that the model can acquire the competence to predict COVID-19 successfully, overcoming the gap between its original predictive purpose (the Imagenet dataset) and the final biomedical domain through a knowledge transfer process. Finally, we specify the steps and technical details to reproduce the experiments and build the model.
6
Conclusion
In this study, we introduced Covid-ViT, a vision transformers model based on a two-stage fine tuning process design for the detection of COVID-19 cases from CXR images. We used COVIDx as open access benchmark dataset for training and evaluating the Covid-ViT learning capacity. All technical details have been provided to reproduce the fine-tuning process. Compared to other state-of-art CNN architectures, such as VGG-19, ResNet-50 and COVID-Net, Covid-ViT is a more accurate (94.75% accuracy test) and robust (99% COVID-19 sensitivity) COVID-19 predictor. The results confirm that transformer-based models outperform CNN models and can be an extraordinary tool for professional radiologists. Acknowledgements. This work was supported by the ‘Artificial Intelligence for the diagnosis and prognosis of COVID-19’ project (CV20-29480), funded by the Consejería de Transformación Económica, Industria, Conocimiento y Universidades, Junta de Andalucía, the FEDER funds and the ‘Deep processing of time series: Central Nervous System Brain diagnosis from perfusion of MRI Images’ project (PID2020-118224RBI00), funded by the Spanish Ministry of Science and Innovation.
References 1. Johns Hopkins University. COVID-19 Map Dashboard. https://coronavirus.jhu. edu/map.html (2022). Accessed 25 Sept 2022 2. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: Endto-End Object Detection with Transformers. (arXiv, 2020). https://arxiv.org/abs/ 2005.12872 3. Chen, X., Xie, S., He, K.: An empirical study of training self-supervised vision transformers. (arXiv, 2021). https://arxiv.org/abs/2104.02057 4. Wang, L., Lin, Z., Wong, A.: COVID-Net: a tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images. Scientific Rep. 10, 19549 (2020). https://doi.org/10.1038/s41598-020-76550-z 5. Ding, W., Nayak, J., Swapnarekha, H., Abraham, A., Naik, B., Pelusi, D. Fusion of intelligent learning for COVID-19: a state-of-the-art review and analysis on real medical data. Neurocomputing 457, 40–66 (2021). https://doi.org/10.1016/j. neucom.2021.06.024 6. Das, P., Nayak, B., Meher, S.: A lightweight deep learning system for automatic detection of blood cancer. Measurement 191 110762 (2022). https://www. sciencedirect.com/science/article/pii/S026322412200063X 7. Das, P., Meher, S., Panda, R., Abraham, A.: A review of automated methods for the detection of sickle cell disease. IEEE Rev. Biomed. Eng. 13, 309–324 (2020)
COVID-ViT: COVID-19 Detection Method Based on Vision Transformers
89
8. Das, P.K., Diya, V.A., Meher, S., Panda, R., Abraham, A.: A systematic review on recent advancements in deep and machine learning based detection and classification of acute lymphoblastic leukemia. IEEE Access 10 81741–81763 (2022) 9. Das, P., Meher, S.: Transfer learning-based automatic detection of acute lymphocytic leukemia. In: 2021 National Conference On Communications (NCC), pp. 1–6 (2021) 10. Das, P., Meher, S., Panda, R., Abraham, A.: An efficient blood-cell segmentation for the detection of hematological disorders. IEEE Trans. Cybern. 52, 10615–10626 (2022) 11. Naik, M., Swain, M., Panda, R., Abraham, A.: Novel square error minimizationbased multilevel thresholding method for COVID-19 X-ray image analysis using fast cuckoo search. International Journal of Image and Graphics (2022) 12. Goodarzian, F., Ghasemi, P., Gunasekaran, A., Taleizadeh, A., Abraham, A.: A sustainable-resilience healthcare network for handling COVID-19 pandemic. Annal. Oper. Res. 312, 1–65 (2022) 13. Rahaman, M., et al.: Identification of COVID-19 samples from chest X-Ray images using deep learning: a comparison of transfer learning approaches. J. X-ray Sci. Technol. 28, 821–839 (2020) 14. Cohen, J., Morrison, P., Dao, L., Roth, K., Duong, T., Ghassemi, M.: COVID19 image data collection: prospective predictions are the future. (arXiv, 2020). https://arxiv.org/abs/2006.11988 15. Pan, I., Cadrin-Chênevert, A., Cheng, P.: Tackling the radiological society of north America pneumonia detection challenge. Am. J. Roentgenol. 213, 568–574 (2019) 16. Apostolopoulos, I., Aznaouridis, S., Tzani, B.: Extracting possibly representative covid-19 biomarkers from x-ray images with deep learning approach and image data related to pulmonary diseases (2020) 17. Malhotra, A., et al.: Multi-task driven explainable diagnosis of COVID-19 using chest X-ray images. Pattern Recogn. 122, 108243 (2022). https://www. sciencedirect.com/science/article/pii/S0031320321004246 18. Ozturk, T., et al.: Automated detection of COVID-19 cases using deep neural networks with X-ray images. Comput. Biol. Medi. 121, 103792 (2020). https:// www.sciencedirect.com/science/article/pii/S0010482520301621 19. Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., Summers, R.: ChestX-Ray8: hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In: 2017 IEEE Conference On Computer Vision And Pattern Recognition (CVPR), pp. 3462–3471 (2017) 20. Iraji, M., Feizi-Derakhshi, M., Tanha, J.: COVID-19 detection using deep convolutional neural networks and binary-differential-algorithm-based feature selection on X-ray images. (arXiv, 2021). https://arxiv.org/abs/2104.07279 21. Kermany, D., Zhang, K., Goldbaum, M.: Labeled optical coherence tomography (OCT) and chest X-ray images for classification (2018) 22. Arias-Garzón, D., et al.: COVID-19 detection in X-ray images using convolutional neural networks. Mach. Learn. Appl. 6, 100138 (2021). https://www.sciencedirect. com/science/article/pii/S2666827021000694 23. Vayá, M., et al.: BIMCV COVID-19+: a large annotated dataset of RX and CT images from COVID-19 patients. (arXiv, 2020). https://arxiv.org/abs/2006.01174 24. Mondal, A., Bhattacharjee, A., Singla, P., Prathosh, A.: xViTCOS: explainable vision transformer based COVID-19 screening using radiography. IEEE J. Transl. Eng. Health Med. 10, 1–10 (2022) 25. Park, S., et al.: Vision transformer for COVID-19 CXR diagnosis using chest X-ray feature corpus. (arXiv, 2021). https://arxiv.org/abs/2103.07055
90
L. Balderas et al.
26. Choi, Y., Song, E., Kim, Y., Song, T.: Analysis of high-risk infant births and their mortality: ten years’ data from Chonnam national university hospital. Chonnam Med. J. 47, pp. 31-38 (2011) 27. Chetoui, M., Akhloufi, M.: Explainable vision transformers and radiomics for COVID-19 detection in chest X-rays. J. Clin. Med. 11, 3013 (2022). https://www. mdpi.com/2077-0383/11/11/3013 28. Society for imaging informatics in medicine (SIIM). SIIM-FISABIO-RSNA COVID-19 Detection.https://www.kaggle.com/c/siim-covid19-detection 29. Vaswani, A., et al.: Attention is all you need. (arXiv, 2017). https://arxiv.org/abs/ 1706.03762 30. Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. (arXiv, 2021). https://arxiv.org/abs/2010.11929 31. Jiang, X., Zhu, Y., Cai, G., et al.: MXT: a new variant of pyramid vision transformer for multi-label chest X-ray image classification. Cogn. Comput. 14, 1362– 1377 (2022). https://doi.org/10.1007/s12559-022-10032-4 32. Vision Transformers. Google Research. https://github.com/google-research/ vision_transformer 33. Raghu, M., Unterthiner, T., Kornblith, S., Zhang, C., Dosovitskiy, A.: Do vision transformers see like convolutional neural networks? (arXiv, 2021). https://arxiv. org/abs/2108.08810 34. Kermany, D., et al.: Identifying medical diagnoses and treatable diseases by imagebased deep learning. Cell 172, 1122–1131.e9 (2018). https://www.sciencedirect. com/science/article/pii/S0092867418301545 35. Labeled optical coherence tomography (OCT) and Chest x-ray images for classification. Mendeley Data. https://data.mendeley.com/datasets/rscbjbr9sj/2
Assessment of Epileptic Gamma Oscillations’ Networks Connectivity Amal Necibi1(B) , Abir Hadriche2 , and Nawel Jmail1 1 Miracl Lab, Sfax University, Sfax, Tunisia
[email protected] 2 Regim Laboratory, ENIS, Sfax University, Sfax, Tunisia
Abstract. Source localization consists in defining exact position of the brain generators for a time course obtained from a surface electrophysiological signal (EEG, MEG), in order to determine with a high precision the epileptogenic zones. We applied diverse inverse problem techniques to obtain this resolution. These techniques present various hypotheses and specific epileptic network connectivity. We proposed here to rate the performance of issued inverse problem in identifying epileptic zone. Then, we used four methods of inverse problem to explain cortical areas and neural generators of excessive discharges. We computed network connectivity of each technique. We applied a pre processing chain to assess the rate of epileptic gamma oscillation connectivity among MEG of each technique. Wavelet Maximum Entropy on the Mean (wMEM) proved a high matching between MEG network connectivity based on Correlation, Coherence, Granger Causality (GC) and Phase Locking Value (PLV) between active sources, followed by Dynamical Statistical Parametric Mapping (dSPM), standardized low-resolution brain electromagnetic tomography (sLORETA), and Minimum norm estimation (MNE). The problem techniques studied are at least able to find theoretically part of seizure onset zone. wMEM and dSPM represent the most powerful connection of all techniques. Keywords: Gamma oscillations · MEG events · inverse problem · MNE · sLORETA · dsPM · wMEM · network connectivity · Correlation · coherence · Granger Causality · Phase Locking Value
1 Introduction Electrophysiological techniques such as magnetoencephalography (MEG) and electroencephalography (EEG) are applied to characterize brain function or its pathologies in a non-invasive manner and with significant temporal accuracy. Brain activity has been defined in several protocols, using oscillation rythms within different experimental conditions and different sensor numbers. Recent progress is now making it possible to reconstruct temporal dynamics in the cortical zones of interest [1]. These reconstructed signals are used to investigate network activations dynamics [2]. Through connectivity measurements, interactions between different areas of the brain can be evaluated [3, © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 91–99, 2023. https://doi.org/10.1007/978-3-031-35501-1_9
92
A. Necibi et al.
4], using measures such as coherence [5], correlation [6], the Directed transfer function (DTF) [7]. In general, with a complex dynamic, epilepsy involves a network of regions, which can be involved either as zones of propagation or during paroxysmal discharges. Thus, source localization is a solution to determine the areas responsible ofexcessive epileptic discharges. We propose here, studying and assessing four inverse problem methods: MNE (Minimum norm estimation), dSPM (Dynamical Statistical Parametric Mapping), sLORETA (standardized Low-resolution brain electromagnetic tomography), and wMEM (wavelet Maximum Entropy on the Mean). To delimit these areas in order to be removed surgically, it is necessary to ensure a fine determination of the epileptogenic network that will serve as a preoperative step. Hence, we need to improve and evaluate networks connectivity metrics of gamma oscillations. These oscillations or at higher frequencies are an important guide of hyper excitable substance [8]. As well, oscillatory processes involve superposed but different networks and are complementary for presurgical mapping [9]. MNE and sLORETA proved established superb precision in sources localisation. dSPM shown good timing and spatial extent of language processes in epilepsy [10]. wMEM display a high precision in spatially spread of epileptic sources [11]. In fact, we used coherence, correlation, Granger Causality and Phase Locking Value measures to assess theoretically network connectivity of MEG oscillatory events of gamma oscillations. Our findings could be used as a prediction to help a neurologist diagnose epilepsy and identify Epileptic Zone.
2 Materials and Methods 2.1 Materials The pre-processing steps revealed in this project are carried out by the ‘Matlab’ software, (a structured programming environment that allow access, storage, manipulation, measurement and viewing EEG, MEG data [12]), and Brainstorm (an open-source collaborative application for analysis of MEG and EEG brain recordings. The main advantage of Brainstorm is the simplicity and approximation of its graphical interface [13]). 2.2 Database The real data used in this section is a MEG magnetoencephalography sample. The selected patient is a drugresistant, his MEG registration was carried out at the clinical neurophysiology department of the “Timone” hospital in Marseille, which is characterized by a stable and frequent inter-critical activity and includes simultaneous peaks and oscillations (the two bio-markers necessary to define epileptogenic areas). A clear permission has been taken from this patient for MEG recording and for the study of these signals authorized by the institutional review committee of INSERM, the French Institute of Health (IRB0000388, FWA00005831).
Assessment of Epileptic Gamma Oscillations’ Networks Connectivity
93
2.3 Methods MNE The minimum norm estimation MNE is a method widely used in the resolution of inverse problem to source location [14]. It is the basis of a stage of averages regularization, (Tikhonov regularization) [15]. Hincapié and these collaborators [14] addressed an interesting question in localization of MEG sources while using MNE method around the adjusted regulation threshold which differs from a cortico-cortical connectivity to inter cortical activity. Indeed, Hincapié generated 21,600 source simulations of different sizes, coupling strengths and signal to noise ratio (SNR), subsequently, they seek to adjust Tikhonov regularization to improve detection performance in coherence and power.The optimal coefficient of regularization was obtained for a quantity twice smaller than the best coefficient. Thus, Hincapié prove that spatial extent and SNR have a stronger effect on regularization than the coupling force. SLORETA According to Pascual-Marqui [16] extra cranial magnetic fields MEG and electrical potentials of EEG scalp are solutions to define primary distribution of postsynaptic neural processes current density implication. Indeed, there are several solutions such as linear, distributed, instantaneous, discrete solution, able to locate sources. Since LORETA (low resolution brain electromagnetic tomography) has excellent accuracy in locating sources, even for deep sources, with minimum localization errors, on the other hand, linear solutions produce images with localization errors. Methodical different from zero. For electrical neural activity, Pascual-Marqui proposes a new tomography where localization inference is based on normalized current density images: called SLORETA, with a zero localization error. dSPM Using magnetoencephalography (MEG), Carrie R and et al. [17] employed a noisestandardized distributed source solution called dSPM in healthy subjects and epilepsy. They verified, in healthy controls, that bilateral visual cortex activity of 80 to 120 ms is restored by dSPM responding to sensory control stimuli and new words. While in patients with epilepsy, dSPM can find timing and spatial extent of language processes. wMEM MEM (Maximum mean entropy) an unified paradigm methodology for managing temporal and spatiotemporal MEG/EEG series [18] using wavelets representation. Source localisation of oscillatory patterns in the time-frequency domain is performed via wavelet based MEM. A discrete wavelet expansion is used to express the data and the answer. Each time-frequency component of the data is parcelled by the spatial prior, which is a parcelization of the cortex [19].
94
A. Necibi et al.
2.4 Metrics of Network Connectivity Brain connectivity measurements are intended to study how brain regions (or nodes) interact as a network. We use 4 connectivity metrics for non-directed and directed functional connectivity analyses (Table 1). Table 1. Metrics of brain network connectivity. Metric
Directionality
Domain
Definition
Correlation
Non-directed
Time
Is a relatively simple, non-directed connectivity metric of association between time series
Coherence
Non-directed
Frequency
Is a complex-valued metric that measures linear relationship of two signals in the frequency domain
Granger causality
Directed
Time
Is a measure of directed functional connectivity based on the Wiener-Granger causality framework. GC measure linear dependencies between time series, and prediction tests of the signal future
Phase locking value
Non-directed
Phase
Is a popular metric defined as the average vector of many unit vectors length whose phase angle corresponds to the phase difference between two time series [20]
3 Results In Fig. 1, we describe active region (Local peaks of source activation film) of pure oscillations activities using MNE, dSPM, sLoreta, and wMEN inverse problem methods. sLoreta shows the highest number of active regions ROI (Region Of Interest). The lowest active regions are obtained by wMEM. 3.1 Correlation In Fig. 2, dSPMindicate a strong correlation between all selected regions, meantime, sLORETA and wMEM show an average correlation. For MNE method, all selected active regions exhibit a weak correlation.
Assessment of Epileptic Gamma Oscillations’ Networks Connectivity
95
Fig. 1. Active regions: a (MNE), b (sLORETA), c (dsPM), d (wMEM)
Fig. 2. Networks connectivity of gamma oscillation using Correlation, obtained by: a (MNE), b (sLORETA), c (dsPM), d (wMEM)
3.2 Coherence In Fig. 3, we note that for dSPM, sLORETA and wMEM methods, have almost an equivalent coherence degree. Moreover MNE show a stronger connectivity.
96
A. Necibi et al.
Fig. 3. Networks connectivity of gamma oscillation using Coherence, obtained by: a (MNE), b (sLORETA), c (dsPM), d (wMEM)
3.3 Granger Causality In Fig. 4, connectivity is depicted by a link with direction “in” or “out” thanks to this metric specification, we obtained a measure of directed functional connectivity. wMEM shows strong connection between ROI and dSPM represents weak connection.
Fig. 4. Networks connectivity of gamma oscillation using Granger Causality, obtained by: a (MNE), b (sLORETA), c (dsPM), d (wMEM)
Assessment of Epileptic Gamma Oscillations’ Networks Connectivity
97
3.4 PLV (Phase-Locking Value) In Fig. 5, MNE and sLoreta represent few links but with a high connection FordSPM and wMEM they have more links but a weak connection (Table 2).
Fig. 5. Networks connectivity of gamma oscillation using PLV (Phase-locking value) obtained: a (MNE), b (sLORETA), c (dsPM), d (wMEM)
Table 2. Number of links between active regions. Metric
MNE
Correlation
14
21
47
47
Coherence
21
30
33
35
Granger causality
33
26
28
36
8
26
29
21
76
103
137
139
Phase locking value Total number of links
sLORETA
dSPM
wMEM
It’s shown that Wavelet Maximum Entropy on the Mean (wMEM) had the highest number of links 139 which proved a high matching between MEG network connectivity based on Correlation, Coherence, Granger Causality (GC) and Phase Locking Value (PLV) between active sources, followed by Dynamical Statistical Parametric Mapping (dSPM) that reached 137 links, standardized low-resolution brain electromagnetic tomography (sLORETA) and Minimum norm estimation (MNE) shown lower number of links.
98
A. Necibi et al.
4 Discussion and Conclusions We used source localization of surface measurements in order to localize brain generators and determine the areas responsible of epileptic discharges. To solve inverse problem, we have investigated our inverse methods which are MNE, dSPM, SLORETA and wMEM by identifying active areas during excessive discharges. Additionally, the acquired results of these methods are generally consistent with the neurologist’s diagnosis. In fact, all of the techniques used were able to locate with good precision the active sources of the gamma oscillation MEG epileptic events. We escaped our work by using connectivity measures: correlation, coherence, Granger Causality to evaluate connectivity networks between these active areas. dSPM has the closet connection strength of networks with all metrics. MNE and sLORETA promote propagation power of occurring between seizures epileptic’s discharges. wMEM demonstrated a high connectivity with Granger Causality metric. Acknowledgment. This work was supported by 20PJEC0613 “Hatem Ben Taher Tunisian Project”.
References 1. Mamelak, A.N., Akhtari, M., Sutherling, W.W., Lopez, N.: Mamelak 4mm between 2 dipoles.pdf, vol. 97, pp. 865–873 (2002) 2. David, O., Garnero, L., Cosmelli, D., Varela, F.J.: Estimation of neural dynamics from MEG/EEG cortical current density maps: application to the reconstruction of large-scale cortical synchrony. IEEE Trans. Biomed. Eng. 49(9), 975–987 (2002). https://doi.org/10. 1109/TBME.2002.802013 3. Horwitz, B.: The elusive concept of brain connectivity. Neuroimage 19(2), 466–470 (2003). https://doi.org/10.1016/S1053-8119(03)00112-5 4. Darvas, F., Pantazis, D., Kucukaltun-Yildirim, E., Leahy, R.M.: Mapping human brain function with MEG and EEG: methods and validation. Neuroimage 23(SUPPL. 1), 289–299 (2004). https://doi.org/10.1016/j.neuroimage.2004.07.014 5. Gross, J., Kujala, J., Hämäläinen, M., Timmermann, L., Schnitzler, A., Salmelin, R.: Dynamic imaging of coherent sources: studying neural interactions in the human brain. Proc. Natl. Acad. Sci. U. S. A. 98(2), 694–699 (2001). https://doi.org/10.1073/pnas.98.2.694 6. Peled, A., Geva, A.B., Kremen, W.S., Blankfeld, H.M., Esfandiarfard, R., Nordahl, T.E.: Functional connectivity and working memory in schizophrenia: an EEG study. Int. J. Neurosci. 106(1–2), 47–61 (2001). https://doi.org/10.3109/00207450109149737 7. Kaminski, M.J., Blinowska, K.J.: A new method of the description of the information flow in the brain structures. Biol. Cybern. 65(3), 203–210 (1991). https://doi.org/10.1007/BF0019 8091 8. Nawel, J., Abir, H., Ichrak, B., Amal, N., Chokri, B.A.: A comparison of inverse problem methods for source localization of epileptic meg spikes. In: Proceedings - 2019 IEEE 19th International Conference on Bioinformatics and Bioengineering, BIBE 2019, pp. 867–870 (2019). https://doi.org/10.1109/BIBE.2019.00161 9. Jmail, N., Gavaret, M., Bartolomei, F., Chauvel, P., Badier, J.-M., Bénar, C.-G.: Comparison of brain networks during interictal oscillations and spikes on magnetoencephalography and intracerebral EEG. Brain Topogr. 29(5), 752–765 (2016). https://doi.org/10.1007/s10548016-0501-7
Assessment of Epileptic Gamma Oscillations’ Networks Connectivity
99
10. Hadriche, A., ElBehy, I., Hajjej, A., Jmail, N.: Evaluation of techniques for predicting a build up of a seizure. In: Abraham, A., Gandhi, N., Hanne, T., Hong, T.-P., Nogueira Rios, T., Ding, W. (eds.) ISDA 2021. LNNS, vol. 418, pp. 816–827. Springer, Cham (2022). https://doi.org/ 10.1007/978-3-030-96308-8_76 11. Hadriche, A., Jmail, N.: A build up of seizure prediction and detection Software: a review. J. Clin. Images Med. Case Reports 2(2), 1–2 (2021). https://doi.org/10.52768/2766-7820/1087 12. Delorme, A., Makeig, S.: EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. J. Neurosci. Methods 134(1), 9–21 (2004). https://doi.org/10.1016/j.jneumeth.2003.10.009 13. Tadel, F., Baillet, S., Mosher, J.C., Pantazis, D., Leahy, R.M.: Brainstorm: a user-friendly application for MEG/EEG analysis. Comput. Intell. Neurosci. 2011, 1–13 (2011). https://doi. org/10.1155/2011/879716 14. Hincapié, A.S., et al.: MEG connectivity and power detections with minimum norm estimates require different regularization parameters. Comput. Intell. Neurosci. 2016, 12–18 (2016). https://doi.org/10.1155/2016/3979547 15. Borchers, B., Uram, T., Hendrickx, J.M.H.: Tikhonov regularization of electrical conductivity depth profiles in field soils. Soil Sci. Soc. Am. J. 61(4), 1004–1009 (1997). https://doi.org/ 10.2136/sssaj1997.03615995006100040002x 16. Pascual-Marqui, R.D., Esslen, M., Kochi, K., Lehmann, D.: Functional imaging with lowresolution brain electromagnetic tomography (LORETA): a review. Methods Find. Exp. Clin. Pharmacol. 24(SUPPL. C), 91–95 (2002) 17. McDonald, C.R., et al.: Distributed source modeling of language with magnetoencephalography: application to patients with intractable epilepsy. Epilepsia 50(10), 2256–2266 (2009). https://doi.org/10.1111/j.1528-1167.2009.02172.x 18. Chowdhury, R.A., Lina, J.M., Kobayashi, E., Grova, C.: MEG source localization of spatially extended generators of epileptic activity: comparing entropic and hierarchical Bayesian approaches. PLoS ONE 8(2), e55969 (2013). https://doi.org/10.1371/journal.pone.0055969 19. Lina, J.M., Chowdhury, R., Lemay, E., Kobayashi, E., Grova, C.: Wavelet-based localization of oscillatory sources from magnetoencephalography data. IEEE Trans. Biomed. Eng. 61(8), 2350–2364 (2014). https://doi.org/10.1109/TBME.2012.2189883 20. Tass, P., et al.: Tass1998, pp. 1–4 (1998). https://doi.org/10.1103/PhysRevLett.81.3291
Clustering of High Frequency Oscillations HFO in Epilepsy Using Pretrained Neural Networks Zayneb Sadek1(B) , Abir Hadriche2 , and Nawel Jmail1,3 1 Digital Research Center of Sfax, Sfax, Tunisia
[email protected]
2 Regim Laboratory, ENIS, Sfax University, Sfax, Tunisia
[email protected] 3 Miracl Laboratory, Sfax University, Sfax, Tunisia
Abstract. High frequency oscillations (HFOs) have been presented as a promising clinical biomarker of regions responsible of epileptic seizure onset zone (soz) and thus a potential aid to guide epilepsy surgery. Visual identification of HFOs in long-term continuous intracranial EEG (iEEG) is cumbersome, due to their low amplitude and short duration. The objective of our study is to improve and automate HFO detection by developing analysis tools based on an unsupervised clustering method. First, we used a temporal basis set from Jmail et al. 2017 while exploiting the time-frequency content of iEEG data. Subsequently, we used a CNN (resnet 18) feature extractor. Then, we applied the clustering method based on reducing the events dimension per frame while preserving the distance between points when displaying from high-dimensional space to a low-dimensional one. The clustering method (Deep Cluster) is based on a standard k-means clustering algorithm. This algorithm successfully isolated HFOs from artifacts, peaks and peaks with ripples. Using this algorithm, we were able to locate the seizure onset area. Keywords: epileptic seizure · intracranial EEG (iEEG) · high frequency oscillations (HFOs) · CNN · K-means
1 Introduction Epilepsy is characterized by increased electrical activity in the brain, resulting in temporary disruption of communication between neurons. In addition, Intracranial EEG (iEEG) is recorded as a traumatic technique where electrodes are placed directly on the brain to visualize cortical sub-zones interactions. iEEG has been widely used by neurologists to detect seizure onset area and accurately identify seizure onset [1–4]. In epilepsy, IEEG signals depict spikes, oscillations at different frequencies, and superimposed spikes and oscillations [5]. Pathological high-frequency oscillations (HFOs) between 80 and 500 Hz have recently been proposed as a potential biomarker of the onset zone of epileptic seizures and have shown superior accuracy to interictal epileptiform discharges. HFOs are field © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 100–107, 2023. https://doi.org/10.1007/978-3-031-35501-1_10
Clustering of High Frequency Oscillations HFO
101
potentials that reflect short-term synchronization of neuronal activity [6]. Further studies proposed sub-HFOs as ripples (80–250 Hz) and fast ripples (FR, 250–500 Hz) [7, 8]. In fact, HFOs appear as transients with a low amplitude in intracranial EEG. Therefore, visual identification of these events remains such a difficult task. This challenge required a robust and high precision automatic detector. However, most studies of HFO detections have been performed by simple thresholding, defining context energy by calculating root mean square (RMS) [9], or signal line length [10, 11]. In semi-automatic detectors, visual inspection is performed after initial detection [8, 12], while in fully automatic detectors, supervised classifiers or advanced signal processing steps are required [13, 14]. Nevertheless, most of these techniques explored high frequency iEEG data above 80 Hz and investigated the signal amplitude in high frequency band. The purpose of this study is to prove and automate the detection of HFO by exploring _the spectrum of iEEG in high frequency band. Therefore, we propose to study an automatic unsupervised clustering detection of HFOs to map the detected events and investigate their spatial diffusion. First, we proposed a database of simulated IEEG data (in the HFO frequency range) where we evaluated them for different constraints (SNR, overlap rate, relative amplitude and frequency range). Second, explored the spectrum of iEEG signal in high frequency band. These results would help us to identify the channel with a maximum of HFOs. Thus, in the first section, we described our simulated data and the method used for clustering. In the second section, we exposed our obtained results and finally we concluded and discussed our results.
2 Materials and methods 2.1 Materials All signal processing steps of our study is processed in Matlab software (Mathworks, Natick, MA). Simulated Data: is obtained by a combination of a peak and HFO shapes as real IEEG signal, sampled at 512 Hz and of duration 2 s, with 1024 samples. Thanks to different tests, we have prepared five classes of signals composed of spikes, HFO (Ripple, Fast Ripple) and superimposed ripple and spike. By varying different parameters: relative amplitudes, frequency of oscillations, signal to noise ratio (SNR) and overlap rate: we obtained 120 sets of simulated data composed of spikes and HFO events in this range [85, 105, 200, 350, 450] Hz (ripples and fast ripples).
2.2 Methods In this paper we propose a solution for automatic detection using unsupervised clustering. The schematic diagram shows our clustering steps.
102
Z. Sadek et al.
In Fig. 1 we depicted three steps of the clustering pipeline. In the first step, we recovered a simulated IEEG time signal, presenting a mixed of pure HFO, artifact, spike, and spiky event with ripples. These events produce a complex shape in which it is difficult to distinguish the basic elements. In the second step, we presented the frequency plan of our studied events. In the last step, we used the clustering algorithm based on CNN for feature extraction as well as a PCA method and t-SNE for dimension reduction. then, we applied a standard K-means clustering algorithm.
Fig. 1. Clustering steps of HFO.
The approaches described above solve the clustering problem of pathological HFOs. In fact, clustering is a quiet difficult task since original signal are not suitable for algorithms and require a step of feature extraction, i.e. a feature description vector. CNN Feature Extraction A convolutional neural network [15, 16] is a neural network that uses a mathematical operation called convolution or convolution product (a linear operation). Each convolutional neural network contains at least one convolution layer. Let f and g be two functions defined on R, the convolution product between f and g is denoted f ∗ g and it is defined by the following equation: +∞ s(x) = (f ∗ g)(x) = f (t)g(x − t)dt (1) −∞
In machine learning, the input is always a multidimensional array of data, and the kernel is always a multidimensional array of parameters that will be adapted by the learning algorithm. Convolution is always used with a dimension greater than 1. The most used convolution is a 2D convolution. In this case, for an image input I and for a kernel K, the discrete convolution is written: s(i, j) = (k ∗ I )(i, j) = I (i − m, j − n)k(m, n) (2) m
n
The convolutional neural network (CNN) is widely used in automatic image extraction feature. The feature dimension extracted by this method is very high, and several features have strong correlations. CNN is respectively composed by convolution layer, clustering layer, fully connected layer and classifier. Features extracted by the convolution layer in CNN structure are called feature map. The design of convolutional neural network architectures vary, but in general terms they can be divided into several functional blocks (a block of convolution layers, a block of pooling layers, and a block of fully linked layers).
Clustering of High Frequency Oscillations HFO
103
The most common output of a neural network in classification tasks is the membership probability of the input image, obtained using a function (softmax), which is a generalization of the logistic function of multidimensional case (Fig. 2).
Fig. 2. Schematic of image feature extraction from a CNN.
Dimensionality Reduction Methods Dimensionality reduction methods preserve the distance between points when going from a high-dimensional space to a low-dimensional space. Typical representatives are depicted using principal components analysis (PCA) and multidimensional scaling (MDS). The feature space dimensionality reduction problem can be solved using linear methods, such as the principal component analysis method (PCA). Assessment by t-SNE The methods used are stochastic neighbor method with t-distribution (t-SNE) and uniform spatial approximations and projections (UMAP) [17]. In general, the application of these methods produces similar results, but UMAP is more computationally efficient than t-SNE. Methods for finding nearest vectors in high-dimensional spaces by K-means. The most popular algorithms for solving this kind of problems are the approximate nearest neighbor method. In [18] the set of algorithms based on a modified K-means method and allows calculations on GPU. Evaluation by Sensitivity, Specificity and Accuracy The detection sensitivity (SE), specificity (SP) and accuracy (ACC) was evaluated by [15]: SE =
TP TP + FN
SP = 1 − Acc = F score =
FP FP + TN
TP + TN P+N
2 ∗ TP (2 ∗ TP + FP + FN )
(3) (4) (5) (6)
104
Z. Sadek et al.
Silhouette Scoring Is a metric for evaluating a clustering algorithm, calculated from two scores, a and b. a is the average distance between a sample and all other points in the same cluster and b is the average distance between a sample and all other points in the nearest cluster. The silhouette score of a sample is calculated by the following formula: s=
b−a max(a, b)
(7)
3 Results We explored a clustering approach using spectral representation of iEEG in order to isolate ripples and other events such as spikes and contaminated noise. In Fig. 3 we depicted the simulated data set in the frequency plan. At each line, we find events in the frequency plan. As well as pure HFOs: Ripples in the frequency range [80–250 Hz], fast Ripples in [250–450 Hz], spike, artifacts (high noise) and a spike event with a ripple.
Fig. 3. Time-frequency distribution of simulated IEEG data.
Figure 3 shows the spatial pattern of HFO distribution. These Color Maps represents a state dispute (Spike, Ripple, Fast ripple and Artifact) from 2D t-SNE visualization after different feature extractors.
Clustering of High Frequency Oscillations HFO
105
According to [19] our results show that extracted features with ResNet18 have a better contribution to the test between VGG16, VGG 19, ResNet50, Resnet101 and Resnet34. Therefore, it is expected that the feature representation will be more compact and well separated as we progress through the used pipeline (Fig. 4).
Fig. 4. 2D t-SNE visualization of extracted features using resnet18.
In Fig. 5 we represented the spatial distribution of simulated IEEG data after initial detection and final clustering using k-means of extracted CNN feature (ResNet18). Indeed, the clustering result presents 5 classes, where the upper line is a frequency plan of artifact. The second line depict a class of peaks and ripples. In the third line, there is a well-ranked fast ripple class. Then in the following line, we got a class of spike and a mixture of spike and ripple. The bottom line presents perfectly the ripple cluster.
Fig. 5. Spatial distribution of HFOs after final grouping.
The clustering algorithm present an acc of 94%. For the artifact clustering, it gives (SE 98%, SP 95%, F score 97%), ( SE 95%, SP 100% and F score 97%) for fast ripple,
106
Z. Sadek et al.
(SE 100%, SP 100% and F score100%) for ripple, (SE 82%, SP 96% and F score 84%) for spike ripple, (SE 85%,SP 95%, F score 83%) for spike using features calculated from iEEG spectrum that isolate HFO from other events like spikes and sharp waves. However, the two classes spike and spike ripple were further grouped into two categories where one of these groups was patiently located in another far region. Silhouette Score Note that the silhouette score can range from −1 to +1. A silhouette score of −1 means incorrect clustering and +1 means correct and very dense clustering. A silhouette score of 0 means the clusters overlap (Fig. 6).
Fig. 6. Silhouette score of HFOs clustering algorithm.
4 Conclusions In this study, we applied the HFO clustering approach to detect IEEG events such as epileptic spikes, ripples, and fast ripples in intracranial EEG datasets (simulated IEEG). The HFO clustering network described follows time frequency representation of simulated iEEG. This network is created through a novel automated approach based on sequential feature extraction through pre-trained neural networks and t-SNE methods for dimension reduction. First, we applied a CNN feature extraction (resnet18). Then we used the t-SNE method for downsizing. The algorithm succeeded in isolating a selection of events from five classes, namely spikes, ripples and fast ripples, contaminated noises as well as spikes with superimposed ripples. The clustering result should be recommended as a guide to identify the channel with a maximum number of HFOs which subsequently can be identified as a strong indicator at seizure onset. As perspective, we propose to use our results in order to localize HFO and define their networks connectivity for an accurate definition of epileptogenic zones, which would imply a significant impact on surgical intervention for drug-resistant patients in order to delineate epileptogenic tissue. Acknowledgment. This work was supported by 20PJEC0613 “Hatem Ben Taher Tunisian Project”.
Clustering of High Frequency Oscillations HFO
107
References 1. Zentner, J.: Surgical treatment of epilepsies. In: Sutter, B., Schröttner, O. (eds.) Advances in Epilepsy Surgery and Radiosurgery, pp. 27–35. Springer Vienna, Vienna (2002). https://doi. org/10.1007/978-3-7091-6117-3_3 2. Hadriche, A., ElBehy, I., Hajjej, A., Jmail, N.: Evaluation of techniques for predicting a build up of a seizure. In: Abraham, A., Gandhi, N., Hanne, T., Hong, T.-P., Nogueira Rios, T., Ding, W. (eds.) ISDA 2021. LNNS, vol. 418, pp. 816–827. Springer, Cham (2022). https://doi.org/ 10.1007/978-3-030-96308-8_76 3. Hadriche, A„ Behy, I., Necibi, A., Kachouri, A., Ben Amar, C., Jmail, N.: Assessment of Effective Network Connectivity among MEG None Contaminated Epileptic Transitory Events. Computational and Mathematical Methods in Medicine (2021). https://doi.org/10.1155/2021/ 6406362 4. Nawel, J., Abir, H., Ichrak, B., Amal, N., Chokri, B.A.: A comparison of inverse problem methods for source localization of epileptic meg spikes. In: 2019 IEEE 19th International Conference on Bioinformatics and Bioengineering (BIBE), pp. 867–870 (2019) https://doi. org/10.1109/BIBE.2019.00161 5. Jmail, N., Gavaret, M., Wendling, F., Kachouri, A., Hamadi, G., Badier, J.M.: A comparison of methods for separation of transient and oscillatory signals in EEG. J. Neurosci. Methods 193, 273–289 (2011) 6. Zelmann, R., Mari, F., Jacobs, J., Zijlmans, M., Dubeau, F., Gotman, J.: A comparison between detectors of high frequency oscillations. Clinical Neurophysiol. 123(1), 106–116 (2012). https://doi.org/10.1016/j.clinph.2011.06.006 7. Engel, J., Bragin, A., Staba, R., Mody, I.: High-frequency oscillations: what is normal and what is not? Epilepsia 50(4), 598–604 (2009) 8. Schevon, C.A., Trevelyan, A.J., Schroeder, C.E., Goodman, R.R., McKhann, G., Emerson, R.G.: Spatial characterization of interictal high frequency oscillations in epileptic neocortex. Brain 132(Pt 11), 3047–3059 (2009) 9. Worrell, G.A., et al.: High-frequency oscillations in human temporal lobe: simultaneous microwire 10. Blanco, J.A., et al.: Unsupervised classification of high-frequency oscillations in human neocortical epilepsy and control patients. J. Neurophysiol. 104(5), 2900–2912 (2010) 11. Andrew, B.G., Greg W., Eric, A,M., Dennis, D., Brian, L.: Human and automated detection of high-frequency oscillations in clinical intracranial EEG Regcordings. Clin. Neurophysiol. 118(5), 1134–1143 (2007) 12. Akiyama, T., et al.: Focal cortical high-frequency oscillations trigger epileptic spasms: confirmation by digital video subdural EEG. Clin. Neurophysiol. 13. Crépon, B., et al.: Mapping interictal oscillations greater than 200 Hz recorded with intracranial macroelectrodes in human epilepsy. Brain 133(Pt 1), 33–45 (2010) 14. Burnos, S., et al.: Human intracranial high frequency oscillations (HFOs) detected by automatic time-frequency analysis. PLoS One 9(4), e94381 (2014) 15. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning, vol. 43. MIT Press (2016) 16. Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998) 17. Leland, M., Healy, J., Melville, J.: Umap: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018) 18. David, M.C., et al.: t-SNE-CUDA: GPU-accelerated t-SNE and its applications to modern data. Preprint arXiv arXiv:1807.11824 (2018) 19. Benchmarks for popular CNN models. https://github.com/jcjohnson/cnn-benchmarks
Real Time Detection and Tracking in Multi Speakers Video Conferencing Nesrine Affes1
, Jalel Ktari1(B) , Nader Ben Amor1 and Habib Hamam2,3,4,5
, Tarek Frikha1
,
1 CES Lab, ENIS, University of Sfax, Sfax, Tunisia
{nesrine.affes,jalel.ktari,nader.benamor,tarek.frikha}@enis.tn 2 Faculty of Engineering, University de Moncton, Moncton, NB 1A3E9, Canada [email protected] 3 Spectrum of Knowledge Production and Skills Development, 3027 Sfax, Tunisia 4 International Institute of Technology and Management, 1989, Libreville, Gabon 5 Department of Electrical and Electronic Engineering Science, School of Electrical Engineering, University of Johannesburg, Johannesburg 2006, South Africa
Abstract. Currently, the videoconferencing market is growing worldwide with annual growth (CAGR) up to 10%. Several companies appreciated this technique during the Coronavirus lockdown as it allowed them to maintain continuous activity and frequent remote meetings. Despite the diversity of videoconference platforms, several are still needed especially if the conference participant moves out of the video capture window or if there is more than one person in the window. In this paper, a new videoconferencing system capable of detecting, choosing and tracking one participant is proposed. The framework suggested uses deep learning algorithms and offers the detection and tracking of a chosen person in a video stream. Person detection uses the Convolutional Neural Network model trained on a selected dataset. The tracking uses the SiamFC algorithm. In the model test phase, our system achieved an accuracy of 98%. Keywords: Videoconferencing · Tracking · Multi speakers · YOLOV3 · SiamFC · deep learning · Darknet-53 · Siameses Network
1 Introduction Videoconferencing technology has the potential to increase opportunities for many domains such as education, healthcare, industry, etc. One of the major features in a video conferencing platform is face detection which has received significant attention especially with the rapid progress in the domain of artificial intelligence. The huge selection of police services and commercial applications as well as the availability of researchable technologies contribute to this trend. In 2020, many companies and universities expressed their appreciation of this technique during the period of the health crisis due to the COVID-19 virus pandemic [1, 2]. In order to increase the performance and efficiency of videoconferencing, the automatic detection/ tracking of the speaker is © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 108–118, 2023. https://doi.org/10.1007/978-3-031-35501-1_11
Real Time Detection and Tracking in Multi Speakers
109
becoming a necessity. The tracking task can be considered as a special case that guides the camera movement using a controlled servo motor. In the literature, human and object detection is a basic problem in computer vision. It aims to extract object instances localization from beforehand determining categories in real images. In this context, Brahati et al. proposed a Regional Convolutional Neural Network (R-CNN) composed of 4 modules for object detection. Independently of the category, region proposals are generated in the first module. Then, a feature vector is extracted from each proposed region having a fixed length in the second module. To classify objects, a set of linear SVMs is determined in the third module.A bounding box regressor predicts exactly the bounding box in the fourth module [3].Then, Ross Girshick [4] proposed a fast version of R-CNN. For each region proposal, R-CNN performs a ConvNet forward pass without sharing computation. It takes too much time on SVMs classification [5, 6]. After the introduction of Fast RCNN, comes a Faster RCNN detector. To overcome the disadvantage of Fast RCNN, Faster RCNN introduces a region proposal network (RPN). Both detection tasks and region proposal generation are performed in Fast R-CNN. Excepting RPN, Fast RCNN is like Faster RCNN. With the continuous improvement of computing power and the deep learning development, a single-shot detector SSD has been invented. Box offsets and category scores are predicted for a fixed set of bounding boxes for each location in several feature maps [7]. Redmon et al. proposed YOLO for real time detection. The main improvement measures of YOLO network are YOLO V1:V8 [8]. The mean average precision (mAP) of Faster R-CNN achieved 73.20% (Pascal-VOC 2007) [9] but YOLO-V3 is a high-speed detector where the frames per second reached 5 times than that of Faster R-CNN with 35 FPS. This implies that YOLO-V3 is a real time object detector with a high mAP of 65.86%. Conversely, SSD did not reach important scores. Recently, several studies are proposed to deal with the automatic visual tracking. In a video sequence, it seeks to predict the position of any target chosen in the initial frame. Roughly speaking, visual target trackers fall into two categories; without or with deep learning. The first one includes traditional trackers grounded in motion models and classical looks and then examines their pros and cons experimentally, systematically. They use manual design features for target modelisation to provide efficient computational complexity and to alleviate appearance variations. The second category includes trackers based on deep learning that employ not only deep off-the-shelf features but also end-to-end networks. A method of straight forward is weaving before manual trained deep features. Some inconsistency problems can be generated due to task differences, such trackers. However, specific studies have been undertaken for completely trained visual trackers with respect to existing tracking challenges. For instance, some traditional methods and deep methods are categorized into correlation filter trackers and non correlation filter ones. After that, additional classification was performed using the architectures and tracking mechanisms. Some DL-based trackers can be cited [10]: ATOM, DiMP, PrDiMP, SiamDW, SiamRPN++, SiamMask etc. As the backbone network, ATOM, DiMP50, and PrDiMP tracker employ ResNet blocks, while the DiMP & PrDiMP train these blocks on tracking datasets. When exploiting ResNet models, new residual modules and architectures were proposed in SiamDW to improve the accuracy of feature localization. The best tracker to tackle similar objects is the SiamDW using the dataset VisDrone2019-test-dev [10].
110
N. Affes et al.
2 Proposed Approach 2.1 Proposed Solution When presenting training or consultation, several people may be present in the videoconference scene, located close to each other. Who is the presenter and how do you allow the recipient to follow him automatically via the camera? This presenter can move in a specified direction, i.e. he can present in a standing position and move here, there, and yonder. If the camera stays still, the presenter may not appear on the recipient’s screen. The system has to follow the movement of the target person via the camera. It is therefore necessary to make the task of video conferencing more intelligent by automating some of its functions. • The application must be in real-time: The system must be sufficiently reactive to follow the target shape of the presenter without any loss of its movement or look. • Learning by building a dataset: The videoconferencing system must be able to detect the people only (one class) and their locations in the scene. 2.2 System Design Any automatic person tracking system is mainly composed of four steps: detection, tracking, coordinates extraction and webcam adjustment. The tracking and detection methods that will be adopted to achieve this system will be detailed. The architecture of the system is shown in Fig. 1.
Fig. 1. System architecture
Real Time Detection and Tracking in Multi Speakers
111
2.3 Person Detection In this step, person detection is achieved by applying Yolo-V3 [11] detection system because of its high speed and its real time operation. The detection of object is defined as a regression problem for precisely bounding box prediction and exact class probabilities. A single neural network estimates directly class probabilities and bounding boxes from complete images within one-stage, as shown in Fig. 2.
Fig. 2. Yolo system detection
2.4 Tracking In computer vision, the tracking of a Single object is a basic problem. The target object is shown in the first video frame and will be tracked successively in the following frames. The Siamese trackers [12] achieve good results in tracking a single “SOT” object. Siamese approaches operate on pairs of images. Their objective is to track (by matching) the target object in the first image in a search region from the second image. The user receives a video and indicates which object he is interested in, using a single rectangular bounding box. As the object to be tracked is not known in advance, it is necessary to perform an online stochastic gradient descent to adapt the network weights, severely compromising the speed of the system. The standard Siamese architecture takes a pair of images as input, including an exemplar image z and a candidate search image x. The image z represents the object of interest and x represents the area in the following video images. An identical transformation φ to both inputs is applied by Siamese networks. And then they combine their representations using another function. Thus, each subwindow within the search area will have a direct correspondence in a final response. The maximum of this response will indicate the new position of the target, as shown in Fig. 3.
Fig. 3. Fully-convolutional Siamese architecture [12]
112
N. Affes et al.
2.5 Machine Learning/Deep Learning Intelligent systems often rely on machine learning. Machine learning allows machines to learn from training data to solve a problem and automate the generation of analytical models. Deep learning is an automatic concept using Artificial Neural Networks. This technique outperforms traditional learning models and traditional data analysis. Many works have exploited Deep Learning technology, and among the recent researches that have enriched the science, we quote [13] in which Yiwen et al. studied an application of 3D spatial features for multi-channel multi-speaker overlapped speech recognition showing wonderful improvement over the previous 1D. In the medical field, Pradeep et al. [14] proposed a systematic review of recent advances in the detection and classification of Acute Lymphoblastic Leukemia based on Deep and Machine learning [15]. In the same context, a computationally efficient blood cancer detection system is proposed by pradeep et al. [16]. They suggested a lightweight deep learning based feature extraction to detect Leukemia. In another work [17], authors give an overview of existing methods in the literature to enhance, segment and extract features and classes of red blood cell images to detect sickle cell disease. As such, the reasons behind choosing Deep Learning become clear, as it is a young emerging field in predictions for images.
3 Implementation and Experimentation 3.1 Detection Using a Pre-trained Model As already explained, for the detection, Yolo-V3 is trained on the coco dataset, by default. To test the performance of the model for this project, the detector is run on a set of images that are related to the scenario of the videoconference (Table 1). Table 1. Results of different person detection using a pre-trained model. N°
Trained faces with COCO dataset
Results
1
2
Detection is successful if a bounding box surrounds the object to be detected. We notice then that detection failed in image N°1, a detection of the person located far from
Real Time Detection and Tracking in Multi Speakers
113
the webcam and no detection of the person located near the webcam. Also in image N°2, the system detects the chair, the wine and the person located on the right and does not detect the person located on the left. Following a test on a set of 8943 images, which are related with the scenario of the videoconference + VOC + COCO, it is clear that this model was accurate to 21% for images taken a few tens of cm from the webcam. As such, a solution needs to be found to overcome this problem since detection and recognition are needed regardless of the distance. This is the objective of the next part. 3.2 Discussion The accuracy of detection is the key point of this project. It is important that this application detects: • All people present in the webcam frame: – Remote person or close person. – Person attending or contrary. • All people in images with little blur or with a variable amount of contrast. The contrast is an important factor in this project, because at different levels of brightness, the detection of people must be accurate. In order to achieve these objectives, it was decided to build our database that can detect only one class: the person class with extremely special cases to resort to optimal detection accuracy. 3.3 Dataset Construction In order to solve the detection problem, the collection of images is oriented towards choosing the scenarios extracted from the video conferences: sitting person, standing person, distant person, close person, etc. To create the annotations for each one of the collected images, several tools can be used to train Yolo in Python. This means, to manually indicate the “bounding box” containing each one of the “person” in the image and indicate its class “person”. The annotation tool stores the points on the top left and bottom right in a corresponding “.txt” file as shown in Fig. 4. After labelling the images, the location of people in an image is saved in a “.txt” file with the same name. All the information needed for training and locating people is present in this file. The next step is “Data Augmentation”. It is used to improve data for maintaining precise response. For this, it comprises a lot of techniques for increasing training samples. Color space augmentations and geometric transformations are hugely exploited for visual tracking. Within this framework, 8943 images were able to be collected. The focus is laid on making blurred images and images at different levels of brightness. The dataset is divided into 90% for the training part and 10% for the testing.
114
N. Affes et al.
Fig. 4. Yolo tool annotation
Table 2. Results of different persons detection using our dataset N°
Trained faces with COCO dataset
Results
1
2
3.4 Results After Training Our Dataset The detection is run again on the same set of images. Results are revealed in Table 2. To choose the best model to adopt during the detection phase, the performance of the COCO dataset and our dataset was evaluated, as shown in Table 3.
Real Time Detection and Tracking in Multi Speakers
115
Table 3. System autonomy in the best case Characteristics
COCO
Our dataset
Classes number
80
1
Images number
6000
17886
Accuracy (%)
21
98
It is obvious that this model is faster and more accurate. It was able to detect all the people present in the images with the different positioning variabilities. 3.5 Implementation of the Target Person Selection Algorithm Since several people can be present in the window, a need select in real time the target speaker is necessary. Thus, this step consists of displaying a message to the user that asks him to select the identity of the target person. The identity is between 0 and 6 as shown in Fig. 5.
Fig. 5. Manual choice of the target from the terminal
3.6 Implementation of the Target Person Tracking Algorithm SiamFC is the algorithm used in this part under a Python file, as shown in Fig. 6.
Fig. 6. Implementation of the network architecture
It is worth noting that for the Siamese networks, the user receives a video and indicates which object he is interested in: he will describe this object just with a simple rectangular frame. Now, in this case, the rectangular frame is described by Yolo, so there is no intervention for the user at this level: the coordinates of the detected the target person’s bounding box have to be integrated in the tracking input. The bounding box coordinates generated by the tracking will be used to generate the necessary signals for the servo motor control.
116
N. Affes et al.
3.7 Camera Movement Control Servo motors are devices that can rotate to a specified position. Hitec HS-430BH has an arm that can turn 180 degrees. To control the Servo, the center position of the target person’s bounding box is calculated, as explained in Fig. 7. Thus the program generates movement commands to the servo motors via arduino to move the webcam.
Fig. 7. Explanation of the code operation
The bounding box is centered in the middle of the webcam capture. It can be noticed that this model is faster and more precise. It was able to detect all the people present in the images with the different positioning variabilities. Table 4 shows the performance evaluation of the proposed method. The evaluation was done using real-time video. This model was 98% accurate on the videoconference dataset. The detection and tracking time of the first image is: 1.3 s. After loading the models each instruction will be made in 23 ms: the time for sending coordinates to the servo (dealing with 43 FPS). All experiments were performed on an LAPTOP-J283508F with an Intel® Core™ i5-9300H CPU @ 2.40 GHz × 8, 8.00 Go of RAM, and an NVIDIA Geforce GTX 1650 with 7.5 Cuda capability. The implementation was done with: Python 3.6 using suitable libraries and frameworks, CUDA Toolkit: 11.0, NVIDIA cuDNN: 7.0, Pytorch: 1.6.0, Pycharm and OpenCV. Table 4. Performance evaluation Model Loading and acquisition of the first frame
Sending coordinates to the servo
Accuracy
FPS
1.3 secondes
0.23 secondes
98%
43
Real Time Detection and Tracking in Multi Speakers
117
4 Discussion The video conference system designed in this research is built as a dedicated system. In this work, after the implementation, the training and the test with yolo, the videoconference reaches 43 frames/s, which is acceptable. Moreover, one of the advantages of this system is its ability to distinguish and to follow the speaker, in real time, even when there are other people on the screen. This novelty can be implemented in videoconference applications such as Zoom, Skype, and Google meet, etc. To make this work popular and flexible, it could be embedded as an executable application. With a single click, the application is launched using a camera placed on a rotating motor, both connected with a USB cable to the pc, also giving access to choose the target person. This makes the application easier to install. This makes the application easier to install. In addition to the COVID context, the deep learning implementation in video conference service can make some challenges in others fields such as: electrical resources, smart city [18], networking, e-health [19, 20], software, low power hardware systems[21, 22], traffic counting system [23] and agriculture [24].
5 Conclusion and Perspective Detecting persons in the camera framework and tracking one of them accurately in videoconference is one of the most important topics of vision research because of its great variety of applications. It is challenging to process the image obtained from a webcam. A review of the detection and tracking techniques is presented. Our implementation is fulfilled using Python and OpenCV library in order to ensure the image processing and vision applications and the deep learning library Pytorch. The integration phase of the pre-trained detection module was followed by a test phase in order to exploit the performance. This training gave 98% efficient results. The target person to detect is tracked using a servo-motor. As perspectives, it is suggested to make this application executable and to add a part of gestures’ recognition “hand gestures”; by raising the hand, the speaker will be tracked automatically.
References 1. Zhou, T., Huang, S., Cheng, J., Xiao, Y.: The distance teaching practice of combined mode of massive open online course micro-video for interns in emergency department during the COVID-19 epidemic period. Telemed. e-Health 26(5), 584–588 (2020) 2. Lischer, S., Safi, N., Dickson, C.: Remote learning and students’ mental health during the Covid-19 pandemic: a mixed- method enquiry. Prospects 1–11 (2021) 3. Bharati, P., Pramanik, P.: Deep learning techniques—R-CNN to mask R-CNN: a survey. Computational Intelligence in Pattern Recognition, pp. 657–668 (2020) 4. Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015) 5. Jiao, L., Zhang, F., Liu, F., Yang, S., Li, S.: A survey of deep learning-based object detection. IEEE Access 7, 128837–128868 (2019) 6. Maity, M., Banerjee, S., Chaudhuri, S.: Faster R-CNN and yolo based vehicle detection: a survey. In: 2021 5th International Conference on Computing Methodologies and Communication (ICCMC), pp. 1442–1447. IEEE (2021)
118
N. Affes et al.
7. Tan, L., Huangfu, T., Wu, L., Chen, W.: Comparison of YOLO v3, faster R-CNN, and SSD for real-time pill identification. BMC Med Inform. Decis. Mak. 21, 324 (2021) 8. Jiang, P., Ergu, D., Liu, F., Cai, Y., Ma, B.: A Review of Yolo algorithm developments. Procedia Comput. Sci. 199, 1066–1073 (2022) 9. Murthy, C.B., Hashmi, M.F., Bokde, N.D., Geem, Z.W.: Investigations of object detection in images/videos using various deep learning techniques and embedded platforms—a comprehensive review. Appl. Sci. 10, 3280 (2020) 10. Marvasti-Zadeh, S.M., Cheng, L., Ghanei-Yakhdan, H., Kasaei, S.: Deep learning for visual tracking: a comprehensive survey. In: IEEE Transactions on Intelligent Transportation Systems (2021) 11. Fiaz, M., Mahmood, A., Javed, S., Jung, S.K.: Handcrafted and deep trackers: Recent visual object tracking approaches and trends. ACM Comput. Surv. 52(2), 43:1–43:44 (2019) 12. Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 850–865. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_56 13. Shao, Y., Zhang, S.X., Yu, D.: Multi-channel multi-speaker ASR using 3D spatial feature. In: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6067–6071 (2022) 14. Das, P.K., Meher, D.V.A., Panda, R.S., Abraham, A.: A systematic review on recent advancements in deep and machine learning based detection and classification of acute lymphoblastic Leukemia. IEEE Access 10, 81741–81763 (2022) 15. Das, P.K., Meher, S.: An efficient deep Convolutional Neural Network based detection and classification of Acute Lymphoblastic Leukemia. Expert Syst. Appl. (2021) 16. Das, P.K., Nayak, B., Meher, S.: A lightweight deep learning system for automatic detection of blood cancer. Measurement, vol. 191 (2022) 17. Das, P.K., Meher, S., Panda, R., Abraham, A.: A review of automated methods for the detection of sickle cell disease. IEEE Rev Biomed Eng. 13, 309–324 (2020) 18. Meli, W., Lacy, F., Ismail, Y.: Video-based automated pedestrians counting algorithms for smart cities. Int. J. Comput. Digital Syst. 9, 1065–1079 (2022) 19. Ktari, J., Frikha, T., Ben Amor, N., Louraidh, L., Elmannai, H., Hamdi, M.: IoMT-based platform for E-health monitoring based on the blockchain. Electronics 11(15) (2022). https:// doi.org/10.3390/electronics11152314 20. Frikha, T., Chaari, A., Chaabane, F., Cheikhrouhou, O., Zaguia, A.: Healthcare and fitness data management using the IoT-based blockchain platform. J. Healthcare Eng. (2021). https:// doi.org/10.1155/2021/9978863 21. Ktari, J., Frikha, T., Yousfi, M.A., Belghith, M.K., Sanei, M.K.: Embedded Keccak implementation on FPGA. In: 2022 IEEE International Conference on Design & Test of Integrated Micro & Nano-Systems (DTS), pp. 01–05. https://doi.org/10.1109/DTS55284.2022.980984 22. Ktari, J., Abid, M.: A low power design space exploration methodology based on high level models and confidence intervals. J. Low Power Electron. 5(1), 17–30. https://doi.org/10.1166/ jolpe.2009.1003 23. Lin, J.P., Sun, M.T.: A YOLO-based traffic counting system. In: 2018 Conference on Technologies and Applications of Artificial Intelligence (TAAI), pp. 82–85. IEEE 24. Ktari, J., Frikha, T., Chaabane, F., Hamdi, M., Hamam, H.: Agricultural lightweight embedded blockchain system: a case study in olive oil. Electronics 11(20), 3394 (2022). https://doi.org/ 10.3390/electronics11203394
Towards Business Process Model Extension with Quality Perspective Dhafer Thabet1,2(B) , Sonia Ayachi Ghannouchi2 , and Henda Hajjami Ben Ghézala2 1 University of Ha’il, Ha’il, Kingdom of Saudi Arabia
[email protected]
2 RIADI Laboratory, National School of Computer Sciences, Mannouba University, Mannouba,
Tunisia [email protected], [email protected]
Abstract. Organizations are always looking for enhancing the quality of their products/services in order to improve their competitiveness. The quality of products/services are closely related to the quality of the business process of the organization. Therefore, it is crucial for decision makers to gain insight of the quality perspective of their business process at the model level. Several studies aimed to measure and/or model the quality of a business process. Some of them provide these measures and/or models separately from the business model. Some others integrate quality information into the business process model for a specific modeling notation. This makes the integration of quality information dependent on the modeling notation. However, different notations are used by organizations to model their business processes. Therefore, in this paper, we propose a generic approach for business process model extension with quality perspective, at the activity level as a first stage. The main contributions of this paper are: (1) the adaptation of the business process quality meta-model and (2) the proposal of a generic business process meta-model extended with quality perspective at the activity level. Keywords: Business Process Model Extension · Business Process Quality · High-level Process Structure · Event Logs
1 Introduction and Research Question Organizations are always looking for improving the quality of their products or services [8]. The quality of a product/service is tightly dependent on the quality of the corresponding business process (BP). Thus, one of the main concerns of organizations is the continuous control and improvement of their BPs quality. Therefore, it is important for decision makers to have an overview of quality perspective into the BP model itself. Several studies proposed different approaches and tools to measure and/or model the quality of BPs. Some of these studies considered to provide quality measures and/or models separately from the BP model. Some other studies focused on a specific modeling notation to extend the corresponding BP model with quality characteristics. However, © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 119–128, 2023. https://doi.org/10.1007/978-3-031-35501-1_12
120
D. Thabet et al.
different modeling notations are used by organizations to model their BPs. Thus, the lack of genericity motivated us to propose an approach for BP models extension with quality perspective, independently of the modeling notation. The proposed approach is based on BP model extension using event logs as one of the main process mining techniques [9, 13]. The remainder of this paper is organized as follows: Sect. 2 summarizes and discusses the related work. Section 3 describes the proposed approach and the proposed qualityextended BP meta-model. Section 4 presents and discusses an illustration of the proposed approach using a BP example. Section 5 presents a summary of the paper contributions as well as an overview of the future work.
2 Related Work In [3], authors proposed a meta-model of BP quality characteristics with the corresponding attributes and measures which were identified and adapted based on software product quality standards. The proposed approach deals with addressing the quality of BPs during the modeling phase. In [1, 2], authors proposed an approach and the extension of CASE modeling tool to model the BP quality characteristics during the modeling phase. The proposed approach also includes an evaluation of BPMN notation for expressing the proposed characteristics. The approach proposed in [4–6] concerns the determination of performance measures thresholds for BPs. It consists in a measure-based assessment to evaluate the performance of BPMN BP models. In [12], authors propose an approach and the corresponding tool to assess the overall performance of different process variants discovered by process mining. It also allows to select the best performing variants using conjoint analysis and regression analysis. In [7], authors define the quality metrics of BP design by adapting the quality metrics of object-oriented software design, namely cohesion and coupling. In our previous work, we proposed an approach and the corresponding tool for extending BP models with cost perspective in [10] and with business cost perspective in [11]. Even though some of the existing approaches address capturing the quality of BPs as a part of the BP model, most of them [1–6] considered a specific modeling notation (namely BPMN). For some others, namely [12], the BP model notation depends on the process mining algorithm used to discover the BP model/variant with the best performance in terms of quality. In [10, 11], the proposed approach is generic in terms of modeling notation but it deals with financial cost and business cost of a BP, not with its quality characteristics. Therefore, the luck of genericity motivated us to propose a general approach for BP model extension with quality perspective, at the activity level as a first stage.
3 Proposed Approach In this section, we present a general overview of the proposed approach. Then, we introduce the adapted BP quality meta-model. Next, the proposed quality-extended BP meta-model is described. The proposed approach is a generic extension of BP models with quality perspective. The proposed quality perspective is based on event logs recorded by the system executing the BP. In this paper, we focus on the activity level
Towards Business Process Model Extension with Quality Perspective
121
to perform the proposed extension. As shown in Fig. 1, the user selects the required quality characteristic. Then, for each activity identified in the BP model, the information -corresponding to the selected quality characteristic and to the considered activity- is extracted. The extracted information is used to calculate the values of the selected quality characteristic according to the corresponding measures.
Fig. 1. General overview of the proposed approach (activity level)
Next, the calculated values are associated to the corresponding activity in the BP model. This iteration is repeated for each activity of the BP model. The final output of the proposed approach is a BP model extended with quality perspective at the activity level. 3.1 Business Process Quality Meta-model In this section, we introduce the adapted BP quality meta-model. To the best of our knowledge, the quality characteristics defined in [3] are the most comprehensive proposal. Therefore, we adopted and adapted the part concerning the activity level and
122
D. Thabet et al.
revised its representation as a UML class diagram. Figure 2 shows the adapted BP quality meta-model at the activity level. The HLActivityQualityCharacteristic class represents a quality characteristic of an activity of the BP model. The classes inheriting from the latter class represent examples of quality characteristics: time behaviour, fault tolerance, productivity, maturity, accuracy, compliance, analyzability, security, testability, stability and changeability. Each of the latter classes has attributes defining the quality characteristic in hand. In turn, each attribute is defined by the corresponding measure. Details about quality characteristics, attributes and measures are provided in [3].
Fig. 2. Quality perspective data structure (activity level)
Some of these attributes and measures refer to the structural BP properties and some others focus on the behaviour during the execution of the BP. In this paper, as the proposed approach basically uses event logs to extend the BP model with quality perspective at the activity level, we focus on characteristics related to the behaviour of activities during their execution. Table 1 illustrates examples of activity quality characteristics (maturity, time behaviour, fault tolerance) and the corresponding attributes and measures. Besides, the measure of each attribute is adapted so that it accurately meets the activity level of event logs. The definitions of the considered quality characteristics are as follows: • Maturity is the capability of the activity to avoid failure as a result of faults in the activity. An attribute example defining this quality characteristic is the fault density which is measured by the percentage of activity instances that terminate correctly. We define the fault density of an activity A by the following measure: MFaultDensity (A) = Count(Correct(Ai ))/Count(Ai )
(1)
where: Ai is the instance number i of the activity A. Correct is a function that determines if an activity instance Ai terminated correctly. • Time behaviour is the capability of the activity to provide appropriate transport and processing times and throughput rates when executed under stated conditions. An attribute example defining this quality characteristic is the waiting time which is measured by the percentage of activity instances with very different processing time. We define the waiting time of an activity A by the following measure: MWaitingTime (A) = Count(Long(Ai ))/Count(Ai ) where: Ai is the instance number i of the activity A.
(2)
Towards Business Process Model Extension with Quality Perspective
123
Long is a function that determines if the duration of an activity instance Ai exceeds the duration average of A. • Fault tolerance is the capability of the activity to maintain a specified level of performance in cases of faults or of infringement of its specified interface. An attribute example defining this quality characteristic is the exception handling which is measured by the percentage of handled exceptions. We define exception handling of an activity A by the following measure: MExceptHandling (A) = Count(HandlExcept(Ai )) / Count(Except(Ai ))
(3)
where: Ai is the instance number i of the activity A. HandlExcept is a function that determines if an exception was handled during the execution an activity instance Ai . Except is a function that determines if an exception occurred during the execution of an activity instance Ai . Table 1. Examples of activity quality characteristics, attributes and measures Characteristic
Attribute
Measure
Maturity
Fault density
Percentage of activity instances that terminate correctly
Time behavior
Waiting time
Percentage of activity instances with very different processing time
Fault tolerance
Exception handling
Percentage of handled exceptions
3.2 Quality-Extended Business Process Meta-model The high-level process structure is a general meta-model designed to extend information from different perspectives into the control flow perspective and to make them as generic and reusable as possible [10]. The proposed approach aims at extending the BP model with quality perspective at the activity level regardless the BP model notation. Thus, we consider to extend the high-level process structure with the quality perspective data structure. The quality-extended high-level process structure is illustrated in Fig. 3 as a UML class diagram. The yellow-colored classes together with their relations represent the high-level process structure. The HLProcess is the central class and holds the highlevel information independently of the BP model type. It holds a list of process elements (HLProcessElement) such as activities (HLActivity) for the process. The HLModel class represents a bridge to match the nodes of an actual BP model to their corresponding elements in the HLProcess structure. The ModelGraph class represents the actual BP model. These classes represent the common elements that will be shared by all high-level processes, regardless of whether they refer to some Petri net model, or BPMN model, etc.
124
D. Thabet et al.
Fig. 3. Quality-extended high-level process structure (activity level)
In Fig. 3, the quality perspective data structure (defined in the previous section) is represented by the grey-colored classes together with their relations. As the proposed quality perspective data structure concerns the activity level of a BP, then it is associated to the HLActivity class of the high-level process structure. Thus, quality information could be associated to the corresponding activity in the BP model regardless its notation, thanks to the HLModel data structure that plays the role of a bridge between high-level quality information and the actual BP model represented by the ModelGraph data structure.
4 llustration of the Proposed Approach In this section, the proposed approach for BP extension with quality perspective at the activity level is illustrated. First, a BP model example is introduced along with a sample of the corresponding event log. Then, we illustrate the application of the proposed approach to the considered BP model example. 4.1 Business Process Model and Event Log Examples In order to illustrate the proposed approach, we considered the BP example of a simple phone repair process. As shown in Fig. 4, the considered BP example is represented as a Petri net model as an example of BP modeling notation. The process begins by the registration of the broken phone. Then, the phone is analyzed to determine the defect type. Then, the customer is informed about the defect type. Simultaneously, as per the severity of the defect, a simple repair or a complex repair is performed. Next, the phone is tested to check if it is fixed. If so, the repair details are archived and the phone is returned to the customer. If it is not fixed, the repair and the test are performed again. If the phone is still broken after the fifth repair test, the repair details are archived and the phone is returned to the customer [14].
Towards Business Process Model Extension with Quality Perspective
125
Fig. 4. Simple phone repair process model (Petri net notation)
The simulation of the considered BP example generated an event log of 1000 cases (process instances). The obtained event log is preprocessed to obtain a file with XES extension which is the standard extension used for process mining techniques [9, 10, 13]. For simplification reasons, we present five instances of the “Analyze defect” activity in Table 2. Each instance is described in terms of start timestamp, end timestamp, end event and exception occurrence. Table 2. Sample of instances of the “Analyze Defect” activity Act. Inst.
Start
End
End Event
Exception
1
2022-01-21T19:03:00
2022-01-21T19:09:00
Complete
No
2
2022-01-22T10:51:00
2022-01-22T10:59:00
Complete
No
3
2022-01-22T14:50:00
2022-01-22T14:59:00
Complete
No
4
2022-01-23T02:35:00
2022-01-23T02:40:00
Withdrawn
Unhandled
5
2022-01-23T03:10:00
2022-01-23T03:18:00
Complete
Handled
4.2 Application of the Proposed Approach In this section, the proposed approach illustrated in Fig. 1 is applied to the BP example presented in Fig. 4 and the corresponding event log sample illustrated in Table 2. First, an instance of the meta-model presented in Fig. 3 is constructed as follows: the Petri net model example illustrated in Fig. 4 represents an instance of the PetriNetModel part of the meta-model. Besides, an instance of the HLPetriNet class is created to ensure the link between the actual activities in the Petri net model and the corresponding quality information hold by the HLProcess part of the meta-model. Moreover, an instance of the HLProcess part is created, particularly, the HLActivity and HLQualityCharacteristic classes which represent the quality perspective for each activity of the considered BP
126
D. Thabet et al.
example. Thus, for each identified activity in the Petri net model example, the event log is mined in order to extract the required data to calculate the attributes values of each quality characteristic. In the current application case, we assume that the user selects the following quality characteristics: maturity, time behaviour and fault tolerance. We also consider the example of the “Analyze Defect” activity (Petri net transition) and the quality characteristics attributes listed in Table 1. By applying the formulas (1), (2) and (3), we obtain the following results: • MFaultDensity (“Analyze Defect”) = 4/5 = 80% which represents maturity. • MWaitingTime (“Analyze Defect”) = 3/5 = 60% which represents time behavior. • MExceptHandling (“Analyze Defect”) = 1/2 = 50% which represents fault tolerance. Then, the obtained results are used to construct the HLQualityCharacteristic class instance which is associated to the HLActivity class instance corresponding to “Analyze Defect” activity. Hence, the final result is the Petri net model example extended with quality perspective at the “Analyze Defect” activity level as illustrated in Fig. 5. The quality perspective is shown in red color as a rectangle associated to the corresponding activity (“Analyze Defect” transition) of the Petri net model.
Fig. 5. Quality-extended simple phone repair process model (“Analyze Defect” activity level)
4.3 Discussion The proposed approach consists in extending BP models with quality perspective at the activity level regardless the modeling notation. In the previous section, the application of the proposed approach is performed on a BP example with Petri net notation. This notation was arbitrarily chosen but the approach is also applicable on any other BP modeling notation (like BPMN, EPC, Activity Diagram) by defining the corresponding meta-model (ModelGraph part) and the corresponding high-level model (HLModel part). This provides the genericity of the proposed approach which is the main challenge in the current paper. Furthermore, the illustration considers three examples of quality
Towards Business Process Model Extension with Quality Perspective
127
characteristics. It is also applicable for other quality characteristics by defining their attributes, measures and the way to discover them from the event log. Moreover, the illustration concerns an activity example of the considered BP for clarity reasons but the approach is similarly applicable for the other activities of the BP example.
5 Conclusion and Future Work In this paper, we proposed an approach for BP model extension with quality perspective at the activity level. It was presented by giving a general overview, the considered BP quality meta-model and the proposed quality-extended BP meta-model. Moreover, an illustration of the proposed approach was presented and discussed. The proposed approach is generic as it supports different modeling notations which is the main contribution of the current paper. Furthermore, it uses event logs for quality perspective construction which is a reliable data source for continuous BP improvement. However, the proposed approach considers only the activity level of a BP and it does not support all the defined quality characteristics. Therefore, the future work concerns the extension of the whole BP components with more significant quality characteristics. Furthermore, providing recommendations to improve the quality of the BP is also considered in our future work.
References 1. Heinrich, R., Kappe, A., Paech, B.: Modeling quality information within business process models. In: Proceedings of the 4th SQMB 2011 Workshop, pp. 4–13 (2011) 2. Heinrich, R., Kappe, A., Paech, B.: Tool support for the comprehensive modeling of quality information within business process models. In: Proceedings of the Enterprise modelling and information systems architectures, pp. 213–218. Hamburg (2011) 3. Heinrich, R., Paech, B.: Defining the quality of business processes. In: Proceedings of Modellierung Conference, pp. 133–148. Klagenfurt (2010) 4. Kchaou, M., Khlif, W., Gargouri, F.: A methodology for determination of performance measures thresholds for business process. In: Proceedings of the 15th International Conference on Evaluation of Novel Approaches to Software Engineering, online, pp. 144–157(2020) 5. Khlif, W., Kchaou, M., Gargouri, F.: A framework for evaluating business process performance. In: Proceedings of the 14th International Conference on Software and Data Technologies, pp. 371–383. Prague (2019) 6. Khlif, W., Makni, L., Zaaboub, N., Ben-Abdallah, H.: Quality metrics for business process modeling. In: Proceedings of the 9th WSEAS International Conference on Applied Computer Science, pp. 195–200. Genova (2009) 7. Khlif, W., Zaaboub, N., Ben-Abdallah, H.: Business process quality metrics: state of the art and OO software metric adaptation. In: Proceedings BIR 2009 - 8th International Conference on Perspectives in Business Informatics Research, Kristianstad Academic Press (2014) 8. Lohrmann, M., Reichert, M.: Understanding business process quality. In: Glykas, M. (ed.) Business Process Management: Theory and Applications, pp. 41–73. Springer, Berlin (2013). https://doi.org/10.1007/978-3-642-28409-0_2 9. Rozinat, A.: Process Mining: Conformance and Extension. Dissertation, Eindhoven University (2010)
128
D. Thabet et al.
10. Thabet, D., Ghannouchi, S., Ben Ghezala, H.: A general solution for business process model extension with cost perspective based on process mining. In: Proceedings of the 11th International Conference on Software Engineering Advances, pp. 238–247. Rome (2016) 11. Thabet, D., Ghannouchi, S., Ben Ghezala, H.: Towards business cost mining - considering business process reliability. In: Borzemski, L., et al. (eds.) International Conference on Intelligent Systems Design and Applications. LNNS, vol. 364, pp. 127-137 (2021). https://doi. org/10.1007/978-3-030-54157-6_11 12. Van den Ingh, L., Eshuis. R., Gelper, S.: Assessing performance of mined business process variants. In: Enterprise Information Systems, vol. 15:5, pp. 676–693 (2021) 13. Van der Aalst, W.M.P.: Process Mining: Discovery, Conformance and Enhancement. Springer, Berlin Heidelberg (2011) 14. Wynn, M.T., Low, W.Z., ter Hofstede, A.H.M., Nauta, W.E.: A framework for cost-aware process management: cost reporting and cost prediction. J. Univ. Comput. Sci. 406–430 (2014)
Image Compression-Encryption Scheme Based on SPIHT Coding and 2D Beta Chaotic Map Najet Elkhalil(B) , Youssouf Cheikh Weddy, and Ridha Ejbali Research Team in Intelligent Machines (RTIM), National Engineering School of Gabes, Omar Ibn El Khattab, Avenue Zrig, 6072 Gabes, Tunisia [email protected] , ridha [email protected] Abstract. In this paper, a new image compression-encryption algorithm is proposed. Compression is performed using the Set partitioning in hierarchical trees algorithm. After compression, the image is encrypted using the new two-dimensional Beta Chaotic map(2D-BCM). The encryption process involves permutation, diffusion, and substitution stage. According to the experimental results and security analysis, the compression-encryption algorithm has high sensitivity and security. Thus, the algorithm is able to resist chosen plain text attacks. Keywords: image compression-encryption SPIHT coding · 2D Beta Chaotic Map
1
· chaotic encryption ·
Introduction
Images are one of the most used data that hold a lot of information. It is widely used in different domains. Storage efficiency and security of these images become important issues. Chaotic systems and chaotic maps are one of the most used techniques in image encryption. This was confirmed by Ghadirli et al. [4]in their survey on color image encryption. Another image encryption scheme for color images is proposed by Wu et al. [15]. The scheme is based on 6D hyper-chaotic system and 2D discrete wavelet transform. The initial image is first divided into four sub-bands using the 2D DWT. Second, every sub-bands is permuted using a key stream. Third, an intermediate image was constructed from the four encrypted sub-bands using the inverse 2D DWT. To further increase the security, each pixel of the intermediate image is modified using another key stream. The proposed algorithm is resistant to different plaintext attacks. Massood et al. [9] proposed a hybrid image encryption scheme based on 3D Lo-renz chaotic map. The algorithm involves shuffling and diffusion processes. Security tests validate the robustness of the scheme. Combining image compression and encryption not only reduces data but also offers much-needed security to the image information. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 129–138, 2023. https://doi.org/10.1007/978-3-031-35501-1_13
130
N. Elkhalil et al.
With the growth of advanced technologies, new approaches, and techniques for image compression and encryption have been developed. Embedded encryption into the compression process is proposed by Zhang et al. [16]. This algorithm adopts three stages of scrambling and diffusion. Several chaotic systems are used to generate the encryption key. Security tests indicate that the algorithm has high security and good compression performance. In [5], authors propose a new compression-encryption algorithm based on the performance of chaotic sequence and the excellent proprieties of the wavelet transform. Results show the effectiveness of encryption security and the compression ratio. Beneficial from the data compression theory named Compressive Sensing (CS) and chaos, many scholars have proposed lossy encryption schemes combining CS and chaos [2,7,8]. SPIHT compressing is widely used in image compression due to its computational complexity and compression performance [3,14,17].
2
Literature Reviews
2.1
1D Beta Chaotic Map
The one-dimensional Beta chaotic map was first introduced in 2017 by Zahmoul et al. [11,12] This map is widely used in cryptography, Watermarking, and steganography [10,13]. The mathematical formula of the Beta chaotic map is as follows: (1) xn +1 = k × Beta (yn; x1, x2, p, q) where p = b1 + c1 × a
(2)
q = b2 + c2 × a
(3)
b1 ,c1 ,b2 and c2 adequately chosen constants. a: bifurcation parameter. k: amplitude control parameter. a: bifurcation parameter. k: parameter of chaotic map its role is to adjust the amplitude of the Beta chaotic map. The motivation of this work is to use the equation of the 1D Beta chaotic map and to produce a new 2D Beta chaotic map. The proposed 2D-BCM map has large range of bifurcation parameters and strong chaotic behavior presented in the bifurcation diagram Fig. 1. Thus the chaotic encryption process will be more secure. 2.2
The New 2D Beta Chaotic Map
The 2D chaotic maps lead to 2D iterate xi+1, yi+1 from the previous xi and yi. Derived from the equation of 1D-BCM, the mathematical definition of the new 2D-BCM is as follows: (4) xn+1 = k × Beta yn +1 ; x1, x2, p, q yn +1 = k × Beta (xn; y1, y2, p, q)
(5)
Image Compression-Encryption Scheme Based
131
where p = b1 + c1 × a
(6)
q = b2 + c2 × a
(7)
b1 ,c1 ,b2 and c2 adequately chosen constants. a: bifurcation parameter. k: amplitude control parameter. 2.3
Bifurcation Diagram of 2D-BCM
A bifurcation diagram illustrates a qualitative change in dynamics as a result of a simple change in one of parameters. The dotted line in the bifurcation diagram generally relates to the system’s chaotic behavior; the solid line, on the other hand, indicates that the system’s behavior has been adjusted to be periodic [1]. Figure 1 shows the different 2D Beta Chaotic maps .Our maps characterized by strong chaotic behavior, better pseudo random sequences, wide range bifurcation parameters, and a large number of parameters. As result, the encryption process becomes more efficient and could stand up to most attacks.
Fig. 1. Different shapes of the new 2D-BCM
2.4
SPIHT Algorithm
SPIHT Tree Structure SPIHT(Set Partitioning in Hierarchical Trees) encoding is an improved version of EZW(Embedded Zero-tree Wavelet). SPIHT is based on the Discret Wavelet Transform(DWT): the image passes through DWT block to determine its wavelet coefficients. Aspatial orientation tree structure is then constructed (show in
132
N. Elkhalil et al.
Figure). Each node of the spatial tree corresponds to a wavelet coefficient represented by position (i,j). Every node(i,j) have 4 children: (2i,2j) (2i,2j+1) (2i+1,2j) and (2i+1,2j+1), expect the lowest frequency sub-band(LL). SPIHT Encoding The SPIHT coding technique is characterized by steps that are repeated throughout each stage and are described by a bit-plane that carries an indicator of wavelet coefficients that are quantified by the structure in a hierarchical tree. Each coefficient in the orientation tree is coded from the Most Significant Bitplane (MSB) to the Least Significant Bit-plane (LSB), starting with the coefficients having the largest magnitudes within. SPIHT coding computes a threshold Tp in each bit-plane and assigns it to one or more of the three lists bellows: – List of insignificant sets(LIS): contains set of wavelet coefficients that have smaller magnitude than the threshold Tp. – List of insignificant pixels (LIP): contains wavelet coefficient that are smaller than the threshold Tp. – List of significant pixels (LSP): contains wavelet coefficient that are larger than the threshold Tp. SPIHT algorithm involves three main steps defined bellows: First step: initialization Initialize the threshold T = 2n = log2(max(i, j) {|ci, j|}),ci,j for wavelet coefficient. Initialize the three lists: LSP=Ø LIP=all(i, j) ∈ LLsub − bands LIS=treestructure Second step: Sorting Pass verify the LSP to check if the wavelet coefficients contained are significant or not. significant coefficients: then output 1 and the sign bit of the wavelet coefficients are represented by 1 and 0 (positive or negative), then remove the wavelet coefficient from LIP and add to the LSP. Insignificant coefficients: keep them in the LIP and the output will be ‘0’. Verify all the significant set in the LIS. Third step: Refinement Pass refinement bit ( bit) of all the coefficients in the LSP, which contains coefficients that have larger magnitudes than the Tp. LSP keeps pixels that are already evaluated and do not need to be evaluated again. Updating the threshold Tp. 2.5
The 2D Beta Chaotic Encryption Algorithm
Step1: resize the M*N input image into a square dimension. Step2: generating two different random chaotic sequences: Rseq1 and Rseq2. The variation of one parameter of the 2D-BCM function produces a totally different map and thus different chaotic sequences.
Image Compression-Encryption Scheme Based
133
Step3: rearrange Rseq1 and Rseq2 into matrix forms M1 and M2 with M*N dimension. M1 and M2 are then used to shuffle the rows and columns of the input image. Step4: divide the resulting matrix into four blocks. Transform the blocks to a random matrix W using the equation bellows: fN (d) = T (d) modG √ fR (d) = T d modG fS (d) = T d2 modG
(10)
fD (d) = T (2d) modG
(11)
And the matrix function is given bellows: ⎡ ⎤ fN (B1,1 ) fR (B1,2 ) fS (B1,3 ) fD (B1,4 ) ⎢ fR (B2,1 ) fS (B2,2 ) fD (B2,3 ) fN (B2,4 ) ⎥ ⎥ W =⎢ ⎣ fS (B3,1 ) fD (B3,2 ) fN (B3,3 ) fR (B3,4 ) ⎦ fD (B4,1 ) fN (B4,2 ) fR (B4,3 ) fS (B4,4 )
3
(8) (9)
(12)
Compression-Encryption Algorithm
After detailing the compression and encryption process in the previous section, we will use the SPIHT encoding algorithm to compress the input image after that we will use the 2D-BCM encryption algorithm to encrypt the resulting compressed image . The flowchart of our compression-encryption is as follows Fig. 2:
4
Simulation and Analysis of the Proposed Scheme
A good cryptosystem should have strong immunity against common and noun at-tacks such as: statistical attacks and differential attacks. In this section, we made a set of chosen security analysis to test the security and the efficiency of our scheme and a comparison with the state of the art methods. 4.1
Statistical Analysis
In order to determine the resistance against brute force attacks we made the statistical analysis. To prove the efficiency of our algorithm we have performed this statistical analysis by computing the histograms of the encrypted images and the information entropy analysis. Histogram Analysis It is crucial that the encrypted image have no statistical similarity to the original image in order to protect the information in the original image from statistical
134
N. Elkhalil et al.
Fig. 2. Flowchart of the compression-encryption algorithm
at-tacks. The encrypted image’s histogram must be uniform. The results are shown in Fig. 3(a)(b) corresponds to the original Cameraman image and its histogram, respectively. Figure 3(c)(d) corresponds to the encrypted image and its histogram, respectively. As we can see, the histogram of the encrypted image is flat and totally different from the original one. Figure 4(a)(b) corresponds to the original Chemical plants image and its histogram, respectively. Figure 4(c)(d) corresponds to the encrypted image and its histogram, respectively. Also, the histogram of the encrypted image is uniform and different from the original image.
Image Compression-Encryption Scheme Based
135
Fig. 3. (a) Cameraman plain image (b) histogram of Cameraman (c) the encrypted image (d) histogram of the encrypted image
Fig. 4. (a) Chemical plant plain image (b) histogram of Chemical plant (c) the encrypted image (d) histogram of the encrypted image.
136
N. Elkhalil et al.
Information Entropy Analysis Information entropy is one of the most important feature of randomness. It can be calculated as follows: H (S) =
n 2 −1
P (Si ) log (
i=1
1 ) P (Si )
(13)
where: 2n: the total states of the information source Si, P (Si): the probability of symbol Si. The more the value of the entropy is close to 8, the less possible for attackers to decrypt the encrypted image. The entropy for the encrypted images using our algorithm is presented in Table 1. Analyzing the Table, we can see that the entropy values are close to 8, so the pro-posed cryptosystem has a good randomness. Furthermore, our algorithm is better than those in [5,6]. Table 1. Entropy of our approach and other approaches Lena
4.2
Peppers Boat
Baboon
Our Approach 7.9998 7.9993
7.9996 7.9993
Ref [5]
7.9989 7.9988
7.9989 7.9987
Ref [17]
7.9973
-
-
-
Differential Attack Analysis
Differential test is another test to determine the robustness of the cryptosystem. We use Change Rate (NPCR) and Unified Average Changing Intensity (UACI) to deter-mine this feature. NPCR is used to calculate the difference between two images by evaluating the number of pixel differences. Meanwhile the UACI is used to obtain the difference between two images by evaluating the change of visual effect be-tween two encrypted images. NPCR and UACI are calculated by the following formulas: i,j D(i, j) × 100 (14) N P CR = M ×N ⎡ ⎤ |C (i, j) − C2 (i, j)| 1 1 ⎣ ⎦ × 100 U ACI = (15) M × N i,j 255
where D(i, j) =
0 if C1 (i, j) = C2 (i, j) 1 if C1 (i, j) = C2 (i, j)
(16)
M and N: the height and the width of the original and the cipher images, C1 and C2 are the encrypted images before and after one pixel is modified from the
Image Compression-Encryption Scheme Based
137
original image, respectively. Table 2 and Table 3 represent the obtained results of NPCR and UACI of our cryptosystem compared with other algorithms. The obtained results demonstrate the robustness of the proposed cryptosystem. Thus, our crypto system can resist against differential attacks. Table 2. NPCR of our approach and other approaches Lena
House Boat Lake
Our Approach 99.63 99.63
99.63 99.61
Ref [5]
99.91 89.99
99.61 99.02
Ref [17]
99.60
-
-
-
Table 3. UACI of our approach and other approaches Lena
5
House Boat Lake
Our Approach 33.51 33.52
33.53 33.39
Ref [5]
33.69 33.96
33.73 33.49
Ref [17]
30.66
-
-
-
Conclusion and Future Work
In this work, we attempted to propose new 2D-BCM. Our new chaotic map have strong chaotic behavior and large range parameters. Then we used this map into a compression-encryption algorithm to better prove its effectiveness. The results obtained from the statistical and differential attacks confirm that our new 2D-BCM can rely into a compression-encryption cryptosystem and gives good results. We aims further to test the efficiency of our 2D-BCM with other cryptosystem.
References 1. Chahar, V., Girdhar, A.: 2D logistic map and Lorenz-Rossler chaotic system based RGB image encryption approach. Multimedia Tools Appl. 80, 1–25 (2021) 2. Chai, X., Zheng, X., Gan, Z., Han, D., Chen, Y.: An image encryption algorithm based on chaotic system and compressive sensing. Signal Process. 148, 124–144 (2018) 3. Elkhalil, N., Zahmoul, R., Ejbali, R., Zaied, M.: A joint encryption-compression technique for images based on beta chaotic maps and SPIHT coding. The Fourteenth International Conference on Software Engineering Advances, ICSEA 2019, 130 (2019)
138
N. Elkhalil et al.
4. Ghadirli, H., Nodehi, A., Enayatifar, R.: An overview of encryption algorithms in color images. Signal Process. 164, 163–185 (2019) 5. Hamdi, M., Rhouma, R., Belghith, Safya: A selective compression-encryption of images based on SPIHT coding and Chirikov standard map. Signal Process. 131, 09 (2016) 6. Hamiche, H., Lahdir, M., Kassim, S., Tahanout, M., Kemih, K., Addouche, SA.: A novel robust compression-encryption of images based on SPIHT coding and fractional-order discrete-time chaotic system. OPTIK 109, 534–546 (2019) 7. Huang, R., Rhee, K.H., Uchida, S.: A parallel image encryption method based on compressive sensing. Multimedia Tools Appl. Vol. 72, 71–93 (2014) 8. Lu, P., Xu, Z., Lu, X., Liu, X.: Digital image information encryption based on compressive sensing and double random-phase encoding technique. OPTIK - Int. J. Light Electron Opt. 124, 2514-2518 (2013) 9. Masood, F., Ahmad, J., Shah, S.A., Sajjad, S., Jamal, Hussain, I.: A novel hybrid secure image encryption based on Julia set of fractals and 3D Lorenz chaotic map. Entropy 22, 274 (2020) 10. Rim, Z., Afef, A., Ejbali, R., Zaied, M.: Beta chaotic map Based image steganography, 97–104 (2020) 11. Rim, Z., Ejbali, R., Mourad Zaied, M.: Image encryption based on new beta chaotic maps. Opt. Lasers Eng. 96, 39–49, (2017) 12. Rim, Z., Zaied, M.: Toward new family beta maps for chaotic image encryption. pp, 52–57 (2016) 13. Souden, H., Ejbali, R., Zaied, M.: A watermarking scheme based on DCT, SVD and BCM, 116 (2019) 14. Tong, X.-J., Chen, P., Zhang, M.: A joint image lossless compression and encryption method based on chaotic map. Multimedia Tools Appl. 76, 06 (2017) 15. Xiangjun, Wu., Wang, D., Kurths, J., Kan, H.: A novel lossless color image encryption scheme using 2D DWT and 6D hyperchaotic system. Inf. Sci. 349, 02 (2016) 16. Zhang, H., Wang, X-Q., Sun, Y-J., Wang, X-Y.: A novel method for lossless image compression and encryption based on LWT, SPIHT and cellular automata. Signal Process. Image Commun. 84, 115829 (2020) 17. Zhang, M., Tong, Xiaojun: Joint image encryption and compression scheme based on IWT and SPIHT. Opt. Lasers Eng. 90, 03 (2017)
A Meta-analytical Comparison of Naive Bayes and Random Forest for Software Defect Prediction Ch Muhammad Awais1(B) , Wei Gu1 , Gcinizwe Dlamini1 , Zamira Kholmatova1 , and Giancarlo Succi2 1
Innopolis University, Innopolis, Russia {c.awais,g.wei,g.dlamini,z.kholmatova}@innopolis.university 2 Università di Bologna, Bologna, Italy [email protected]
Abstract. Is there a statistical difference between Naive Bayes and Random Forest in terms of recall, f-measure, and precision for predicting software defects? By utilizing systematic literature review and metaanalysis, we are answering this question. We conducted a systematic literature review by establishing criteria to search and choose papers, resulting in five studies. After that, using the meta-data and forest-plots of five chosen papers, we conducted a meta-analysis to compare the two models. The results have shown that there is no significant statistical evidence that Naive Bayes perform differently from Random Forest in terms of recall, f-measure, and precision. Keywords: Random Forest · Naive Bayes Software Defect Prediction · Meta-analysis
1
· Defect Prediction ·
Introduction
A defect that causes software to behave unexpectedly in a way that does not meet the actual requirements is known as a software defect [8]. Software defects can be heavily influential, which gives rise to disasters. Nowadays, software systems are increasing steadily in size and complexity [26]. Industry has developed sophisticated software testing methodologies [14]. To ensure quality, newly developed or modified engineering products are often subjected to rigorous testing. However, the nature of software testing is time-consuming and resourcehungry, typically the software testing process takes up approximately 40%–50% of the total development resources, 30% of the total effort and 50%–60% of the total cost of software development [12]. Over the past two decades, a variety of machine learning approaches have been employed to detect software defects in an endeavor to reduce, to minimize cost and automate the task of software testing [3]. In a typical workflow for building an ML defect prediction model, the data usually comes from the version control system and contains the source code and commit messages. Accordingly, an instance can be labelled as defective or c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 139–149, 2023. https://doi.org/10.1007/978-3-031-35501-1_14
140
C. M. Awais et al.
non-defective. The instances in the data set are then constructed based on the acquisition of tags and other messages in the version control system, and a defect prediction model can be built by using a set of training instances. Finally, the prediction model can classify whether a new test instance has a defect or not [16]. Our focus is on comparing prediction methods based on the following two ML models: Naive Bayes (NB) Random forest (RF) [11] In this paper, we present a systematic literature review and meta-analysis of two machine learning methods for software defect prediction. Our motivation is to help researchers and practitioners in the field of software defect prediction to build a better understanding of these two algorithms that have been applied from a very early stage. The goal is to find out if there is any significance performance difference between NB and RF on software defect prediction in terms of precision, recall and f-measure. The remaining sections of this paper are organized as follows: In Sect. 2, related works are outlined. Section 3 describes the methodology, systematic literature review and meta-analysis. Section 4 presents the result and the discussion. Section 5 outlines the conclusion.
2
Related Works
Software defect prediction is executed using some software related data, that consists of information about the code for software development, i.e. lines of code, number of methods, inheritance etc. In the paper [9], researchers have discussed the effects of the metric set on software defect prediction, which summarizes that, if we precisely simplify metrics/features of our dataset, then the simplest model will perform better for software defect prediction, meaning that the feature selection plays an important role in software defect prediction. Machine learning models [5] are used for performing software defect prediction. The researchers [23] discussed the techniques for building an ideal model capable of detecting unseen defects in software. Software defect prediction models only focus on training on a specific domain of data, i.e., open-source projects, whereas an ideal model is trained on commercial projects. Moreover, preprocessing (i.e. feature selection, multi-co-linearity) of the dataset for training, can improve the performance of the defect prediction model significantly [23]. Meta-analysis is the procedure of combining different studies for finding out whether the experiments performed in different studies [13] follow the same distribution or not. It helps in validating the results in different studies. The limitations of meta-analysis for defect prediction [20], resulted in a guide for future researchers for meta-analytical studies on software defect prediction. Software defect prediction can vary based on the knowledge and skills of the researcher [19]. In [19], a meta-analytical experiment helped in finding bias in the results of software defect prediction, comparing 4 different groups (Classifier’s effect, Dataset effect, effect of metrics and effect of researchers). The results showed that the effect of the classifier on the result was smaller than the effect of the researcher on the study result.
Meta-analysis for Software Defect Prediction
141
Soe et al. [22] performed experiments on 12 different datasets, in nine datasets RF performed better than NB with a marginal difference, in one the accuracy was the same, and in one NB outperformed RF with a difference of 0.07%. The accuracy of NB on the PC3 dataset was i.e. 45.8%, while RF achieved 89.76%, which made a huge difference of 43.89%, because of these differences the researchers stated that Random Forest is better. We are testing the claim of [22], using meta-analysis and systematic literature review.
3
Methodology
Our proposed methodology is presented in Fig. 1. It contains two main stages: A) Literature Review and B) Meta-analysis. The following sections outline the details about each stage.
Start
Research Questions Hypothesis definition Search Strategy definition
Searching for related Studies
Performing Meta-Analysis
Studies Selection Data Extraction
End
Conclusion
Fig. 1. Methodology Overview
3.1
Literature Review
To test our aforementioned hypothesis using a meta-analytical approach and address the research questions, we conducted a systematic literature review. Research Questions and Hypothesis Definition: Research questions (RQ) helps in steering research to valuable and constructive conclusions [2]. We apply PICOC (Population, Intervention, Comparison, Outcomes, and Context) [17] approach to produce six main research questions, which drive the analysis application of machine learning techniques (i.e. naive Bayes, Random forest) in software defect prediction. RQ1. Which repositories are popular for software defect prediction? RQ2. What kinds of datasets are the most commonly utilized in prediction of software defects? RQ3. What are the programming languages which commonly in datasets for software defect prediction? RQ4. What are the common methods that are used to predict software defects? RQ5. What comparison metrics are commonly used for comparing the software defect prediction performance?
142
C. M. Awais et al.
Search Strategy: Before beginning the search, it is important to choose a suitable combination of databases to enhance the chances of discovering extremely relevant articles. Digital libraries are the databases where the papers are published, it is an important step to decide which libraries to choose. To have the broadest set of studies possible, it is important to search for literature databases that offer a broad perspective on the field. We used Google Scholar (GS), Research Gate (RG), Springer (SP) as search databases. Based on our research questions we formulated the following query string: (software OR applicati* OR systems) AND (fault* OR defect* OR quality OR error-prone) AND (predict* OR prone* OR probability OR assess* OR detect* OR estimat* OR classificat*) AND (random* OR forest) AND (naive OR bayes) The human behavior may differ based on their own selection criteria, so the search string was adjusted based on the trends being used for searches, also the original one was kept. After the search string was adjusted to suit the specific requirements of each database, the databases were searched by title, keyword, and abstract. Search results limited to publications published from 1999 to 2021 were retrieved. Only English journal papers and conference proceedings were included. Study Selection: We will discuss the tools and techniques used for selecting the studies for performing the meta-analytical study. Also in this section we will explain the process of filtering retrieved studies requires outlining inclusion and exclusion criteria. To eliminate bias in the systematic literature review procedure, we defined the inclusion and exclusion criteria using PICOC approach in advance. – Inclusion Criteria: We included studies that • are discussing NB and RF models in software defect prediction. • are with publicly available datasets for software defect prediction. • benchmarked performance with appropriate metrics for comparison. – Exclusion criteria • Studies that do not include our research objects. • Studies with unpublished datasets. • Studies that are with no validation of experimental results or experimental process. • Non-peer-reviewed studies. • Systematic review studies. Initially, the literature reviews and non peer-reviewed papers were removed. We created groups in Mendeley [25] to collaborate on the screening of studies. Two reviewers independently reviewed all the works, decisions on whether to screen a study were made according to established exclusion and inclusion criteria. When reviewers disagree, a discussion will occur and an explanation for screening will be provided. If there is uncertainty for the final decision, then the literature will be included for analysis.
Meta-analysis for Software Defect Prediction
143
Selection Results: We have collected 62 research studies based on our search query from different databases. On this stage we applied screening based on title(Removal of papers with unrelated titles) and removal of duplicates. Assessment Criteria: We utilized a brief checklist inspired by Ming’s [15] work to determine whether the study provides adequate contextual and methodological information, to interpret if the study is sufficient to answer the research questions. After a pilot test, we fine-tuned the criteria for our research questions using the following different stages. – Stage 1: Context Criteria • The study’s objective and domain should be clearly stated. • The programming language for benchmarking must be specified. • The source information of publicly available dataset should be provided. – Stage 2: Model building Criteria • The features used for training(e.g., software metrics), must be explicit. • The dependent variable predicted by the models should be stated clearly. • The granularity of the dependent variable should be reported, i.e., whether the predicted outcome is a statistic on LOC, modules, files, or packages. – Stage 3: Data Criteria • There should be a report or reference about data collection in order to have a confidence on the dataset. • The data transformation or pre-processing techniques should be clearly stated in the study. – Stage 4: Predictive performance Criteria • The prediction model must be tested on unseen data after training. Assessment Results: We performed full-text analysis on each study to keep paper, which discussing Naive Bayes and Random Forests. We excluded the studies which were not discussing the specific metrics(i.e. recall, precision, fmeasure). Data Extraction: We carried out a pilot test on three randomized studies to observe if data extraction meets the needs of analysis. Two group members separately designed the table headers, after the pilot test, they discussed and combined the two tables to ensure the structure consistency. The data was extracted according to the established table structure, the consistency of data was compared to ensure that no mistakes were made in the extraction process. 3.2
Meta-analysis
In meta-analysis, firstly, we find out the outcome of a study, and based on this outcome, statistics are calculated for each study. Secondly, we combine the statistics of all studies to estimate the summary by using the weighted average of each
144
C. M. Awais et al.
study. Based on the number of experiments, and the publishing year, the study is given weights or other factors, i.e. study with fewer experiments will get a lower weight etc. [24]. To understand meta-analysis, we need to define a few terms, which are as follows: – Effect Size: The measure of an outcome to validate/invalidate the hypothesis. We used mean-difference because our study is based on the recall score. – Forest Plot: The Forest plot indicates the variability in the results, based on the forest plot we decide the presence of heterogeneity. – Heterogeneity: The validation criterion to decide whether to employ metaanalysis, i.e. high heterogeneity means the studies are not linking up, and it leads to halt in meta-analysis. In our case, low heterogeneity can be later visualized in the forest plots. – Effect Model: We utilized two effects models, i.e. fixed and random. We will discuss them in the next section. Fixed vs Random Effect Models: Fixed effect model is used for best estimation of the effect size, it is presumed that all studies have a single effect. Whereas random effect model is used for average effect, in random effect it is assumed that each study has a different effect [24]. In order to estimate the individual effects of a assessment criterion and the difference between the experimental and control groups, we calculate the effect size. The below formulas from [4] are used for calculation of the Effect Size g and its standard deviation SEg : g =J ×d
(1)
3 4df −1
j is Hedge’s correction factor: J = 1 − and df = N umberof Studies − 1 d is Cohen’s standardized difference of the sample means: d=
¯1 − X ¯2 X Swithin
(2)
Here X1 and X2 are the means of the control group (Naive Bayes) and the experimental group (Random Forest) respectively, with Swithin being the pooled within-groups standard deviation (with an assumption that σ1 = σ2 ): (n1 − 1) S12 + (n2 − 1) S22 Swithin = (3) n1 + n2 − 2 where S1 ,S2 are the standard deviation of control group and experimental group and n1 ,n2 are the number of studies of control group and experimental group respectively. 2 The variance and standard deviation of each Effect Size g is: Vg = J × Vd and SEg = Vg where Vd is the variance of Cohen’s d. Here, again n1 ,n2 are the number of studies of control group and experimental group respectively: Vd =
n1 + n2 d2 + n1 n2 2 (n1 + n2 )
(4)
Meta-analysis for Software Defect Prediction
Finally, each study’s weight is calculated as: Wi∗ =
1 Vg∗
i
145
and Vg∗i = Vgi + T 2
where Vgi is the within-study variance for a study i and T 2 is the betweenstudies variance. In this study, τ 2 is estimated with T 2 using the DerSimonian and Laird method [7]
4
Results and Discussions
The results section is discussing the detailed analysis of the discovered literature, then the answers to the research questions, and at the end of the section we discussed the meta-analytical results, presenting the meta-data and forest-plots. 4.1
Literature Review
We found 62 studies, the repository representation is as follows, 47.37% from PROMISE, 42.10% from NASA and 10.53% from OSS. The answers to the research questions are as follows: RQ1: Which repositories are popular for software defect prediction? There were two types of repositories, public and private, in most of the studies the three repositories were discussed, PROMISE, NASA being public and OSS being private. When we analyzed the results, we found out PROMISE, NASA are commonly used, because of public availability. RQ2: What kinds of datasets are the most commonly utilized in prediction of software defects? It can be visualized from the Fig. 2, that the researchers mostly used PC3, KCI, PC1, PC4. There is a marginal difference between the usage dataset, also the datasets are used in a combination with other datasets, but we can see that jEdit, KC3 and ANT were least commonly used. RQ3: What are the programming languages which commonly in datasets for software defect prediction? In the studies we observed that there were only four languages used, while Java and C were most common, whereas Perl was the least used Fig. 3.
Fig. 2. Most common datasets
Fig. 3. Dataset languages
146
C. M. Awais et al.
RQ4: What are the common methods that are used to predict software defects? As our research query only pinpoints the Naive Bayes and Random Forest, so these were the most common, besides these models we found out that Linear Regression and Decision Tree are also commonly used and Multi-layer perceptron was least used in Fig. 5. RQ5: What comparison metrics are commonly used for comparing the software defect prediction performance? The top discussed metrics are Accuracy, F-measure, Recall, and precision, one of the reason of these being top was the search query, and it is a known fact that accuracy is the common metric when we discuss ML models. Whereas, it was interesting to observe that precision was used fewer times, the least used metrics were MAE and ROC (Fig. 4).
Fig. 4. Common Metrics
4.2
Fig. 5. Common Methods
Meta-analysis
Meta-analysis is performed on the data we retrieved after literature review, the studies selected for meta-analysis are presented in Table 1. In the table, NB experiments represents the number of experiments performed for Naive Bayes, and RF for Random Forest. Y represents the presence of a specific measure in the study, and N means absence. Table 1. Meta-data Title
Year NB Experiments
RF Experiments
Recall Precision F-Measure Total Models Total Metrics
S1 [10] 2017
4
4
Y
Y
Y
11
4
S2 [18] 2018
7
7
Y
Y
Y
6
5 2
S3 [6] 2018
3
3
N
N
Y
4
S4 [1] 2019
24
24
Y
Y
Y
10
6
S5 [21] 2020
3
3
Y
N
N
5
2
Meta-analysis for Software Defect Prediction
147
Further, we drew forest plots to analyze the meta-data. Forest plot consists of 10 columns, where each row represents a study, the authors are the identifiers. Total is the number of experiments in the study, and mean and SD refer to the mean and standard deviation of that performance measure. The Standardized Mean Difference refers to the standardized difference between the Mean performance of both groups, and the values are in SMD column. The positions of the shaded rectangles represent the standardized mean differences for a study instance, while their size represents the effect weights. 95% CI represents the confidence interval, verifies the hypothesis for SMD. Weights are the contribution of the study to the meta-analysis. Diamond represents the combined effect of all the studies. Heterogeneity identifies the model (random or fixed effect). Test for overall effect shows overall effect based on degrees of freedom. The forest plots are represented in Fig. 6, Fig. 7 and Fig. 8.
Fig. 6. Forest plot for precision
Fig. 7. Forest plot for recall
The standardized mean difference in the study is represented by the horizontal line of a forest plot. When the horizontal and vertical lines of a forest plot intersect, it means that there is no statistically significant difference between the research groups. These horizontal lines create a diamond, that symbolizes the combined impact of all the studies. There is no statistically significant difference between the groups for the studies when the diamond intersects the vertical line. From the forest-plots above, we can observe that all the diamonds are intersecting the vertical line, meaning that, the effects of Naive Bayes and Random Forests on software defect prediction are the same. The implementation details are publicly available in Github1 . 1
https://github.com/Muhammad0Awais/SDP_NB_RF_meta_analysis.
148
C. M. Awais et al.
Fig. 8. Forest plot for F-measure
5
Conclusions
In this study, we thoroughly reviewed literature assessment and performed a meta-analysis to determine if naive Bayes and random forest are more effective at predicting software problems. We collected the related studies, and presented statistics about databases, datasets, languages, repositories, machine learning models, comparison metrics. We retrieved 62 studies that address Naive Bayes and Random Forest, and five of them were chosen for meta-analysis based on our criteria. We conclude from the meta-analysis that there is no statistically significant difference between random forest and naive Bayes in terms of recall, precision, and f-measure for software defect prediction. Acknowledgements. This research is supported by the Russian Science Foundation, Grant No. 22-21-00494.
References 1. Ali, U., Iqbal, A., Aftab, S.: Performance analysis of machine learning techniques on software defect prediction using NASA datasets (2019) 2. Agee, J.: Developing qualitative research questions: a reflective process. Int. J. Qual. Stud. Educ. (2009) 3. Aleem, S., Capretz, L.F., Ahmed, F.: Benchmarking machine learning technologies for software defect detection (2015) 4. Borenstein, M., et al.: Identifying and quantifying heterogeneity. In: Introduction to Meta-analysis (Chap. 16), pp. 107–125 (2009). ISBN 9780470743386 5. Challagulla, V.U.B.: Empirical assessment of machine learning based software defect prediction techniques. ACP J. Club. (2005) 6. Petric, J., Bowes, D., Hall, T.: Software defect prediction: do different classifiers find the same defects? Softw. Qual. J. (2018) 7. DerSimonian, R., Laird, N.: Meta-analysis in clinical trials. Controlled Clin. Trials (1986) 8. Devnani-Chulani, S.: Modeling software defect introduction. In: Proceedings of California Software Symposium (1998) 9. Bing, L., Peng, H.: An empirical study on software defect prediction with a simplified metric set. Inf. Softw. Technol. (2015) 10. Akour, M., Alazzam, I., Alsmadi, I.: Software fault proneness prediction: a comparative study between bagging, boosting, and stacking ensemble and base learner methods (2017)
Meta-analysis for Software Defect Prediction
149
11. Jacob, S.G., et al.: Improved random forest algorithm for software defect prediction through data mining techniques. IJCA (2015) 12. Kumar, D., Mishra, K.K.: The impacts of test automation on software’s cost, quality and time to market (2016) 13. Leandro, G.: Meta-analysis in medical research: The handbook for the understanding and practice of meta-analysis. ACP J. Club. (2008) 14. Lewis, W.E.: Software Testing and Continuous Quality Improvement. CRC Press, Boca Raton (2017) 15. Li, M., et al.: Sample-based software defect prediction with active and semisupervised learning. Autom. Softw. Eng. (2012) 16. Li, Z., Jing, X.-Y., Zhu, X.: Progress on approaches to software defect prediction. IET Softw. (2018) 17. Roberts, H., Petticrew, M.: Systematic reviews in the social sciences: a practical guide. ACP J. Club. (2008) 18. Jain, S., Kakkar, M.: Is open-source software valuable for software defect prediction of proprietary software and vice versa? (2018) 19. Bowes, D., Shepperd, M., Hall, T.: Researcher bias: the use of machine learning in software defect prediction (2016) 20. Miller, J.: Applying meta-analytical procedures to software engineering experiments. J. Syst. Softw. (2000) 21. Dubey, R., Khakhar, P.: The integrity of machine learning algorithms against software defect prediction (2020) 22. Soe, Y.N., Oo, K.K.: A comparison of Naıve Bayes and random forest for software defect prediction. In: ICCA (2018) 23. Khari, M., Son, L.H., Pritam, N.: Empirical study of software defect prediction: a systematic mapping. Symmetry (2019) 24. University of York: CRD’s guidance for undertaking reviews in health care. In: Introduction to Meta-analysis (Chap. 1), pp. 54–55 (2009) 25. Zaugg, H., et al.: Mendeley: creating communities of scholarly inquiry through research collaboration. TechTrends 55, 32–36 (2011) 26. Zuse, H.: Software Complexity: Measures and Methods. Walter de Gruyter GmbH & Co KG, Munich (2019)
Skeleton-Based Human Activity Recognition Using Bidirectional LSTM Monika(B) , Pardeep Singh, and Satish Chand School of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi 110067, India [email protected] Abstract. Human activity recognition using 3D skeleton data has drawn the attention of research community. It is an interesting and wellsought-after computer vision problem whose prime objective is to determine actions performed by a human in a video or an image. The research work investigating human activity recognition in the 3D skeleton are still very limited. In this paper, we employ bidirectional long short-term memory (Bi-LSTM) deep learning model that utilizes skeleton information for modeling the dynamics in sequential data. The proposed model has been evaluated on Stony Brook University’s (SBU) dataset having two-person interactions with 8 different actions. Results demonstrate that the proposed method perform better than feature-based machine learning and state-of-the-art methods. Keywords: human action recognition skeleton data · two person interaction
1
· long short term memory · 3D
Introduction
Human activity recognition is an important field studied extensively in the field of computer vision with several applications in human behavior learning, video surveillance and ambient assisted living [12,20]. The majority of the studies in the literature primarily focuses on single person activity recognition. Therefore, understanding two-person interactions may be more significant due to their societal ramifications and potential action relationships between persons. Recognizing body pose, gesture, and body position is necessary for identifying these interactions. Traditionally human activity recognition systems are developed using monocular RGB videos which barely represent actions in 3D space [28]. Hence computer vision researchers have paid attention to monitor skeleton joints information to recognize human activities. The development of sensor technology provides a method to extract a person’s 3D skeleton data for the purpose of better human activity recognition. The skeleton data containing 3D joint coordinates have many advantages over RGB photos and videos: being robust to illumination, lightweight and stable against the background. In recent years deep learning models have gained attention in handling sequential data related to computer vision, language modeling, RGB video c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 150–159, 2023. https://doi.org/10.1007/978-3-031-35501-1_15
Skeleton-Based Human Activity Recognition
151
analysis etc for the classification task [4,13,15,16,18,29]. Hence in this paper, we develop a human activity recognition system for two-person interaction using LSTM sequence to sequence network bidirectionally. We summarize our contributions as follows: – We present an extensive study to explore skeleton base human activity recognition(HAR) using Bi-LSTM sequence to sequence model. – The proposed model works regardless scene, illumination and pose. – To protect the privacy of persons, we fed the 3D skeletal joint coordinates of the two persons instead of video for training the Bi-LSTM model. A pair of fifteen skeleton coordinates have been given to the networks for recognizing the activities. The remainder of the paper is organized as follows: The overview of previous work is presented in Sect. 2. Section 3 discusses Bi-LSTM method for HAR. The experimental results are presented in Sect. 4. Finally, we discuss the conclusion and future suggestions.
2
Related Work
This section discusses the related work in the area of human activity recognition task. Researchers have applied several computational techniques to address this problem ranging from sensors to dash-cam real time video. Action Recognition Based On RGB: There are two primary approaches for using RGB data as a visual cue: advanced handcrafted feature-based methods [1,19] and deep learning-based techniques [29]. Before deep learning-based methods gained popularity, Wang’s improved Dense Trajectories(iDT) [21–23] and Bag-of-Words(BOW) framework reached the pinnacle of handcrafted feature development in action recognition and achieved the highest performance in most benchmark datasets, including UCF101 and HDMB51. For extracting appearance features from RGB and motion data from the optical flow [25] introduced a two-stream framework based on the VGG model. To extract 3D action features authors in [24] developed a new two-stream inflated 3D ConvNet model (I3D) on the UCF101 dataset. Recently, low-cost depth sensors like Kinect have being discussed. These methods combined joint localization and low-resolution depth data from the Kinect to identify falls. Kinect depth sensors have been used to detect falls as well as other activities to develop fall detection systems [3]. Skeleton-based Action Recognition: Skeleton data has attracted greater attention for human action recognition because of its resistance to variations in body scales, orientation, and background. Traditional techniques have emphasized on manually creating hand-crafted features and joint relationships, which are only capable of storage lower-level data and overlook the crucial semantic
152
Monika et al.
connectivity of the human body. Conventional approaches typically reorganize the skeleton data into a grid-like layout and feed it directly into a traditional RNN [13] or CNN [7] architecture to take advantage of the deep learning model. Convolutional neural networks (CNNs) networks have been used in combination in various earlier works on activity recognition [7,27]. Authors in [2] discussed a skeleton-based activity recognition system using hierarchical RNNs. They have divided the human skeleton Kinect sensor data into five different joints. These joints are then provided to five different bidirectional RNNs for prediction. Among various RNN architectures, LSTMs are the most popular due to their capability of remembering long-term sequences. Hence in our human activity recognition task, we employ this model bidirectionally for better performance of results.
3
Methodology
In this section, we briefly introduce Bi-LSTM [14] a deep learning model which can focus on informative joints in each frame of the 3D skeleton sequence. Bidirectional- LSTM: It is a state-of-the-art approach for many classification challenges. LSTM has the edge over RNN and conventional neural networks due to its nature of remembering long-term dependencies. It also resolves the vanishing gradient problem of RNN with the help of carefully regulated structures known as Gates. The LSTM network has a chain-like structure of a repeated module known as a cell that can regulate the flow of information. Each LSTM cell have cell state and three gates i.e. input gate, output gate and forget gate. The other parameters denoting the weight for respective gate, output of the previous LSTM block, input at current timestamp and biases for the respective gates. The cell state transfers the relevant information along the sequence chain. The architecture of LSTM is to utilize a sequence of information in one direction known as forward LSTM [15]. This approach only preserves the previous context. However, the identification of sequence is determined by the previous context and the subsequent context. Hence Bidirectional LSTM (Bi-LSTM) is used to solve this issue. Bi-LSTM processes the sequence from both forward and backward directions to capture the context of the sequence better [14]. Figure 1 describes our Bi-LSTM deep learning architecture for human activity recognition. It is a layered architecture that comprises an input layer, feature vector layer, LSTM memory blocks, Dense layer and Softmax layer for final prediction. In top input layer, 3D coordinates of the 15 major joint positions i.e. J1 , J 2 , .., J45 of two persons are provided. The input sequence is pass to the next layer for feature vector representation F1 , F2 , .., Fn and further passed to the next BiLSTM layer. This layer has a chain-like structure of LSTM blocks that helps capture the sequence’s contextual information in both directions. The output of the Bi-LSTM is passed to the dense and softmax layers for final prediction of results.
Skeleton-Based Human Activity Recognition
153
Fig. 1. The proposed Bi-LSTM layered architecture. The 3D skeleton joint coordinates of two persons are extracted and passed to Bidirectional LSTM (Bi-LSTM) for extracting the automated spatial features. These features are passed to next dense and softmax layers to classify the activities.
4
Experimental Results
For executing the proposed work, we use Scikit-learn [11] and Keras [6] python libraries. We divide the dataset into train, validation and test sets with 70% data for training purposes, 10% data for validation, and 20% data for testing purposes. We employ categorical cross Entropy as a loss function to measures the dissimilarity between the ground truth value and the predicted value. To minimize the loss function and find the weight W that minimizes the total loss, we use ADAM as an optimizer. To avoid the model from overfitting, we have used dropout and early stopping as regularization approach. Dropout is a regularization technique where randomly selected neurons are ignored from the neural network during the model training [17]. Ignoring specific neurons can prevent their over-adaptation, which could lead to over-fitting. While using this technique, it is essential to set the hyperparameter defining the probability of selecting some neurons that will be dropped out of the network. We use dropout 0.3 in our work. Along with the dropout technique, we used another common regularization technique called early stopping. Early stopping automatically stops training when a specific performance metric (e.g., Validation loss, accuracy) stops improving.
154
Monika et al.
This approach suggests that the model should stop the training before overfitting occurs, but not too early so that the network has learned something. To evaluate the performance of the system, we use accuracy and Macro-F1 scores. 4.1
SBU Two-Person-Interaction Dataset
The SBU interaction dataset is a human action recognition dataset that captures two people interacting [26]. In this dataset, one person acts while the other reacts. The dataset has 282 sequences total of 6822 frames. This dataset has a significant challenge due to the low measurement accuracy of the joint coordinates in many sequences. The RGB, depth and tracked skeleton data in the SBU Kinect Interaction Dataset were obtained via an RGB-D sensor. In Fig. 3 represents the 15 skeleton body coordinates for each person and each coordinates is represented as (xi , yi , zi ). There are 8 different actions in total: approaching, departing, pushing, kicking, punching, exchanging, hugging and shaking-hand. The dataset consists of 21 sets, each of which comprises information about an individual person (7 participants) carrying out an action at a frame rate of 15 frames per second. Each interaction in the dataset is represented by a carefully segmented video lasting around 4 s, but each video generally begins with a standing stance before acting and concludes with a standing pose after acting. In Fig. 2 we represent class-wise distribution of two-person-interaction action.
Fig. 2. Two-person-interaction SBU Dataset skeletal joints labels [5].
4.2
Results and Discussions
Before using the classifier on relevant domain problems, one of the key tasks in machine learning is the visualization of large multidimensional data. The visualization and recognition of human activities from n-dimensional space to low-dimensional space by constructing a non-linear technique called t-SNE (tdistributed stochastic neighborhood embedding) [9]. The class label information has been introduced to accurately represent the actions from related action classes, and this approach is used to construct relationships between diverse
Skeleton-Based Human Activity Recognition
155
Fig. 3. Class-wise distribution of two person interaction SBU dataset action recognition task.
human behaviors. This falls under the category of the machine learning approach known as manifold learning, where the action is classified using the class label information. The features of eight classes appear in the form of crowd points together extracted using t-SNE as shown in Fig. 4 We evaluated the performance of models using Accuracy and Macro-Avg F1 since the metric accuracy alone is not good enough to verify the results. The accuracy and loss plots on training and validation data are shown in Fig. 5. The experimental results on the test set are shown in Table 2. The Bi-LSTM model performs better in comparison to baselines and other state-of-the-art methods. The class-wise results of the proposed model are also shown in Table 1 which shows how well our model performs class-wise. Finally, the confusion metrics are shown in Fig. 6 to represent the results better.
Fig. 4. The t-SNE visualization of training data for 8 classes of sample and visualized with different colors.
156
Monika et al.
Fig. 5. Model accuracy and loss curves using Bi-LSTM.
Fig. 6. Confusion matrix of the Bi-LSTM framework for two person interaction dataset. Table 1. Activity-wise precision, recall, F1-score and support using the Bi-LSTM S. No. Activity
Precision Recall F1-score Support
1.
Approaching
0.87
0.77
0.81
256
2.
Departing
0.79
0.93
0.86
197
3.
Kicking
0.97
0.84
0.90
242
4.
Punching
0.88
0.84
0.86
260
5.
Pushing
0.95
0.93
0.94
151
6.
Hugging
0.95
0.89
0.92
139
7.
Shaking-hands 0.94
0.99
0.96
251
8
Exchanging
0.82
0.84
210
0.86
Skeleton-Based Human Activity Recognition
157
Table 2. Human action recognition accuracy and Macro-F1 for two person interaction kinect dataset of using different algorithms S. No. Algorithm
5
Model Accuracy F1-score
1
Gaussian Naive Bayes 0.54
0.56
2
MLP
0.73
0.72
3
SGD
0.48
0.50
4
RandomForest
0.86
0.85
5
Voting
0.82
0.82
6
H-RNN [2]
0.80
–
7
CHARM [8]
0.86
–
8
X-Mean [10]
0.88
–
9
Bi-LSTM
0.90
0.90
Conclusions and Future Scope
This paper introduces a Bi-LSTM architecture for human activity recognition based on skeleton data. The proposed model classifies 8 actions between two interacting persons. The experimentation on the SBU dataset showed the performance of the proposed approach. It is found empirically that Bi-LSTM shows better performance than baselines and state-of-the-art methods. This finds applications in various domains especially in understanding human behavior. In the future, we seek to discover transformer-based architectures without much feature engineering work.
References 1. Chen, C., Jafari, R., Kehtarnavaz, N.: Action recognition from depth sequences using depth motion maps-based local binary patterns. In: 2015 IEEE Winter Conference on Applications of Computer Vision, pp. 1092–1099. IEEE (2015) 2. Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1110–1118 (2015) 3. Gasparrini, S., Cippitelli, E., Spinsante, S., Gambi, E.: A depth-based fall detection R sensor. Sensors 14(2), 2756–2775 (2014) system using a kinect 4. Ibrahim, M.S., Muralidharan, S., Deng, Z., Vahdat, A., Mori, G.: A hierarchical deep temporal model for group activity recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1971–1980 (2016) 5. Islam, M.S., Bakhat, K., Khan, R., Naqvi, N., Islam, M.M., Ye, Z.: Applied human action recognition network based on SNSP features. Neural Processing Letters, pp. 1–14 (2022) 6. Ketkar, N.: Introduction to Keras. In: Deep Learning with Python, pp. 95–109. Apress, Berkeley, CA (2017). https://doi.org/10.1007/978-1-4842-2766-4 7
158
Monika et al.
7. Lai, K., Yanushkevich, S.N.: CNN+ RNN depth and skeleton based dynamic hand gesture recognition. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 3451–3456. IEEE (2018) 8. Li, W., Wen, L., Chuah, M.C., Lyu, S.: Category-blind human action recognition: a practical recognition system. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4444–4452 (2015) 9. Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11), 2579–2605 (2008) 10. Manzi, A., Fiorini, L., Limo-sani, R., Dario, P., Cavallo, F.: Two-person activity recognition using skeleton data. IET Comput. Vision 12(1), 27–35 (2018) 11. Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. research 12, 2825–2830 (2011) 12. Hbali, Y., Hbali, S., Ballihi, L., Sadgal, M.: Skeleton-based human activity recognition for elderly monitoring systems. IET Comput. Vis. 12(1), 16–26 (2018). https://doi.org/10.1049/iet-cvi.2016.0231 13. Ren, B., Liu, M., Ding, R., Liu, H.: A survey on 3D skeleton-based action recognition using learning method. arXiv preprint arXiv:2002.05907 (2020) 14. Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997) 15. Singh, P., Chand, S.: Pardeep at semeval-2019 task 6: identifying and categorizing offensive language in social media using deep learning. In: Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 727–734 (2019) 16. Singh, P., Chand, S.: Predicting the popularity of rumors in social media using machine learning. In: Shukla, R.K., Agrawal, J., Sharma, S., Chaudhari, N.S., Shukla, K.K. (eds.) Social Networking and Computational Intelligence. LNNS, vol. 100, pp. 775–789. Springer, Singapore (2020). https://doi.org/10.1007/978-981-152071-6 65 17. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014) 18. Sundermeyer, M., Schl¨ uter, R., Ney, H.: LSTM neural networks for language modeling. In: Thirteenth Annual Conference of the International Speech Communication Association (2012) 19. Thakur, D., Biswas, S.: An integration of feature extraction and guided regularized random forest feature selection for smartphone based human activity recognition. J. Netw. Comput. Appl. 204, 103417 (2022) 20. Taha, A., Zayed, H.H., Khalifa, M.E., El-Horbaty, E.S.M.: Skeleton-based human activity recognition for video surveillance. Int. J. Sci. Eng. Res. 6(1), 993–1004 (2015) 21. Wang, X., Zhu, Z.: Vision-based framework for automatic interpretation of construction workers’ hand gestures. Autom. Constr. 130, 103872 (2021) 22. Wang, Y., Sun, M., Liu, L.: Basketball shooting angle calculation and analysis by deeply-learned vision model. Futur. Gener. Comput. Syst. 125, 949–953 (2021) 23. Xu, G.L., Zhou, H., Yuan, L.Y., Huang, Y.Y.: Using improved dense trajectory feature to realize action recognition. J. Comput. 32(4), 94–108 (2021) 24. Yifan, W., Doersch, C., Arandjelovi´c, R., Carreira, J., Zisserman, A.: Input-level inductive biases for 3D reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6176–6186 (2022) 25. Yu, W., Yang, K., Bai, Y., Xiao, T., Yao, H., Rui, Y.: Visualizing and comparing AlexNet and VGG using deconvolutional layers. In: Proceedings of the 33rd International Conference on Machine Learning (2016)
Skeleton-Based Human Activity Recognition
159
26. Yun, K., Honorio, J., Chattopadhyay, D., Berg, T.L., Samaras, D.: Two-person interaction detection using body-pose features and multiple instance learning. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Work-shops, pp. 28–35. IEEE (2012) 27. Zhang, S., et al.: Fusing geometric features for skeleton-based action recognition using multilayer LSTM networks. IEEE Trans. Multimedia 20(9), 2330–2343 (2018) 28. Zhang, X., Xu, C., Tian, X., Tao, D.: Graph edge convolutional neural networks for skeleton-based action recognition. IEEE Trans. Neural Netw. Learn. Syst. 31(8), 3047–3060 (2019) 29. Zhu, W., Lan, C., Xing, J., Zeng, W., Li, Y., Shen, L., Xie, X.: Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016)
A Data Warehouse for Spatial Soil Data Analysis and Mining: Application to the Maghreb Region Widad Hassina Belkadi1(B) , Yassine Drias2 , and Habiba Drias1 1
2
LRIA, USTHB, BP 32 El Alia, Bab Ezzouar, 16111 Algiers, Algeria [email protected], [email protected] University of Algiers, 02 rue Didouche Mourad, 16000 Algiers, Algeria [email protected]
Abstract. Soil health affects soil functions, food production, and climate change. Understanding our soil is important for our environment and food security. Therefore, we propose a data warehouse architecture for storing, processing, and visualizing soil data. In this study, we focused on the Maghreb region for the first time. This region is highly vulnerable to climate change and food insecurity, even though, as far as we know, it has not yet been considered. The proposed data warehouse architecture involves data mining and data science techniques to extract value from soil data and support decision-making at various levels. In our case, the warehoused data were analyzed using exploratory spatial data analysis tools to explore the spatial distribution, heterogeneity, and autocorrelation of soil properties. In our experiments, we presented the results of organic carbon (OC) analysis because of its importance to regulate climate change. The highest values of OC were situated in Morocco and Tunisia and we noticed a great distribution similarity in Algeria and Libya. One of the most important results of the autocorrelation analysis is the presence of positive spatial autocorrelation that leads to a perfect spatial clustering situation in the whole Maghreb region. This motivated us to a future work that aims to detect and interpret these clusters. Keywords: Data warehouse · Data mining · Digital soil mapping · Exploratory Spatial Data Analysis (ESDA) · Maghreb · Climate change
1
Introduction
The food and agriculture organization of the United States has confirmed that the Russian Federation and Ukraine are one of the world’s top producers of agricultural products. The war between these countries has created major global food security challenges. During this period, there will be a clear focus on food production. Soil is one of the most important natural resources including various types of metals, nutrients, and minerals that are essential for plant growth [15]. Poor soil management can lead to many problems. In Maghreb, several soil phenomena and damage from soil loss increased over the last two decades such as climate change, soil erosion, and drought. This motivated us to study c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 160–170, 2023. https://doi.org/10.1007/978-3-031-35501-1_16
A Data Warehouse for Spatial Soil Data Analysis and Mining
161
the Maghreb region soil which did not show much concern over these threats. Spatially accurate soil map is urgently needed to address the above issues. In addition, a good manipulation of these data is a prerequisite for the continued use in soil analysis applications. There are variety of objectives in our work. First, to collect and store the Maghreb soil data in an effective data warehouse where the main challenge is to process the data to be clean, consistent and useful for many soil science applications such as digital soil mapping and soil quality monitoring. Then, to analyze the spatial distribution, correlation and heterogeneity of the soil properties using the exploratory data analysis tools to get a better picture of the Maghreb soil data. Table 1. Instances distribution over Maghreb region. Country Algeria Tunisia Morocco Lybia Mauritania Count
2
295
98
128
138
132
Background
This section introduces the main concepts of technologies and methods used in this work: Sect. 2.1 introduces data warehouses and OLAP systems, and Sect. 2.2 describes exploratory spatial data analysis methods. 2.1
Data Warehouse
Data warehousing (DW) and online analytical processing (OLAP) are business intelligence technologies defined as decision support systems that enable online analysis of vast amounts of multidimensional data [16]. The DW conceptual model is defined by the concept of fact and dimension tables. Facts are described by numerical attributes called measures, such as the quantity of products sold [16]. These are data that will be analyzed at different levels represented by the dimensions. Data from operational DBMS and other external sources are extracted, transformed, and loaded (ETL) into DW. Analysts explore the warehoused data using OLAP queries which answer questions related to decisions making using graphical views to analyze, and mine the data. 2.2
Exploratory Spatial Data Analysis (ESDA)
Exploratory Spatial Data Analysis (ESDA) is an extension of Exploratory Data Analysis (EDA) for discovering spatial properties of data, recognizing if there are geographic patterns in data, forming hypotheses based on data geography, and evaluating spatial models [10]. It is important to be able to link numerical and graphical steps to the map to answer questions like, “Where are these cases on the map?” [10]. The steps of the ESDA process are as follows. First, display the distribution of a variable and identify outliers using histograms, box plots, and choropleth maps which are geographic maps that display statistics encoded
162
W. H. Belkadi et al.
in a color palette, such as a quantile map. Then, discover patterns of spatial autocorrelation and spatial heterogeneity. Spatial heterogeneity means that the observations are not homogenous over space. Spatial autocorrelation refers to the existence of a “functional relationship between what happens at one point in space and what happens elsewhere” [1]. We talk about global or local autocorrelation. By examining the global spatial autocorrelation, a statement can be made about the degree of clustering within the data set. The first choice for estimating global autocorrelation is a well-known statistic called Moran’s I. Local autocorrelation aims to identify these clusters in order to provide a more detailed picture. It is offered by Moran scatter plot and the Local Indicators of Spatial Association (LISA), which finally reveals spatial heterogeneity.
3
Related Works
In this section, we will give a small review about the works related to DW as well as ESDA. Data warehouses have been widely investigated for the collection of environmental data such as climate, hydrological, and agricultural data. For climatic concerns, recent work was done by [9], the authors developed a DW for the collection and the mining of climatic data from 1990 to 2019 of the Maghreb region to observe climate change. Some data warehouses related to agriculture and soil science concerns were developed, which are of interest to this work. The authors of [16] used the OLAP technology to protect the olive trees from pests considering the relationships among climate, crops, and pests data. A DW for the improvement of nitrogen catchment was built in [5] to store and analyze simulated spatially distributed agro-hydrological data. In a more recent research [2], the authors proposed a system that designed first a DW and OLAP tool to store and explore multidimensional data. Then, they used data mining and data science tools to explore the temporal dynamics of invertebrate diversity in farmland ecosystems. Agricultural and soil data analysis needs to deal with spatial data. The purpose of spatial analysis is to understand, estimate, and predict real-world phenomena that exhibit repeating spatial structures and shapes [3]. The ESDA was recently little considered in agriculture and soil science: In [6], the authors studied the spatial autocorrelation of grain production and agriculture storage using Morans’s I indicator. In [4,13], the authors explored the spatial distribution and correlation of potentially toxic elements in soil profiles using ESDA tools. The Mediterranean region were studied in [11], the authors explored the spatial context of adaptations in irrigated agriculture. In addition, the authors of [14] evaluated the spatial distribution of agricultural activities to capture their impact on poverty. Another recent study [12] investigated the ESDA to analyze soil quality in Iran through exploratory regression taking into consideration the effect of land use, elevation, slope, and vegetation indexes on the soil quality. Our research has two originalities; First, we will create a ready-to-use DW for analyzing spatial soil data distribution, heterogeneity, and correlations. Secondly, for the first time, the study area is the Maghreb region, which has not yet been considered.
A Data Warehouse for Spatial Soil Data Analysis and Mining
4
163
Building the DW for Explanatory Spatial Soil Data Analysis
This section presents the architecture of our proposed business intelligence tool. Next, we describe the conceptual model of our DW and depict the resulting dataset. 4.1
The Data Warehouse Architecture
Figure 1 illustrates the proposed architecture of the spatial data warehouse consisting of four layers. First, the data collection layer, multiple sources of soil data are available on the web like the Food and Agriculture Organisation (FAO) soils portal, the International Soil Reference and Information Centre (ISRIC), and the European Soil Data Centre (ESDAC). Our DW is populated by the FAO/UNESCO soil map of the world, from which we extracted the data for the Maghreb Region. In the near future, additional data will be extracted from regional and national soil maps with a view to get a better spatial resolution of the soil data. The collected data is then continuously processed according to the ETL process to maintain the accuracy and value of the information. Third, the transformed data is loaded and stored in the warehouse. Then, the warehoused soil data will be ready for spatial data analysis and data mining techniques where different levels of analysis are considered to target the different types of users. Finally, a data visualization interface is provided to access the warehouse data and outcomes in order to facilitate decision-making by decision makers and scientists. In the following, we expose the details of each of the architecture components. Data Acquisition and Operational Data Sources. There are various sources of soil data such as legacy maps, laboratory soil spectroscopy and remote sensing techniques. Staging Area. Is an intermediate storage area including all databases required for data processing during the ETL process, validation, and cleaning soil data. Storage Area. The data warehouse designed for the soil of Maghreb region. It contains a set of dimensions to support spatial analysis and regression. The Data Mining Tools Area. This layer contains various data mining and visualization tools to access, analyze and visualize the soil data like EDA, Clustering, Digital soil mapping to predict the soil properties, and soil quality analysis. Artificial intelligence approaches like [7,8], might be explored to address such issues. The Human-Machine Interface (HMI). Represents the intermediate interface between users and systems, and aid in operations such as transferring data, accessing data, applying data mining tools to data, and visualizing results.
164
W. H. Belkadi et al.
Fig. 1. The proposed architecture for the Maghreb Soil Data Warehouse.
4.2
The Conceptual Model
We implemented a star schema that allows multidimensional spatial analysis of soil data in the Maghreb region. This schema consists of two dimensions and a fact table. The Location DIM and SOIL UNITS DIM dimensions represent respectively the geographic location and description of FAO soil units. The fact table includes the soil properties for a given location depending on the FAO soil unit. The conceptual model defined with the UML-based formalism is illustrated in Fig. 2. In the next section, we will describe the data attributes of each table.
Fig. 2. The proposed star schema for the Maghreb Soil Data Warehouse.
A Data Warehouse for Spatial Soil Data Analysis and Mining
4.3
165
Description of the Generated Dataset
At the end of the data collection and preparation phase, we had a data set with a total of 41 attributes that describe a soil profile instance. Of these, 24 attributes represent chemical and physical soil properties. Table 2 shows these attributes along with their role and value domain.
5
Experiments
Extensive experiments have been conducted with various components of the developed DW architecture: Data collection and the development of an HMI Table 2. Summary of the main current attributes present in the dataset.
SOIL UNITS DIM
LOCATION DIM
Table
Attribute Country
Role The country of origin
Domain Character string
City
The displayed name of the city
Character string
Latitude
The latitude of the centroid.
-90 .. +90
Longitude
The longitude of the centroid.
-180 .. +180
Geometry
The geometry surface polygon.
POLYGON
Centroid
The centroid of the surface.
POINT
Area
The area of the belonging surface.
Number
FAOSOIL
FAO soil mapping unit.
Character string
DOMSOIL
The dominant soil symbol unit.
Character string
Profile rule
The rule that represents the composition of FAOSOIL represented by an equation.
Character string
MU Coarse Texture
The percentage of the coarse texture in the map unit (class 1).
MU Medium Texture
The percentage of the medium texture in the map unit (class 2).0 .. 100 %
0 .. 100 %
MU Heavy Texture
The percentage of heavy texture in the map unit (class 3).
MU Flat Topography
The percentage of flat slope in the map unit (class a).
0 .. 100 % 0 .. 100 %
MU Rolling Topography
The percentage of rolling slope in the map unit (class b).
0 .. 100 %
Fact table
MU Mountaineous TopographyThe percentage of mountaineous slope in the map unit (class c). 0 .. 100 % Sand % topsoil
The percentage of sand on the top soil.
0 .. 100 %
Sand % subsoil
The percentage of sand on the sub soil.
0 .. 100 %
Silt % topsoil
The percentage of silt on the top soil.
0 .. 100 %
Silt% subsoil
The percentage of silt on the sub soil.
0 .. 100 %
Clay % topsoil
The percentage of clay on the top soil.
0 .. 100 %
Clay % subsoil
The percentage of clay on the sub soil.
0 .. 100 %
pH water topsoil
The pH content in water on the topsoil.
double
pH water subsoil
The pH content in water on the cub soil.
double
OC % topsoil
The percentage of organic carbon (OC) on the top soil.
0 .. 100 %
OC % subsoil
The percentage of organic carbon (OC) on the sub soil.
0 .. 100 %
N % topsoil
The percentage of nitrogen (N) on the top soil.
0 .. 100 %
N % subsoil
The percentage of nitrogen (N) on the sub soil.
0 .. 100 %
BS % topsoil
The percentage of Boron sulfide (BS) on the top soil.
0 .. 100 %
BS % subsoil
The percentage of Boron sulfide (BS) on the sub soil.
0 .. 100 %
CEC topsoil
The value of Cation exchange capacity on the top soil.
double
CEC subsoil
The value of Cation exchange capacity on the sub soil.
double
CEC clay topsoil
The value of Cation exchange capacity clay on the top soil.
double
CEC Clay subsoil
The value of Cation exchange capacity clay on the sub soil.
double
CaCO3 % topsoil
The percentage of Calcium carbonate (CaCO3) on the top soil. 0 .. 100 %
CaCO3 % subsoil
The percentage of Calcium carbonate (CaCO3) on the sub soil. 0 .. 100 %
BD topsoil
The value of bulk density on the top soil.
BD subsoil
The value of bulk density on the sub soil.
double
C/N topsoil
The value of Carbon to Nitrogen Ratios on the top soil.
double
C/N subsoil
The value of Carbon to Nitrogen Ratios on the sub soil.
double
double
166
W. H. Belkadi et al.
and the ESDA tools. All these tasks were implemented using python 3.9. Some of these efforts are shown below. Also, we chose MySQL as our DBMS because it is open source with support for spatial data types which enables their storage and geographic features analysis of these data. 5.1
Data Collection and Dataset Preparation
We have fed our DW using FAO-UNESCO World Soil Map downloaded from the FAO soil portal. This digitized soil map, at 1:5,000,000 scale, contains around 34111 rows and 1700 soil profiles. After the extract step, we did some transformations such as getting the city names from the location, calculating the percentage of each texture class and slope class for each soil mapping unit. Then, we created profile rules to represent the composition of the FAOSOIL unit in order to calculate the unit’s measurements of soil properties. Finally we created and populated the DW tables. The outcome dataset contains about 800 rows. Table 1 shows the distribution of instances over the Maghreb region. Table 3. Summary of the main fixed parameters during experiments. Parameter
Possible values
Soil properties See Fact table attributes in table 2. {Algeria, Tunisia, Libya, Morocco, Mauritania} Countries 166 colormaps of matplotlib {viridis, magma, inferno} Color palettes 0 ≤ n ≤ N ; N=number of observations Number of neighbors (k) 0 ..100% Significance Threshold
5.2
The selected value OC % topsoil All viridis r 8 5%
Spatial Soil Data Analysis
After the processing and storage of the data in our DW, we designed a web interface that enables the user to visualize the spatial soil data analysis results. The analysis concerned the soil properties in the Maghreb region. The user can filter the results by soil property and by country. In the following, we will show the results of our analysis for the ‘OC % topsoil’ property over all the countries of Maghreb. We chose this property as an example because of its big importance for our environment, especially its impact on climate change. For this, we followed the steps of ESDA. Table 3 shows the fixed parameters for these steps. First, in order to analyze the data distribution, we built some choropleth maps. Several map classification algorithms were implemented, deterministic algorithms such as quantiles, and boxmaps. The latter was introduced by Luc Anselin which is based on the boxplot and displays the dispersion of the outliers in the map using a representative color palette. Then, heuristic classification algorithms like Jenks Caspall, and Fisher Jenks algorithms. Figure 3 shows a boxmap of OC % topsoil for Maghreb countries. We can see that the distribution of the values is somewhat similar in Algeria and Libya, with outliers in the extreme north of these two countries. Thus, by matching the boxplot and the
A Data Warehouse for Spatial Soil Data Analysis and Mining
167
box map, we can answer the question, where are the outliers located? Which is a relevant information in the case of spatial analysis. Then, for the global spatial autocorrelation analysis, we calculated the Moran’s I statistic. The value of Moran’s I for OC % topsoil, with k = 8 neighbors, was equal to approximately 0.44 which is a positive and significant value that describes a positive spatial autocorrelation indicating perfect clustering of similar values. In the Fig. 4a, the empirical distribution generated by simulating 999 random maps using the values of the OC % topsoil variable and computing Moran’s I for each of these maps is shown in gray. A blue line marks the mean value and the red line shows Moran’s I (0.44). It is clear that the value of Moran’s I is significantly higher (far) than the values under randomness. Next, we described the local spatial autocorrelation. Figure 4b shows the Moran’s I scatter plot for Maghreb region that displays on the horizontal axis the standardized values of OC % top soil observations against its standardized spatial lag (average of the neighbors’ value) on the vertical axis. The slope of the line is the Moran’s I coefficient and the four different quadrants in this plot correspond to the four types of local spatial association between an observation and its neighbors. Each quadrant describes an area where either the original variable’s value and its spatial lag can be above the mean (high) or below the mean (low). Positive autocorrelation area is located in the top right corner displayed ‘HH’ (high/high) and ‘LL’ (low/low) in the bottom left. As for negative autocorrelation area is based in the top left corner named ‘LH’ (low/high) and in the bottom right named ‘HL’ (high/low). We can notice that even though we have a positive global autocorrelation we still have some cases of negative autocorrelation in the Maghreb area.
Fig. 3. Spatial distribution of OC % top soil in Maghreb region.
Finally, we plotted the local Moran’s I (LISA statistic) values in a choropleth map shown in Fig. 4c. In addition, to have a more insightful map we need to answer two questions : Where are the situations of positive and negative autocorrelation? and Which local statistics are significant and which ones are
168
W. H. Belkadi et al.
non-significant (ns) from a statistical point of view? The map that answers to these questions, named cluster map, is shown in the Fig. 4d where we have 5 groups: ‘HH’, ‘LL’, ‘LH’, ‘HL’ and ‘ns’. We note that we chose the threshold of significance to be 5%. As we can see, the positive autocorrelation with its two kinds: ‘HH’ and ‘LL’ is present in all the countries of Maghreb. The ‘HH’ area is located on the north of Algeria, Tunisia and with a big percentage in Morocco. The ‘LL’ group is present in the south Sahara of the countries. We can see that there is a reduced area of negative autocorrelation (‘LH’ and ‘HL’). However, even though the Moran’s I showed a positive autocorrelation, there is a notable area of non-significance in our map more specifically in the center of the countries. Thus, we assign that the local spatial autocorrelation gives a better detailed conclusions comparing with the global spatial autocorrelation.
Fig. 4. Spatial autocorrelation of OC % top soil in Maghreb region.
6
Conclusion
This paper addresses the design and implementation of a DW architecture for storing, processing, and manipulating soil data of the Maghreb region for the first time. To achieve this, we first collected data from the FAO soil portal. Then, we processed the data using an ETL for cleaning and formatting to get high-quality
A Data Warehouse for Spatial Soil Data Analysis and Mining
169
soil properties estimations. The processed data was then stored in a DW to apply data mining, data exploration and data science techniques for soil science field. Afterwards, we have exploited these soil data, following the whole step-by-step process of ESDA, to study soil properties and features in the Maghreb region. Spatial distribution and autocorrelation were explored. We concluded a positive global autocorrelation involving a spatial clustering situation. We also calculated LISA to dig deeper into where the positive autocorrelations were located. As future works, we plan to: – Collect more soil data using more data sources to improve the spatial resolution. – Perform clustering to detect and interpret the clusters of the Maghreb region. – Use the soil properties measures as ground truth to perform the digital soil mapping of the Maghreb soil.
References 1. Anselin, L.: Spatial econometrics: methods and models, vol. 4. Springer Science & Business Media (1988). https://doi.org/10.1007/978-94-015-7799-1 2. Bimonte, S., et al.: Collect and analysis of Agro-biodiversity data in a participative context: a business intelligence framework. Eco. Inform. 61, 101231 (2021) 3. Bimonte, S., Tchounikine, A., Miquel, M., Pinet, F.: When spatial analysis meets OLAP: multidimensional model and operators. Int. J. DataWarehousing Min. 6(4), 33–60 (2010) 4. Borojerdnia, A., Rozbahani, M.M., Nazarpour, A., Ghanavati, N., Payandeh, K.: Application of exploratory and spatial data analysis (sda), singularity matrix analysis, and fractal models to delineate background of potentially toxic elements: a case study of ahvaz, sw iran. Sci. Total Environ. 740, 140103 (2020) 5. Bouadi, T., Cordier, M.O., Moreau, P., Quiniou, R., Salmon-Monviola, J., GascuelOdoux, C.: A data warehouse to explore multidimensional simulated data from a spatially distributed agro-hydrological model to improve catchment nitrogen management. Environ. Modell. Softw. 97, 229–242 (2017) 6. Cima, E., Uribe Opazo, M., Johann, J., Rocha, W., Dalposso, G.: Analysis of spatial autocorrelation of grain production and agricultural storage in paran´ a. Engenharia Agr´ıcola 38, 395–402 (2018) 7. Drias, H., Drias, Y., Houacine, N., Sonia, B.L., Zouache, D., Khennak, I.: Quantum OPTICS and deep self-learning on swarm intelligence algorithms for covid19 emergency transportation. Soft Computing (2022). https://doi.org/10.1007/ s00500-022-06946-8 8. Drias, H., Drias, Y., Khennak, I.: A novel orca cultural algorithm and applications. Expert Systems (2022) 9. Drias, Y., Drias, H., Khennak, I.: Data warehousing and mining for climate change: application to the Maghreb region. In: Abraham, A., et al. (eds.) SoCPaR 2021. LNNS, vol. 417, pp. 293–302. Springer, Cham (2022). https://doi.org/10.1007/9783-030-96302-6 27 10. Haining, R., Wise, S., Ma, J.: Exploratory spatial data analysis. J. Royal Statist. Soc. Ser. D (Statistician) 47(3), 457–469 (1998)
170
W. H. Belkadi et al.
11. Harmanny, K., Malek, Z.: Adaptations in irrigated agriculture in the mediterranean region: an overview and spatial analysis of implemented strategies. Reg. Environ. Change 19, 1401–1416 (2019). https://doi.org/10.1007/s10113-019-01494-8 12. Mirghaed, F.A., Souri, B.: Spatial analysis of soil quality through landscape patterns in the Shoor river basin, southwestern Iran. CATENA 211, 106028 (2022) 13. Protano, G., Lella, L., Nannoni, F.: Exploring distribution of potentially toxic elements in soil profiles to assess the geochemical background and contamination extent in soils of a metallurgical and industrial area in Kosovo. Environ. Earth Sci. 80, 486 (2021). https://doi.org/10.1007/s12665-021-09771-8 14. da Silva, G.S., Amarante, P.A., Amarante, J.C.A.: Agricultural clusters and poverty in municipalities in the northeast region of brazil: a spatial perspective. J. Rural. Stud. 92, 189–205 (2022) 15. Singh, S., Kasana, S.S.: Estimation of soil properties from the EU spectral library using long short-term memory networks. Geoderma Reg. 18, e00233 (2019) 16. Zaza, C., Bimonte, S., Faccilongo, N., Sala, P.L., Cont` o, F., Gallo, C.: A new decision-support system for the historical analysis of integrated pest management activities on olive crops based on climatic data. Comput. Electron. Agric. 148, 237–249 (2018)
A New Approach for the Design of Medical Image ETL Using CNN Mohamed Hedi Elhajjej(B) , Nouha Arfaoui, Salwa Said, and Ridha Ejbali Research Team in Intelligent Machines (RTIM), National School of Engineers, University of Gabes, Gabes, Tunisia elhajjej m [email protected], [email protected], ridha [email protected]
Abstract. Nowadays, the combination of digital images and machine learning techniques to solve COVID-19 problems has been one of the most explored elements. Most efforts have focused on the detection and classification of lung diseases, which requires a large amount of images to process. Extracted images from different sources need to be loaded into big data base after required transformation to reduce error and minimize data loss. This process is also known as Extraction-TransformationLoading (ETL). It is responsible for extracting, transforming, conciliating, and loading data for supporting decision-making requirements. This paper provides the innovative approach of using an images extract, transform, load (MI-ETL) solution, to provide a large number of images of interest from heterogeneous data sources into a specialized database. The main objective of the paper is to present the three stages of the MIETL process starting with the collection of medical images from several sources using different techniques. Then, applying deep learning techniques (CNN filter) to extract only images of the lungs, and finally loading the features of the images in a big database.
1
Introduction
Although big data provides great opportunities for a broad of areas including ecommerce, industrial control and smart medical, it poses many challenging issues on data mining and information processing due to its characteristics of large volume, large variety, large velocity and large veracity. Deep learning, as one of the most currently remarkable machine learning techniques, has played an important role in big data analytic solutions and achieved a great success in many applications such as image analysis. Normally, big data analytics starts with integrating the generated data into a data warehouse using various techniques; ETL (extract, transform, load) is a popular technique used for this purpose [1,2]. It ensuresintegrating data from multiple sources into a data warehouse. It used with data warehousing, big data, and business intelligence. It is then a mandatory step in the decision making process. The ETL is based on the connectors to extract or import data from various sources (databases, ERP, CRM or others). It is based also on processors that handle the data: aggregations, filters, conversions, and mappings. Such tool fetches the data of the enterprise, turn to put them together c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 171–180, 2023. https://doi.org/10.1007/978-3-031-35501-1_17
172
M. H. Elhajjej et al.
and make them usable in the context of decision support to finally inject them into a data warehouse [3]. Configuring an ETL process is one of the key factors having direct impact over cost, time and effort for establishment of a successful data warehouse. Data modelling gives an abstract view about how the data will be arranged in an organization and how they will be managed. By applying data modelling techniques, the relationship between different data items can be visualized. The modelling concept has a great benefit over organizational data to manage it in a structural way. At starting phase, it is highly recommended to make an efficient modelling and design of the total workflow [4]. However, data volume flowing in the enterprises, especially those that process medical images, has increased considerably, not only from datasets, but also from the web, social networks and interconnected devices. This change in volume, type, and speed of input data requires other platforms that can inexpensively process this highvolume data. That’s why, organizations are rapidly generating ‘big data’ from various sources, e.g., social media and e-commerce systems. The term big data is used to define data that are too voluminous and complex to be processed by traditional data processing systems [1]. This article proposes a medical image ETL (MI-ETL) designed to extract images from several sources, transform them using several pre-processing techniques (grayscale, resize, thresholding, filtering and noise removal, pivot,etc)select lungs images using smart filters and load images features into NoSQL database. The smart filters are implemented using CNN deep learning algorithm. They are used mainly to distinguish lungs image from the set of extracted ones. For that reason we focus in the first section of this paper the existing ETL approaches treating images. Then, in the second section, we will present our MI-ETL by describing its three parts. This paper ends with conclusion and perspectives as future work.
2
State of the Art
Not enough works have talked about the implementation of an ETL dealing only with images but we have found some works of implementation of ETLs which deal with a variety of context including images. We have distinguished mainly the following works: In their studies, Souissi et al. [5] were interested in the variety aspect of data in big data integration. They have proposed an ETL tool called GENUS. Their solution extracts data from different document types (text, image, video), transform and load them into a datawarehouse. They have intervene in the transformation phase. This phase is then divided into two parts: data cleaning and extracting main concepts. In data cleaning, data are cleaned to prevent the errors resulting from the extraction phase. After cleaning data, main concepts and metadata are extracted. For image processing, they propose a new representation of the image to be stored in simple XML files in order to be used in future analysis. Indeed, they propose to encode it to the Base64, and extract some EXIF metadata that we judge important in the knowledge extracting tasks. Although this approach allows in some cases the processing of images with an
A New Approach for the Design of Medical Image ETL Using CNN
173
ETL process, but it is not an image processing solution in addition the images are modified in representation since they are transformed into an XML file. In [6], A. Haidar et al. have proposed a patient data collection and processing (PDCP), a set of tools implemented via python to prepare radiotherapy(RT) data stored in an open-source picture imaging and archiving system (PACS) known as Orthanc. PDCP enables querying, retrieving, and validating patient imaging summaries; analysing associations between patient DICOM data; retrieving patient imaging data into a local directory; preparing the records for use in various research questions; tracking the patient’s data collection process and identifying reasons behind excluding patient’s data. PDCP targeted simplifying the data preparation process in such applications, and it was made expandable to facilitate additional data preparation tasks. They implement an extract, transform and load (ETL) tools for mapping a patient cohort to a format that can be used in data mining and machine learning applications. Images extracted and saved as a row before being aggregated into a CSV file representing the patient’s instances belonging to multiple studies and series. Although this approach describe tools to manage various cases in RT data identification and extraction. However, the clinical care decisions that might be taken by the clinicians while treating patients are broad. This may include reimaging and replanning patient RT treatment which might lead to new datasets related to the patient treatment. In [7], X. Li et al. present a real-time data ETL framework to separately process the historical data and real-time data. Then, combining an external dynamic storage area, a dynamic mirror replication technology was proposed to avoid the contention between OLAP queries and OLTP updates. Although this approach allows in some cases the processing of images with an ETL process, but it is not an image processing. In [8], U. Drescek et al. present an approach to data-driven 3D building modelling in the spatial ETL environment, using UAV photogrammetric point cloud as input data. The proposed approach covers the complete modelling workflow, from initial photogrammetric point cloud processing to the final polyhedral building model. This ETL solution is used to reconstruct a 3D building model from a dense image matching point cloud which was obtained beforehand from UAV imagery. The ETL presented in this article deal with images as input but also with point clouds and other input data and the outputs of this ETL are not loaded into a target database. In [9] T. Godinho et al. proposes ETL framework for medical imaging repositories able to feed, in real-time, a developed BI (Business Intelligence) application. The solution was designed to provide the necessary environment for leading research on top of live institutional repositories without requesting the creation of a data warehouse. This solution uses Dicoogle PACs as input to extract knowledge and metadata from these images but it moves away from the classic ETL architecture since it eliminates the third step, the loading. Based on the cited works, we can conclude that the existing works do not preserve the structures of the input images but transform them into XML files for example. Also, they do
174
M. H. Elhajjej et al.
not propose the integration of deep learning techniques as a part of the transformation phase.
3
Methodology
We present a new MI-ETL tool that handles the Big Data variety. This tool is able to deal with medical images processing. It extracts the images from a dataset or from the web to have a wide variety of images. It then applies several pre-processing techniques to transform images and apply a filter to determine the images concerning the lungs transforming them from their first representations (images) to a new one (features) in order to load them into a NoSQL database. The output of our proposed tool is a NoSQL database of images features. In the most cases, this database will be then processed by analysis and mining algorithms in order to extract knowledge. Hence, the knowledge extracted influent directly on the decision made by the decision maker. Via our tool, we give the decider an access to knowledge extracted from different types of data. The main architecture of our MI-ETL is represented in figure.
Fig. 1. The main architecture of proposed MI-ETL
We are going to intervene, starting as a first part in this article, in the first two phases, Extract and Transform. 3.1
Extract
Before data organization, the first step in the MI-ETL data process is to extract data from all relevant sources and compiles it. The ability to extract data from heterogeneous data sources is a key point while building an MI-ETL tool. Data needs to be extracted from structured data sources (relational databases), unstructured data sources (PDF files,image, emails etc), semi-structured data sources (XML and other markup languages), legacy systems (mainframe), application packages (SAP) etc. [10]. Our method consists in extracting medical images from several sources, in order to increase the number of images to be
A New Approach for the Design of Medical Image ETL Using CNN
175
analyzed, to do this we will extract them using two methods: from a dataset and from the web with the web scraping technique. Web scraping, also known as web extraction or harvesting, is a technique to extract data from the World Wide Web. It is a great technique of extracting unstructured data from the websites and transforming that data into structured data that can be stored and analysed in a database. It is, therefore, a software that simulates human browsing on the web to collect detailed information data from different websites. The advantage of a scraper resides on its speed and its capacity to be automated and/or programmed. However, no matter what technique is used, the approach and the objectives remain the same: capture web data and present it in a more structured format [11]. 3.2
Transform
Transform is the second step of the MI-ETL process, the data extracted from the sources is compiled, converted, reformatted and cleaned in the staging area to be introduced into the target database in the next step. This phase involves the cleansing of data to comply with the target schema. Some of the typical transformation activities involve normalizing data, removing duplicates, checking for integrity constraint violations, filtering data based on some regular expressions, sorting and grouping data, applying built-in functions where necessary, etc. [3]. Our transform method is divided into two parts, as shown in Fig. 1, image transformation, and this is done via a set of pre-processing functions performed on the extracted images such as converting the images to grayscale, resizing them, thresholding, filtering and noise removal, pivot, etc., and filtering part to extract only lungs images. In our case study we want to process the images of lungs (a), resulting from the first phase of extraction. However, the scraping technique will extract several images that we do not need, such as images of doctors (c) or images for educational purposes (b), etc., as shown in Fig. 2.
Fig. 2. Filtering to extract lungs images
3.3
Data Cleaning
In order to prevent the errors resulting from the extraction phase and before any processing task, the images extracted should be cleaned. Indeed, the data
176
M. H. Elhajjej et al.
can have several types of errors such as inaccuracies and duplications, etc. The unsuitable part of the processed data can be replaced, modified or deleted. In this study, the only cleaning task proposed is to delete the data which do not meet our requirements such as very small images. Then the images are loaded into a list in memory where they are converted to “RGB” format using the python pillow lib and resized to a new dimension of 150 × 150 pixels. Algorithm 1 Images cleaning algorithm Inputs : LI0, lms Outputs : LI1 Begin 1: read DS0 2:for each iin LI0 2.1: if size(i) ≤ lms then 2.1.1 Delete (i) 2.2 else 2.2.1 Convert (i) to GRB 2.2.2 Resize(i) to 150x150 pixels 2.2.3 if i not in LI1 then 2.2.3.1 Copy(i) on LI1 3: Save LI1 End
With : • • • •
LI0: it represents the list that contains the images extracted. Lms: it represents the image size. DS0: it represents the folder that contains images extracted. LI1: it represents the list that contains the cleaned images.
3.4
Intelligent Filter
Deep learning, which is one of the most successful machine learning techniques available nowadays, has achieved considerable success in several applications such as image analysis. Recently, CNN is widely used in image recognition applications [12]. Taking advantage of these techniques, we want to use a CNN filter playing the task of data cleaning in our MI-ETL model. This filter aims to detect images containing lungs among the images extracted in the first part of our MIETL process. The convolutional neural network is a form of ANN that conserves spatial correlations in the data by having fewer connections between layers which keep track of data relationships as they are fed in. Each layer operation, of the utilised model, operates on a small region of the previous layer as shown in Fig. 3. In our system, we will process X-Ray type image data which will be divided into two parts (training data and test data). Next, we create a training model that
A New Approach for the Design of Medical Image ETL Using CNN
177
takes the data inputs processed in the previous step and trains on a portion of the data (training data). The final system will be to insert a new X-Ray image and classify as lung/non-lung. The dataset created to be used as a database set for training is called A Pulmonary Chest X-Ray and it is available on Kaggle [13]. The input images are 150*150*3 of size, the image passes first to the first convolution layer. This layer is composed of 32 filters of size 3*3, Each of our convolution layers is followed by a ReLU activation function this function forces the neurons to return positive values, after this convolution 32 feature maps of size 32* 32 will be created. These feature maps are given in entry of the second convolution layer that is composed of 64 filters, a function of activation RELU is applied on the layer of convolution, then Maxpooling of size 2*2 is applied with a step of 2 to reduce the size of the image so the quantity of parameters and calculation. The pooling configured for the model was made from a resolution by the global average of two dimensions. Adding batch normalization to the process increases the stability of the neural network by applying normalizations in the middle of training. At the output of this layer, we will have 64 feature maps of size 112*112. The outputs of this bloc are given at the input of bloc2, a convolution layer composed of 64 filters of size 3*3 followed by a ReLu activation function is applied, then a maxpolling layer of size 2*2 with a step of 2 is also applied and we will have 64 feature maps of size 56*56 The same thing is repeated, but this time applying 128 filters of size 3*3, and we will finally have 128 feature maps of size 28*28. The feature vector resulting from the convolutions has a dimension of 100352 (28*28*128). A Flatten configuration was another layer added to the configuration, which serializes the image to the dense layer. A dense layer was also added to the construction of the model, adding 64 neurons with RELU activation. A dropout with a value of 0.4 was used in the configuration of the model; the dropout serves to improve the generality of the network. The last layer added was a dense layer with 2 neurons with softmax activation [14,15]. Then the model is compiled according to the parameters given in the Table 1. Adam optimization is an optimization setup for stochastic gradient descent based on adaptive estimation of first and second order moments. The “Binary cross-entropy” which computes the cross-entropy loss between the true and predicted labels. The metric is a function used to evaluate the performance of the model and is based on the accuracy (“acc”) of the model which measures the frequency of predictions that match the labels.
Fig. 3. Design of CNN Model
178
M. H. Elhajjej et al. Table 1. Model Compilation Parameters Parameter Configuration Adam
Optimizer
Loss
Binary cross-entropy
Metrics
Acc
The metric used to demonstrate the compilation results is: Graph of the model’s precision curve, which demonstrates the accuracy performance of the model training over the epochs; As shown in Fig. 4, the accuracy model was generated to demonstrate the performance during the training of the epochs and the model reached an accuracy of 0.95.
Fig. 4. CNN Model Accuracy
3.5
Load
Once the data is ready, we precede to the last steps of our MI-ETL. In this phase the images will be loaded into aschemaless databasescalled NoSQL databases which are efficient to support large size data. Nowadays, several technologies are developed to lever voluminous data i.e. big data. Medical images are treated under big data category due to volume and variety [16]. NoSQL databases are proficient in handling this kind data. They do not follow a strict schema as they are non-relational. It follows more flexible data model [17]. To do this we have chosen the Hbase database in which we will load our images. Hbase is an open source, non-relational, distributed database modelled database which is developed based on Google’s Big Table Concept. It fits to key-value workloads with high volume random read and write access patterns used for basic use cases. It serves the image files either storing them in itself or storing the image. Handling of more images is complex and it depends on the name node memory size. The techniques and approaches used to save the images in the Hbase model will be detailed in our future work.
A New Approach for the Design of Medical Image ETL Using CNN
4
179
Conclusion
Machine learning has already been used as a resource for disease detection and in healthcare as a complementary tool to help cope with diverse health problems. Advances in deep learning techniques have allowed algorithms to outperform medical teams in certain imaging tasks, such as COVID’19 detection. These techniques require a large amount of images to be provided. In this paper we presented a new approach that can provide a very large number of images based on image extraction, transformation and loading (MI-ETL), we presented the three steps of the MI-ETLprocess starting with the collection of medical images from several sources based on web-scraping. Then applying a deep learning technique (CNN) to extract only the lung images. Finally the last step to load the image features into a big database (Hbase) and prepared to beanalysed in future work to predict images with COVID’19 infected lungs. Acknowledgments. The authors would like to acknowledge the financial support of this work by grants from General Direction of Scientific Research (DGRST), Tunisia, under the ARUB.
References 1. Gudivada, V.N., Yates, R.B-., Raghavan, V.V.: Big data: promises and problems. Computer, 48(03), 20–23 (2015) 2. Hilali, I., Arfaoui, N., Ejbali, R.: A new approach for integrating data into big data warehouse. In: Fourteenth International Conference on Machine Vision (ICMV 2021)SPIE 12084, pp 475–480 (2022) 3. Vassiliadis, P., Simitsis, A.: Extraction, transformation, and loading. Encycl. Database Syst., 10 (2009) 4. C ¸ a˘ gıltay, N.E., Topallı, D., Ayka¸c, Y.E., Tokdemir, G.: Abstract conceptual database model approach. In: 2013 Science and Information Conference, pp. 275– 281 IEEE(2013) 5. Souissi, S., BenAyed, M., Genus: an ETL tool treating the big data variety. In: 2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA), pp.1–8. IEEE (2016) 6. Haidar, A.: Aly, F., Holloway, L.: PDCP: a set of tools for extracting, transforming, and loading radiotherapy data from the Orthanc research PACS. Software 1(2), 215–222 (2022) 7. Li, X., Mao, Y.: Real-time data ETL framework for big real-time data analysis. In: 2015 IEEE International Conference on Information and Automation, pp. 1289– 1294. IEEE (2015) 8. Dreˇsˇcek, U., Fras, M.K., Tekavec, J., Lisec, A.: Spatial ETL for 3D building modelling based on unmanned aerial vehicle data in semi-urban areas. Remote Sens., 12(12), 1972 (2020) 9. Godinho, T.M., Lebre, R., Almeida, J.R., Costa. C.: ETL framework for real-time business intelligence over medical imaging repositories. J. Digital Imaging, 32(5), 870–879 (2019) 10. Mukherjee, R., Kar, P.: A comparative review of data warehousing ETL tools with new trends and industry insight. In: 2017 IEEE 7th International Advance Computing Conference (IACC), pp. 943–948. IEEE (2017)
180
M. H. Elhajjej et al.
11. Sarr, E.N., Ousmane, S., Diallo, A.: Factextract: automatic collection and aggregation of articles and journalistic factual claims from online newspaper. In: 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS), pp. 336–341. IEEE (2018) 12. Shin, A., Yamaguchi, M., Ohnishi, K., Harada, T.: Dense image representation with spatial pyramid VLAD coding of CNN for locally robust captioning. arXiv preprint arXiv:1603.09046 (2016) 13. Wei, W., Xiao, H., Ji, L., Peng, Z., Xin, W.: Detecting COVID-19 patients in X-Ray images based on MAI-nets. Int. J. Comput. Intell. Syst. 14(1), 1607–1616 (2021) 14. Jain, G., Mittal, D., Thakur, D., Mittal, M.K.: A deep learning approach to detect COVID-19 coronavirus with X-Ray images. Biocybernetics Biomed. Eng., 40(4), 1391–1405, (2020) 15. Rahmani, M.K.I., Taranum, F., Nikhat, R., Farooqi, M.D.R., Khan, M.A.: Automatic real-time medical mask detection using deep learning to fight COVID-19. Comput. Syst. Sci. Eng., 42(3), 1181–1198 (2022) 16. Chawla, N.V., Davis, D.A.: Bringing big data to personalized healthcare: a patientcentered framework. J. Gen. Intern. Med., 28(3), 660–665 (2013) 17. Krishna, T.H., Rajabhushanam, C.: Exploring NOSQL databases in medical image management. Int. J. Mod. Agric. 9(4), 1259–1265 (2020)
An Improved Model for Semantic Segmentation of Brain Lesions Using CNN 3D Ala Guennich1,2(B) , Mohamed Othmani2,3 , and Hela Ltifi4,5 1 National Engineering School of Sfax, University of Sfax, 2100, Gafsa, Tunisia
[email protected]
2 Research Lab: Technology, Energy, and Innovative Materials Lab,
Faculty of Sciences of Gafsa, University of Gafsa, Gafsa, Tunisia 3 Faculty of Sciences of Gafsa, University of Gafsa, 2100, Gafsa, Tunisia 4 Faculty of Sciences and Technology of Sidi Bouzid, 9100, Sidi Bouzid, Tunisia
[email protected] 5 Research Lab: RGIM-Lab ENIS, Gafsa, Tunisia
Abstract. The implementation of accurate automatic algorithms for brain tumor segmentation could improve disease diagnosis, treatment monitoring, and make large-scale studies of the pathology possible. In this study, we optimize the architecture used for the challenging task of brain lesion segmentation, which is based on the 3D convolutional neural network. In order to integrate broader local and contextual information, we use a two-channel architecture that processes input images at multiple scales simultaneously. Furthermore, we show that optimizing at the momentum value level, helps us achieve better results in the DSC, accuracy and sensitivity criteria. Our method has also been tested by the BRATS 2015 training dataset, where it has performed very well despite the simplicity of the method. Keywords: Semantic segmentation · CNN 3D · Deep learning · Medical image · Brain lesions
1 Introduction Segmentation and subsequent quantitative analysis of lesions in medical images for an important process for the diagnosis and treatment of neuropathology. They are also crucial for determining how a disease may progress and for planning treatment options. To better understand disease pathophysiology, it is possible for quantitative imaging to uncover indices of the patterns of disease and its effects on particular structures of anatomy. In recent years, several studies [1–3] have shown that quantification of lesion load can provide insight into the functional outcome of patients. As an additional example, the associations of different lesion types, spatial distribution, and extent with acute and chronic sequelae after Traumatic Brain Injury (TBI) are still poorly understood [4] and are related to specific challenges that depend on the affected brain structure [5]. This is consistent with the expectation that functional deficits caused by an incident © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 181–189, 2023. https://doi.org/10.1007/978-3-031-35501-1_18
182
A. Guennich et al.
stroke are related to the severity of damage at particular sites in the brain [6]. Lesion load is generally described in terms of the number and size of lesions. Studies have shown that altered levels of biomarkers, show a relationship to changes in cognitive deficits. For example, in [7] the authors have demonstrated that accurate estimation of the tumor’s relative size is crucial to planning radiotherapy and following up on treatments, which cannot be delivered without an accurate understanding of their size. Furthermore, the volumes of incised White Matter Lesions (WMLs) have been shown to correlate with decreased cognition and increased risk of dementia. Lastly, in MS clinical research, the volume and number of lesions involved are used to determine the evolution of the patient’s disease and the effectiveness of pharmaceutical treatments. Accurate lesion segmentation in multimodal and 3-dimensional images is challenging for quantitative lesion analysis for several reasons. It is difficult to design effective segmentation rules due to the heterogeneous appearance of lesions, including significant variability in their location, shape, size, and density. In [8], the authors demonstrated that it is very difficult, to delineate the subcomponents of brain tumors such as proliferating cells and necrotic core or hemorrhage, contusion, and cranial edema in trauma [9]. So, manual intervention of a human expert in the delineation task is required to achieve the best segmentation results, which is costly, laborious, time-consuming, impractical in larger studies, and leads to some divergence among observers. The difficulty of this task is that lesions can vary in size and form, at multiple sites with largely overlapping image intensity profiles with healthy, unaffected parts of the lesions or brain outside the area of the main interest. So, to have a better result of segmentation accuracy, the level of expertise of the specialist is an important factor. Also, it is necessary to consider many image sequences with different contrast levels to be able to determine if a particular region belongs to a lesion. For this reason, in [7, 10], the authors demonstrated that simple qualitative and visual inspection or, at best, coarse measurements such as the number of lesions or their approximate volume are frequently used in clinical routine. To capture and understand the increasing complexity of cerebral pathologies, it is necessary to carry out large studies with multiple subjects to achieve the statistical power to extract conclusions on an entire group of patients. Therefore, in order to quantitatively assess brain lesions, it has become a major research focus on the field of medical image processing to develop precise and automated processes of segmentation, having the added potential to allow scalable, objective and reusable approaches.
2 Related Works There has been a lot of proposals in recent years when it comes to automatic lesion segmentation methods. Besides, there are several categories you can identify. In the first instance, the lesion segmentation task is considered problematic for anomaly detection. In the earlier work of [11] and the more recent method of [12] discrepancies in tissue appearance between the atlas image and the patient are exploited for lesion detection. However, lesions can be difficult to identify and may not be detected. This can lead to incorrect recording of the brain’s gray matter. In [13, 14] the authors both address this problem by jointly to solve the recalibration and segmentation tasks with a new architecture. Liu et al. [15] argued that low rank decomposition, when used in conjunction with brain tissue registration can also loosely detect abnormally dense tissues
An Improved Model for Semantic Segmentation of Brain Lesions
183
but is not precise enough to find smaller micro-lesions. The detection of abnormalities has been also suggested for synthetic work on images. Some representative approaches are [16] which uses dictionary learning and [17] which uses a pattern-based approach. The aim is to obtain pseudo-sanitary images which can highlight abnormal areas when searched against the patient’s own scan. Against this background, the authors in [18], introduce for image synthesis a generative model that obtains a segmentation of anomalies probabilistically. An alternative unsupervised salience-based method that exploits brain asymmetry in a pathological case is that proposed in [19]. It is likely that these methods are more useful for detecting lesions than for accurately segmenting them. The joint advantage of these methods is that they do not require an initial set of manually labelled training data. With respect to supervised segmentation, one of the most effective methods is based on per-voxel classifiers, such as random forests, to deal with brain lesions. For example the work of Geremia et al. [20] on MS lesions, was conducted through a study of intensity as a means of measuring and describing the appearance of particular regions at the level of each voxel. In [21], the authors combined the generative Gaussian mixture model (GMM) with their approach to determine specific probabilities for the tissue. There are multiple research publications that have adopted this framework, including work on TBI by Rao et al. in [22] and on brain tumors by Tustison et al. in [23]. These two works include features such as morphology and context to capture the diversity of lesions. In [22] the authors used results from a multi-atlas label propagation approach to contribute the strong priors to the random forests. Another work in [23], whose goal is to incorporate spatial regularization, exploits the Markov Random Field (MRF). It is reasonable to assume that these methods, despite their great success, still have limited capabilities. This is confirmed by the results of the last challenge [24]. Instead of supervised learning, there is now a powerful alternative, which is using deep learning. The latter has the ability to learn highly discriminating features for the task at hand, with great modeling and deep learning capability [25, 26]. The performance of these is a lot better than home-made or predefined feature sets. Specifically, in the context of biomedical imaging, the use of convolutional neural networks (CNNs) in various problems has shown promising results as in [25]. The subsequent CNN-based work [27, 28], all of which were the top performing automatic approaches in the BRATS 2015 challenge [8]. These methods are based on two-dimensional neural networks, which have been widely used in computer vision applications based on natural images. Knowing here, that the segmentation of a 3D brain scan is performed by processing independently each 2D slice, which constitutes a suboptimal use of the volumetric data of the medical image. Although this is a simple architecture, the results obtained so far by CNNs are promising and clearly show the potential of this technique. A fully 3D CNN has three dimensions - width, height and depth. It requires extensive memory and computation as well as a large number of parameters compared to 2D CNNs. A review of previous studies discusses the limitations of using a 3D CNN on medical imaging data, such as [29] and [30]. Considering that multiple works have used 2D CNNs on three orthogonal 2D patches [29, 31] to embed 3D contextual information. In their research for brain segmentation [32], the authors retrieved and extracted a number of large 2D patches at multiple image scales to avoid the storage requirements of fully 3D networks and combined them with small 3D patches at a single scale. Indeed, one of the main causes
184
A. Guennich et al.
that have prevented the implementation of 3D CNNs is the slowness of the inference due to the problem of 3D convolutions, which are very costly in terms of computation. As a solution to this problem, in [33, 34], computation times are reported to be a few seconds and about a minute respectively for processing a single brain scan using dense inference with 3D CNNs which is a technique that significantly reduces inference times compared to 2D/3D hybrid variants [32]. Their results were very promising on MS segmentation and brain tumors despite the limitation of their networks that they developed. The performance of the NCC depends primarily on the strategy for the formation of training samples. Recently, some researchers have experimented with learning from a set of patches taken from each image class. Contrary to this, bias towards rare classes is common and could lead to over segmentation. To combat this problem, the researchers [35] propose to train a second CNN on samples whose class distribution is fairly close to the actual distribution and the misclassified pixels in the first stage are oversampled. In addition, there is a second training step that was suggested by Havaei et al. [36], in which they re-train the cluster layer on uniformly extracted patches of the image. As a practical matter, two-stage training schemes can be problematic because they can be overfit in the first stage and behave poorly around data not seen in the first classifier. Another option is dense training. This method [37] trains a neural network on all the voxels in one image for a single optimization step [33, 38]. This can cause a severe imbalance of classes, much like uniform sampling. As a solution, the last two works proposed weighted cost functions to overcome this problem. In the first work [38], the authors considered the cost of each class to be equal among other classes, which had a similar effect as taking equally sized samples from each group, then adjusted it to suit the particular to task for an estimate of how difficult it was to segment each pixel. In the other work [33], Brosch et al. have adjusted the network sensitivity manually, however the method is able to fade into difficulty when calibrating for multi-class problems.
3 Proposed Approach In this section, we improve the deepmedic method [1], which is based on a fully automatic 3D convolutional neural network with dual paths for lesion segmentation in multimodal brain MRI, incorporating residual connections [39] using a new momentum value equal to 0.5 instead of 0.6 which gives us a better result. The first point addressed in the developed solution is the use of parallel convolutional paths, allowing to take into account local and contextual information. This considerably improves the segmentation process by processing the input images at several scales simultaneously. For example, in our study, the brain is divided into two tracks. The finer details of appearance are acquired in the first track, in contrast to the high-level features acquired in the second track. As a second point addressed, we propose a new type of efficient learning, using dense learning [37] on sampled segments to learn the most difficult parts of the dataset and show its behavior in adapting to class imbalance in segmentation problems. Then, we use larger training batches which are preferable as they provide a more accurate representation of the input dataset and help improve the estimation of true gradients. However, as the batch size increases, the memory requirements and computational times also increase. This is especially relevant for 3D CNNs, as GPU computing capabilities are required to achieve
An Improved Model for Semantic Segmentation of Brain Lesions
185
good performance. To address this issue, we designed a training strategy that uses the dense inference technique on smaller image segments. Finally, we exploit the use of a previous design approach, small kernels [40], to improve the performance of this model in 3D, and to do so by investigating how to better work with deeper networks. We have also been using the recently introduced BN technique for our hidden layers [41], which helps to better maintain a signal in the activations throughout optimization (Fig. 1).
Fig. 1. The improved DeepMedic system [1] using residual connections. In each layer block, the operations are applied in the order: Batch-Normalization [41], nonlinearity and then convolution. In [42], the authors have proven to obtain better performances with this format in an empirical way. The Up and C layers represent respectively an oversampling and classification layer. Both number of filters and their size are represented by (number × size). More hyper parameters as in [1].
4 Experimental Results Our new solution is evaluated on challenging clinical data from patients with traumatic brain injury. We present a quantitative study and a comparative analysis with the state of the art. The dataset we used is BRATS 2015, which includes 220 multimodality scans of patients with high-grade glioma (HGG) and 54 with low-grade glioma (LGG). The scans contain pre- and post- operative scans. We have T1-weighted, T1c contrast-enhanced, T2-weighted and FLAIR sequences. All images were registered in a common space, reamplified to an isotropic resolution of 1 mm × 1 mm × 1 mm with an image size of 240 × 240 × 155 and were stripped by the organizers. Annotations are provided and consist of four labels: 1) necrotic core (NC), 2) edema (OE), 3) unenhanced (NE), and 4) enhanced core (EC). We obtained the annotations from the training dataset semiautomatically, by merging predictions from several automatic algorithms, followed by expert review. To make a formal evaluation, the obtained labels were merged into three sets: whole tumor (all 4 labels), core (1, 3, 4) and enhanced tumor (4). Our experimental aspects and results were performed with the google colab GPU. The details of the training set are explained below: Exploiting the normal residual connection in layers 4, 6 and 8 i.e. we will connect (add) to the output of layer 4 the input of layer 3, similarly, the input of layer 5 will be added to the output of layer 6, and the input of layer 7 to the
186
A. Guennich et al.
output of layer 8. In addition, we use the Prelu function as the activation function. Also, the learning rate remains constant at 0.001, the RmsProp optimizer was used and the momentum value is set to 0.5. The model was trained for 35 epochs with a batch size of 10. Our new solution is evaluated on challenging clinical data from patients with traumatic brain injury. We present a quantitative study and comparative analysis with the state of the art in the table below. Table 1. Result obtained by our solution or the BRATS 2015 training data compared to other visible approaches at the time of manuscript submission. Showing only the teams that submitted more than half of the 274 cases. DSC
Precision
Sensitivity
Class
Whole Core Enh
Whole Core Enh
Whole Core Enh
Cases
Ensemble+CRF
90.1
75.4
72.8 91.9
85.7
75.5 89.1
71.7
74.4
274
Ensemble
90.0
75.5
72.8 90.3
85.5
75.4 90.4
71.9
74.3
274
DeepMedic
274
89.7
75.0
72.0 89.7
84.2
75.6 90.5
72.3
72.5
DeepMedic+CRF 89.8
75.0
72.1 91.5
84.4
75.9 89.1
72.1
72 .5 274
bakas1
88
77
68
90
84
68
89
76
75
186
peres1
87
73
68
89
74
72
86
77
70
274
anon1
84
67
55
90
76
59
82
68
61
274
thirs1
80
66
58
84
71
53
79
66
74
267
peyrj
80
60
57
87
79
59
77
53
60
274
Ours
92.8
91.3
95.0 93.0
91.6
94.7 92.8
91.3
95.4
274
We note that with the new momentum value equal to 0.5, we obtain a better result based on the publicly available BRATS 2015 data. The result mentioned in Table 1 shows that our new solution is very efficient compared to others, which allows its adoption in a variety of researches and clinical settings.
5 Conclusion In this work, a medical image segmentation method was designed based on the 3D CNN used, and the main objective is to achieve improvement in some aspects of steganography, especially visibility. Our implementation extended the single image steganography model proposed by the recent implementation. We relied heavily on visual perception [43–47] for overall loss and did not experiment with various types of losses that might have been better suited to our model. We can use more images or other data sets to improve our method. We have also shown in this paper that the value 0.5 instead of 0.6 for momentum leads to a better result. We plan for future work to apply our approach on other data sets for remote monitoring [48].
An Improved Model for Semantic Segmentation of Brain Lesions
187
References 1. Kamnitsas, K., et al.: Efficient multi-scale 3d CNN with fully connected CRF for accurate brain lesion segmentation. Med. Image Anal. 36, 61–78 (2016) 2. Ding, K., et al.: Cerebral atrophy after traumatic white matter injury: correlation with acute neuroimaging and outcome. J. Neurotrauma 25(12), 1433–1440 (2008) 3. Moen, K.G., et al.: A longitudinal MRI study of traumatic axonal injury in patients with moderate and severe traumatic brain injury. J. Neurol. Neurosurg. Psychiatry 83, 1193–1200 ((2012)) 4. Maas, A.I., et al.: Participants, Investigators, et al. Collaborative european neurotrauma effectiveness research in traumatic brain injury (CENTER-TBI): a prospective longitudinal observational study. Neurosurgery 76(1), 67–80 (2015) 5. Warner, M.A.: Assessing spatial relationships between axonal integrity, regional brain volumes, and neuropsychological outcomes after traumatic axonal injury. J. Neurotrauma 27(12), 2121–2130 (2010) 6. Carey, L.M., et al.: Connelly Beyond the lesion: neuroimaging foundations for post-stroke recovery. Future Neurol. 8(5), 507–527 (2013) 7. Wen, P.Y., et al.: Updated response assessment criteria for high-grade gliomas: response assessment in neuro-oncology working group. J. Clin. Oncol. 28(11), 1963–1972 (2010) 8. Menze, B.H., et al.: The multimodal brain tumor image segmentation benchmark (BRATS) Med. Imaging IEEE Trans. 34(10), 1993–2024 (2015) 9. Irimia, A., et al.: Neuroimaging of structural pathology and connectomics in traumatic brain injury: toward personalized outcome prediction. NeuroImage: Clin. 1(1), 1–17 (2012) 10. Yuh, E.L., Cooper, S.R., Ferguson, A.R., Manley G.T.: Quantitative CT improves outcome prediction in acute traumatic brain injury. J. Neurotrauma 29(5), 735–746 (2012) 11. Prastawa, M., Bullitt, E., Ho, S., Gerig, G.: A brain tumor segmentation framework based on outlier detection. Med. Image Anal. 8(3), 275–283 (2004) 12. Doyle, S., Vasseur, F., Dojat, M., Forbes, F.: Fully automatic brain tumour segmentation from multiple MR sequences using hidden markov fields and variational EM Procs. NCI-MICCAI BRATS, pp. 18–22 (2013) 13. Gooya, A., Pohl, K.M., Bilello, M., Biros, G., Davatzikos, C.: Joint segmentation and deformable registration of brain scans guided by a tumor growth model. In: Fichtinger, G., Martel, A., Peters, T. (eds.) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2011. MICCAI 2011. LNCS vol. 6892. Springer, Berlin, Heidelberg (2011). 14. Parisot, S., Duffau, H., Chemouny, S., Paragios, N.: Joint tumor segmentation and dense deformable registration of brain MR images. In: Ayache, N., Delingette, H., Golland, P., Mori, K. (eds.) MICCAI 2012. LNCS, vol. 7511, pp. 651–658. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33418-4_80 15. Liu, X., Niethammer, M., Kwitt, R., McCormick, M., Aylward, S.: Low-rank to the rescue – atlas-based analyses in the presence of pathologies. In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI 2014. LNCS, vol. 8675, pp. 97–104. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10443-0_13 16. Weiss, N., Rueckert, D., Rao, A.: Multiple sclerosis lesion segmentation using dictionary learning and sparse coding. In: Mori, K., Sakuma, I., Sato, Y., Barillot, C., Navab, N. (eds.) MICCAI 2013. LNCS, vol. 8149, pp. 735–742. Springer, Heidelberg (2013). https://doi.org/ 10.1007/978-3-642-40811-3_92 17. Ye, D.H., Zikic, D., Glocker, B., Criminisi, A., Konukoglu, E.: Modality propagation: coherent synthesis of subject-specific scans with data-driven regularization. In: Mori, K., Sakuma, I., Sato, Y., Barillot, C., Navab, N. (eds.) MICCAI 2013. LNCS, vol. 8149, pp. 606–613. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40811-3_76
188
A. Guennich et al.
18. Cardoso, M.J., Sudre, C.H., Modat, M., Ourselin, S.: Template-based multimodal joint generative model of brain data. In: Ourselin, S., Alexander, D., Westin, C.F., Cardoso, M. (eds.) Information Processing in Medical Imaging. IPMI 2015. LNCS, vol. 9123. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19992-4_2 19. Erihov, M., Alpert, S., Kisilev, P., Hashoul, S.: A cross saliency approach to asymmetry-based tumor detection. In: Navab, N., Hornegger, J., Wells, W., Frangi, A. (eds.) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. MICCAI 2015. LNCS, vol. 9351. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_7 20. Geremia, E., Menze, B.H., Clatz, O., Konukoglu, E., Criminisi, A., Ayache, N.: Spatial decision forests for MS lesion segmentation in multi-channel MR images. In: Jiang, T., Navab, N., Pluim, J.P.W., Viergever, M.A. (eds.) MICCAI 2010. LNCS, vol. 6361, pp. 111–118. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15705-9_14 21. Zikic, D., et al.: Decision forests for tissue-specific segmentation of high-grade gliomas in multi-channel MR. In: Ayache, N., Delingette, H., Golland, P., Mori, K. (eds.) MICCAI 2012. LNCS, vol. 7512, pp. 369–376. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3642-33454-2_46 22. Rao, A., Ledig, C., Newcombe, V., Menon, D., Rueckert, D.: Contusion segmentation from subjects with traumatic brain injury: a random forest framework. In: 2014 IEEE 11th International Symposium on, Biomedical Imaging (ISBI), pp. 333–336. IEEE (2014) 23. Tustison, N., Wintermark, M., Durst, C., Brian, A.: ANTs and Arboles in proc of BRATSMICCAI (2013) 24. http://braintumorsegmentation.org/, www.isles-challenge.org 25. Fourati, J., Othmani, M., Ltifi, H.: A hybrid model based on convolutional neural networks and long short-term memory for rest tremor classification. In: ICAART, vol. 3, pp. 75–82 (2022) 26. Salah, K.B., Othmani, M., Kherallah, M.: A novel approach for human skin detection using convolutional neural network. Vis. Comput. 38(5), 1833–1843 (2021). https://doi.org/10. 1007/s00371-021-02108-3 27. Pereira, S., Pinto, A., Alves, V., Silva, C.A.: Deep convolutional neural networks for the segmentation of gliomas in multi-sequence MRI. In: Crimi, A., Menze, B., Maier, O., Reyes, M., Handels, H. (eds.) Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes 2015. LNCS, vol. 9556. Springer, Cham (2016). https://doi.org/10.1007/ 978-3-319-30858-6_12 28. Havaei, M., et al.: Brain tumour segmentation with deep neural networks. arXiv preprint arXiv:1505.03540. (2015) 29. Roth, H.R., et al.: A new 2.5D representation for lymph node detection using random sets of deep convolutional neural network observations. In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI 2014. LNCS, vol. 8673, pp. 520–527. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10404-1_65 30. Li, R., et al.: Deep learning based imaging data completion for improved brain disease diagnosis. In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI 2014. LNCS, vol. 8675, pp. 305–312. Springer, Cham (2014). https://doi.org/10.1007/978-3-31910443-0_39 31. Lyksborg, M., Puonti, O., Agn, M., Larsen, R.: An Ensemble of 2D convolutional neural networks for tumor segmentation. In: Paulsen, R., Pedersen, K. (eds.) Image Analysis. SCIA 2015. LNCS, vol. 9127. (2015). https://doi.org/10.1007/978-3-319-19665-7_17 32. Brebisson, A., Montana, G.: Deep neural networks for anatomical brain segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 20–28 (2015)
An Improved Model for Semantic Segmentation of Brain Lesions
189
33. Brosch, T., Yoo, Y., Tang, L.Y.W., Li, D.K.B., Traboulsee, A., Tam, R.: Deep convolutional encoder networks for multiple sclerosis lesion segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 3–11. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_1 34. Urban, G., Bendszus, M., Hamprecht, F., Kleesiek, J.: Multi-modal brain tumor segmentation using deep convolutional neural networks in proc of BRATS-MICCAI (2014) 35. Cire¸san, D.C., Giusti, A., Gambardella, L.M., Schmidhuber, J.: Mitosis detection in breast cancer histology images with deep neural networks. In: Mori, K., Sakuma, I., Sato, Y., Barillot, C., Navab, N. (eds.) MICCAI 2013. LNCS, vol. 8150, pp. 411–418. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40763-5_51 36. Havaei, M., et al.: Brain tumour segmentation with deep neural networks. arXiv preprint (2015) 37. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) 38. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-31924574-4_28 39. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015) 40. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) 41. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015) 42. He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. arXiv preprint arXiv:1603.05027 (2016) 43. Ltifi, H., Benmohamed, E., Kolski, C., Ben Ayed, M.: Adapted visual analytics process for intelligent decision-making: application in a medical context. Int. J. Inform. Technol. Dec. Mak. 19(01), 241–282 (2020) 44. Ltifi, H., Ben Ayed, M., Kolski, C., Alimi, A.M.: HCI-enriched approach for DSS development: the UP/U approach. In: 2009 IEEE Symposium on Computers and Communications, pp. 895–900. IEEE (2009) 45. Ltifi, H., Ben Ayed, M., Trabelsi, G., Alimi, A.M.: Using perspective wall to visualize medical data in the Intensive Care Unit. In: 2012 IEEE 12th International Conference on Data Mining Workshops, pp. 72–78, IEEE (2012) 46. Jemmaa, A.B., Ltifi, H., Ayed, M.B.: Multi-agent architecture for visual intelligent remote healthcare monitoring system. In: Abraham, A., Han, S., Al-Sharhan, S., Liu, H. (eds.) Hybrid Intelligent Systems. HIS 2016. AISC, vol. 420. Springer, Cham (2016). https://doi.org/10. 1007/978-3-319-27221-4_18 47. Ellouzi, H., Ltifi, H., Ben Ayed, M.: New multi-agent architecture of visual intelligent decision support systems application in the medical field. In: 2015 IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA) pp. 1–8. IEEE(2015) 48. Benjemmaa, A., Ltifi, H., Ayed, M.B.: Design of remote heart monitoring system for cardiac patients. In: Barolli, L., Takizawa, M., Xhafa, F., Enokido, T. (eds.) AINA 2019. AISC, vol. 926, pp. 963–976. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-15032-7_81
Experimental Analysis on Dissimilarity Metrics and Sudden Concept Drift Detection Sebasti´an Basterrech1(B) , Jan Platoˇs1 , Gerardo Rubino2 , and Michal Wo´zniak3 1
3
ˇ - Technical University of Ostrava, Ostrava, Czech Republic VSB {Sebastian.Basterrech,Jan.Platos}@vsb.cz 2 INRIA Rennes – Bretagne Atlantique, Rennes, France [email protected] Wroclaw University of Science and Technology, Wroclaw, Poland [email protected]
Abstract. Learning from non-stationary data presents several new challenges. Among them, a significant problem comes from the sudden changes in the incoming data distributions, the so-called concept drift. Several concept drift detection methods exist, generally based on distances between distributions, either arbitrarily selected or contextdependent. This paper presents a straightforward approach for detecting concept drift based on a weighted dissimilarity metric over posterior probabilities. We also evaluate the performance of three well-known dissimilarity metrics when used by the proposed approach. Experimental evaluation has been done over ten datasets with injected sudden drifts in a binary classification context. Our results first suggest choosing the Kullback-Leibler divergence, and second, they show that our drift detection procedure based on dissimilarity measures is pretty efficient.
1
Introduction
In many real-world problems, a data stream may suddenly change its distributions. This phenomenon is commonly called concept drift. One of the most commonly used approaches, which is a reaction to the occurrence of this phenomenon, is its detection with the use of so-called drift detectors. A drift detector signals that the change in data distribution is significant and requires reconstruction or upgrade of the used model [1]. So far, a number of methods have been proposed on how to construct drift detectors. However, most of them require either access to labels or access to prediction metrics of the used prediction model to make a decision. [2]. Other concept drift detectors are based on distances over the underlying data distributions [3–6]. A recent experimental framework for the drift detection evaluation can be found in [7]. To assess a concept drift detector’s performance, among the metrics measuring how different two distributions are, some usually considered ones are the number of true positive drift detections, the number of false alarms, the drift detection c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 190–199, 2023. https://doi.org/10.1007/978-3-031-35501-1_19
Experimental Analysis on Dissimilarity Metrics
191
delay, the confusion matrix, and so on. One difficulty here is that there is typically a cost-benefit trade-off to find between different metrics [8]. The mentioned metrics are more often arbitrarily selected, or they are selected according to the characteristics of the data and the specific problem at hand. Nevertheless, the metric choice for identifying changes in the probability distributions is a crucial decision addressed in this article. The contributions of this brief paper are two-fold. (i) First, we specify a universal drift detector method in a supervised context without considering any assumptions about data, and we explore the impact of the two main parameters of the proposed technique. (ii) Second, we empirically analyze the performance of our drift detector using three different and important dissimilarity metrics: KL-divergence, Hellinger distance and Wasserstein distance. The selection of these metrics is based on the fact that nowadays are often used in the learning area [5,9,10]. Our experimental results over ten simulated datasets with injected drifts show remarkable differences between Hellinger distance and the other two evaluated metrics. In addition, it seems to be also a difference between KL-divergence and Wasserstein distance that makes us provide insights about the advantages of relative entropy in cases of sudden drifts in binary multidimensional data. This paper also briefly describes the studied dissimilarity metrics. Then, we present the drift detector method and our general methodology. We report the results in Sect. 4. Finally, we conclude with some discussion on further studies.
2 2.1
Background Drift Detection Problem
Streaming data processing is usually related to problems where data comes in regular data chunks (blocks). Because we focus on the supervised context, we receive a long sequence of (input, output) values organized in chunks of common size K. Consider a system producing the output y ∈ Y when the input is u ∈ U. Formally, we receive a time series C1 , C2 , . . . where Ci is the chunk i, composed (k) (k) (1) (K) (k) (k) , with zi = ui , yi (that is, zi is of the K-length sequence zi , . . . , zi the (input,output) pair of the kth element in the ith chunk). We see the elements (1) (K) as a sample having size K of a random variable z over a discrete zi , . . . , z i set U × Y. The concept drift idea refers to the phenomenon that the probability distribution of z changes over time, i.e. there exists a point t such that the underlying distribution of {. . . , zt−2 , zt−1 , zt } is different from the distribution of {zt+Δ , zt+Δ+1 , . . .}. We refer to sudden (known also as abrupt) drift when Δ = 1 [3,11]. Observe that the change in the joint distribution can be provoked either by a change in the posterior distribution (Pr(y | u)) (referred as the real concept drift) or by a change in the independent variables collected in u (referred as the virtual concept drift) [11]. In this contribution, we are focusing only on the studies of abrupt real drifts.
192
2.2
S. Basterrech et al.
Re-visiting the Concepts on Dissimilarities
Let us consider discrete probability distributions, our case of interest. The context is the following: we have two discrete probability distributions (two probability mass functions, pmfs) p and q, defined on some common space S, and we want to measure how different they are. We review three different ways proposed for this purpose in computer science applications, primarily used in data mining. Kullback-Leibler divergence. The Kullback-Leibler (KL) divergence (also abusively called distance) between p and q (better said from p to q) is [9] p(s) . (1) p(s) log KL(p q) = q(s) s∈S
Observe that this is not strictly a distance as the other dissimilarities analyzed here. It is positive and its value is zero if and only if p and q are identical. In information theory, we know that KL(p q) is the quantity of information lost when we use q instead of p, or as an approximation of p. The KL divergence does not satisfy the triangular inequality, in general. Several variations of the canonical KL divergence have been introduced in the literature to reach the symmetry property. Also, observe that the expression defining this divergence needs the sum taken for all values s where p and q are not zero. This leads to some technical issues relevant to our work. In case of zero values, a correction is proposed, see [4,9]. Despite the mentioned inconveniences, the KL divergence also has several advantages, such that: there exists a relationship with the expected value of likelihood ratio, several hypothesis tests are equivalent to KL divergence, and in case of some specific distributions KL divergence computation can be performed very fast Pinsker’s inequality, and so on. For more details, please see [4,9,12]. Hellinger Distance. The definition is as follows [13]: 2 1/2 p(s) − q(s) . H(p, q) =
(2)
s∈S
This is a distance, so it is equal to zero if both distributions are the same [5]. A particularly interesting property of the Hellinger distance is that it is bounded, √ the H(p, q) values are in [0, 2]. Wasserstein Distance. Let Γ = Γ (p, q) stands for the set of pmfs on S2 having p and q as marginals. Then, given some real ν ≥ 1, the ν-Wasserstein distance Wν (p, q) between the two distributions is ν 1/ν Wν (p, q) = inf Expf dist(X, Y ) , f ∈Γ
where dist(.,.) denotes the Euclidean distance and (X, Y ) is a pair of random variables having distribution f ∈ Γ . The implementation of this distance has technical issues [10], and the usual approach is to get approximations of the theoretical value. This is provided by available packages, like the one used in this paper (see below).
Experimental Analysis on Dissimilarity Metrics
3
193
Methodology
Computation of Dissimilarity Scores. For computing the dissimilarities between two distributions, first we need to build a descriptor of the distribution of the data [3]. Here, we use the standard estimator as a descriptor based on the binning strategy. Following the previous notation, we receive in the ith chunk a (k) (k) (1) (K) (k) , with zi = ui , yi . Then, for the latK-length sequence zi , . . . , zi ter, that is, for the output values of the system, the number Pr(yi = ), for any ∈ Y, is naturally estimated by its standard estimator K (k)
i = ) = 1 Pr(y 1(yi = ), K k=1
where we denote by 1(P ) the indicator function. For an input u to the system, we apply the binning strategy decomposing the input space U into J disjoint “bins” b(1) , b(2) , . . . , b(J) . For the the Ci th chunk the conditional probability of having a class in a specific bin bji is estimated by
yi = | {ui : uk ∈ bj , ∀k} = Pr i i
K
k=1
1((uki ∈ bji ) ∩ (yik = l)) . K j k k=1 1(ui ∈ bi )
(3)
Then, by applying expression (3) for each output class we may compute the
i | {ui : uk ∈ bj , ∀k} . Hence, each bin has assoprobability mass function Pr(y i i ciated a pmf, and then we evaluate a change through any two chunks Ci and Ct as follows
i | {ui : uk ∈ bj , ∀k} , Pr(y
t | {ut : uk ∈ bjt , ∀k} , (4) dji,t = φ Pr(y i t i where φ(·) is any selected function for estimating the distribution dissimilarity. Finally, we aggregate the estimated dissimilarity for covering the whole input space (for all the bins) J 1 j Φ(Ci , Ct ) = d . (5) J j=1 i,t Finally, we modify the previous aggregation form using a weighted sum. We consider the chance of sampling in a specific bin Φ(Ci , Ct ) =
J 1 j j γ d , J j=1 i i,t
(6)
where the weight γij is the probability estimation of sampling in a specific region bji of the reference chunk Si γij =
K 1 1(uki ∈ bji ). K k=1
194
S. Basterrech et al.
Decision Rule Using a Variance-Based us consider Threshold. Now, let windows of chunks W1 , W2 , . . . where Wi = Ci , Ci+1 , . . . , Ci+N −1 . We employ the previous approach again to look for changes in the data distributions but see a block of chunks as a sliding window on the series of chunks, having KN instances. We proceed as before, except that instead of comparing two windows starting at chunks Ci and Ci+1 , we shift the blocks by N individual chunks, which is, we compare the window starting at chunk i with the one starting at chunk i + N . For each N individual chunk is possible to compute a new dissimilarity score by collecting the chunks in batches (windows) and computing a dissimilarity score Φ(Wj , Wj+1 ) applying expression (6). Therefore, a sequence of dissimilarity scores is generated Φ(W1 , W2 ), Φ(W2 , W3 ), . . . , Φ(Wj , Wj+1 ). It is necessary to define a procedure for identifying locations where critical points occur to make an automatic decision. Let mj be the mean of the dissimilarity scores until the last processed window Wj , and σk the standard deviation of this sequence. Given a new dissimilarity score value Φ(Wj , Wj+1 ), we decide that a drift occurs when Φ(Wj , Wj+1 ) ∈ / [mk − ασk , mk + ασk ],
(7)
where α is a threshold parameter. This specific decision rule is inspired by techniques for artifact and outliers detection [14,15]. Window length and α value are the main parameters of the method. A larger α may increase the chance of false negatives. When α is too small, then the chances of false positives increase. Here, we analyze only scenarios where the windows are disjoint, and the α values are static (we don’t modify them according to changes in the data). Another parameter that has an impact on the results is the number of bins. It impacts the pmf estimation. A large number also increases computational costs. After a preliminary evaluation, we decided to present results using J = 5 × dim(U) homogeneous bins, where dim(U) denotes the dimensionality of the input space. Methodological Approach Overview. The concept drift detector method analyzed here is summarized in the high-level workflow presented in Fig. 1. It has the following main steps: (i) Homogeneous partition of the input space. We decompose the input space into disjoint bins using parameterized range constraints. The search for the best splitting hyper-planes in U is out of the scope of this paper. Here, we decided to create homogeneous partitions following the standard binning strategy. (ii) Posterior probability estimation. The probability mass function is estimated by applying the expression (3). Note that the conditional distribution is made for each partition of the input space. (iii) Dissimilarity metric aggregation. In this step, we apply a weighted dissimilarity (expression (6)) for computing an aggregated score among the values computed in each partition. (iv) Decision rule. Given a new batch of data, we identify either a drift occurred or not using expression (7).
Experimental Analysis on Dissimilarity Metrics
195
Fig. 1. High-level flowchart of investigated algorithm.
4
Experimental Study
In the previous section, we provided a framework for explicitly monitoring the data stream and detecting if a drift occurs. We hypothesize that proposed measures could be used as the base for a decision about drift. We designed experiments to compare the performance of the three mentioned earlier dissimilarity metrics. We used simulated data streams where the drift appearances are marked. In this ongoing work, we study only binary datasets with injected sudden concept drifts. The analyzed window lengths are {250, 500, 1000, 2000, 4000}, and we studied α values over a large domain (the specific range depended on the metric). Benchmark Data Streams. We employed 10 datasets in our performance evaluation studies. We generated 5 binary datasets with 3 features and 5 datasets with 5 features, all of them were created using the stream-learn library [16]. Each data stream has 10000 chunks with 250 instances and 20 induced sudden concept drifts. The stream-learn library is useful for generating a wide range of datasets with injected drifts, it has the additional advantage that provides the time-stamp where the drifts were injected. The stream-learn simulator has a parameter that determines how sudden the change of drift concept is. We used the maximum allowed value for this parameter. More details about the simulation of data streams with sudden drift concepts are specified in [16]. Performance Evaluation. We chose the standard metrics: sensitivity, precision, balance accuracy score (BAC) and F1-score [7,8]. Results. According to our empirical results, we do not appreciate notable differences between results over data with 3 and 5 features. However, we obviously cannot affirm similar behavior in larger input space dimensions. Results obtained by the KL-divergence dissimilarity, the Hellinger distance, and the Wasserstein distance are presented in the Figs. 2, 3, and 4, respectively. Each of these figures
196
S. Basterrech et al.
has two graphics, in the left graphic is presented the specificity according to the window length, and the right side is shown the precision according to the window length. We present the results of the specificity metric over datasets with 5 features and the precision obtained over data with 3 features. Each graphic has several curves resulting from different experiments over five datasets. A common behavior in the figures is that the window length is a relevant parameter, which is intuitive because it directly affects the distribution estimation. Another characteristic is that the specificity decreases when the window length is large. On the other hand, the precision also is impacted by the window length, but it seems more stable in the case of KL and Hellinger metrics than in the case of Wasserstein distance. Let us note that from previously described figures, Hellinger distance seems less competitive than the other two metrics. For illustrative reasons, the α threshold used for creating the mentioned curves was empirically tuned to obtain 20 drifts during the whole stream. Figures 5 and 6 show results over different threshold values α. From Fig. 2 and Fig. 4 we see a minor difference between KL and Wasserstein dissimilarities. Then, we also present a specific comparison between KL-divergence and Wasserstein distance for different α thresholds using BAC and F1-score values. We fixed the window length to 500 instances (a value that both metrics perform “pretty well” according to Fig. 2 and Fig. 4). Figure 5 presents two graphics with BAC results, and Fig. 6 has two figures with a comparison between KL and Wasserstein using F1-scores. We appreciate a slight difference between both metrics from the obtained results, indicating that KL-divergence has a better global performance. It also seems that KL is more robust, i.e., it is less sensitive to the window length and the α value. In addition, KL-divergence is faster than the computation of Wasserstein distance. Experimental Protocol and Implementation. We used python v3.9, the libraries numpy v1.19.5, stream-learn v0.8.16 and scipy.stats v1.5.4. We used the scipy.stats.wasserstein distance function for computing the Wasserstein distance; KL-divergence and Hellinger distance were implemented by us based on numpy functions.
Fig. 2. KL-divergence: specificity and precision according to window length.
Experimental Analysis on Dissimilarity Metrics
197
Fig. 3. Hellinger distance: specificity and precision according to window length.
Fig. 4. Wasserstein: specificity and precision according to window length.
Fig. 5. Comparison using BAC between KL-divergence and Wasserstein distance.
198
S. Basterrech et al.
Fig. 6. Comparison using f1-score between KL-divergence and Wasserstein distance.
5
Conclusions and Future Work
We presented a drift detection method based on the evaluation of changes over the empirical statistical distributions of the data. The method does not require any assumptions about the data. We show its performances using three wellknown dissimilarity metrics over binary data with sudden drifts. We compare the behavior of each of the metrics. It is interesting to note that the KL divergence obtains better results globally, and with it, our proposed detector achieves good performance. Further work needs to be done exploring real data to analyze statistical differences between the results and other types of concept drifts. Note that the number of bins grows exponentially with the number of dimensions. Then, the binning strategy has well-known limitations in high-dimensional data. For this reason, we also plan to explore other data descriptors. Acknowledgements. This work was supported by the CEUS-UNISONO programme, which has received funding from the National Science Centre, Poland under grant agreement No. 2020/02/Y/ST6/00037, and the GACR-Czech Science Foundation project No. 21-33574K “Lifelong Machine Learning on Data Streams”. It was also supported by the ClimateDL project (code 22-CLIMAT-02) belonging to the Climate AmSud programme, where the central problem is forecasting extreme temperatures in future periods such as in the following summer.
References 1. Sobolewski, P., Wo´zniak, M.: Concept drift detection and model selection with simulated recurrence and ensembles of statistical detectors. J. Univers. Comput. Sci. 19(4), 462–483 (2013) ˇ 2. Gama, J., Zliobait˙ e, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv., 46(4), 441–4437 (2014) 3. Hinder, F., Vaquet, V., Hammer, B.: Suitability of Different Metric Choices for Concept Drift Detection, Arxiv (2022) 4. Basterrech, S., Wo´zniak, M.: Tracking changes using Kullback-Leibler divergence for the continual learning. IEEE SMC’2022. ArXiv (2022)
Experimental Analysis on Dissimilarity Metrics
199
5. Ditzler, G., Polikar, R.: Hellinger distance based drift detection for nonstationary environments. In IEEE Symposium on Computational Intelligence in Dynamic and Uncertain Environments (CIDUE), 41–48, (2011) 6. Brzezinski, D., Stefanowski, J.: Reacting to different types of concept drift: the accuracy updated ensemble algorithm. IEEE Trans. Neural Netw. Lear. Syst. 25(1), 81–94 (2014) 7. Gonzalvez, P.M., Santos, S.D.C., Barros, R., Vieira, D.: A comparative study on concept drift detectors. Expert Syst. Appl. 41(18), 8144–8156 (2014) 8. Gustafsson, F.: Adaptive Filtering and Change Detection. Wiley, (2000) 9. Dasu, T., Krishnan, S., VenkataSubramanian, S.: An information-theoretic approach to detecting changes in multidimensional data streams. Interfaces, 1–24, (2006) 10. Faber, K., Corizzo, R., Sniezynski, B., Baron, M., Japkowicz, N.: WATCH: wasserstein Change Point Detection for High-Dimensional Time Series Data. In: 2021 IEEE Int. Conf. on Big Data (Big Data), 4450–4459 (2021) 11. Igor, G., Webb, G.: Survey of distance measures for quantifying concept drift and shift in numeric data. Know. Inf. Syst. 591–615, (2019) 12. Cover, T.M., Thomas, J.A.: Elements of information theory. John Wiley Sons, (2012) 13. Gibbs, A.L., Su, F, E: On choosing and bounding probability metrics. Available in ArXiv, (2002) 14. Basterrech, S., Kr¨ omer, P.: A nature-inspired biomarker for mental concentration using a single-channel EEG. Neural Comput. Appl. (2019) 15. Basterrech, S., Bobrov, P., Frolov, A., H´ usek, D.: Nature-Inspired Algorithms for Selecting EEG Sources for Motor Imagery Based BCI. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2015. LNCS (LNAI), vol. 9120, pp. 79–90. Springer, Cham (2015). https:// doi.org/10.1007/978-3-319-19369-4 8 16. Ksieniewicz, P., Zyblewski, P.: stream-learn—open-source python library for difficult data stream batch analysis. Neurocomputing (2022)
Can Post-vaccination Sentiment Affect the Acceptance of Booster Jab? Blessing Ogbuokiri1,2(B) , Ali Ahmadi3 , Bruce Mellado1,5 , Jiahong Wu1,2 , James Orbinski1,6 , Ali Asgary1,4 , and Jude Kong1,2(B) 1
6
Africa-Canada Artificial Intelligence and Data Innovation Consortium (ACADIC), York University, Toronto, Canada {blessogb,jdkong}@yorku.ca 2 Laboratory for Industrial and Applied Mathematics, York University, Toronto, Canada 3 Faculty of Computer Engineering, K.N. Toosi University, Tehran, Iran 4 Advanced Disaster, Emergency and Rapid-Response Simulation (ADERSIM), York University, Toronto, ON, Canada 5 School of Physics, Institute for Collider Particle Physics, University of the Witwatersrand, Johannesburg, South Africa Dahdaleh Institute for Global Health Research, York University, Toronto, Canada
Abstract. In this paper, Twitter posts discussing the COVID-19 vaccine booster shot from nine African countries were classified according to sentiments to understand the effect of citizens’ sentiments towards accepting the booster shot. The number of booster shot-related tweets significantly positively correlated with the increase in booster shots across different countries (Corr = 0.410, P = 0.028). Similarly, the increase in the number of positive tweets discussing booster shots significantly positively correlated with the increase in positive tweet intensities (Corr = 0.992, P < 0.001). The increase in intensities of positive tweets also positively correlated with an increase in likes and re-tweets (Corr = 0.560, P < 0.001). Topics were identified from the tweets using the LDA model, including – booster safety, booster efficacy, booster type, booster uptake, and vaccine uptake. The 77% of tweets discussing these topics are mostly from South Africa, Nigeria (19%), and Namibia (3%). Our result showed that there is an average 45.5% chance of tweets discussing these topics carrying positive sentiments. The outcome suggests that users’ expressions on social media regarding booster shots could
This research is funded by Canada’s International Development Research Centre (IDRC) and the Swedish International Development Cooperation Agency (SIDA) (Grant No. 109559-001). JDK acknowledges support from IDRC (Grant No. 109981), New Frontier in Research Fund- Exploratory (Grant No. NFRFE-2021-00879) and NSERC Discovery Grant (Grant No. RGPIN-2022-04559). B.O. and JDK acknowledges support from the Dahdaleh Institute for Global Health Research. The authors wish to acknowledge the Africa-Canada AI & Data Innovation Consortium (ACADIC) team at York University, Toronto, Canada and University of the Witwatersrand Johannesburg, South Africa. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 200–211, 2023. https://doi.org/10.1007/978-3-031-35501-1_20
Can Post-vaccination Sentiment Affect the Acceptance of Booster Jab?
201
likely affect the acceptance of booster shots either positively or negatively. This research should be relevant to health policy-makers in gathering insight from social media data for the management and planning of vaccination programs during a disease outbreak. Keywords: Vaccination · Booster shot Data mining · Sentiment analysis
1
· Boostered · Social media ·
Introduction
A vaccine could be seen as a biological composition that gives an active needed immune system against a particular infectious disease [12]. It contains disease agents that mimic a disease that causes micro-organisms and stimulates the immune system to build up defences against them [7]. Vaccines mostly are made from weakened or dead forms of the microbe, its toxins, or one of its surface proteins. Vaccination is the administration of a vaccine to help the immune system develop immunity from a disease [9,17]. During the outbreak of coronavirus in 2019, the United States government supported many vaccine production companies to accelerate their research into vaccine candidates that could be developed as fast as possible to fight against the pandemic [11]. This process was called Operation Warp Speed. Normally, vaccine producers will only begin phase II trials when phase I trials were done, and phase III after phase II. This way, vaccine production that will not work could be detected at an earlier stage, saving time and money. However, with government support, vaccine producers were able to perform different phases concurrently. Further, production began soon after the vaccines were found to be safe [22]. This approach was greeted with a lot of arguments and counterarguments, questioning the efficacy of the COVID-19 vaccines due to how fast they were produced. Between July and August 2020, Moderna and Pfizer produced their first COVID-19 vaccines [4]. This generated a lot of controversies, as conspiracy theorists and many others believed these companies could not have produced an effective vaccine within a short period. As vaccination programs continued across different countries, the news about the effect of these vaccines increased. Of course, there were cases of recorded deaths and other reactions after the consumption of the vaccines, which increased the existing anxiety and bias against vaccination, especially among those who are hesitant [16]. By September 2021, the U.S. Food and Drug Administration had approved the booster dose of PfizerBioNTech for use in the United States for a certain age population [24]. African countries were not left out in the distribution of these vaccines [13]. However, the vaccination program in Africa was faced with a lot of miss information which was mostly shared on social media [23]. Social Media became an effective tool for users to share and reshare information about how they feel or react after taking the vaccine, from the comfort of their homes. Be it the first dose, the second dose, or the booster dose, the information travelled very fast. Twitter has been one of the popular social media
202
B. Ogbuokiri et al.
platforms used to share such information about the effect of the booster shot [2,15]. This type of information expressed on Twitter forms sentiments which can weaken or strengthen the confidence level of other users well before they are vaccinated. People who believe their second dose is enough were happy to reject the booster shot given some of the sentiments from social media [15]. A lot of research has used the Natural Language Processing (NLP) approach to generate sentiment expressed on Twitter posts [15]. In [21], the emotional classification of Twitter posts and their corresponding intensities were associated with vaccination data to predict the dynamics in citizens’ emotional behaviour towards vaccination during Omicron. Similarly, the NLP approach was used to study the city-level variations in sentiments towards COVID–19 vaccine-related topics [19]. The topics were generated from geo-tagged Twitter posts in Cape Town, Durban, and Johannesburg. The dynamics in sentiments towards the COVID-19 vaccine in these cities were identified [19]. Additionally, machine learning classification algorithms, including Naive Bayes, Logistic Regression, Support Vector Machines, Decision Tree, and K-Nearest Neighbour were used to classify vaccine-related tweets to identify hesitancy hotspots in Africa [20]. In respect of the above, identifying the sentiment and topic discussion around the booster shot from a Twitter post seems a possible and very interesting thing to do. Hence, we collected 242,393 Twitter posts discussing COVID-19 vaccine booster shot from Botswana, Cameroon, Eswatini, Mozambique, Namibia, Nigeria, Rwanda, South Africa, and Zimbabwe, between January 2022 and October 2022. The tweets were classified according to sentiments and used to complement existing vaccination data in understanding the effect of citizens’ sentiments towards accepting the booster shot. The remainder of the paper is organised as follows. Section 2 presents the material and methods used in this research, followed by, the results in Sect. 3. In Sect. 4, we present the discussion of the results, and finally, Sect. 5 is the conclusion.
2
Materials and Methods
Here, we present different methodologies used in this study. 2.1
Data Collection
We extracted 242,393 historical tweets that mentioned the following keywords #boosterdose, second jab, 2nd jab, booster shot, booster, booster dose, and boostered from Botswana, Cameroon, Eswatini, Mozambique, Namibia, Nigeria, Rwanda, South Africa, and Zimbabwe, between January 2022 and October 2022, using the Twitter academic researcher API. Each Tweet contains most of the following features: Text, ID, Date, RetweetCount, ReplyCount, LikeCount, location, UserName, etc. The word cloud showing the most common word searched is shown in Fig. 1.
Can Post-vaccination Sentiment Affect the Acceptance of Booster Jab?
203
Fig. 1. Most common words used
Additionally, daily statistics of COVID–19 partial vaccination, fully vaccinations, and booster shots given by countries are collected from the latest updates of the Africa centre for disease control and prevention on progress made in COVID-19 vaccinations on the continent [1]. Figure 2 shows the summary of the percentage of vaccination data on the African map.
Fig. 2. Percentage of vaccination by country.
2.2
Data Preprocessing
User tweets are highly unstructured and contain a lot of information about the data they represent that may not be useful. The preprocessing of raw tweets is highly needed to make sense of the data. We extracted tweets text, date created, time created, likes, retweets, and location from the dataset into a dataframe using Pandas version 1.2.4 [14]. The tweets were prepared for NLP by removing the URLs, duplicate tweets, tweets with incomplete information, punctuations, special and non–alphabetical characters, non–English words, and Stopwords using the tweets-preprocessor toolkit version 0.6.0 [10], Natural Language Toolkit (NLTK version 3.6.2) [5], and Spacy2 toolkit (version 3.2) [8]. We also used the Spacy2 toolkit to perform the tokenization of the tweets. This process reduced the tweets in the dataset to 136,466 tweets.
204
B. Ogbuokiri et al. Table 1. Topics generated from LDA model.
Topic Ten Representative Words Number
Predicted Topic
0
[‘South Africa’, ‘health’, ‘19’, ‘vaccination’, ‘vaccine’, ‘safe’, ‘dose’, ‘covid19’, ‘booster’, ‘boosterdose’]
Booster safety
1
[‘vaccination’, ‘works’, effective’, ‘good’, ‘dose’, ‘booster’, Booster efficacy ‘1st’, ‘2nd’, ‘second’, ‘jab’]
2
[‘pfizer’, ‘J&J’, ‘astrazeneca’, ‘vaccine’, ‘omicron’, ‘moderna’, ‘type’, ‘covid’, ‘shots’, ‘booster’]
Booster type for omicron
3
[‘people’, ‘boostered’, ‘time’, ‘getting’, ‘got’, ‘shots’, ‘received’, ‘covid’, ‘shot’, ‘booster’]
Booster uptake
4
[‘vaccine’, ‘vaccination’, ‘1st’, ‘covid’, ‘shot’, ‘2nd’, ‘booster’, ‘second’, ‘got’, ‘jab’]
Vaccine uptake
2.3
Sentiment Analysis
The Valence Aware Dictionary for Sentiment Reasoning (VADER) available in the NLTK package was used to calculate the compound scores of the tweets and assign labels to the tweets [18,20]. The compound score was used to determine when the label (positive, negative, or neutral) can be assigned to a tweet. A compound score =0.5, x = 0 are assigned the label positive, negative, and neutral, respectively. Further, we also used TextBlob [3] from the NLTK package to calculate the subjectivity and polarity of the tweets. The labels positive, negative, and neutral were assigned to the tweets. We selected tweets that were assigned the same labels by VADER and Textblob models. The result was validated using Logistics Regression with an accuracy of 81% and Long Short-Term Memory (LSTM) with an accuracy of 88%. 49% of the tweets were classified positive, 27% negative, and 24% neutral. 2.4
Topic Modelling
The Latent Dirichlet Allocation (LDA), an unsupervised learning machine learning model was used for the topic modelling of the tweets, from the dataset, through the sklearn package in python. The LDA was used because it is assumed to be one of the popular models for this type of analysis, is easy to use, and has been successfully used and implemented in recent studies such as [19]. The LDA model predicted five relevant topics. The topics and their associated representative words are shown in Table 1. In Table 2, we summarised selected sample tweets and expected topics they were discussing according to the country. Table 2 shows that for a given identified topic, there could be more than one different sentiment expression on the topic during discussion. That is, people see these topics differently. Their opinions are usually different from one another, even if the topic of discussion is the same.
Can Post-vaccination Sentiment Affect the Acceptance of Booster Jab?
205
Table 2. Selected tweets with associated topic sentiment by country Topic No Tweets
Sentiment Country
0
Boostered, go get yours, its safe guys How safe is the booster, Nzanzi. Any after effect? This booster thing is not for me, no matter how safe it is
Positive Neutral Negative
1
If you can take the booster please do, Omicron is real Positive Took the booster, still tested positive for Omicron. Nonsense! Negative Does booster protect from Omicron infection? Neutral
Rwanda Rwanda Nigeria
2
Is pandemic shifting endemic, all the booster must accessible Which is vaccine is better for booster, I hate feeling sick My company requires JnJ booster, any side effect?
Positive Negative Neutral
South Africa Namibia South Africa
3
Got a booster jab today, let’s get vaccinated practising what I preach I took vaccine booster I’ve sick since Tuesday, temperature high painful tonsils Girl I need go booster thing
Positive
Botswana
Negative
South Africa
Neutral
South Africa
Please get vaccinate and stay safe My first vaccine jab 29 July 2021 bad all, even experience side effects No vaccine, no life, funny people
Positive Negative
Namibia South Africa
Neutral
Cameroon
4
3
South Africa South Africa Zimbabwe
Results
In this part, we focus only on the positive and negative sentiment classes expressed on the booster tweet discussions and present the results of the correlation accordingly. The increase in the number of booster shots given in Fig. 2 was correlated with the increase in tweets discussing the booster shots. The result showed a significantly positively correlation Corr = 0.410, P = 0.028 in all countries except Cameroon and Eswatini that do not have any record of booster shot data. This goes a to suggest that the sentiment of booster-related discussions could affect booster acceptance. We now, present the result of negative and positive sentiments in relation to the booster shots. 3.1
Booster Negative Sentiment Analysis
The increase in negative tweets was first correlated with the tweet intensities (see Fig. 3). The result showed a strong negative correlation Corr = -0.985, P < 0.001. This simply means that, as the negative tweets increased, the intensities continued to increase negatively. Within the months of March to June, the increase in tweets and negative intensities behaved the same way. However, from July 2022 to October 2022, the graphs grew in opposite directions. Secondly, the correlation of the negative tweets with the likes and retweets also showed an average negative correlation Corr = −0.145, P = 0.06 (see Fig. 4). In June 2022, there is an upsurge of likes and retweets while the negative intensities increased negatively. There is
206
B. Ogbuokiri et al.
Fig. 3. Negative tweets vs negative intensities.
an intermittent upsurge between the likes and retweets while the negative tweet intensities increased. The increase in the negative tweet intensities negatively correlated with the increase in booster shots Corr = −0.650, P = 0.03. This shows that the negative sentiments expressed on the discussions about the booster shot on Twitter could affect the acceptance of the booster vaccines negatively.
Fig. 4. Negative tweets intensities vs likes and retweets.
3.2
Booster Positive Sentiment Analysis
When the increase in positive tweets was correlated with the tweet intensities (see Fig. 5). We observed an upsurge in the month of January 2022 and July
Can Post-vaccination Sentiment Affect the Acceptance of Booster Jab?
207
2022, although the booster shot was not yet available in January 2022 in most African countries, however, the surge could be regarded as the positive sentiments expressed in the discussion of the other vaccines.
Fig. 5. Positive tweets vs positive intensities.
The result showed a statistically significant positive correlation Corr = 0.992, P < 0.001. This means that, as the positive tweets increased, the intensities continued to increase positively as well. Secondly, the correlation of the positive tweets with the likes and retweets also showed an average positive correlation Corr = 0.560, P < 0.001 (see Fig. 6). However, while the likes and retweets behaved almost the same way over the period, the positive intensities grew very high in July 2022.
Fig. 6. Positive tweets intensities vs likes and retweets.
208
B. Ogbuokiri et al. Table 3. Topic sentiment discussion by country tweets.
Country
Topic 0 Topic 1 Topic 2 Topic 3 Topic 4 Positive Negative Positive Negative Positive Negative Positive Negative Positive Negative
Botswana
120
2
60
16
22
4
52
4
100
Cameroon
0
0
0
0
2
0
0
0
2
0
59
1
9
0
7
0
16
1
16
16
Eswatini Mozambique
110
59
1
16
5
9
2
23
2
32
37
Namibia
1285
156
130
24
231
50
61
201
215
125
Nigeria
1437
199
4090
1146
952
500
1171
574
4937
5105
2
4
17
0
0
7
11
2
12
8
2609
357
4820
1286
11990
3169
20531
8804
11248
11505
6
0
2
0
2
0
10
6
0
2
Rwanda South Africa Zimbabwe
Additionally, the increase in the positive tweet intensities showed a statistically significant positive correlation with the increase in booster shots Corr = 0.880, P < 0.001. This goes to show that the positive sentiments expressed in the discussions about the booster shot on Twitter could affect the acceptance of the booster vaccines positively. 3.3
Topic Analysis
The topics generated in the discussions about the booster shot are analysed. Table 2 summarises the identified topics according to their sentiments by country. Topic 0 represents booster safety, Topic 1–booster efficacy, Topic 2–booster types, Topic 3–booster uptake, and Topic 4–booster uptake. From Table 2, countries like Botswana, Eswatini, Mozambique, and Namibia expressed more positive sentiments towards booster safety (Topic 0) than any other topic. Cameroon showed a very low significance in the expression of the topic’s sentiment. Nigeria expressed more positive sentiments towards the general vaccine uptake (Topic 4) than other topics. Meanwhile, countries like South Africa and Zimbabwe expressed more positive sentiment on the booster uptake topic (Topic 3) than any other topic. In Rwanda, there are more positive expressions on the booster efficacy topic (Topic 1) than any other topic. From Table 2, there is an average 45.5% chance of tweets discussing these topics to carry a positive sentiment. 3.4
Limitations
The Twitter data used for this research only reflects the opinion of Twitter users whose location was Botswana, Cameroon, Eswatini, Mozambique, Namibia, Nigeria, Rwanda, South Africa, and Zimbabwe from January 2022 to October 2022. The average age of online adults who use Twitter in these counties are between the age of 18–34 years [6]. Therefore, this research does not, at large, represent the opinion of all the people of the selected countries towards booster vaccines. However, this research only provided an insightful analysis of Twitter
Can Post-vaccination Sentiment Affect the Acceptance of Booster Jab?
209
data with existing booster vaccination data. Showing how data from social media could be leveraged to complement existing data in planning and management during an outbreak. It is also important to state here that the VADER, Textblob, and LDA are pre-trained models used for sentiment classification and topic modelling. VADER and Textblob do not have the capacity to properly label figurative language, such as sarcasm, pidgin English, and vernacular. However, since we only selected the tweets the VADER and Textblob gave the same label and got over 130k labelled tweets in our dataset. We verified the classification with the 81% and 88% accuracy scores achieved with the Logistics Regression and LSTM classification algorithms. We assume they were able to deal with the noise generated by this obvious challenge in multiclass classification and prediction.
4
Discussion
We have used a Python script to scrape tweets discussing booster shots in nine African countries and process the same to identify the sentiments around the topic of discussion. We observed a steady upsurge in both positive and negative tweet sentiment intensities in July 2022 and October 2022 respectively. Our results showed that positive and negative sentiments expressed towards the booster shot on social media could contribute to the acceptance of the booster shot. We identified a correlation between each of the sentiment classes with the increase in booster vaccination. We generated five topics using the LDA model. The topics are booster safety, booster efficacy, booster type for Omicron, booster uptake, and vaccine uptake. While there are more positive sentiment expressions towards booster safety in Botswana, only 16.2% of the population has received their booster shots. We believe these positive sentiments may have contributed to the 16.2% of the population who have received the booster shot. Countries like Cameroon and Eswatini do not have a trace of booster shot vaccines received by anyone in their countries. However, Cameroon only has 4.5% of the population to be fully vaccinated, while Eswatini has 33.5% of the population to be fully vaccinated, refer to Fig. 2. The 33.5% of the population that is fully vaccinated in Eswatini could only be traced to the high number of positive sentiments expressed in the discussions about booster safety, refer to Table 3. In Mozambique, 44.7% of the population was fully vaccinated. However, only 2.1% of the population received the booster vaccines, even though there is more positive sentiment discussing booster safety. In Namibia, only 3.6% of the population received the booster shot despite having more positive sentiments expressed about booster safety, see Table 3. At the time of this report, we are not sure if this is due to the availability of the booster or not. Meanwhile, Nigeria has only 0.4% of the population that received the booster shot, there are more positive sentiments expressed about the general vaccination topic than any topic. Rwanda has 42.9% of the population who received the booster shot, see Fig. 2. This is also demonstrated in the more positive sentiments expressed about the discussion on booster efficacy, see Table 3.
210
B. Ogbuokiri et al.
Meanwhile, South Africa and Zimbabwe have 6.2% and 7.6% of their populations who received the booster shot. There are more positive sentiments expressed towards the booster uptake topic than any other topic, see Table 3. Our research suggests that sentiments expressed about the booster shot on social media, such as Twitter, could impact the acceptance of the booster vaccines in Africa.
5
Conclusion
In this research, Twitter posts containing daily updates of location-based booster vaccine-related tweets were collected. The number of booster shots-related tweets significantly positively correlated with the increase in booster shots across different countries except Cameroon and Eswatini for lack of data. The increase in the number of positive and negative tweets discussing booster shots was correlated with the increase in positive and negative tweet intensities respectively. The increase in intensities of positive and negative tweets was also correlated with the increase in likes and re-tweets. The LDA model was used to generate five topics from the tweets. Our result showed that there is an average 45.5% chance of tweets discussing these topics carrying positive sentiment. However, while more users expressed positive sentiment on booster uptake discussions (Topics 3), the rate of the booster vaccine uptake is still very low across different countries. This is worth investigating in the future. The outcome suggests that users’ expressions on social media could affect the acceptance of booster shots either positively or negatively. The authors believe that this research should be relevant to health policy-makers in gathering insight from social media data to support existing data for the management and planning of vaccination programs during a disease outbreak. Ethical Considerations. All retrieved tweets are in the public domain and are publicly available. However, the authors strictly followed the highest ethical principles in handling the personal information of Twitter users, as such, all personal information was removed. Conflict of Interest. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
References 1. Africa centre for disease control and prevention, April 2022. available Online. Accessed 10 June 2022 2. Al-Zaman, M.: Covid-19-related social media fake news in India. J. Media 2(5), 100–114 (2021) 3. Aljedaani, W., et al.: Sentiment analysis on twitter data integrating textBlob and deep learning models: the case of us airline industry. Knowl.-Based Syst. 255, 109780 (2022)
Can Post-vaccination Sentiment Affect the Acceptance of Booster Jab?
211
4. Angyal, A., et al.: T-cell and antibody responses to first bnt162b2 vaccine dose in previously infected and SARS-COV-2-Naive UK health-care workers: a multicentre prospective cohort study. Lancet Microbe 3(1), e21–e31 (2022) 5. Ganguly, S., Morapakula, S.N., Coronado, L.M.P.: Quantum natural language processing based sentiment analysis using lambeq toolkit. In: 2022 Second International Conference on Power, Control and Computing Technologies (ICPC2T), pp. 1–6 (2022) 6. Ghai, S., Magis-Weinberg, L., Stoilova, M., Livingstone, S., Orben, A.: Social media and adolescent well-being in the global south. Curr. Opin. Psychol. 46, 101318 (2022) 7. Hogan, M.J., Pardi, N.: mRNA vaccines in the Covid-19 pandemic and beyond. Annu. Rev. Med. 73, 17–39 (2022) 8. Honnibal, M.: spacy 2: Natural language understanding with bloom embeddings, convolutional neural networks and incremental parsing. Sentometrics Res. 1(1), 2586–2593 (2017) 9. Hussain, A., et al.: MRNA vaccines for Covid-19 and diverse diseases. J. Control. Release 345, 314–333 (2022) 10. Jang, H., Rempel, E., Roe, I., Adu, P., Carenini, G., Janjua, N.Z.: Tracking public attitudes toward Covid-19 vaccination on tweets in Canada: using aspect-based sentiment analysis. J. Med. Internet Res. 24(3), e35016 (2022) 11. Kesselheim, A.S., et al.: An overview of vaccine development, approval, and regulation, with implications for COVID-19. Health Aff. 40(1) (2020) 12. Lavelle, E., Ward, R.: Mucosal vaccines - fortifying the frontiers. Nat. Rev. Immunol. 22, 236–250 (2022) 13. Lawal, L., et al.: Low coverage of Covid-19 vaccines in Africa: current evidence and the way forward. Hum. Vaccines Immunotherapeutics 18(1), 2034457 (2022) 14. Li, F., et al.: What’s new in pandas 1.2.4. Available online. Accessed 01 June 2022 15. Marcec, R., Likic, R.: Using Twitter for sentiment analysis towards Astrazeneca/Oxford, Pfizer/Biontech and Moderna Covid-19 vaccines. Postgrad. Med. J. 10(5), 1–7 (2021) 16. Medeiros, K.S., Costa, A.P.F., Sarmento, A.C.A., Freitas, C.L., Gon¸calves, A.K.: Side effects of Covid-19 vaccines: a systematic review and meta-analysis protocol of randomised trials. BMJ Open 12(2) (2022) 17. Morens, D.M., Taubenberger, J.K., Fauci, A.S.: Universal coronavirus vaccines an urgent need. N. Engl. J. Med. 386(4), 297–299 (2022). pMID: 34910863 18. Obaido, G., et al.: An interpretable machine learning approach for hepatitis b diagnosis. Appl. Sci. 12(21) (2022) 19. Ogbuokiri, B., et al.: Public sentiments toward Covid-19 vaccines in South African cities: an analysis of twitter posts. Front. Publ. Health 10, 987376 (2022) 20. Ogbuokiri, B., et al.: Vaccine hesitancy hotspots in Africa: an insight from geotagged Twitter posts. TechRxiv, Preprint (2022) 21. Ogbuokiri, B., et al.: Determining the impact of omicron variant in vaccine uptake in South Africa using Twitter data. Submitted to Nat. Lang. Process. J. (2022) 22. Ritskes-Hoiting, M., Barell, Y., Kleinhout-Vliek, T.: The promises of speeding up: changes in requirements for animal studies and alternatives during Covid-19 vaccine approval-a case study. Animals 12(13), 1735 (2022) 23. Tasnim, S., Hossain, M., Mazumder, H.: Impact of rumors and misinformation on Covid-19 in social media. J. Prev. Med. Publ. Health 202(53), 171–174 (2021) 24. van Gils, M.J., et al.: A single MRNA vaccine dose in Covid-19 patients boosts neutralizing antibodies against SARS-COV-2 and variants of concern. Cell Rep. Med. 3(1), 100486 (2022)
Emotion Detection Based on Facial Expression Using YOLOv5 Awais Shaikh(B) , Mahendra Kanojia , and Keshav Mishra Sheth L.U.J and Sir M.V College, Mumbai, Maharashtra, India [email protected]
Abstract. Human emotions can be understood from facial expressions, a type of nonverbal communication. Facial emotion recognition is a technology that analyzes emotions from various sources such as images and videos. Emotion recognition is an important topic due to its wide range of applications. In this paper, the YOLOv5 (You Look Only Once) model was used to detect basic human emotions. Capturing facial expressions aids to identify the emotion based on which various suggestions can be proposed, such as songs and movie recommendations. There are numerous models for emotion detection in different architectural styles. In this article, we present a less explored YOLOv5 model. It is more accurate and gives a real time result than the previous detection algorithms. 40 images from the FER2013 dataset were used to train our YOLOv5 model and accuracy of 50% was achieved with the proposed model. Keywords: YOLOv5 · emotion detection · deep learning
1 Introduction Facial Emotion Recognition is a technology that analyzes emotions from many sources, including images and videos. Recently, the increasing use of cameras and developments in pattern recognition, machine learning, and biometrics analysis have been key factors in the growth of FER technology. This technology can be adopted using the YOLO algorithm for making a recommender system for music and videos, it can also be used in the medical domain to identify mood-based disorders such as depression Changes in appetite or weight, and Asthma. It is used in virtual meetings to check the interest of the person throughout the meeting. That can give us a response whether the person is interested in communication or not. There are seven basic human emotions which of them are surprise, disgust, anger, fear, happiness, and sadness. There are many face detection and emotion detection models but when compared with YOLO it gave real-time accuracy. The main objective of our study was to train our model and compare its accuracy with previous versions of the YOLO model. YOLO can generalize representations of distinct things, expanding its applicability to several different environments. For training we collected 40 images from the FER 2013 dataset and modified it according to YOLOv5 standards which is widely used for facial emotion detection and gained an mAP of 0.5%. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 212–218, 2023. https://doi.org/10.1007/978-3-031-35501-1_21
Emotion Detection Based on Facial Expression Using YOLOv5
213
2 Literature Review YOLO and relative deep learning computer vision-based models have been implemented in the past majorly for face detection and real-time object detection. This section gives a cursory look at the recent work done in the said domain. Two works [1, 2] proposed in 2018 used the YOLO model for face detection. Authors of [1] have reported an accuracy of 92.25% after training the model for 20 iterations with a learning rate of 0.0001. Whereas [2] claimed to have achieved a reduced average detection time of 0.02 seconds for 20 images extracted from the WIDER face image dataset. The authors could achieve the reported results after 30200 epochs with a 0.01 learning rate. Both works [1, 2] used a gradient descent algorithm to find an optimum error. Lu, Z., Lu, J et al [3] proposed a multi-object detection hybrid model based on YOLO and ResNet. To boost the feature extraction phase and realize effective object detection, the ResNet algorithm was included with YOLOv3. Which gave an accuracy of 75.36%. Another work [4] in 2019 proposed real-time object detection to avoid traffic and accidents. The authors claimed to detect nearly all traffic signs by the proposed detector, which also regressed precise bounding boxes for the majority of the recognized traffic signs. It was able to run the model at a rate of nearly 10 frames per second and a high MAP Performance of 92.2%. Article [5] in 2020 proposed a face recognition model with student behavior (attentive or not attentive) detection. The authors used 400 face images as a dataset which was split into training and testing sets to train the YOLOv3 model. The Image AI package was used to train the model. The authors reported 88.60% model accuracy. In the year 2020 [7] used YOLO and [8] used the YOLOv3 model for different objectives. Social distancing and face mask violation was detected by [8] using the WIDER Face dataset and reported an accuracy of 94.75%. Whereas [7] developed a real-time system that can be used to determine a person’s gender based on facial images and reported an accuracy of 84.69 % at 4000 epochs. Article [9] in 2021 proposed an improved object detection algorithm of YOLOv3 for remote sensing images, the mapped value was 5.33% higher than that of the previous YOLOV3 method. Another work [10] in 2022 proposed a model for fast object detection with distance estimation when compared with other models the mAP was 77.1%.
3 Research Methodology YOLO the original deep learning architecture was developed by [11] in 2015. It gained popularity among computer vision scientists due to YOLO’s speed, precision, and learning capacity. YOLO is a method that provides real-time object detection using neural networks. Figure 1 is an illustration of the YOLO base model which includes 24 convolutional layers (CL) and 2 fully connected neural network (NN) layers with logistic classifiers activation function. The NN layers are adaptable as per the desired task. Every CN layer is connected after performing max pooling to choose the desired features. The model is trained using PASCAL VOC 2007 dataset. YOLOv2 [12] included Darknet-19 architecture where the original 24 CL were replaced by enhanced 19 layers with an upgraded 5 × 5 max pooling layer.
214
A. Shaikh et al.
In their article [13] “YOLOv3: An Incremental Improvement,” Joseph Redmon and Ali Farhadi published the updated algorithm in 2018. Independent logistic classifiers were replaced by softmax activation functions in the YOLOv3 model. The binary cross-entropy loss function was applied during training. The Darknet-19 architecture of YOLOv2 was enhanced and transformed into the 53 convolutional layers known as Darknet-53.
Fig. 1. Darknet-53 architecture - illustration adopted from [3]
In YOLOv4 [14], two new data augmentation techniques were introduced viz Mosaic and self-adversarial training (SAT). In Mosaic data augmentation four training images were combined into one in specific ratios. The self-Adversarial Training (SAT) data augmentation method works in two stages—forward and backward. The neural network modifies the original image rather than the network weights in the first stage. The neural network is trained in the second stage to recognize an object on this altered image in a conventional manner. YOLOv5 [15] applied the PyTorch deep learning framework for the first time. YOLOv5 does not employ or create any innovative approaches, hence the official article could not be announced. It was merely YOLOv3’s PyTorch extension. According to researchers [15], YOLOv5 is faster than YOLOv4 but performs better than both YOLOv4 and YOLOv3.
4 Proposed Model Dataset Description. We have used the FER 2013 [https://www.kaggle.com/datasets/ msambare/fer2013] dataset to train our model. This dataset consists of 48 × 48 pixel grayscale images of human faces. The faces have been automatically registered so that the face is more or less centered and occupies about the same amount of space in each image. The dataset includes 3,589 test set images and the training set has 28,709 instances. The images in the dataset are further labeled into seven categories of basic human emotions viz: Angry, Happy, Sad, Surprised, disgusted, Neutral and Fearful. Experimental Setup. In our proposed work we have conducted a pilot study with 40 various emotion-based human face images adopted from the FER 2013 dataset to train the YOLOv5 model. We found that YOLOv6 and YOLOv7 are still in the maturity phases whereas the models are still in the tuning phase. We have simulated the dataset on a standard MAC Air with a 1.1 GHz processor and Intel Iris Plus Graphics The model took 54 s for training the input data with a learning rate of 0.01. The Model. In our proposed model for emotion detection, we have fine-tuned the YOLOv5 model. A human face image of size 48 × 48 pixels with its label as a text
Emotion Detection Based on Facial Expression Using YOLOv5
215
file is given as input. The proposed model architecture consists of three parts: The backbone which is CSP-Darknet for feature extraction, the Neck which consists of PANet Head for feature fusion, and the YOLO layers.
Fig. 2. Proposed Model Architecture
The bounding box for the human face image is identified using the data labeling method. The coordinates of the calculated bounding box are stored in a separate text file. The tag format of bounding box coordinates is as follows: < object-class-ID> . Figure 2 reflects the proposed model architecture. The path of our training, validation, and testing dataset with the seven emotion-labeled classes is set in the YAML files before the model is initiated for training. The proposed model included an additional 3 more convolutional layers to standard YOLOv5. With every layer down the prominent features to detect the face and emotions are forwarded. The model showed promising results after 60 epochs with a batch size of 32. The learning rate was set to 0.01. The output layer consists of seven nodes. To get reliable results we have used the Stochastic Gradient Descent optimization method and Leaky ReLU activation function in the output layer and the hidden layer. The sigmoid activation function is used in the final detection.
5 Results and Discussion The proposed model’s metric-based results are discussed in this section. Figure 3 is the output of the proposed model where we can see that the model has successfully labeled three emotions for images 1, 6, 8 and mispredicted the emotions for image 2, 3, 4, 5, 7.
216
A. Shaikh et al.
Fig. 3. Proposed model output.
Our model has reported an accuracy of 50% with the pilot run of 40 images in standard computational environments. Figure 3 plot of Precision vs Recall shows that our model has a balanced prediction for all the detected emotions. We can further infer that the fearful emotion can be recognized very well; whereas, the happy emotion might be misrecognized. The F1 curve shown in Fig. 5 reflects that the confidence of the proposed model is high for most of the emotions other than fearful and neutral emotions (Fig. 4).
Fig. 4. Proposed model precision-Recall curve.
Fig. 5. F1 curve.
High precision is correlated with a low false positive rate, while big recall is correlated with a low false negative rate. A high area under the curve denotes both high recall and high precision. The above Fig. 6 displays the loss function where box_loss defines the regression loss of the bounding box and obj_loss defines binary cross entropy and cls_loss means cross entropy. Precision measures how much bbox(bounding box) predictions are correct, and recall measures the amount of the true bbox that was accurately predicted. We have compared our work with four other works in a similar domain. Authors of [1, 6] have worked only on face detection and [3] on multi-object detection using older versions of YOLO architecture. We have detected faces with emotions at an accuracy of 50% using YOLOv5 (Table 1).
Emotion Detection Based on Facial Expression Using YOLOv5
217
Fig. 6. Proposed model’s Loss Function Table 1. Result Comparison. Ref. No
Model and objective
Result in accuracy
[1]
Face detection using YOLO
92.2%
[3]
Multi object detection using YOLOv3 RestNet
75.36%
[6]
Face detection and Recognition using YOLO
78.2%
Proposed model
Face and emotion detection using YOLOv5
50%
6 Conclusion In this research, we suggested a real-time approach based on YOLO for human emotion detection. This study used the YOLOv5 architecture to recognize emotions. FER 2013 dataset was used to evaluate performance. Results indicate that YOLOv5 outperforms several approaches in terms of recognition rate. Yolo v5’s emotion detection approach is more robust and has a faster detection time than the conventional algorithm, which can lower the miss rate and error rate.
7 Future Scope In future we will be using an advanced dataset and state-of-art experimental setup, which will help us in determining its performance on a larger dataset. We will be comparing this model’s performance with its successors YOLOv6 and YOLOv7 which will give us a better understanding of our model. We claim that the proposed model can be used in various fields to detect human emotions through facial behavior in a real time environment.
References 1. Garg, D., Goel, P., Pandya, S., Ganatra, A., Kotecha, K.: A deep learning approach for face detection using Yolo. 2018 IEEE Punecon (2018) https://doi.org/10.1109/punecon.2018.874 5376
218
A. Shaikh et al.
2. Yang, W., Jiachun, Z.: Real-time face detection based on Yolo. In: 2018 1st IEEE International Conference on Knowledge Innovation and Invention (ICKII) (2018). https://doi.org/10.1109/ ickii.2018.8569109 3. Lu, Z., Lu, J., Ge, Q., Zhan, T.: Multi-object detection method based on Yolo and ResNet Hybrid Networks. In: 2019 IEEE 4th International Conference on Advanced Robotics and Mechatronics (ICARM). (2019) https://doi.org/10.1109/icarm.2019.8833671 4. . Rajendran, S.P., Shine, L., Pradeep, R., Vijayaraghavan, S.: Real-time traffic sign recognition using yolov3 based detector. In: 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT) (2019). https://doi.org/10.1109/ icccnt45670.2019.8944890 5. Mindoro, J.N., Pilueta, N.U., Austria, Y.D., Lolong Lacatan, L., Dellosa, R.M.: Capturing students’ attention through visible behavior: a prediction utilizing yolov3 approach. In: 2020 11th IEEE Control and System Graduate Research Colloquium (ICSGRC) (2020). https:// doi.org/10.1109/icsgrc49013.2020.9232659 6. Mehta, K., Bhinge, A., Deshmukh, A., Londhe, Aa.: Facial detection and recognition among heterogenous multi object frames. Int. J. Eng. Res. V9(01) (2020). https://doi.org/10.17577/ IJERTV9IS010175 7. E.K., V., Ramachandran, C.: Real-time gender identification from face images using you only look once (YOLO). In: 2020 4th International Conference on Trends in Electronics and Informatics (ICOEI) (48184) (2020). https://doi.org/10.1109/icoei48184.2020.9142989 8. Bhambani, K., Jain, T., Sultanpure, K.A.: Real-time face mask and social distancing violation detection system using Yolo. In: 2020 IEEE Bangalore Humanitarian Technology Conference (B-HTC) (2020). https://doi.org/10.1109/b-htc50970.2020.9297902 9. Wu, K., Bai, C., Wang, D., Liu, Z., Huang, T., Zheng, H.: Improved object detection algorithm of Yolov3 Remote Sensing Image. IEEE Access 9, 113889–113900 (2021). https://doi.org/ 10.1109/access.2021.3103522 10. Vajgl, M., Hurtik, P., Nejezchleba, T.: Dist-YOLO: fast object detection with distance estimation. Appl. Sci. 12(3), 1354 (2022). https://doi.org/10.3390/app12031354 11. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016). https://doi.org/10.1109/cvpr.2016.91 12. Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017). https://doi.org/10.1109/cvpr.201 7.690 13. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv.org (2018). https://arxiv. org/abs/1804.02767. Accessed 19 Oct 2022 14. Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: Yolov4: Optimal Speed and accuracy of object detection. arXiv.org. (2020). https://arxiv.org/abs/2004.10934v1. Accessed 19 Oct 2022 15. Nepal, U., Eslamiat, H.: Comparing Yolov3, Yolov4 and Yolov5 for autonomous landing spot detection in faulty uavs. Sensors 22(2), 464 (2022).https://doi.org/10.3390/s22020464
LSTM-Based Model for Sanskrit to English Translation Keshav Mishra(B) , Mahendra Kanojia , and Awais Shaikh Sheth L.U.J and Sir M.V. College, Mumbai, Maharashtra, India [email protected]
Abstract. One of the earliest languages of the Asian Subcontinent, Sanskrit, was no longer widely spoken at approximately 600 B.C. [1] Language study and use of human communication languages to interact with machines is a prominent research domain in Natural Language Processing [NLP]. The Sanskrit language being the oldest, we found that there is limited work done to include Sanskrit and its translation using NLP. In this study, we use NLP and Deep learning techniques to translate Sanskrit to English utilizing Neural Machine Translation (NMT) methods based on Long Short Term Memory [LSTM] Encoder-Decoder Architecture. We have used a corpus dataset to train our model and reported 30% accuracy using the Bhagavad Gita dataset and 53% accuracy using the Bible dataset which can be considered a good standard. As we increase the number of lines in the dataset the Model gives better accuracy. Our Model performs better than earlier Models used to translate Sanskrit by different researchers, and they will also aid the linguistic community to save the time-consuming process of Sanskrit to English translation. Keywords: Sanskrit Translation · NLP · Neural Machine Translation · LSTM
1 Introduction The Sanskrit language is one of the oldest languages in the history of the linguistic world [1]. Linguistics has worked a lot to formalize the Sanskrit language. Panini the ancient Indian linguistic scholar had formalized the grammar of the Sanskrit language which turned out to be the basis of all the Sanskrit language processing techniques. Due to the emergence of globalization and the rise of English as the most widely spoken language worldwide, translations between languages are now necessary. Since regional languages are still widely used in developing areas, translating material from English into those languages can help people access it. One of the most challenging challenges in Sanskrit language natural language processing is machine translation as Sanskrit doesn’t follow noun-phrase models. Sanskrit machine translation into other languages is difficult because of translation divergence, ambiguity caused by various meanings, and a lack of parallel linguistic data [1, 2]. Numerous historical and cultural documents were initially written in Sanskrit, the majority of which have not been translated into other languages. Examples include the Bhagavad Gita, Ramayana, Mahabharata, and Hindu Literature Vedas [2]. Although Sanskrit plays a significant role in Indian culture and history, nothing has been translated into or out of it whereas many different natural languages are © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 219–226, 2023. https://doi.org/10.1007/978-3-031-35501-1_22
220
K. Mishra et al.
accessible for this information. The speakers of various languages must use translation services or pick up the other language to access this material. Since not everyone can learn numerous languages, translation becomes necessary. In such a situation, machine translation arises as a significant area of study in the field of computational linguistics that can provide access to a large amount of information in a brief time and at a reasonable price. The ideal of a flawless full-scale machine translation system remains an intellectually demanding endeavor, and automatic translation of natural languages is one of the most intricate and extensive applications of computational linguistics. In this research, we utilize Neural Machine Translation (NMT) methods based on Long Short Term Memory [LSTM] Encoder-Decoder Architecture to translate Sanskrit to English. We also find that as we increase the number of line Model trains better and give better accuracy.
2 Literature Review In 2018[7] Sitender Malik and Seema Bawa proposed a fully automatic Machine translation system for the conversion of Sanskrit to a Universal Networking Language (UNL). They proposed that their system used two databases consisting of 300 rules for analysis purposes and another of 1500 rules for the generation of Universal Networking language [UNL]. They selected 500 Sanskrit Sentences that cover UNL from various sources. Their system achieves a 0.85 BLEU score and 93.18% accuracy. In 2019, [2] proposed a neural machine translation-based model for Sanskrit to English. Used Support Vector Machines (SVM) classifier for finding English words corresponding to Sanskrit words which resulted in 10% more accuracy translations compared to the Naive Bayes Classifier. The SVM was implemented after the data was modeled using a Recurrent Neural Network (RNN). The trained Recurrent Neural Network allowed them to handle multi-featured input which is part of Sanskrit sentence grammar. Each layer of RNN was implemented for multiple steps of processing. Sanskrit Dictionary and classifier were used together and the translation accuracy was enhanced. Advanced translation techniques were used in 2020 [1, 3] to achieve the Sanskrit translation. The supervised methodology was used by [1] on parallel allied English to Sanskrit data to test multiple machine translation approaches. Transfer Learning and Reinforcement learning were leveraged to work with monolingual data for Neural Machine Translation (NMT). The experiment reported an 18.4 BLEU score with the Transfer learning Method. The proposed work [3] used a corpus-based Machine Translation System with Deep Neural Network for Sanskrit to Hindi Translation. With huge amounts of Sanskrit to Hindi word datasets, the Machine Translation System achieves a better BLEU Score compared to the rule-based approach reported. It was observed that using bilingual and monolingual corpus the BLEU score advances. Another work [5] in 2020, where Sanskrit-to-English machine translation was achieved using a direct and rule-based hybrid approach. A POS tagger, a CYK parser, and a parsing algorithm were utilized in direct machine translation (DMT) and rule-based machine translation (RBMT). The suggested system received BLEU scores of 0.7606, 3.63 for fluency, and 3.72 for adequacy. The overall efficiency of the developed system was 97.8%. Jivnesh Sandhan and Digumarthi Komal [9] proposed Evaluating the Neural Word Embeddings model for Sanskrit in 2021. In
LSTM-Based Model for Sanskrit to English Translation
221
this work, the authors investigated the effectiveness of word embeddings. To simplify systematic experimentation and evaluation, they had to categorize word embeddings into broad groups and rate them on four intrinsic tasks. The authors examined the effectiveness of embedding techniques for Sanskrit together with different linguistic difficulties that were initially offered for languages other than Sanskrit. The corpus was formed from the Digital Corpus of Sanskrit (DCS), scraped data from Wikipedia, and Vedabase corpus. Word embedding was used to transfer knowledge learned from readily available unlabelled data and improved the task-specific performance of data-driven approaches. In 2022, Aditya Vyawahare et al. [8] proposed a Neural Machine Translation model for Kannada to Sanskrit translation. The bilingual dataset was used to train various translation Seq2Seq models such as LSTM, bidirectional LSTM, and Conv2Seq. The authors also trained state-of-the-art transformers from scratch and fine-tuned pre-trained models. The BLEU score for LSTM was 0.8085, BILSTM was 0.8059 and the Transformer model was 0.5551. The above review infers that the researchers had moved to advance NLP techniques hybrid with deep learning and transfer learning methodology. Little work was found where the Sanskrit to English translation is done using an LSTM encoder-decoder-based model.
3 Research Methodology Machine translation involves converting the source language to the destination language. Machine translation techniques include rule-based algorithms, statistical machine translation [4], and neural machine translation. Neural Machine Translation (NMT) [1] is the most adapted methodology in advanced Natural language processing. NMT depends on the conditional probability of translating a source language input to a target language output. NMT requires less knowledge of the structure source as well as the target language. NMT can be configured to form an encoder-decoder architecture [2]. The task of the encoder is to combine and generalize the semantics of the input language and represent it in a vector. The decoder inputs the vector generated by the encoder and translates the vector to the target language. The encoder-decoder architecture is non-iterative in nature. To deep train the encoder-decoder-based NMT we can use Recurrent Neural Network deep learning architecture. Further, the memory aspect is added in the RNN architecture to preserve the past information which the machine has learned in the sequence analysis. This model is known as RNN with Long-Short Term Memory (LSTM) [2]. As has been illustrated in Fig. 1 the LSTM architecture includes information processing cell (C), input (I), output (O), and forget (F) memory gates. LSTM can analyze SeqToSeq data using these cells and memory gates. The cell includes sigmoid and tanh activation functions. The input gate is responsible to identify the amount of information to be used to change the memory whereas the forget gate balances the total sequence stored in the cell by forgetting the unreferred information. Finally, the output is predicted by the output gate.
222
K. Mishra et al.
Fig. 1. LSTM Model adapted from Medium.com [11]
4 Proposed Model We have used Sanskrit and English corpus datasets adapted from open source GitHub Repository [10]. The repository includes Sanskrit-to-English parallel data and Sanskritto-Hindi parallel data. The Sanskrit and English sentences are included in corpus data extracted from Indian Epics such as The Ramayana (12733), The Rigveda (13453), The Bhagavad Gita (701), The Bible (5294), and The Manu (2193) Fig. 2.
Fig. 2. Proposed Model Architecture
Data wrangling techniques are implemented before training the LSTM model. The raw data text file containing Sanskrit and English sentences is processed to remove stop words and white spaces, this is achieved using regular expressions. The extracted text is
LSTM-Based Model for Sanskrit to English Translation
223
converted to lowercase and a further line tokenization method is used followed by word tokenization to extract the principal words from the sentences. LSTM needs number information to function, we have changed our string input into a numerical list. We have used the Tokenizer function offered by the Keras-preprocessing package to accomplish this. In sequence-to-sequence models, all input sequences must be of equal length. To make the sequence the same length, we have padded the shorter tokens with more “0s”. These padded words are sorted and given as input to the LSTM model. We have added English sentences with ‘start’ and ‘end’ tags. This indicated the decoder to determine when to stop decoding and where to begin. The wrangled data is split in the ratio of 90:10 for training and validation. The LSTM model is trained using batch processing of size 128 and epochs of 400. The encoder receives the Sanskrit tokens and the decoder receives English tokens. These are the parallel data tokens. The output node of LSTM is attached to the SoftMax function for the translation prediction. Root mean square optimizer with categorical cross-entropy loss functions gave the best possible accuracy. The model gave the best results after 400 epochs with a learning rate of 0.01.
5 Results and Discussion The proposed model is based on the Deep Learning SeqToSeq model LSTM. It is one of the early proposed models for the translation of the Sanskrit Language into the English Language. After using Bhagavad Gita dataset The training time of the model is 7 min and was around approx. 150 ms per epoch on the GPU. The model training loss stabilized at 400 epochs with a loss value below 0.7 as seen in Fig. 3.
Fig. 3. Epoch Vs Loss graph using Bhagavad Gita Dataset
After using the Bible dataset the training time of the model is 35 min and was around 135 ms per epoch on the GPU. The model training loss stabilized at 400 epoch with a loss value below 0.8 as seen in Fig. 4. We can visualize the loss difference in both the training and validation phases as epochs increase and the loss rate gradually decreases.
224
K. Mishra et al.
Fig. 4. Epoch Vs Loss graph using Bible Dataset
Figure 5 represents the epoch vs accuracy graph with Bhagavad Gita Dataset, We can infer that at epoch 400 the model achieved 30% accuracy with the Bhagavad Gita dataset. As this is a pilot study with a smaller number of data, we achieved the said accuracy, hence the accuracy of the model can be further increased by increasing the training set.
Fig. 5. Epoch Vs Accuracy graph using Bhagavad Gita Dataset
Figure 6 represents the epoch vs accuracy graph with the Bible dataset, sn. The above result infers that the Bhagavad Gita dataset contains only 701 lines and the Bible dataset contains 5294 as the number of data lines increases Model performs better and gives better accuracy (Table 1).
LSTM-Based Model for Sanskrit to English Translation
225
Fig. 6. Epoch Vs Accuracy graph using Bible Dataset
Table 1. Result Comparison. Dataset Name
No. of Lines
Execution Time
Result in accuracy
Bhagavad Gita
701
7 min
30%
Bible
5294
35 min
53%
6 Conclusions and Future Scope In this paper, we translated Sanskrit into the English language where we looked at techniques that have been used previously. With the LSTM Encoder-Decoder architecture, we first defined a Cleaned the Data. Additionally, we enhanced the LSTM model using the SoftMax activation process, and the accuracy was somewhat gained. With the LSTM approach, we saw an accuracy of 53%. If we increase the number of lines in the data better results can be achieved. Several unauthorized repositories have worked on and published the results, but we wouldn’t like to say explicitly that our exploration of the LSTM Encoder-Decoder architecture is the first. To enhance the quality of the trained models, we would need to integrate additional data in the future. We intend to expand on this work to include extrinsic evaluation tasks. Adding GUI in the Future can make this model more attractive and useful.
References 1. Punia, R., Sharma, A., Pruthi, S., Jain, M.: Improving neural machine translation for SanskritEnglish. ACL Anthology (2020). https://aclanthology.org/2020.icon-main.30/ 2. Koul, N., Manvi, S.S.: A proposed model for neural machine translation of Sanskrit into English. Int. J. Inform. Technol. 13(1), 375–381 (2019). https://doi.org/10.1007/s41870-01900340-8
226
K. Mishra et al.
3. Singha, M., Chenab, I., Kumara, R.: (PDF) corpus-based machine translation system with deep. Researchgate (2019). https://www.researchgate.net/publication/340705524_Cor pus_based_Machine_Translation_System_with_Deep_Neural_Network_for_Sanskrit_to_ Hindi_Translation 4. Deshpande, D.S., Kulkarni, M.N.: A review of various approaches in machine translation for the Sanskrit language (2020). https://www.researchgate.net/publication/347909824_A_R eview_on_various_approaches_in_Machine_Translation_for_Sanskrit_Language 5. Sitender, Bawa, S.: A Sanskrit-to-English machine translation using hybridization of direct and rule-based approach - neural computing and applications. Neural Comput. Appl. 33, 2819–2838 (2021) https://doi.org/10.1007/s00521-020-05156-3 6. Sandhan, J., Adideva, O., Komal, D., Behera, L., Goyal, P.: Evaluating neural word embeddings for Sanskrit. arXiv.org. (2021) https://arxiv.org/abs/2104.00270 7. Sitender, Bawa, S.: SANSUNL: a Sanskrit to UNL enconverter system. IETE J. Res. 67(1), 117–128 (2018)https://doi.org/10.1080/03772063.2018.1528187 8. Vyawahare, A., Tangsali, R., Mandke, A., Litake, O., Kadam, D. (2022). Pict@dravidianlangtech-acl2022: neural machine translation on Dravidian languages. In: Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages (2022) https://doi.org/10.18653/v1/2022.dravidianlangtech-1.28 9. Sitender, Bawa, S.: Sanskrit to universal networking language EnConverter system based on deep learning and context-free grammar. Multimedia Syst. (2020)https://doi.org/10.1007/s00 530-020-00692-3 10. https://github.com/priyanshu2103/Sanskrit-Hindi-Machine-Translation/tree/main/parallelcorpus/sanskrit-english 11. https://medium.com/@vinayarun/from-scratch-an-lstm-model-to-predict-commodity-pri ces-179e12445c5a
Alzheimer Disease Investigation in Resting-State fMRI Images Using Local Coherence Measure Sali Issa1(B) , Qinmu Peng2 , and Haiham Issa3 1
Electrical Information of Science and Technology, Hubei University of Education, Wuhan, China [email protected] 2 Electrical Information and Communication, Huazhong University of Science and Technology, Wuhan, China 3 Electrical Engineering, Zarqa University, Zarqa, Jordan
Abstract. In this paper, an advanced voxel-based coherence measure is proposed for Alzheimer’s disease detection and investigation. A public rs-fMRI dataset, including healthy elderly people against Alzheimer and mild cognitive impairment patients, is used for evaluation purpose. Firstly, several sequential pre-processing steps were performed for removing noises, then, the Local Coherence (LCOR) measure of the full frequency band was obtained within the first-level and group-level analysis. Finally, the proposed study accurately investigate the effect of LCOR connectivity, and discovered that left Occipital Pole, left Cerebelum, right Superior Frontal Gyrus, in addition to left and right Caudate regions have a prominent role in Alzheimer’s detection. Keywords: rs-fMRI · Alzheimer Disease Group-Level Analysis
1
· Local Coherence (LCOR) ·
Introduction
Alzheimer disease (AD) is a type of dementia, that starts with episodic memory and cognitive functions loss, followed by language and visuospatial skills deficiencies, and behavioural disorders in most cases. Actually, many people are yearly suffered from Alzheimer, which represents 60–70% of all dementia cases [1]. The importance of early diagnosis of Alzheimer disease could save other people’s lives which may alleviate its dangerous symptoms [1,2]. Although neuroimaging techniques such as fMRI images are a powerful tool for diagnosis, many researches and biomedical scientists are still competing for detecting Alzheimer in its early stages [2,3]. On the other hand, several artificial and computerized methods are investing the resting state fMRI neuroimages for Alzheimer detection and investigation. For example, Yang et al. (2020) [4] proposed a spatiotemporal Network c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 227–236, 2023. https://doi.org/10.1007/978-3-031-35501-1_23
228
S. Issa et al.
Switching Rate (stNSR) parameter which is calculated using Pearson correlation with the Louvain technique. They used a database including 171 subjects; 61 Alzheimer disease patients and 110 healthy people; from Xuanwu Hospital, in Beijing, China for verification purpose. Compared with the traditional calculations, stNSR parameter values provide a clear difference among patients and normal people in left calcimine fissure, surrounding cortex, left Lingual gyrus, left cerebellum, left Para hippocampal gyrus, left temporal and superior temporal gyrus. In Shi et al. (2020) research [3], they used Independent Component Analysis (ICA) for voxels activity estimation in e Alzheimer’s Disease Neuroimaging Initiative (ADNI) public database. ICA values were calculated for both patients and normal groups. Their results show that ICA parameters of the prefrontal lobe, prefrontal and parietal lobes have the main role in recognizing Alzheimer’s disease. Baninajjar et al. (2020) [5] obtained the Canonical Correlation Analysis (CCA) analysis for the fusion between resonance imaging (MRI) and functional MRI (fMRI) images from ADNI public database. Sadiq et al. (2021) [6] calculated Pearson correlation analysis followed by Relief feature selection method for recognizing Alzheimer’s disease effects in two public rs-fMRI datasets. Both Baninajjar and Sadiq achieved acceptable results in Alzheimer diagnosis application. Mascali et al. (2015) [7] investigated the effect of Functional Connectivity (FC) and Amplitude Low Frequency Fluctuation (ALFF) measures in fMRI images of Alzheimer patients. A public dataset is used, which includes Alzheimer patients, mild cognitive impairment and healthy people groups. The two measures were recorded for three different frequency bands: 0.01–0.027 Hz; 0.027– 0.073 Hz; and the full frequency band of 0.01–0.073 Hz. They found that FC and ALFF parameters are negatively correlated for Alzheimer patients in anterior and posterior cingulate cortex within all frequency bands, temporal cortex within full and 0.01–0.027 Hz bands, as well as, subcortical regions within full and 0.027–0.073 Hz bands. Sadiq et al. (2021) [8] calculated the combination of Amplitude Low Frequency Fluctuation (ALFF) and Fractional ALFF (fALFF) together from Blood Oxygen Level Dependent (BOLD) signal of fMRI data at 0.01–0.1 Hz frequency band. Despite of the powerful feature of fMRI in detecting this kind of brain disorder, working with neuroimages is not easy from different perspectives. Most importantly is the mass noise in fMRI images which require several sequential steps of pre-processing and denoising operations by proposed toolboxes such as CONN tool [3,7]. Another point is the enormous dimensional size of fMRI images which consumes a wide memory space, as well as, waits too much time in analysis. Consequently, most of studies are willing to provide robust and accurate results to overcome the complexity of fMRI calculations and analysis[6,8]. At the same time, the need for early and/or accurate diagnosis of Alzheimer using fMRI images is urgent, and considered as a competitive research field [1,2].
Alzheimer Disease Investigation in Resting-State fMRI Images
229
However, in this paper, the Local Coherence (LCOR) parameters are calculated for a public database [9] including Alzheimer Disease (AD) patients, Mild Cognitive Impairment (MCI) patients, and Healthy subjects (HC). Analysis of first-level and group-level processing is obtained to investigate the effect of Alzheimer disease in brain regions. The overall contributions are summarized as follows: 1. An effective LCOR analysis is proposed to simulate and investigate the changes of voxel-level functional connectivity in AD patients against healthy elderly people. 2. Three different group-level comparisons were implemented to detect all possible positive/negative LCOR changes among HC, AD, and MCI groups. 3. This study accurately declares the affected brain regions with their effect size and MNI coordinates in AD and MCI cases. Furthermore, it interprets the link/relation of these brain affected regions with common AD symptoms. The remaining part of the paper is organized as follows; Sect. 2 provides a current and related literature review. Section 3 presents the proposed methodology of pre-processing fMRI images and using the local coherence feature. Section 4 illustrates the tested analysis, and results with further discussion, while the final section lights up the overall conclusion, and proposes possible directions for future work.
2 2.1
Methodology Data Acquisition and Pre-processing
A public resting-state fMRI dataset for thirty subjects [9], ten healthy elderly subjects participated, in addition to twenty patients diagnosed with Alzheimer Disease (AD) and Mild Cognitive Impairment (MCI). All participating subjects submitted written consent prior the MR session test, and the collected database is confirmed by the ethics committee of Santa Lucia Foundation. A German 3T MRI Magnetom Allegra, Siemens system device was used for data acquisition. The resting-state fMRI images were scanned using Echo Planar Imaging (EPI) with TR of 2080 ms, TE of 30 ms, a total of 32 axial slices parallel to AC-PC plane, a matrix size of 64 × 64 with 3 × 3 mm2 resolution, a slice thickness of 2.5 mm, 50% skip, and flip angle of 70◦ . The test session lasted for 7 min and 20 s, where subjects were informed to close their eyes, relax and refrain from falling asleep. A total time period of 12 min of T1-weighted three-dimensional equilibrium Fourier transform scan [10] was implemented for all participated subjects for anatomical localization and Grey Matter (GM) volumetry with TR of 1338 ms, TE of 2.4 ms, TI of 910 ms, flip angle of 15◦ , matrix size of 256 × 224 × 176, FOV value of 256 × 224 mm2 , and slice thickness of 1 mm. Furthermore, Fluid Attenuated Inversion Recovery (FLAIR) images with TR of 8170 ms, TE of 96 ms, and TI of 2100 ms were also collected from all subjects to exclude any remarkable sign of Cerebro-vascular disease [11].
230
S. Issa et al.
Resting-state fMRI images were preprocessed against noises and distortions using CONN tool box [12]. The overall sequential pre-processing steps are summarized as follows: – The first four images were deleted. – All brain images were corrected and realigned to the first one. – EPI image mean, from the realignment step, was taken as a source image for transformation parameters estimation. – All images were normalized into the Montreal Neurological Institute (MNI) space coordinates, where voxel size of 2 × 2 × 2 mm3 was used. – Brain images were smoothed using half maximum (FWHM) Gaussian kernel of 8 × 8 × 8 mm3 full width. – The six parameters of realignment and the first five eigenvectors of the PCA decomposition of EPI averaged over cerebrospinal fluid (CSF) and white matter (WM) were regressed out. – Finally, fMRI images were filtered using the full-band of 0.01–0.073 Hz. 2.2
Local Coherence (LCOR) Parameter
Voxel-level network measures illustrate the functional connectome between each voxel pair in fMRI brain images. Some of these measures address the properties of voxels pairs, represent these properties in each individual person (first-level analysis), and estimate the properties of group-level analysis [13]. Actually, intrinsic connectivity, local and global Correlations are common examples for this kind of network measures [14]. Local Coherence (LCOR) measures the local coherence for each voxel, and is estimated by the intensity connection between each voxel and its neighboring area [12]. LCOR measure is calculated by the average of correlation coefficients between each voxel and its neighboring voxels region [12]. Equation 1 below reforms LCOR measure calculation [12]: w(x − y)r(x, y)dy (1) LCOR(x) = w(x − y)dy where LCOR(x) is the local coherence matrix of x voxel; y is the neighboring voxel; r is the voxel-to-voxel correlation matrix for each pair of voxels (Eq. 3); w is the weighting function and obtained from the following equation [12]: |z|2
w(z) = e− 2σ2
(2)
where w(z) is the weighting isotropic Gaussian function; and sigma is the local neighborhoods region size. Voxel-to-voxel correlation matrix (r) is defined using orthogonal Singular Value Decomposition (SVD) components as follows [12,15]: r(x, y) =
m k=1
σk2 Qk (x)Qk (y) + m (x, y)
(3)
Alzheimer Disease Investigation in Resting-State fMRI Images
231
Qk (x) = |min
k 2 (x, y)dxdy
(4)
where r is the voxel-to-voxel correlation matrix; Q is the orthogonal spatial basis for each m maximal-variance spatial components or eigenvectors of r; and sigma is the eigenvalue set values of r matrix characterizing the variance of these components. For computation simplicity, m value is used to be lower than matrix r rank, which known as subject-level dimension reduction. Besides the dimension reduction simplicity, it is necessary for subject-level denoising, and minimizing the potential differences in effective degrees of the residual BOLD signal among subjects [12,15].
3
Results and Discussion
All experiments and analyses were prepared using CONN Matlab-based toolbox [12]. First, fMRI images were pre-processed and denoised as discussed in Sect. 2 within the full band frequency of 0.008–0.09 Hz. Three groups of AD, MCI and HC were created, and the connectivity Local Coherence (LCOR) analysis was computed individually for each subject, where the dimensionality reduction is 64, and kernel size of 25 mm. Figure 1 below presents LCOR connectivity effects for HC, MCI and AD subjects. Yellow (+1) and magenta (−1) colors refers to positive and negative effects, respectively. While, the numbers next to the images refer to z coordinates in Montreal Neurological Institute (MNI) space. For comparison purposes, a group-level analysis was implemented for the three groups, taking into consideration the following scenarios: 1. The effect of LCOR connectivity of HC group against MCI and AD patients, respectively: In this scenario, the statistical analysis checks the noticeable positive/negative functional connectivity of HC group over other MCI and AD patients. Figure 2 presents the observable LCOR changes of normal people against MCI and AD patients, while Table 1 provides the complete information of the affected brain regions in MNI coordination domain. According to the deductible results, the Occipital Pole in the left region of (−44 −90 +12) MNI coordinates has the lowest LCOR values compared to MCI and AD patients. It is located in the posterior portion of the occipital lobe, receives a dual blood supply, and contributes to some parts of visual functions [16]. Both Vermis 7 and Angular Gyrus regions have high LCOR values in healthy elderly people. Angular Gyrus of (−46 −36 +52) MNI coordinates is lying in the posteroinferior region of the parietal lobe, and is related to transferring visual signals to Wernicke’s area, as well as, its role in language, spatial cognition, memory retrieval, and attention [17]. On the other hand, Vermis
232
S. Issa et al.
Fig. 1. Color-coded map of first-level LCOR analysis for three different subjects from HC, MCI and AD groups, respectively.
Alzheimer Disease Investigation in Resting-State fMRI Images
233
Fig. 2. LCOR parameter positive/negative effect of HC group against MCI and AD patients.
7 of (−08 −58 −18) MNI coordinates is located within the Cerebellar Cortex region, and is responsible of eye movements, fine tuning body and limb movements [18]. 2. The effect of LCOR connectivity of HC group against AD patients: In this scenario, the statistical analysis checks the noticeable positive/negative effect of HC group over other AD patients. Figure 3 presents the observable LCOR changes of normal people against AD patients only, while Table 2 provides the complete information of the affected brain regions in MNI coordination domain. The Occipital Pole region of (−44 −90 +12) MNI coordinates, once more, has a noticeable decrease in LCOR paremeter. Additionally, MNI coordinates of (−08 −54 −20) within the Cerebelum region, which is responsible for fine and balanced movements, posture, as well, the motor learning [19]. 3. The difference in LCOR connectivity values between HC and AD patients: In this scenario, the statistical analysis checks the difference value of LCOR parameter of HC group over AD patients (HC < AD). Figure 4 presents LCOR parameter differences between normal people and AD patients, while Table 3 provides the full information of the affected brain regions in MNI coordination domain. It is clear that there is an obvious difference in LCOR parameter connectivity between normal people and AD patients in three brain regions: Caudate of (−16 −14 +30) and (+10 +22 +18) MNI coordinates for both left and right brain hemisphere, respectively; and Superior Frontal Gyrus of (+00 +28 +52) MNI coordinates. Actually, Caudate is included in the Basal Ganglia region, and has a main role in human motor processes, procedural and associative learning, and inhibitory action control [20]. On the other hand, the Superior Frontal Gyrus forms approximately one-third of the frontal lobe, and responsible for skeletal movements, speech control, emotion expressions, and metal actions [21].
234
S. Issa et al.
Table 1. The positive/negative effect in brain regions of HC group against MCI and AD patients. Brain Region Hemisphere voxels MNI coordinates Peak t-value Occipital Pole
L
2255
−44 −90 +12
Vermis 7
—
191
−08 −58 −18
21.31
575
−46 −36 +52
19.04
Angular Gyrus L
−51.72
Fig. 3. LCOR parameter positive/negative effect of HC group against AD patients only. Table 2. The positive/negative effect in brain regions of HC group against AD patients only. Brain Region Hemisphere
voxels
MNI coordinates
Peak t-value
Occipital Pole
L
2204
−44 −90 +12
−48.61
Cerebelum 6
L
945
−08 −54 −20
23.38
Table 3. The differences of LCOR parameters in brain regions between HC and AD patients. Brain Region
Hemisphere
voxels
MNI coordinates
Peak t-value
7
−16 −14 +30
Superior Frontal Gyrus R
66
+00 +28 +52
−5.82
Caudate
4
+10 +22 +18
−5.52
Caudate
L R
−6.39
As a summary, the achieved results prove analysis’s influential role in distinguishing normal people from AD and MCI patients. Occipital Pole has a highly negative LCOR values in healthy elderly subjects against AD and MCI patients. Where right Superior Frontal Gyrus, and both left and right Caudate regions have a noticeable difference in LCOR measure between healthy elderly and AD patients. Considering that Compared with the current literature review, this
Alzheimer Disease Investigation in Resting-State fMRI Images
235
Fig. 4. LCOR parameter differences between HC and AD patients.
study simulates and interprets the common symptoms of AD disease such as mental action problems, movement unbalance, and action control.
4
Conclusion
This study proves the role of LCOR connectivity measure in Alzheimer disease detection and investigation. The affected brain regions with their exact corresponding MNI coordinates are given for AD investigation. Furthermore, main interpretations for common AD symptoms are provided and linked to the infected brain regions in AD patients. For future work, it is better to apply the proposed investigation method using more fMRI datasets. More analysis improvement and machine learning techniques are required to simulate the exact difference between MCI and AD disorders.
References 1. Silva, M., Loures, C., Alves, L., Souza, L., Borges, K., Carvalho, M. : Alzheimer’s disease: risk factors and potentially protective measures. J. Biomed. Sci. 26(33) (2019) 2. Kanaga, P., Mohamed, A., Naleen, J., Logesh, E.: Early detection of alzheimer disease in brain using machine learning techniques. In: International Conference on Smart Structures and Systems (ICSSS). IEEE, Chennai (2022) 3. Shi, Y., Zeng, W., Deng, J., Nie, W., Zhang, Y.: The identification of Alzheimer’s disease using functional connectivity between activity voxels in resting-state fMRI data. Adv. Intell. Technol. Dementia, IEEE 4 (2020) 4. Yang, F., Li, Y., Han, Y., Jiang, J.: Use of multilayer network modularity and spatiotemporal network switching rate to explore changes of functional brain networks in Alzheimer’s disease. In: Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE, Montreal (2020)
236
S. Issa et al.
5. Baninajjar, A., Zadeh, H., Rezaie, S., Nejad, A.: Diagnosis of Alzheimer’s disease by canonical correlation analysis based fusion of multi-modal medical images. In: The 8th IEEE International Conference on E-Health and Bioengineering (EHB). IEEE, Web Conference, Romania (2020) 6. Sadiq, A., Yahya, N., Tang, T.: Diagnosis of Alzheimer’s disease using Pearson’s correlation and ReliefF feature selection approach. In: International Conference on Decision Aid Sciences and Application (DASA). IEEE, Sakheer (2022) 7. Mascali, D., et al.: Intrinsic patterns of coupling between correlation and amplitude of low-frequency fMRI fluctuations are disrupted in degenerative dementia mainly due to functional disconnection. PLoS ONE 10(4), 1–18 (2015) 8. Sadiq A., Yahya N., Tang T.: Classification of Alzheimer’s disease using low frequency fluctuation of rs-fMRI signals. In: International Conference on Intelligent Cybernetics Technology and Applications (ICICyTA). IEEE, Bandung (2022) 9. Mascali, D., et al.: Resting-state fMRI in dementia patients. Harvard Dataverse (2015). https://doi.org/10.7910/DVN/29352 10. Deichmann R., Schwarzbauer C., Turner R.: Optimisation of the 3D MDEFT sequence for anatomical brain imaging: technical implications at 1.5 and 3 T. Neuroimage 21(2), 757–767 (2004) 11. Serra, L., Giulietti, G., Cercignani, M., Span` o, B., Torso, M., Castelli, D., et al.: Mild cognitive impairment: same identity for different entities. J. Alzheimers Dis. 33(4), 1157–1165 (2013) 12. Whitfield, S., Nieto, A.: CONN: a functional connectivity toolbox for correlated and anticorrelated brain networks. Brain Connect. 2(3), 125–141 (2012) 13. Network measures, CONN Toolbox. https://web.conn-toolbox.org/fmri-methods/ connectivity-measures/networks-voxel-level. Accessed 19 Oct 2022 14. Deshpande, G., LaConte, S., Peltier, S., Hu, X.: Integrated local correlation: a new measure of local coherence in fMRI data. Hum. Brain Mapp. 30(1), 13–23 (2009) 15. Calhoun, D., Adali, T., Pearlson, D., Pekar, J.: A method for making group inferences from functional MRI data using independent component analysis. Hum. Brain Mapp. 14(3), 140–151 (2001) 16. Assoc Prof Craig Hacking, Occipital pole, Radiopaedia. https://radiopaedia.org/ articles/occipital-pole. Accessed 19 Oct 2022 17. Seghier, L.: The angular gyrus. Neuroscientist 19(1), 43–61 (2013) 18. Ghez, C., Fahn, S.: The Cerebellum, Principles of Neural Science, 2nd edn, pp. 502–522. Elsevier, New York (1985) 19. Fine, E., Ionita, C., Lohr, L.: The history of the development of the cerebellar examination. Semin. Neurol. 22(4), 375–384 (2002) 20. Malenka, R., Nestler, E., Hyman, S.: Molecular Neuropharmacology: A Foundation for Clinical Neuroscience, pp. 147–148. McGraw-Hill Medical, New York (2009) 21. Superior Frontal Gyrus. http://braininfo.rprc.washington.edu/centraldirectory. aspx?ID=83. Accessed 19 Oct 2022
Enhanced Network Anomaly Detection Using Deep Learning Based on U-Net Model P. Ramya, S. G. Balakrishnan(B) , and A. Vidhiyapriya Department of Computer Science and Engineering, Mahendra Engineering College, Namakkal, Tamil Nadu, India [email protected]
Abstract. Enhanced Network anomaly detection is an open topic that expects to recognize network traffic for security purposes. Anomaly detection is currently one of the critical difficulties in many areas. As data multiplies needs tools to process and analyze different data types. The motivation behind the anomaly detection method is to detect when an entity differs from its normal behaviour. Due to the increasing complexity of calculations and the qualities of the data, it isn’t easy to choose a tool for all types of anomalies. To overcome these issues, this work proposed Support Vector Regression Boosting (SVRB), used for anomaly detection and optimization-based deep learning rates to manage the nearest update and static structure for random weight vector features. These nonlinear methods investigate assessing the elements in secret elements and time complexities. Our model provides a complete description of the recurrence patterns resulting from complex traffic dynamics during “noisy” network anomalies, categorized by computable changes in the statistical effects of the traffic time series. Use the UNIBS dataset to evaluate the performance of the enhanced Support Vector Regression Boosting (SVRB). The simulation results show the improved Support Vector Regression Boosting (SVRB) can achieve higher accuracy using this dataset features. Keywords: Network traffic · Anomaly detection · hidden · random weight · KDD Cup99 · Support Vector Regression Boosting (SVRB) · nonlinear techniques
1 Introduction Digital protection is a main concern in the present digital age, with billions of PCs all over the planet associated with networks. Lately, the quantity of digital assaults has expanded altogether. Hence, cyber threat recognition intends to identify these assaults by checking traffic information throughout some undefined time frame and recognizing bizarre behaviour and ordinary traffic. Network Anomaly Detection (NAD) is an innovation that works with network security by identifying dangers from rare examples in rush hour traffic. Throughout the long term, peculiarity location frameworks have been created given accurate calculations, information mining procedures, and AI. Since most NAD techniques commonly depend on developing models of the typical way of behaving, created models can identify strange examples. NAD frameworks have different © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 237–249, 2023. https://doi.org/10.1007/978-3-031-35501-1_24
238
P. Ramya et al.
learning modes like directed, semiregulated and unaided learning. One of the principal benefits of utilizing space ontologies is the capacity to characterize a semantic model of information and related area information. Ontologies can likewise describe associations between various sorts of semantic information. Along these lines, cosmology can be utilized to foster different information recovery procedures. Deep Learning procedures stand out enough to be noticed as of late because deep brain organizations can gain complex traffic designs straightforwardly from network traffic information. In any case, genuine traffic information is enormous, loud, and class lopsided. As such, you have numerous tests unevenly disseminated in your rush hour traffic information, uncommon and numerous in standard rush hour traffic information. Most existing organization datasets don’t meet sensible prerequisites and are not reasonable for present-day organizations. Likewise, traditional datasets, for example, kddcup99 and UNSW-NB15, have been widely concentrated. Deep learning for anomaly detection, or just deep irregularity identification, expects to learn include portrayals or inconsistency scores through brain networks for peculiarity discovery. A few Anomaly recognition techniques have been presented and can fundamentally beat customary irregularity identification in tending to troublesome location issues in different genuine applications.
Fig. 1. Deep learning-based U-Net Anomaly detection
Figure 1 describes U-Net anomalies identified because of Deep learning; network traffic can be unapproved network control or outside network assault. The problem comprises three fundamental stages. First, we want to screen and gather oddity network traffic information. The oddity is then tidied up of the crude information and changed into the two-input design expected in the following stage. At long last, we want a characterization motor that distinguishes network traffic as typical or strange. Support Vector Regression Boosting (SVRB) are proposed in light of this view, and it is like the qualities of the human cerebrum. SVRB is additionally impacted by starting
Enhanced Network Anomaly Detection
239
weight vectors which compare with the information modes. The capability used to think about two vectors has some effect on bunching results. Given these angles, propose further developed self-putting together guides for peculiarity recognition. Initial weight vectors and utilize suitable correlation capability to the comparability. By contrasting the superior calculation and customary SVRB, we find that better SVRB has a higher exactness rate. AI-based SVRB security network assault has become a significant field to diminish network interruption location, further develop framework assault identification ability, and guarantee network security. In data, the executive’s frameworks and organized questions detailing dialects are the methods for data recovery. Composing organized questions is a solid way to deal with information access since it empowers end clients to make complex data set inquiries by learning a particular inquiry language. In any case, question age is hard for PC clients at different levels, other than some visual question age and refinement strategies. Lately, data recovery has become more mind-boggling with the rising utilization of information mining, choice help, and business investigation applications. Consequently, specialists centre on techniques including visual data set interfaces and intelligent question creation through diagrams. These are especially significant in giving intuitive everyday language connection points to help inquiry creation. All the more, as of late, semantic-based approaches utilizing space philosophy have been adjusted to information demonstration and data recovery. The principal objective of metaphysics-based data recovery is to work on the connection point among information and search questions to set results nearer to the client’s examination needs. 1.1 Contribution of the Research • Different malware models vary in their ability to use big data techniques. • Efficient visual detection of malware in real-time using a scalable hybrid deep learning framework. • The framework can process large numbers of malware samples in real-time and on demand.
2 Preliminary Works Abnormal is located in the risk space between the platform door and the train, undermining the protected activity of the metro transport framework. Most recreation-based irregularity location strategies can’t tackle the issues of missing oddity tests, exceptionally unequal inconsistency sections, and abnormality recovery deserts [1]. A phantom space hyperspectral irregularity location strategy is proposed because of Fractional Fourier change (FFT) and saliency weighted composite portrayal [2]. A Deep Residual Convolutional Neural Network (CNN) tries not to explode and blurring inclines and ensures exactness [3]. Irregularity discovery (AD) frequently identifies significantly different protests from encompassing neighbours yet can’t separate the distinguished articles from one another [4]. As have no earlier information on the information, couldn’t make a characterization.
240
P. Ramya et al.
Complex and heterogeneous backgrounds, unknown prior information, and uneven models make it hard to isolate the foundation from inconsistency. A few elements were removed from radar reverberations of various spaces, unable to recognize the target [5, 6] freely. The more significant part of the famous procedures for hyperspectral abnormality recognition undertakings centres around utilizing complex calculations to develop exactness further, making it hard to adjust the expansion in proficiency and intricacy. A Generative Adversarial Network (GAN) based fall identification technique utilizing a pulse sensor and accelerometer. Getting fall information can be an overwhelming errand contrasted with ordinary social details. Present GAN-based peculiarity discovery to some degree encompassed by User Initial Generative Adversarial Network (UI-GAN) [7, 8]. An averaging technique then consolidates the multi-scale discovery maps. The melded identification maps choose an oddity and foundation preparing tests [9]. Effective anomaly detection is fundamental to working on functional dependability and power age. Nonetheless, this is a troublesome errand as it is exceptionally complicated and raises different exceptional cases as often as possible [10]. The number and complexity of new goes after are continually developing. Consequently, a productive and clever arrangement is required. Unsupervised methods are incredibly appealing for interruption discovery frameworks since they can recognize obscure assaults and zero-day attacks [11]. Two key constraints adversely impact the anomaly detection of these techniques. 1) Not considering spatial pixel relationship and mess connection, and 2) low spatial goal and high unearthly goal, bringing about blended pixels and high misleading up-sides rate [12]. In the coordinated factors organization of Industry 4.0, other organization assaults are habitually happening, representing a danger to cyber security [13]. Online protection security anomaly recognition in frameworks where information is dispersed over an organization, every hub has a locally obscure responsiveness and likelihood information model [14]. Design and break down knowledge-driven arrangements where every hub notices a surge of high-layered information and processes a nearby outline score. Existing semi-managed irregularity identification strategies are generally prepared on many named typical information that experience the ill effects of high marking cost. Programmed grouping is a significant undertaking that utilizes organizing techniques to dole out classes to information objects using a learned class input capability [16]. Novel unsupervised anomaly detection and localization methods based on deep spatiotemporal transformation network (DSTN), generative adversarial network (GAN) and edge wrapping (EW). The training uses only frames of regular events to generate the dense optical flow associated with temporal features [17]. Absence of data indicating abnormal behaviour. Abnormal behaviour is often costly to the classification, so collecting sufficient data to represent this behaviour is complex. It makes developing and evaluating anomaly detection techniques difficult [18]. For intelligent surveillance video, anomaly detection is critical. Object identification and restriction issues are frequently experienced in the present work due to congestion and complex scenes [19]. Existing methods are slow because they operate on the entire
Enhanced Network Anomaly Detection
241
dataset. Also, the self-similarity of the data is rarely a concern when the vehicle crosses unusual anomalies [20]. Deep learning (AD) techniques for anomaly detection have recently improved detection performance on complex datasets such as large numbers of images and text as the primary defence for network infrastructures; intrusion detection systems are expected to adapt to the changing threat landscape [21, 22]. The continued growth of the Internet of Things (IoT) devices has created a large attack surface for cybercriminals to carry out highly damaging cyber-attacks. As a result, cyber-attacks in the security industry are growing exponentially [23]. The use of attribute networks in modelling various complex systems and anomaly detection in attribute networks has attracted considerable research interest. However, methods using graph autoencoders as the backbone do not fully utilize the network’s rich information and are not optimized for performance [24]. An unavailable element or link can cause downtime and result in financial and performance losses [25]. Effectively handle complex, high-dimensional data. Data from network connections show a high degree of nonlinearity, explaining why performance improvements are complex [26]. Lately, finding anomaly detection using various deep learning techniques. Use feature extraction methods that describe the whole image by a single feature vector called global features [27]. Several methods have been proposed to extract critical metrics from massive data to represent the state of the overall system. Using these measurements to detect discrepancies in time, possible accidents and economic losses can be prevented [28]. Trust-based multimedia analytics paradigms are essential to meet growing user demands and provide more timely and actionable insights [29]. Deep learning is one of the exciting techniques that have recently been widely adopted in IDS or intrusion detection systems to improve the effectiveness of protecting computer networks and hosts [30]. 2.1 Problem Identification • lacks speed execution for detecting a malicious attack • Failure to discuss the importance of functionality to detect malware in real-time. • Spend a lot of time analyzing the running behaviors.
3 Implementation of the Proposed Method A Machine learning network for the malware dynamic gesture classification task. It developed and tested a Machine learning-based Support Vector Regression Boosting (SVRB) algorithm. A machine learning classifier powered by a malicious detection framework is proposed. The sequence of application programming interface (API) calls, and the frequency of API calls are passed to similarity-based mining and machine learning methods to handle malicious code variants. A comprehensive experimental analysis of a large dataset is conducted to propose a unified framework for feature extraction from malware binaries. The proposed method is faster than static and dynamic analysis because it works on raw bytes and avoids decomposition and processing. In other words, bundled malware variants of unrelated malware may have visual similarities.
242
P. Ramya et al.
Fig. 2. Proposed block diagram
Figure 2 describes Analytical and detection methods implemented using big data analytics and machine learning to develop general detection models. Learning from big data makes machine learning more efficient. 3.1 Preprocessing Stage or Filtering the Data Data preprocessing is essential to avoid noise, missing and inconsistent data in the dataset. Because dataset records are collected from multiple heterogeneous sources, data quality is reduced and preprocessing is required. Many factors influence the data, including accuracy, completeness, consistency, timeliness, reliability and interpretability. The proposed preprocessor checks for null values and efficiently fills missing values. Steps for Preprocessing In the preprocessing data, the stage is, without losing all of the information, to reduce as much data as possible, it is also, special planning, training, and testing are required to analyze the system follow these steps.
Enhanced Network Anomaly Detection
243
• The best, adequate calculation data for the dataset. • Filters the error to increase the false detection rate. • Discovered the attack mode to formulate the policy and then display the corresponding data type administrator. Data filtering preprocesses the dataset in model form. Their unsupervised anomaly detection technique assumes that average is only 25% of the input data anomalous as 75% of UNIBS collection of records were flagged as abnormal. We used subsampling to create a dataset with the required sizes. 3.2 Extracting the Features Using Boosting Algorithm Machine learning and extensive data-based anomaly detection are based on training and testing features for overall security architecture, and it has become effective security tool. An Anomaly is feature extraction is an essential preprocessing step that can be thought of as a pattern recognition system. The feature extraction process includes the selection of the feature structure and function. Steps for Feature extraction
,… Input: Output: : selected features Step 1: Load the filtering or normalized feature Step 2: Create an empty values dataset (S) to save the feature scores Step 3: Boosting classifier for extracting features Step 4: Generate Feature data values Step 5: Set the F1 as the threshold value of Step 6: For f from do If Add End if End for Step 7: use the feature score in S to generate fill
Where F1 − feature filtering, ai - variable initialization, Ftp – feature threshold, and the Boosting algorithm can compute scores for all features in a specified dataset. That total metric is labelled as Feature Rank. 3.3 Classification for Machine Learning Algorithms Using Support Vector Regression Boosting (SVRB) Based on U-Net Model U-Net model based abnormal behaviors in observational scenarios are typically behavioral events that differ significantly from normal behavior. Also, the sample size of cases of abnormal behavior is small, which is no better than the sample size of cases of normal
244
P. Ramya et al.
behavior. Anomalous behavior is determined based on constraints such as reconstruction of predicted latent feature vectors. From learning frames and sequences to practicing reflexes. Support Vector Regression Boosting (SVRB) using the dataset is usually transformed into High-dimensional feature space using kernels for U-Net model based abnormal. Packet traffic and response metrics are collected during periods when the structure of the original reference training set is not corrupted. All dimensions are scaled between 0 and 1 to reduce the time required to compute the SVRB model. After training, the packet flow measurements are distributed as input to the SVRB model, and its predictions are compared with the measured current reflection factors ax = aux − avx , x = 1, 2, . . . , n • ax -difference between predicted score, • aux − avx - feature scores Equation defined the predicted error and structure of expected behaviours. Step for SVRB
Step 1: Initialize the dataset Step 2: WHILE (it does not meet the stop condition) FOR x =1 to the number of particles Choose attributes Segment Training data and Test data Train data on Training data Step 4: SVRB Classified using Test data Store anomaly detection rate in an array Next x Update Node position Next group until ending criterion Step 5: End. Where, SVRB – Support Vector Regression Boosting, x- feature values, to detect the normal record, enter the 1, set it, all the other attack is set to 2. This class, then, are selected with the help of features, and the SVRB algorithm does binary classification. Anomaly detection is believed to be the adaptation of the category that predicts the anomaly detection in the group is considered the best of the elements. In each group, processing of feature selection and classification is done.
4 Result and Discussion The tests presented in this work were performed on Windows 10 OS with the following processors loaded: Intel(R) Core (TM) i7-8568U CPU @ 1.80 GHz to 1.99 GHz. Create,
Enhanced Network Anomaly Detection
245
Table 1. Proposed simulation parameters Parameters
Values
Tool
Anaconda
Language
Python
Total number of data
1500
Training
1000
Testing
500
Method
SVRB
train, evaluate, and test ML models with the Scikit-Learn ML Python framework. ScikitLearn makes use of matplotlib, NumPy and Python Library for Scipy. In addition, ScikitLearn can be used to perform all classification, regression, and clustering tasks. Table 1 shows the simulation parameters for the dataset based on the Support Vector Regression Boosting used for evaluating the training and testing data by analyzing the different methods.
Fig. 3. Analysis of Detection Accuracy
Figure 3 shows the accuracy analysis based on the dataset comparing the existing and proposed systems. Support Vector Regression Boosting (SVRB) is 92% better than previous methods in the proposed method.
246
P. Ramya et al.
Fig. 4. Analysis of Performance Metrix
Figure 4 shows the precision and recall analysis based on the dataset using true and false values for evaluating the confusion matrix. Comparing the existing and proposed systems, the proposed method, Support Vector Regression Boosting (SVRB), is 92% better than previous methods.
Fig. 5. Analysis of False Rate
Figure 5 describes the false rates using the dataset comparing the results in previous and proposed methods. In the proposed way, Support Vector Regression Boosting (SVRB) reduces the false rate by 41% better than previous methods.
Enhanced Network Anomaly Detection
247
Fig. 6. Analysis of time complexity
Figure 6 analysis the time complexity by comparing the existing and proposed methods. Support Vector Regression Boosting (SVRB) is 38 s reduced the time is efficient for finding the attack in the low time.
5 Conclusion Anomaly detection and classification techniques for network security have been proposed. It demonstrates a model fusion that combines binary normal/attack SVRBs to detect arbitrary attacks and the existence of multiple attack SVRBs to classify attacks. In addition, this scheme addresses the problem of highly unbalanced traffic data at the million level. The proposed solutions are trained, validated and tested using real-world datasets with encouraging results. Moreover, the proposed solution is crucial in reducing the false positive rate suffered by most NAD systems. False positives tend to make NAD systems less reliable. Reducing the false positive rate increases the robustness and reliability of the NAD system. However, the proposed solution’s low false positive rate does not reduce our ability to detect real-world attacks.
References 1. Liu, R., et al.: Metro anomaly detection based on light strip inductive key frame extraction and MAGAN network. IEEE Trans. Instrum. Meas. 71, 1–14 (2022). Article no. 5000214. https://doi.org/10.1109/TIM.2021.3128961 2. Yousef, W.A., Traoré, I., Briguglio, W.: UN-AVOIDS: unsupervised and nonparametric approach for visualizing outliers and invariant detection scoring. IEEE Trans. Inf. Forensics Secur. 16, 5195–5210 (2021). https://doi.org/10.1109/TIFS.2021.3125608 3. Zhao, C., Li, C., Feng, S., Su, N., Li, W.: A spectral-spatial anomaly target detection method based on fractional Fourier transform and saliency weighted collaborative representation for hyperspectral images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 13, 5982–5997 (2020). https://doi.org/10.1109/JSTARS.2020.3028372 4. Wang, W., et al.: Anomaly detection of industrial control systems based on transfer learning. Tsinghua Sci. Technol. 26(6), 821–832 (2021). https://doi.org/10.26599/TST.2020.9010041
248
P. Ramya et al.
5. Wang, Y., et al.: A posteriori hyperspectral anomaly detection for unlabeled classification. IEEE Trans. Geosci. Remote Sens. 56(6), 3091–3106 (2018). https://doi.org/10.1109/TGRS. 2018.2790583 6. Zhong, J., Xie, W., Li, Y., Lei, J., Du, Q.: Characterization of background-anomaly separability with generative adversarial network for hyperspectral anomaly detection. IEEE Trans. Geosci. Remote Sens. 59(7), 6017–6028 (2021). https://doi.org/10.1109/TGRS.2020.3013022 7. Guo, Z.-X., Shui, P.-L.: Anomaly based sea-surface small target detection using k-nearest neighbor classification. IEEE Trans. Aerosp. Electron. Syst. 56(6), 4947–4964 (2020). https:// doi.org/10.1109/TAES.2020.3011868 8. Wang, W., Song, W., Li, Z., Zhao, B., Zhao, B.: A novel filter-based anomaly detection framework for hyperspectral imagery. IEEE Access 9, 124033–124043 (2021). https://doi. org/10.1109/ACCESS.2021.3110791 9. Nho, Y.-H., Ryu, S., Kwon, D.-S.: UI-GAN: generative adversarial network-based anomaly detection using user initial information for wearable devices. IEEE Sens. J. 21(8), 9949–9958 (2021). https://doi.org/10.1109/JSEN.2021.3054394 10. Li, S., Zhang, K., Hao, Q., Duan, P., Kang, X.: Hyperspectral anomaly detection with multiscale attribute and edge-preserving filters. Geosci. Remote Sens. Lett. 15(10), 1605–1609 (2018). https://doi.org/10.1109/LGRS.2018.2853705 11. Zhao, Y., Liu, Q., Li, D., Kang, D., Lv, Q., Shang, L.: Hierarchical anomaly detection and multimodal classification in large-scale photovoltaic systems. IEEE Trans. Sustain. Energy 10(3), 1351–1361 (2019). https://doi.org/10.1109/TSTE.2018.2867009 12. Li, Z., Zhang, Y.: Hyperspectral anomaly detection via image super-resolution processing and spatial correlation. IEEE Trans. Geosci. Remote Sens. 59(3), 2307–2320 (2021). https:// doi.org/10.1109/TGRS.2020.3005924 13. Pu, G., Wang, L., Shen, J., Dong, F.: A hybrid unsupervised clustering-based anomaly detection method. Tsinghua Sci. Technol. 26(2), 146–153 (2021). https://doi.org/10.26599/TST. 2019.9010051 14. Qi, L., Yang, Y., Zhou, X., Rafique, W., Ma, J.: Fast anomaly identification based on multiaspect data streams for intelligent intrusion detection toward secure industry 4.0. IEEE Trans. Ind. Inf. 18(9), 6503–6511 (2022). https://doi.org/10.1109/TII.2021.3139363 15. Kurt, M.N., Yılmaz, Y., Wang, X., Masterman, P.J.: Online privacy-preserving data-driven network anomaly detection. IEEE J. Sel. Areas Commun. 40(3), 982–998 (2022). https://doi. org/10.1109/JSAC.2022.3142302 16. Gao, F., Li, J., Cheng, R., Zhou, Y., Ye, Y.: ConNet: deep semi-supervised anomaly detection based on sparse positive samples. IEEE Access 9, 67249–67258 (2021). https://doi.org/10. 1109/ACCESS.2021.3077014 17. Guezzaz, A., Asimi, Y., Azrour, M., Asimi, A.: Mathematical validation of proposed machine learning classifier for heterogeneous traffic and anomaly detection. Big Data Min. Anal. 4(1), 18–24 (2021). https://doi.org/10.26599/BDMA.2020.9020019 18. Ganokratanaa, T., Aramvith, S., Sebe, N.: Unsupervised anomaly detection and localization based on deep spatiotemporal translation network. IEEE Access 8, 50312–50329 (2020). https://doi.org/10.1109/ACCESS.2020.2979869 19. Sabuhi, M., Zhou, M., Bezemer, C.-P., Musilek, P.: Applications of generative adversarial networks in anomaly detection: a systematic literature review. IEEE Access 9, 161003–161029 (2021). https://doi.org/10.1109/ACCESS.2021.3131949 20. Zheng, Z., Zhou, M., Chen, Y., Huo, M., Sun, L.: QDetect: time series querying based road anomaly detection. IEEE Access 8, 98974–98985 (2020). https://doi.org/10.1109/ACCESS. 2020.2994461 21. Ruff, L., et al.: A unifying review of deep and shallow anomaly detection. Proc. IEEE 109(5), 756–795 (2021). https://doi.org/10.1109/JPROC.2021.3052449
Enhanced Network Anomaly Detection
249
22. Naseer, S., et al.: Enhanced network anomaly detection based on deep neural networks. IEEE Access 6, 48231–48246 (2018). https://doi.org/10.1109/ACCESS.2018.2863036 23. Ullah, I., Mahmoud, Q.H.: Design and development of a deep learning-based model for anomaly detection in IoT networks. IEEE Access 9, 103906–103926 (2021). https://doi.org/ 10.1109/ACCESS.2021.3094024 24. Liu, Y., Li, Z., Pan, S., Gong, C., Zhou, C., Karypis, G.: Anomaly detection on attributed networks via contrastive self-supervised learning. IEEE Trans. Neural Netw. Learn. Syst. 33(6), 2378–2392 (2022). https://doi.org/10.1109/TNNLS.2021.3068344 25. Bhanage, D.A., Pawar, A.V., Kotecha, K.: IT infrastructure anomaly detection and failure handling: a systematic literature review focusing on datasets, log preprocessing, machine & deep learning approaches and automated tool. IEEE Access 9, 156392–156421 (2021). https:// doi.org/10.1109/ACCESS.2021.3128283 26. Malaiya, R.K., Kwon, D., Suh, S.C., Kim, H., Kim, I., Kim, J.: An empirical evaluation of deep learning for network anomaly detection. IEEE Access 7, 140806–140817 (2019). https:// doi.org/10.1109/ACCESS.2019.2943249 27. Choi, K., Yi, J., Park, C., Yoon, S.: Deep learning for anomaly detection in time-series data: review, analysis, and guidelines. IEEE Access 9, 120043–120065 (2021). https://doi.org/10. 1109/ACCESS.2021.3107975 28. Garg, S., Kaur, K., Kumar, N., Rodrigues, J.J.P.C.: Hybrid deep-learning-based anomaly detection scheme for suspicious flow detection in SDN: a social multimedia perspective. IEEE Trans. Multimedia 21(3), 566–578 (2019). https://doi.org/10.1109/TMM.2019.2893549 29. Mezina, A., Burget, R., Travieso-Gonzalez, C.M.: Network anomaly detection with temporal convolutional network and U-Net model. Received September 10, 2021, accepted October 3, 2021, date of publication October 21, 2021, date of current version October 27, 2021 (2021) 30. Pena, E.H.M., Carvalho, L.F., Barbon, S.J., Rodrigues, J.J.P.C., Proença, M.L.J.: Anomaly detection using the correlational paraconsistent machine with digital signatures of the network segment. Inf. Sci. 420, 313–328 (2017)
An Improved Multi-image Steganography Model Based on Deep Convolutional Neural Networks Mounir Telli1,2(B) , Mohamed Othmani3,4 , and Hela Ltifi5,6 1
National Engineering School of Sfax, University of Sfax, BP 1173, Sfax, Tunisia 2 Research Lab: Technology, Energy, and Innovative Materials Lab, Faculty of Sciences of Gafsa, University of Gafsa, Gafsa, Tunisia [email protected] 3 Faculty of Sciences of Gafsa, University of Gafsa, Gafsa, Tunisia [email protected] 4 Applied College, Qassim University, Buraydah, Saudi Arabia 5 Computer Science and Mathematics Department, Faculty of Sciences and Techniques of Sidi Bouzid, University of Kairouan, Kairouan, Tunisia [email protected] 6 Research Groups in Intelligent Machines, BP 1173, Sfax 3038, Tunisia
Abstract. In this paper, we present a DeepCNN grounded autoencoderbased picture steganography model that enables the extraction of spatiotemporal information from images. This model’s innovation is the process of concealing four pictures within one other image while taking into account size equivalency. To conceal many hidden pictures under a single, high-resolution cover image, we attempt to encode and decode them. This model’s quantitative output was organized using the quantitative index error per pixel, MSE, SSIM, and PSNR, and it performs admirably in comparison to earlier methods. Keywords: Steganography Auto-Encoder
1
· Image · Spatio-temporal · Deep CNN ·
Introduction
In information security [1,2], steganography [3] is an essential technology in the realm of information [1,4]. Steganography is a method for encoding confidential data (such as a message, a picture, or a sound) into a non-secret object (such as an image, a sound, or a text message) known as a cover object. The majority of the work in picture steganography [3] has been done to disguise specific content on a cover photo. To include as much concealed information without altering the original image as feasible, all existing techniques have concentrated on finding either noisy regions or low-level picture components like edges, textures, and color in the cover image [5,6]. We provide an improved steganography strategy for hiding four photos in one image in this research with the intention c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 250–262, 2023. https://doi.org/10.1007/978-3-031-35501-1_25
An Improved Multi-image Steganography Model Based on Deep CNN
251
of transmitting the greatest concealed data with the least possible loss of the quality of the cover. In order to do this, we created a deep learning network that automatically selects the best characteristics from both the cover and hidden photographs, allowing us to integrate data. The main advantage of our approach is that it is universal and adaptable to any type of image. By using the concepts from the aforementioned articles to combine four photos into a single cover image, we want to make an effort in a similar direction. Convolutional neural network is the deep neural network of choice for us, since it’s largely used nowadays in many domains, including security [1,3,7], Object detection and tracking [8,9], medicine of neurological ailments [10] and monitoring [11].
2
Related Work
Of several implementations, four are the most important to our research. 2.1
Hiding Images in Plain Sight: Deep Steganography
Baluja [12] proposed a model in image steganography founded on deep CNN to embed the entire image within another image. This task uses an autoencoderbased deep learning model [13], and validation is done using the weighted total of the reconstruction losses between secret and published secret images and also between cover and container images. The system is composed of three components. The Preparation Network is responsible for preparing the secret picture to be hidden. If the secret picture (size M*M) is less than the cover image (size N*N), The preparation network evenly distributes the bits of the secret picture over N*N pixels as it gradually increases the size of the secret image until it meets the size of the cover. The preparation network’s output and the cover image are inputs into the hiding network, which creates the container image. The input to the network is a N*N pixel field that consists of depth-concatenated RGB channels from the cover image and altered channels from the concealed image. The Reveal Network is utilized by the image receiver; Fig. 1 [12] shows the system’s three components.
Fig. 1. The three parts that make up the entire system. Left: Preparing the SecretImage. Center: The image on the cover is being hidden. Right: The revealed network, which is simultaneously trained but is used by the receiver, uncovers the secret picture.
252
M. Telli et al.
Baluja explains how a trained system must discover how to condense data from the concealed picture into the areas of the cover image that are least apparent. However, there has been no clear attempt to purposefully disguise the existence of that data from machine detection. The limit of this model is that the networks were trained only on natural images found in the ImageNet challenge. Consequently, there are significant mistakes in the cover and hidden restoration when we use another type of image, though the secret image remains recognizable. The document establishes a standard for encoding single secret images. It does not, however, address multi-image steganography. 2.2
Reversible Image Steganography Scheme Based on a U-Net Structure
A novel picture steganography method based on a U-Net structure is proposed by Duan [14]. The trained deep neural network first combines an extraction network and a concealing network in the form of paired training; then, the sender utilizes the hiding network to seamlessly integrate the hidden picture into another fullsize image prior to mailing it to the recipient. Finally, the receiver successfully reconstructs the original cover picture and secret image using the extraction network. Their method compresses and distributes secret images on all available bits on the cover image. They try different network structures to avoid the existence of secret information in steganalysis. Finally, the hiding network and the extraction network of this paper also adopt the idea of a decoder and encoder. For the Hiding network, the encoder network is used directly to encode the secret image into the cover image. Using a U-Net structured convolutional neural network, the cover image and the secret image are concatenated into a 6-channels tensor as an input to the hiding network. For the Extraction network, the trained decoder network is used to extract the secret image existing in the stego images. The network has 6 convolutional layers with a convolution kernel size of 3×3, except that the last layer uses the Sigmoid activation function [14], and each layer is followed by a BN Layer and ReLU activation layer. A stego image generated by a hiding network is used directly as an input to the extraction network [14]. As a result of this model and under the ImageNet dataset, the average of PSNR [15] and SSIM [15] for the cover image reached (40.4716/0.9794), and the average of PSNR and SSIM for the secret image reached (40.6665/0.9842). This method has significant advantages in visual effects [16], but still can not show the results of using more than one picture as a secret in the input of the model. 2.3
SteganoCNN: Image Steganography with Generalization Ability Based on Convolutional Neural Network
Duan [17], has developed a novel Steganography Convolution Neural Network (SteganoCNN) model that successfully reconstructs two hidden pictures while resolving the issue of two images being contained in a carrier image. Their SteganoCNN model consists of two modules: an encoding network and a decoding network, with two extraction networks included in the decoding network.
An Improved Multi-image Steganography Model Based on Deep CNN
253
The secret picture is automatically embedded into the carrier image by the encoding network after which the decoding network is utilized to reassemble two separate secret images. The entire model architecture of SteganoCNN is presented in Fig. 2.
Fig. 2. SteganoCNN model Architecture
Visual and quantitative assessment techniques, such as SSIM and PSNR, are used to examine the experiment’s ultimate outcome [15]. According to this approach, the Average PSNR and SSIM are, respectively (20.274/0.861) for the cover image, (22.932/0.859) for the first secret image, and (22.271/0.831) for the second secret image. This result shows that this model has a certain degree of success but only with two images as input, we do not have the result of more than two pictures as a secret. 2.4
Multi-Image Steganography Using Deep Neural Networks
Abhishek [18] used multi-image steganography to conceal three photos in a single cover photo. The hidden pictures included in the code must be retrievable with little loss. It is necessary for the cover image to match the original in every way. They include the ideas from Kreuk et al., 2019 [19] and Baluja, 2017 [12] to achieve this. From a network implementation standpoint, they use the ideas of a preparation and concealment network as an encoder and a reveal network as a decoder. To make this work for numerous pictures, they use the prep network to transmit several secret images, then concatenate the resultant data with the carrier image and send it over the Hiding network. The idea of several decoders, one for each hidden image, is then used to extract all of the hidden pictures from the container image. To improve the security of their image retrieval approach, they extend Baluja’s idea by adding noise to the hidden images rather than placing them at the LSBs of the original cover image. Each of the sub-networks basic architecture [18] is composed of three components. The Preparation Networks which is made up of two layers stacked on top of each other. Three separate
254
M. Telli et al.
Conv2D layers make up each layer. These three Conv2D layers have, respectively, 50, 10, and 5 channels, with kernel sizes of 3, 4, and 5. The length of the stride is fixed at one along both axes. Padding is offered to each Conv2D layer to maintain the output image’s dimensions. After each Conv2d layer, a ReLU activation occurs. The Concealing Network or The hiding network is a five-layer aggregation. These layers are made up of the three separate Conv2D layers. The Conv2D levels in the Prep Network and the Conv2D layers in the Hidden Network have a similar fundamental structure. The Reveal Network where each of the reveal networks has the same basic architecture as the hidden network, with similar-shaped five tiers of Conv2D layers. The limit of this method is resumed in the fact that when they increase the number of images the loss for all the values is projected to rise as the number of photos increases, as more image characteristics are buried in a single image. As a result, they’ll need to establish a limit on how many photos may be placed on the cover image to get acceptable results.
3
Proposed Model
In this part, we describe our steganography model, which consists of a generic encoder-decoder architecture based on deep learning. [13,14] for image steganography. The training procedures for the suggested model are then described. Figure 3 depicts the whole pipeline of the suggested encoder model, and Fig. 4 depicts the decoder. We aim to combine the model of Baluja [12] with the pipeline of Kumar [2]. Many prominent steganographic approaches for encoding secret communications [19] differ from the suggested steganography framework. Our method uses a convolutional neural network to spread out the secret images across all of the cover image’s bits. The decoder and encoder concepts are used in this paper’s hiding and extraction networks [20]. The Preparation Network, Hiding Network (Encoder), and Reveal Network are the three components that make up the model. Our objective is to be able to encode data about the secrets images S1, S2, S3, and S4 into the cover image C, resulting in C’ that is very similar to C, while still being able to decode data from C’ to produce the decoded secrets images S1’, S2’, S3’, and S4’ that should resemble the secrets images most accurately. Data from the secret picture must be prepared by the Preparation Network before being usually associated with the cover image and transmitted to the Hiding Network. This input is subsequently converted into the encoded cover picture C’ by the Hiding Network. Finally, the Secret Images S1’, S2’, S3’, and S4’ from C’ are decoded by the Reveal Network. We introduce noise before to the Reveal Network for stability. We employ 5 layers of 65 filters for the Hiding and Reveal networks (50 3 × 3 filters, 10 4 × 4 filters, and 5 5 × 5 filters). We just employ two identically structured layers for the preparation network.
An Improved Multi-image Steganography Model Based on Deep CNN
255
Fig. 3. Pipeline of the Encoder
Detailed representations of the preparation network specifications are found in Table 1 Detailed representations of the hiding network specifications are found in Table 2 For the revealing network, the detailed representation of only one secret image is found in Table 3; The column output displays the size of the output feature map, and the value is equal to one for each layer’s stride and padding.
256
M. Telli et al.
Table 1. Common CNN Preparation network structure Layer
Index
01
Prep111 Conv2D 50
3×3
Prep112 Conv2D 10
4×4
Prep113 Conv2D 5
5×5
Prep121 Conv2D 50
3×3
Prep122 Conv2D 10
02
Output
Concatenation
64 × 64 × 50
N/A
64 × 64 × 3
64 × 64 × 10
N/A
64 × 64 × 3
64 × 64 × 65
Concat with Prep111, Prep112
64 × 64 × 3
64 × 64 × 50
N/A
4×4
64 × 64 × 3
64 × 64 × 10
N/A
Prep123 Conv2D 5
5×5
64 × 64 × 3
64 × 64 × 65
Concat with Prep121, Prep122
Prep131 Conv2D 50
3×3
64 × 64 × 3
64 × 64 × 50
N/A
Prep132 Conv2D 10
4×4
64 × 64 × 3
64 × 64 × 10
N/A
Prep133 Conv2D 5
5×5
64 × 64 × 3
64 × 64 × 65
Concat with Prep131, Prep132
Prep141 Conv2D 50
3×3
64 × 64 × 3
64 × 64 × 50
N/A
Prep142 Conv2D 10
4×4
64 × 64 × 3
64 × 64 × 10
N/A
Prep143 Conv2D 5
5×5
64 × 64 × 3
64 × 64 × 65
Concat with Prep141, Prep142
Prep211 Conv2D 50
3×3
64 × 64 × 65 64 × 64 × 50
N/A
Prep212 Conv2D 10
4×4
64 × 64 × 65 64 × 64 × 10
N/A
Prep213 Conv2D 5
5×5
64 × 64 × 65 64 × 64 × 65
Concat with Prep211, Prep212
Prep221 Conv2D 50
3×3
64 × 64 × 65 64 × 64 × 50
N/A
Prep222 Conv2D 10
4×4
64 × 64 × 65 64 × 64 × 10
N/A
Prep223 Conv2D 5
5×5
64 × 64 × 65 64 × 64 × 65
Concat with Prep221, Prep222
Prep231 Conv2D 50
3×3
64 × 64 × 65 64 × 64 × 50
N/A
Prep232 Conv2D 10
4×4
64 × 64 × 65 64 × 64 × 10
N/A
Prep233 Conv2D 5
5×5
64 × 64 × 65 64 × 64 × 65
Concat with Prep231, Prep232
Prep241 Conv2D 50
3×3
64 × 64 × 65 64 × 64 × 50
N/A
Prep242 Conv2D 10
4×4
64 × 64 × 65 64 × 64 × 10
N/A
Prep243 Conv2D 5
5×5
64 × 64 × 65 64 × 64 × 65
Concat with Prep241, Prep242
Concat CT31
Type
Filters Kernel Input
Concat
64 × 64 × 3
64 × 64 × 3
1
64 × 64 × 263 Concat with Prep243, Prep233, Prep223 and Prep213
Table 2. Common CNN hiding network structure Layer Index 01
02
03
04
05
Type
Filters Kernel Input
Output
Concatenation
Hide11 Conv2D 50
3×3
64 × 64 × 263 64 × 64 × 50 N/A
Hide12 Conv2D 10
4×4
64 × 64 × 263 64 × 64 × 10 N/A
Hide13 Conv2D 5
5×5
64 × 64 × 263 64 × 64 × 65 Concat with Hide11, Hide12
Hide21 Conv2D 50
3×3
64 × 64 × 65
64 × 64 × 50 N/A
Hide22 Conv2D 10
4×4
64 × 64 × 65
64 × 64 × 10 N/A
Hide23 Conv2D 5
5×5
64 × 64 × 65
64 × 64 × 65 Concat with Hide21, Hide22
Hide31 Conv2D 50
3×3
64 × 64 × 65
64 × 64 × 50 N/A
Hide32 Conv2D 10
4×4
64 × 64 × 65
64 × 64 × 10 N/A
Hide33 Conv2D 5
5×5
64 × 64 × 65
64 × 64 × 65 Concat with Hide31, Hide32
Hide41 Conv2D 50
3×3
64 × 64 × 65
64 × 64 × 50 N/A
Hide42 Conv2D 10
4×4
64 × 64 × 65
64 × 64 × 10 N/A
Hide43 Conv2D 5
5×5
64 × 64 × 65
64 × 64 × 65 Concat with Hide41, Hide42
Hide51 Conv2D 50
3×3
64 × 64 × 65
64 × 64 × 50 N/A
Hide52 Conv2D 10
4×4
64 × 64 × 65
64 × 64 × 10 N/A
Hide53 Conv2D 5
5×5
64 × 64 × 65
64 × 64 × 65 Concat with Hide51, Hide52
Hide54 Conv2D 3
3×3
64 × 64 × 65
64 × 64 × 3
N/A
An Improved Multi-image Steganography Model Based on Deep CNN
Fig. 4. Pipeline of the Decoder Table 3. Single image CNN revealing network structure Layer Index Type 01
02
03
04
05
Filters Kernel Input 3×3
64 × 64 × 3
Output
Concatenation
64 × 64 × 50 N/A
Rev11
Conv2D 50
Rev12
Conv2D 10
4×4
64 × 64 × 3
64 × 64 × 10 N/A
Rev13
Conv2D 5
5×5
64 × 64 × 3
64 × 64 × 65 Concat with Rev11,Rev12
Rev21
Conv2D 50
3×3
64 × 64 × 65 64 × 64 × 50 N/A
Rev22
Conv2D 10
4×4
64 × 64 × 65 64 × 64 × 10 N/A
Rev23
Conv2D 5
5×5
64 × 64 × 65 64 × 64 × 65 Concat with Rev21,Rev22
Rev31
Conv2D 50
3×3
64 × 64 × 65 64 × 64 × 50 N/A
Rev32
Conv2D 10
4×4
64 × 64 × 65 64 × 64 × 10 N/A
Rev33
Conv2D 5
5×5
64 × 64 × 65 64 × 64 × 65 Concat with Rev31,Rev32
Rev41
Conv2D 50
3×3
64 × 64 × 65 64 × 64 × 50 N/A
Rev42
Conv2D 10
4×4
64 × 64 × 65 64 × 64 × 10 N/A
Rev43
Conv2D 5
5×5
64 × 64 × 65 64 × 64 × 65 Concat with Rev41,Rev42
Rev51
Conv2D 50
3×3
64 × 64 × 65 64 × 64 × 50 N/A
Rev52
Conv2D 10
4×4
64 × 64 × 65 64 × 64 × 10 N/A
Rev53
Conv2D 5
5×5
64 × 64 × 65 64 × 64 × 65 Concat with Rev51,Rev52
Rev54
Conv2D 3
3×3
64 × 64 × 65 64 × 64 × 3
257
258
4
M. Telli et al.
Experimental Results
Tiny ImageNet Visual Recognition Challenge was the dataset we utilized. A random selection of photographs from each of the 200 classes makes up our training set. 50 photos are taken for each class throughout training, totaling 10000 images for both training and assessment. On a workstation equipped with an Nvidia Geforce 1650TI GPU card, our experimental components and outcomes were achieved. Conda (Anaconda toolkit) and Python are used to create the model, along with certain dependencies (TensorFlow, NVIDIA CUDA Toolkit, and NVIDIA cuDNN). The following is a description of the training: There are two pieces to the train set. The first 8000 photographs are utilized as covert training images, while the remaining 2000 images are used as cover images. The Adam optimizer was applied. The learning rate remains constant at 0.001 till the first 200 epochs, decreasing to 0.0003 from 200 epochs to 400 epochs and further decreasing by 0.00003 for the remaining iterations. The model has been trained for 500 epochs with a batch size of 20. Gaussian noise with a 0.01 standard deviation is added to the encoder’s output before passing it through the decoder. The loss function that has been used for calculating the decoder’s loss is represented as: Loss = C − C 2 + β1 ∗ S1 − S1 2 + β2 ∗ S2 − S2 2 +β3 ∗ S3 − S3 2 + β4 ∗ S4 − S4 2
(1)
The training has been performed for value of β1, β2, β3, and β4 equal to 1.00. Figure 5 shows the consequences of covering a single cover picture with four hidden images. The encoder/decoder outputs are shown on the left side, while the input pictures are shown on the right.
Fig. 5. Result of hiding 4 images. Left-Right: Cover Image, Secret Image1, Secret Image2, Secret Image3, Secret Image4, Encoded Cover Image, Decoded Secret Image1, Decoded Secret Image2, Decoded Secret Image3, and Decoded Secret Image4 are listed in the columns.
The encoded cover picture resembles the original cover in appearance but contains no information about the secret images. In comparison to the scenario
An Improved Multi-image Steganography Model Based on Deep CNN
259
where only two secret photos are employed [18], the encoded cover is more burry. The error per pixel values received for our proposed implementation after 500 epochs were in Table 4. Table 4. Values of error per pixel for our proposed model Image
Proposed results of error per pixel [0, 255]
Secret1 error 15.421788 Secret2 error 14.51199 Secret3 error 34.47326 Secret4 error 82.736496 Cover error
20.335892
The loss curves of the hiding and revealing are computed based on Eq. 1, where the beta’s values are (1.00), and the batch size is 20 (representing 4 covers and 16 secrets). The results are shown in Fig. 6.
Fig. 6. Loss Curves.
In terms of the quantitative analysis, our model’s performance is evaluated on error per pixel and also gauged with metrics such as mean square error (MSE), peak signal-to-noise ratio (PSNR), absolute pixel and structural similarity index (SSIM) [15]. According to Table 5, average MSE values are very close to 0, average PSNR is a high value, and average SSIM values are very near to 1. Table 5. Quantitative analysis of model performance MSE Cover − DecodedCover
PSNR
SSIM
0.00322 25.30165 0.86907
Secret1 − DecodedSecret1 0.00261 26.15695 0.91129 Secret2 − DecodedSecret2 0.00258 26.38304 0.90912 Secret3 − DecodedSecret3 0.00918 20.76433 0.75501 Secret4 − DecodedSecret4 0.09977 10.57023 0.35078
260
M. Telli et al.
As a result, the different resultant values of MSE, PSNR, and SSIM show the effectiveness of the proposed model. In Table 6 we have reported the comparative results of our proposed model against the Baluja Image model [12], Duan Model [14], their new model SteganoCNN [17], and Abhishek model [18] on the performance metrics SSIM and PSNR. Because Baluja and Abhishek show only the results of error per pixel, we have trained the Baluja Image model and Abhishek model on our dataset. Table 6. Qualitative analysis of model performance with SSIM and PSNR Baluja
Duan
Stegano CNN Abhishek
SSIM PSNR SSIM PSNR SSIM PSNR
Ours
SSIM PSNR SSIM PSNR
C-C’
0.984
36.153
0.979
40.471
0.861
20.274
0.915
27.889
0.869
25.301
S1-S1’
0.983
35.439
0.984
40.666
0.859
22.932
0.936
28.093
0.911
26.156
S2-S2’
N/A
N/A
N/A
N/A
0.831
22.271
0.954
29.809
0.909
26.383
S3-S3’
N/A
N/A
N/A
N/A
N/A
N/A
0.755
20.764
0.755
20.764
S4-S4’
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
0.350
10.570
These comparative results show that our model output a container with a visual appearance closer to the cover image. Not only the container image, but the secrets images revealed by our model are also good compared to the results of the others image models. The result shown in Table 6 for the models of Duan is different from our results since the size of the dataset used is different, so for the SteganoCNN, they have used a dataset with 20000 images, for the first model of duan a dataset with only 1000 images had been used. But for us, our dataset is composed of 10000 images. There is still some blur in the fourth image so we can use more images or other Datasets to improve our method to make it more efficient. As a future work, we can make a combination of our model with a key to making the hiding network more efficient like the model of Kweon [21].
5
Conclusion
In this work, a multi-image steganography method has been designed based on utilized CNN, and the main goal is to achieve improvement in some stego aspects, including visibility. Our implementation extended the single-image steganography model proposed by the recent implementation. We relied heavily on visual perception for overall loss and didn’t experiment with various types of losses that could have been better suited for our model.
An Improved Multi-image Steganography Model Based on Deep CNN
261
The future work will be focused on the improvement of our model to increase the capacity with some tests using different datasets and testing over different methods of steganalysis. We can also modify the beta’s values. Declarations Conflict of interests. All authors declare that they have no conflict of interest. Ethical approval. This article does not contain any studies with human participants or animals performed by any of the authors.
References 1. Singh, L., Singh, A.K., Singh, P.K.: Secure data hiding techniques: a survey. Multimedia Tools and Applications (2020) 2. Kumar Sharma, D., Chidananda Singh, N., Noola, D.A., Nirmal Doss, A., Sivakumar, J.: A review on various cryptographic techniques and algorithms. Materials Today: Proceedings (2021) 3. Subramanian, N., Elharrouss, O., Al-Maadeed, S., Bouridane, A.: Image steganography: a review of the recent advances. IEEE Access 9, 23409–23423 (2021) 4. Zhu, J., Kaplan, R., Johnson, J., Fei-Fei, L.: Hidden: hiding data with deep networks. CoRR abs/1807.09937 (2018) 5. Mielikainen, J.: LSB matching revisited (2006) 6. Neeta, D., Snehal, K., Jacobs, D.: Implementation of LSB steganography and its evaluation for various bits. In: 2006 1st International Conference on Digital Information Management, pp. 173–178 (2007) 7. Li, Q., et al.: Image steganography based on style transfer and quaternion exponent moments. Appl. Soft Comput. 110, 107618 (2021) 8. Salah, K.B., Othmani, M., Kherallah, M.: A novel approach for human skin detection using convolutional neural network. Vis. Comput. 38(5), 1833–1843 (2021). https://doi.org/10.1007/s00371-021-02108-3 9. Othmani, M.: A vehicle detection and tracking method for traffic video based on faster R-CNN. Multimedia Tools and Applications (2022) 10. Fourati, J., Othmani, M., Ltifi, H.: A hybrid model based on convolutional neural networks and long short-term memory for rest tremor classification, pp. 75–82 (2022) 11. Benjemmaa, A., Ltifi, H., Ben Ayed, M.: Design of remote heart monitoring system for cardiac patients, pp. 963–976 (2020) 12. Baluja, S.: Hiding images in plain sight: deep steganography. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017) 13. Baldi, P.: Autoencoders, unsupervised learning, and deep architectures. In: Guyon, I., Dror, G., Lemaire, V., Taylor, G., Silver, D. (eds.) Proceedings of ICML Workshop on Unsupervised and Transfer Learning, Proceedings of Machine Learning Research, vol. 27, pp. 37–49. PMLR, Bellevue, Washington, USA (2012) 14. Duan, X., Jia, K., Li, B., Guo, D., Zhang, E., Qin, C.: Reversible image steganography scheme based on a u-net structure. IEEE Access 7, 9314–9323 (2019) 15. Hor´e, A., Ziou, D.: Image quality metrics: PSNR vs. SSIM. In: 2010 20th International Conference on Pattern Recognition, pp. 2366–2369 (2010)
262
M. Telli et al.
16. Ltifi, H., Benmohamed, E., Kolski, C., Ben Ayed, M.: Adapted visual analytics process for intelligent decision-making: application in a medical context. Int. J. Inf. Technol. Decision Making 19(01), 241–282 (2020) 17. Duan, X., Liu, N., Gou, M., Wang, W., Qin, C.: SteganoCNN: image steganography with generalization ability based on convolutional neural network. Entropy 22(10), 1140 (2020) 18. Das, A., Wahi, J.S., Anand, M., Rana, Y.: Multi-image steganography using deep neural networks. CoRR abs/2101.00350 (2021) 19. Kreuk, F., Adi, Y., Raj, B., Singh, R., Keshet, J.: Hide and speak: deep neural networks for speech steganography. CoRR abs/1902.03083 (2019) 20. Jaiswal, A., Kumar, S., Nigam, A.: En-VstegNET: video steganography using spatio-temporal feature enhancement with 3D-CNN and hourglass. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2020) 21. Kweon, H., Park, J., Woo, S., Cho, D.: Deep multi-image steganography with private keys. Electronics 10(16), 1906 (2021)
A Voting Classifier for Mortality Prediction Post-Thoracic Surgery George Obaido1(B) , Blessing Ogbuokiri2 , Ibomoiye Domor Mienye3 , and Sydney Mambwe Kasongo4 1
Center for Human-Compatible Artificial Intelligence (CHAI) - Berkeley Institute for Data Science (BIDS), University of California, Berkeley, USA [email protected] 2 Department of Mathematics and Statistics, York University, Toronto, ON M3J 1P3, Canada [email protected] 3 Department of Electrical and Electronic Engineering Science, University of Johannesburg, Johannesburg 2006, South Africa [email protected] 4 Department of Industrial Engineering, Faculty of Engineering, Stellenbosch University, Stellenbosch 7600, South Africa
Abstract. Thoracic surgery involves the surgical treatment of vital organs inside the thoracic cavity to treat conditions of the lungs, heart, trachea, diaphragm, etc. Such procedures are primarily classified as highrisk, and chances of mortality are increasingly high. Due to this challenge, post-surgical risk assessment is of crucial relevance. This study used an ensemble machine learning approach based on a voting mechanism using different base learners. We evaluated the performance of the voting methods with classification metrics, including accuracy, precision, recall, F-1 score, and AUC-ROC. The soft voting classifier showed better performance with 97.73% accuracy. Our study demonstrates the predictive power of machine learning algorithms in analyzing and predicting outcomes after thoracic surgery. Keywords: Thoracic surgery Ensemble methods
1
· Voting classifiers · Machine learning ·
Introduction
Thoracic surgery involves the surgical treatment of vital organs inside the thoracic cavity to treat conditions of the lungs, heart, trachea, diaphragm, etc. [11,14,23]. Common diseases that require thoracic surgery include coronary artery disease, chest trauma, lung cancer, emphysema and many others. According to [22], an estimated 530,000 thoracic surgical procedures are conducted annually in the United States by about 4,000 cardiothoracic surgeons. Around the world, nearly two million cardiac surgical procedures are often performed on patients annually [12]. These procedures have since increased due to c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 263–272, 2023. https://doi.org/10.1007/978-3-031-35501-1_26
264
G. Obaido et al.
the spread of the novel Coronavirus Disease 2019 (COVID-19) [5,6,24]. Despite successes and improvements in clinical outcomes, especially with patients undergoing cardiac procedures, morbidity and mortality remain common, and hazard is often highest in the post-operative period [29]. Due to their predictive power, there is a seemingly longstanding interest in applying machine learning to health care [1,2,15,17,26]. Even within cardiothoracic settings, especially post-surgical prognosis, there have been vast applications of machine learning to estimate surgical complication rates, with many unexplored areas. For example, [13] applied the ensemble machine learning model, XGBoost, to estimate mortality risk in 11,190 patients compared to the Society of Thoracic Surgeons state-of-the-art Predicted Risk of Mortality (STS PROM) score. Another study by [30] used machine learning classifiers, such as artificial neural network (ANN), na¨ıve Bayes (NB), logistic regression (LR) and a few other models for predicting long-term mortality and identifying risk factors in 7,368 patients. A recent study by [27] used machine learning models on 8,947 patients to predict patient outcomes associated with post-cardiac surgery. The study showed that machine learning models performed considerably better than the Society of Thoracic Surgeons (STS) risk model. The approach used in this work is based on a voting classification. This ensemble machine learning technique trains several ensemble models, ideally achieving better performance than any single model used in the ensemble. This is the main contribution of this work. The voting ensemble approach was used on multiple models, such as Logistic Regression, Support Vector Machine, Random Forest, XGBoost, and AdaBoost, to predict the risk of mortality post-thoracic surgery. The rest of this paper is organized as follows. Section 2 describes the methodology of the proposed study. Section 3 presents the results and highlights the potential of this study. Finally, Sect. 4 concludes the study and highlights potential future directions.
2
Material and Methods
This study uses voting classifiers with base learners to predict mortality in postthoracic surgery. The section commences with a dataset overview and an explanation of the chosen algorithms. Subsequently, we delve into the voting concepts and elaborate on various performance metrics. 2.1
Description of Dataset
The dataset used for the classification problem was obtained from the University of California machine learning (UCI) repository [3], and it contains information on the post-operative life expectancy of lung cancer patients. The dataset consists of 470 instances and seventeen (17) attributes, with a year-long risk period as the class. The class value is binary, denoted as Risk1Y, and represents the one-year survival period. The value is (T)rue if a patient died within a year
A Voting Classifer for Mortality Prediction Post-Thoracic Surgery
265
after surgery. It is worth noting that the thoracic dataset is imbalanced, with a sample distribution of 70 true and 400 false values. Out of the 17 attributes, 14 are nominal, and three are numeric columns, namely PRE4, PRE5, and Age at surgery. A complete description of the dataset is presented below: Table 1. Description of the post-thoracic dataset Variable Type
2.2
Description
DGN
Nominal Diagnosis after surgery
PRE4
Numeric Amount of air forcibly exhaled
PRE5
Numeric Volume that has been exhaled
PRE6
Nominal Zubrod rating scale
PRE7
Nominal Pain before surgery
PRE8
Nominal Coughing up blood (hemoptysis)
PRE9
Nominal Difficulty breathing (Dyspnoea)
PRE10
Nominal Cough before surgery
PRE11
Nominal Weakness before surgery
PRE14
Nominal Original tumour size
PRE17
Nominal Type 2 Diabetes Mellitus
PRE19
Nominal Myocardial Infarction up to six months
PRE25
Nominal Peripheral arterial diseases
PRE30
Nominal Smoking
PRE32
Nominal Asthma
AGE
Numeric Age
Risk1Y
Nominal one year survival period
Algorithms Chosen
The primary outcome of this study is based on the one-year risk factor of mortality after thoracic surgery. Based on the available data, we used the following machine learning models (base learners) to predict the likelihood of postoperative mortality after surgical procedures. We evaluated the prediction performance of the following models: Logistic Regression. Logistic Regression or LR is a technique used for binary or multinomial tasks based on one or more predictor variables [9,25]. The output usually has two possible classes (dichotomous), in which p captures the probability of the instance belonging to another class (risk). At the same time, 1 − p is the probability of belonging to the non-risk class. The base b and parameters βi are denoted as the log-odds, written as: π ) = β0 + β1 x1 + β2 x2 + ...βm xm (1) log( 1−π
266
G. Obaido et al.
Random Forest. Random forest or RF is an ensemble model where each decision tree in the ensemble is built by taking a training sample of data from the entire dataset [4]. It considers the information gain or Gini index to find feature subsets, train multiple decision trees and classify instances using majority voting. The mathematical equivalent for the random forest classifier is given as: nij = wi Cj − wlef t(j) Clef t(j) − wright(j) Cright(j)
(2)
ni j is the importance of node j, wi is the weighted number of samples, C is the impurity value of node j, lef t(j) is child node from left split and right(j) is the child node from right split. Support Vector Machine. The support vector machine (SVM) is a supervised learning algorithm used for classification and regression tasks [20,21]. Assuming we have a training set of N linearly separable examples and a feature vector x having d dimensions, for a dual optimization problem where α RN and y ∈ {1, −1}, the SVM’s solution can be minimized using: maximize a
n
n
αi −
i=1
1 αi αj yi yj (XiT , Xj ) 2 i=1
subject to αi ≥ 0 and
n
αi yi = 0
(3)
(4)
i=1
Adaboost. The adaptive boosting (AdaB) is an ensemble algorithm that constructs a strong classifier through linear combination of weak classifiers [16,17, 31]. Assuming D = {(x1 , y1 ), . . . , (xi , yi ), . . . , (xn , yn )} represents the training set, where yi represents the class label of sample xi , and yi = −1, +1, the weights are assigned to every example in D at each iteration [20]. Given t = {1, . . . , T } number of iterations and ht (x) weak classifiers that are trained using the base learner L, then the sample weight D1 and weight update Dt+1 are computed using: 1 (5) D1 (i) = , i = 1, 2, ..., n n Dt (i) Dt+1 (i) = exp(−αt yi ht (xi )), i = 1, 2, ..., n (6) Zt Here, Zt represents a normalization parameter and αt is the weight of the classifier ht (x). XGBoost. The extreme gradient boosting (XGB) is a decision tree-based ensemble learning algorithm that uses the gradient boosting framework [10,18, 20]. An integral part of the XGBoost algorithm is the introduction of a regularization term in the loss function to prevent overfitting: LM (F (xi )) =
n i=1
L(yi , F (xi )) +
M m=1
Ω(hm )
(7)
A Voting Classifer for Mortality Prediction Post-Thoracic Surgery
267
Here, F (xi ) denotes the prediction on the i − th sample at the M − th training cycle, L(∗) is the loss function, and Ω(hm ) is the regularization term. 2.3
Voting Mechanism
A voting classifier trains on an ensemble of machine learning models and predicts an output based on the highest probability of the chosen class [7]. Voting is an ensemble machine learning approach combining several base classifiers to solve a problem. Voting ensemble methods are based on two variations: hard voting and soft voting. Hard Voting. In hard voting, the predicted output is based on the highest majority of votes. Here, yˆ is the class label based on the voting of each classifier, Cj . (8) yˆ = mode{C1 (x), C2 (x), ..., Cm (x)} Soft Voting. In soft voting, the predicted output is based on the predicted probabilities p for the classifiers. yˆ = argmaxi
m
wj pij
(9)
j=1
Here, wj is the given weight assigned to the ith classifier. 2.4
Performance Metrics
The mathematical representation of the performance evaluation metrics is shown below: TP + TN (10) Accuracy = TP + TN + FP + FN TP (11) Sensitivity(Recall) = TP + FN TP (12) P recision = TP + FP P recision ∗ Recall F 1 − score = 2 ∗ (13) P recision + Recall True positive or TP happens when a positive test sample is classified as positive, while false positive or FP is when a negative test sample is classified as positive. True negative (TN) occurs when a negative test sample is correctly classified as negative, while false negative (FN) occurs when a positive test sample is incorrectly classified as negative [16,18,19]. Area under the curve (AUC) - receiver operating characteristic (ROC) curves are used to evaluate the classification models’ true positives versus false positives for different thresholds.
268
3
G. Obaido et al.
Results and Discussion
This section presents the experimental results of this work. The machine learning models were built using the scikit-learn package in Python 3, a library predominantly used for regression, classification, and clustering tasks [28]. The performance of this study was evaluated using sensitivity, specificity, accuracy, and AUC. The experimental results of the classifiers were obtained using the 10fold cross-validation procedure. Also, for optimal performance, hyperparameter tuning was carried out on the base learners. The performance of the classification models was measured using the AUCROC, as seen in Fig. 1. The RF classifier achieved the highest performance at 99.43%, followed by AdaBoost at 99.28%, XGBoost at 99.15%, LR at 98.38%, and SVM at 95.63%. This result indicates how the model performed by distinguishing between the risk and non-risk classes. We can see that the higher the AUC, the better the model performs at distinguishing between the chances of mortality associated with postoperative thoracic surgery. In Table 1, we present the performance of the classifiers on the testing data. We selected all classifiers, including LR, RF, SVM, AdaBoost, and XGBoost, and combined them into the hard and soft voting schemes. Since all models had an accuracy higher than 90%, we can conclude that the classifiers performed exceedingly well on the data. Other metrics, such as precision, recall, F-measure, and AUC, showed better outcomes, similar to the accuracy result, as depicted in Fig. 2 (Table 2).
Fig. 1. ROC-AUC Curve of the models
A Voting Classifer for Mortality Prediction Post-Thoracic Surgery
269
Fig. 2. Performance of the model on testing data
Soft voting is the model with better performance compared to the other classifiers. Using the F1-score, which accounts for both precision and recall, we can see that soft voting had a good F1-score of 97.51%, followed by LR with 96.71% and SVM with 93.72%. We can observe that the soft voting model can distinguish between the risk and non-risk classes. Furthermore, Hard voting performed quite well on the data with an accuracy of 95.72%, 96.40% precision, 91.77% recall, and an F1-score of 92.62%. Our results are consistent with those of [8], where the soft voting method and other machine learning classifiers improved performance and were used to predict the long-term risk associated with hypercholesterolemia (high cholesterol). Table 2. Cross-validation scores of the models Base Learners
Accuracy (%) Precision (%) Recall (%) F1-score (%)
Logistic Regression
96.97
97.17
94.48
96.71
Random Forest
94.72
95.59
86.92
91.92
Support Vector Machine 95.72
93.24
92.42
93.87
Adaboost
93.46
91.20
91.08
89.30
XGBoost
94.22
94.26
89.70
90.60
Soft Voting
97.73
98.56
95.17
97.51
Hard Voting
95.72
96.40
91.77
92.62
270
4
G. Obaido et al.
Conclusion and Future Work
This study presents the use of supervised machine learning models for predicting the risk of mortality associated with post-thoracic surgery. This study can be beneficial for healthcare professionals and clinical experts to identify risk factors common in post-thoracic surgery and provide interventions. Thorough data exploration may also help find associations among features that signal mortality likelihood post-thoracic surgery. Experimental results showed that soft voting, alongside base classifiers such as logistic regression, support vector machine, random forest, AdaBoost, and XGBoost, achieved better performance with an accuracy of 97.73%, a precision of 98.56%, recall of 95.17%, and F-measure of 97.51%. These results constitute a candidacy for mortality prediction post-thoracic surgery. Although the available data used for this study was relatively small, we believe that these results could improve. Machine learning classifiers thrive on large datasets, which can considerably improve the results and applicability of the models. Future aspects of this work could explore deep learning algorithms using Long Short-Term Memory (LSTM) for time-series analysis. Applying such an approach may be helpful for predicting the spatiotemporal risk of mortality common in post-thoracic surgery.
References 1. Aruleba, K., et al.: Applications of computational methods in biomedical breast cancer imaging diagnostics: a review. J. Imaging 6(10), 105 (2020) 2. Aruleba, R.T., et al.: COVID-19 diagnosis: a review of rapid antigen, RT-PCR and artificial intelligence methods. Bioengineering 9(4), 153 (2022) 3. Asuncion, A., Newman, D.: UCI machine learning repository (2007) 4. Breiman, L., Cutler, A.: Random forests-classification description. Department of Statistics, Berkeley 2 (2007) 5. Chang, S.H., et al.: Thoracic surgery outcomes for patients with Coronavirus Disease 2019. J. Thorac. Cardiovasc. Surg. 162(6), 1654–1664 (2021) 6. Deng, J.Z., et al.: The risk of postoperative complications after major elective surgery in active or resolved COVID-19 in the United States. Ann. Surg. 275(2), 242 (2022) 7. Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000). https:// doi.org/10.1007/3-540-45014-9 1 8. Dritsas, E., Trigka, M.: Machine learning methods for hypercholesterolemia longterm risk prediction. Sensors 22(14), 5365 (2022) 9. Ebiaredoh-Mienye, S.A., Swart, T.G., Esenogho, E., Mienye, I.D.: A machine learning method with filter-based feature selection for improved prediction of chronic kidney disease. Bioengineering 9(8), 350 (2022) 10. Esenogho, E., Mienye, I.D., Swart, T.G., Aruleba, K., Obaido, G.: A neural network ensemble with feature engineering for improved credit card fraud detection. IEEE Access 10, 16400–16407 (2022) 11. Fischer, C., Silverstein, D.C.: Chest wall disease. Small Animal Critical Care Medicine p. 166 (2022)
A Voting Classifer for Mortality Prediction Post-Thoracic Surgery
271
12. Kang, H.C., Chung, M.Y.: Peripheral artery disease. N. Engl. J. Med. 357(18), e19 (2007) 13. Kilic, A., et al.: Predictive utility of a machine learning algorithm in estimating mortality risk in cardiac surgery. Ann. Thorac. Surg. 109(6), 1811–1819 (2020) 14. Hildebrand, F., Andruszkow, H., Pape, H.-C.: Chest trauma: classification and influence on the general management. In: Pape, H.-C., Peitzman, A.B., Rotondo, M.F., Giannoudis, P.V. (eds.) Damage Control Management in the Polytrauma Patient, pp. 79–95. Springer, Cham (2017). https://doi.org/10.1007/978-3-31952429-0 8 15. Mgboh, U., Ogbuokiri, B., Obaido, G., Aruleba, K.: Visual data mining: a comparative analysis of selected datasets. In: Abraham, A., Piuri, V., Gandhi, N., Siarry, P., Kaklauskas, A., Madureira, A. (eds.) ISDA 2020. AISC, vol. 1351, pp. 377–391. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-71187-0 35 16. Mienye, I.D., Ainah, P.K., Emmanuel, I.D., Esenogho, E.: Sparse noise minimization in image classification using genetic algorithm and densenet. In: 2021 Conference on Information Communications Technology and Society (ICTAS), pp. 103– 108. IEEE (2021) 17. Mienye, I.D., Obaido, G., Aruleba, K., Dada, O.A.: Enhanced prediction of chronic kidney disease using feature selection and boosted classifiers. In: Abraham, A., Gandhi, N., Hanne, T., Hong, T.-P., Nogueira Rios, T., Ding, W. (eds.) ISDA 2021. LNNS, vol. 418, pp. 527–537. Springer, Cham (2022). https://doi.org/10. 1007/978-3-030-96308-8 49 18. Mienye, I.D., Sun, Y.: Effective feature selection for improved prediction of heart disease. In: Ngatched, T.M.N., Woungang, I. (eds) Pan-African Artificial Intelligence and Smart Systems, PAAISS 2021, vol. 405, pp. 94–107. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-93314-2 6 19. Mienye, I.D., Sun, Y.: Improved heart disease prediction using particle swarm optimization based stacked sparse autoencoder. Electronics 10(19), 2347 (2021) 20. Mienye, I.D., Sun, Y.: A survey of ensemble learning: Concepts, algorithms, applications, and prospects. IEEE Access (2022) 21. Mienye, I.D., Sun, Y., Wang, Z.: Improved predictive sparse decomposition method with densenet for prediction of lung cancer. Int. J. Comput. 1, 533–541 (2020) 22. Moffatt-Bruce, S., Crestanello, J., Way, D.P., Williams, T.E., Jr.: Providing cardiothoracic services in 2035: signs of trouble ahead. J. Thorac. Cardiovasc. Surg. 155(2), 824–829 (2018) 23. Murphy, A.J., Talbot, L., Davidoff, A.M.: Mediastinum, lung, and chest wall tumors. In: Pediatric Surgical Oncology, pp. 97–112. CRC Press (2022) 24. Nguyen, D.M., Kodia, K., Szewczyk, J., Alnajar, A., Stephens-McDonnough, J.A., Villamizar, N.R.: Effect of COVID-19 on the delivery of care for thoracic surgical patients. Jtcvs Open (2022) 25. Nusinovici, S., et al.: Logistic regression was as good as machine learning for predicting major chronic diseases. J. Clin. Epidemiol. 122, 56–69 (2020) 26. Obaido, G., et al.: An interpretable machine learning approach for hepatitis b diagnosis. Appl. Sci. 12(21) (2022) 27. Park, J., Bonde, P.N.: Machine learning in cardiac surgery: Predicting mortality and readmission. ASAIO J. 10–1097 (2022) 28. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011) 29. Pienta, M.J., et al.: Advancing quality metrics for durable left ventricular assist device implant: analysis of the society of thoracic surgeons intermacs database. Ann. Thoracic Surgery 113(5), 1544–1551 (2022)
272
G. Obaido et al.
30. Tseng, P.Y., et al.: Prediction of the development of acute kidney injury following cardiac surgery by machine learning. Crit. Care 24(1), 1–13 (2020) 31. Ying, C., Qi-Guang, M., Jia-Chen, L., Lin, G.: Advance and prospects of adaboost algorithm. Acta Automatica Sinica 39(6), 745–758 (2013)
Hybrid Adaptive Method for Intrusion Detection with Enhanced Feature Elimination in Ensemble Learning S. G. Balakrishnan(B) , P. Ramya, and P. Divyapriya Department of Computer Science and Engineering, Mahendra Engineering College, Namakkal, Tamil Nadu, India [email protected]
Abstract. The avoidance of leaking of facts has been identified as a system that detects sensitive statistics, records the stats in an effective manner in which it travels around in a business organization of any unwanted disclosure of statistics. As personal information is capable of remaining on quite a variety of computer gadgets and crossing through countless networks, the right of entry to factor is granted for social networks. Email leakage has been defined as though the email goes either intentionally or unintentionally to an address to which it can no longer be sent. The strategy or product that seeks to minimize risks to statistical leakage is data outflow protection. In this article, the clustering approach would be blended with the time span frequency. To determine the relevant centroids for evaluating the variety of emails that are exchanged between organizations participants. Each participant would lead to a variety of topic clusters, and innumerable contributors in the company who have not spoken with each other previously will even be included in one such subject cluster. Every addressee would be classified as a feasible leak receiver and one that is legal at the time of composition of a new electronic mail. Once, this grouping was grounded solely on the electronic mail received between the source and the recipient and also on its subject clusters area. In this context K-means clustering concept, clustering of Tabu K-means and the FPCM algorithm were considered to perceive the most successful clustering points. In the research observations, it was verified that for known and unknown addressees, the suggested solution achieves greater TPR. This reduces the FPR for well-known and unidentified receivers. Keywords: K-means clustering · FPCM algorithm · TPR and FPR · Tabu k-means
1 Introduction Social networks are constructing on-line communities of human beings who share pastimes and activities or who are interested in exploring the interest and activities of others. It is a pressure that has chanced several factors of the manner we alive. If one was to evaluate such pitches as Scientific, Medicine, Crypto, service, Business, finance, Architectures and Engineering. That have an effect on of social community throughout the © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 273–280, 2023. https://doi.org/10.1007/978-3-031-35501-1_27
274
S. G. Balakrishnan et al.
past some decades has been massive. The manner these arenas function currently is immensely distinct from the approaches they functioned in the historical. Most social network offerings are internet based totally and supplied a variety of approaches for user to interact, such as electronic mail and immediately messaging services. Currently, corporations are affected by documents that are open to unauthorized incidents. Additionally, these data drips can consequence in destruction in diverse ways. If exclusive documents are handled in an inappropriate way, the rules of the authority may be broken and fines may follow. For the release of evidence on employees or consumers, such as its social safety records or savings cards, the groups are held to be accurate. In addition, where the loss of proprietary statistics is to rivals, it can also result in market failure and may pose a risk to the corporate company. A breach of records entails the introduction of several untrusted touchy records that can either be deliberate or accidental. There are excellent carriers selling merchandise to avoid leakage of facts, but scholarly study on the subject is distinctly unusual. Most risks to privacy security are being triggered by data leakage, and according to intelligence gathered from survey studies [1]. Such internal dangers include the degree of unintentional leakage of private or touchy information, anyone else leakage of intellectual property fraud, multiple thefts concerning customer information or financial details or both. In comparison to external risks, there is also a consent of administrations showing harm from inside fears to be more severe. Because modern business practices are focused on the tremendous use of electronic mail, an electronic mail drip of incorrect beneficiaries has now arisen as very important inflicting serious harm, resultant in a problematic that is very stressful for the administrations [2]. There have been numerous alternatives that are being established to examine email trade to keep people from being sent to the incorrect addresses, but have unsuccessful to provide a pleasant result. There are a variety of e-mail addressing errors that stay unobserved and the exact beneficiaries are mistakenly labelled as an addressing fault in countless situations.
2 Related Works Shet et al. [3] discussed the prototype algorithm uses the TS optimizing residences with local search capacities and thereby increases the clustering. A dimensional Tabu space in the solutions received for each iteration is used in this algorithm. Additionally, the object with the minimum radius and a yet another cluster may be picked by the multiple objects that not located in the Tabu space. Alsa et al. [4] proposed a situation utilized for undertaking of recognition of communities the use of clustering information from major sources of social statistics. The suggested architecture makes use of the Optimized Cluster and Distance Genetic Algorithm K-Means clustering algorithm. The proposed structure overcomes problems in widely distributed K-Means and the selection of simple centroids is satisfactory Genetic Algorithm and maximizes the remoteness detected by OCD approaches pairwise for attainment precise clusters. Zilb et al. [5] designed strategy for the avoidance of email errors. In this approach, the study of mail change within the company is focused on mail change and also the recognition of friends who change emails with shared topics. At the time of the compliance section, each associates subjects were used to uncover some future leakage. The
Hybrid Adaptive Method for Intrusion Detection
275
receiver of each and every email is evaluated at the period a new electronic mail is produced and is to be delivered. Nevertheless, a key obstacle is avoiding email leakage from documents at the time a letter is mistakenly addressed to unintended beneficiaries. This has ended up being a very widespread problem that can harm both person and company. Caryet al. [6] opportunity to redefine this as an outlier discovery initiative that outliers all unintentional receivers. After that, all actual e-mail is mixed and even specific of the leak beneficiaries which are accurately simulated to examine textual network trends associated with e-mail leaks. The technique may discriminate between mail drops for look instances and outperform existing techniques. Kaly et al. [7] Once simple and scheme of analysis in a place to note leakage of data, the use and measurement of mail pattern were coined ideas of deciding on input variables specific to this domain. This methodology used real-life emails gathered from organisations affiliated with economics. Choosing these factors in a very appropriate way to recognize mail habits and identify environmental breaches. Shva et al. [8] for the DLP structures, the novel graph approach based on a concept of contextual credibility. The work used a contextual honesty structure to define the sequencing of all permissible flows for the overview real-world correspondence trade to the properly demarcated drift of documents under which the security insurance policies labeled the streams of all permissible flows. Emails from the gateway were misdirected by contemporary techniques. Pu et al. [9] a new form of unjustified e-mail identification was proposed, which was totally grounded on the multiple attributes that are implemented on the server side. The distinct features are e-mail information, meta-data and social association fingerprinting from the center of a SVM classification algorithm.
3 Methodology To perform effective data overflow and in order to overcome the limitation, the existing methodology is discussed. 3.1 Hierarchical Agglomerative Clustering With destructive techniques that divide x instances into finer sets, the agglomerative approaches that render a pair of amalgamates of x cases into certain regular pairs. Practically speaking, however, the hierarchical agglomerative clustering used by defining the clusters in a way that directs them. Popular norms for hierarchical agglomerative clustering included the sums of squares within the group together with the least possible distance between sets that executes a single-link approach [11]. The hierarchical agglomerative clustering building a tree T distributed over even a dataset dependent on the dependency function is iterative. The association functionality would determine the exact fusion of two one-of-a-kind nodes that fit clusters through data points that are warehoused on the constituent leaves. For any data point, this is activated also by existence of just a node. This takings in a new sequences of rounds. Two nodes reducing linkage characteristics will be combined with each hierarchical agglomerative clustering round, thereby rendering them siblings to each other and forming new nodes as
276
S. G. Balakrishnan et al.
parents in combination. But after last combination, this algorithm is terminated and the true root for the tree is created [12]. Convenient and easy computing and execution are the benefits of this hierarchical agglomerative clustering process. A reduced number of limits and additional degrees of versatility. Significant resource specifications for cluster organization [13]. It operates without the necessity to re-cluster or upgrade the network daily. They do not need any further quarterly updates now. 3.2 K-Means Clustering Algorithm K-Means would be a right simplistic, well-known algorithms that is unsupervised to crack clustering challenges. It codes the datasets by describing the number of N clusters which have been patched a priori. There is a consequent centroid with each cluster, because this is the center point held inside of the trigger point. The main perception here is to divide the feature vector samples into N corporate entities where each entity does have common features towards the other object classes on a alike cluster, maximises inter cluster intervals and asynchronously reducing intra-cluster intervals. Such an algorithm starts while consuming a arbitrary set of the early N center points of a cluster that denotes the advanced cluster centers. At first, by each entity their distance to each cluster would be determined and this is correlated with its nearest centroid. N- new centroids are recomputed until this is achieved. Both procedures were repeated iterative manner until centroids which do not interchange unless a determinate set of repetitions is achieved are further allocated. In order to measure the distance amongst instances and its clusters, it optimally targets objective functions. K-Means algorithm is very prominent as it is easy to implement, concise and quick to bend. While it is beneficial for a good scale, it suffers from some limitations. It is necessary to properly accept the number of its N clusters in detail. The K-means objective function is no longer curved and might even be curved additionally comprise some neighborhood. Its effectiveness is considered in the current centroids and is therefore very vulnerable to outlier and noise. Data clustering seems to be no longer quite suitable for the density of the clusters. It cannot be used to measure the set of common data and analysis but are limited to numerical data only. 3.3 Tabu K-Means Clustering Algorithm This implies a meta - heuristic approach used for a wide variety of optimization problems. It is an approach to a single-solution neighborhood pursuit that uses versatile memory to prevent the neighborhood optimum from getting stuck. The concept of this approach used to be to hunt a pursuit at the time of chance upon neighborhood optimum. Stirring again to the options visited in the past. The need for a memory known as the Tabu list was being forbidden. This listing tracks all recent movements and correctly guides the search. With a very efficient method to conquer all unnecessary obstacles, called the motivation specifications, which lets in retracting uninvited. Added unpretentious desire criteria were described as the go that had a goal value that was once well equated to the modern-day solutions. There have been a range of processes for halting the search procedure and set with pre-defined repetitions is completed, the target price may be generally accepted as a value that is significantly smaller than every limited threshold. The classification scheme is based on a shift in e-mail traffic. It has been believed that
Hybrid Adaptive Method for Intrusion Detection
277
a portion of any customer may be a range of corporations which work on numerous dissimilar subjects. In the subsequent phase, each and instead every new e-mail being sent would be examined by each e-mail intended recipient and did check to see whether every other addressee and sender is a normally utilized item clique. In the event that around is not a group, it come to the decision that there is no frequent subject matter for mutually users. At that point the recipient referred to above is now not exact. If not, the e-mail and its contented material will be in contrast to the electronic mail content material been swapped. The innovative classification framework has two phases, the learning and the classification phase. Training are being utilized on a innovative pair of electronic mail and would be labeled to be used for the new e-mails organised that are characterized as queries. 3.4 Fuzzy Based Algorithm of Possibilistic Clustering The fuzzified system of the k-means algorithm is FCM which is called Fuzzy C Means. Which is a clustering tactic which cards one piece of information to resemble many clusters and comprehensively utilized in pattern recognition. Such a framework seems to be an iteratively clustering process that gives an efficient c partition through limiting its sum of squared error weighted within the group. The data set within the p-dimensional vector space is symbolized as the p, c which denotes the number of clusters data items. The C centers of the area clusters, vi represents the p-dimension middle of the cluster and D2(xj, vi) designates the planetary between object, centre cluster. The prospect diverse value of preparation model xji suggested as cluster. The premium aspect and it is measured the possibilistic restriction. The PCM is likewise distinct from other cluster performances focused on initialization. In furthermost cases, PCM approaches, the clusters would not have further mobility, because all data points are classified with only single cluster at the time without maintain successful well all clusters. A proper initialization is therefore needed for the algorithms to unite to a nearly global minimum. And the features of together fuzzy and potential strategies for possibilistic c-means are combined. Associations and characteristics are very substantial characteristics for the precise function of data core structure of clustering complexity. FPCM assembles memberships and opportunities simultaneously, just like all clusters common point iterations or cluster centers. FPCM is the synthesis of PCM and FCM that often neglect numerous PCM, FCM and FPCM problems. FCM’s noise reduction deficiency is eliminated by PFCM, which prevents PCM’s problem with concurrent clusters. Yet noise data governs the assessment of centroids. Henceforth a fresh parameter weight is more to every vector that gives an enhanced clustered data. The subsequent equation is represent to calculate the weight xj − vi 2 (1) wji = exp[− 2 c n * j=1 xj − v¯ n
4 Experimental and Results In this segment, the hierarchical agglomerative clustering, K-means clustering and Tabu K-means clustering approaches are used. Table 1 and Table 2 shows the performance of
278
S. G. Balakrishnan et al. Table 1. Performance Comparison of True Positive Rate
Email Data
Hierarchical Agglomerative Clustering
K-Means Clustering
Tabu K-Means Clustering
Fuzzy Possibilistic Clustering Algorithm
Known Addressee
0.8231
0.8377
0.8468
0.8924
Unknown Addressee
0.8407
0.8466
0.8642
0.9252
Table 2. Performance Comparison of False Positive Rate Email Data
Hierarchical Agglomerative Clustering
K-Means Clustering
Tabu K-Means Clustering
Fuzzy Possibilistic Clustering Algorithm
Known Addressee
0.1593
0.1534
0.1358
0.1076
Unknown Addressee
0.1769
0.1623
0.1532
0.7480
various methods and shows the TPR and FPR for well-known and unknown addressee obtained.
Fig. 1. Performance Comparison of True Positive Rate
Hybrid Adaptive Method for Intrusion Detection
279
From the Fig. 1 it can be detected that the fuzzy clustering has upper true positive rate for known and unknown addressee for hierarchical agglomerative clustering, K-means clustering and Tabu K-means clustering.
Fig. 2. Performance Comparison of False Positive Rate
It can be perceived from Fig. 2 that the fuzzy clustering is higher false positive rate for well known and unknown addressee for hierarchical agglomerative clustering, K-means clustering and Tabu K-means clustering.
5 Conclusion The risk to the data outflow has evolved to be an imperative security problem for the organizations as the outflow events expand and their corresponding costs. Outflow techniques are used to predict outflows. The standard technique is focused on clustering the circumstance of organizations into many lessons in order to confirm that the artifacts in each grouping or cluster are exact close to the criteria rules. K-means algorithm the crucial method or partition clustering, which is widespread because of the easiness of its calculation, is the K-means algorithm. And the fuzzy clustering algorithm was built in this paper to swap the originator to prevent all cases from picking it out as the core of the cluster. In which, algorithm often used minimize every specific case to done for improved clustering would be Fuzzy Possibilistic Clustering. The findings have shown that there would be a higher degree of fuzzy clustering. True Positive Rate which is for the recipient and unknown recipient when compared with hierarchical agglomerative clustering, K-means and Tabu K-means methods.
References 1. Shvartzshnaider, Y., et al.: VACCINE: using contextual integrity for data leakage detection. In: The World Wide Web Conference, pp. 1702–1712. ACM, May 2019 2. Pu, Y., Shi, J., Chen, X., Guo, L., Liu, T.: Towards misdirected email detection based on multi- attributes. In: 2015 IEEE Symposium on Computers and Communication (ISCC), pp. 796–802. IEEE, July 2015
280
S. G. Balakrishnan et al.
3. Shetty, J., Adibi, J.: The Enron email dataset database schema and brief statistical report. Inf. Sci. Inst. Techn. Rep. Univ. South. Calif. 4(1), 120–128 (2004) 4. Tu, Q., Lu, J.F., Yuan, B., Tang, J.B., Yang, J.Y.: Density-based hierarchical clustering for streaming data. Pattern Recognit. Lett. 33(5), 641–645 (2012) 5. Yadav, N., Kobren, A., Monath, N., McCallum, A.: Supervised hierarchical clustering with exponential linkage. arXiv preprint arXiv:1906.07859 (2019) 6. Dang, N.C., De la Prieta, F., Corchado, J.M., Moreno, M.N.: Framework for retrieving relevant contents related to fashion from online social network data. In: de la Prieta, F., et al. (eds.) PAAMS 2016. AISC, vol. 473, pp. 335–347. Springer, Cham (2016). https://doi.org/10.1007/ 978-3-319-40159-1_28 7. Raman, P., Kayacık, H.G., Somayaji, A.: Understanding data leak prevention. In: 6th Annual Symposium on Information Assurance (ASIA 2011), p. 27, June 2011 8. Yu, X., Tian, Z., Qiu, J., Jiang, F.: A data leakage prevention method based on the reduction of confidential and context terms for smart mobile devices. Wirel. Commun. Mob. Comput. 2018 (2018) 9. Yaghini, M., Ghazanfari, N.: Tabu-KM: a hybrid clustering algorithm based on Tabu search approach (2010) 10. Alsayat, A., El-Sayed, H.: Social media analysis using optimized K-means clustering. In: 2016 IEEE 14th International Conference on Software Engineering Research, Management and Applications (SERA), pp. 61–66. IEEE, June 2016 11. Zilberman, P., Dolev, S., Katz, G., Elovici, Y., Shabtai, A.: Analyzing group communication for preventing data leakage via email. In: Proceedings of 2011 IEEE International Conference on Intelligence and Security Informatics, pp. 37–41. IEEE, July 2011 12. Carvalho, V.R., Cohen, W.W.: Preventing information leaks in email. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp. 68–77. Society for Industrial and Applied Mathematics, April 2007 13. Kalyan, C., Chandrasekaran, K.: Information leak detection in financial e-mails using mail pattern analysis under partial information. In: AIC 2007: Proceedings of the 7th Conference on 7th WSEAS International Conference on Applied Informatics and Communications, pp. 104– 109, August 2007
Malware Analysis Using Machine Learning Bahirithi Karampudi, D. Meher Phanideep, V. Mani Kumar Reddy, N. Subhashini(B) , and S. Muthulakshmi Vellore Institute of Technology, Chennai, India [email protected]
Abstract. Malware is a malicious software that attempts to steal or infect a code or file as per the intention of the attacker. Effective detection methods are essential for security and malware avoidance. Consider that the performance of a decision tree algorithm is low in a static technique, but in a hybrid method, the efficiency percent is enhanced in peak mode. As a result, improving the performance of existing procedures is critical, and it is significantly easier than establishing new methods. Therefore, the Ada boost classifier is utilized to increase the efficiency of the current model hybrid approach. In different areas, the hybrid approach integrates two or more ML algorithms with the support of various optimization methodologies. Ensemble learning methods are necessary to explore hybrid approaches. By mixing several learners, the ensemble learning technique allows you to increase the performance of machine learning systems. This model produces results that are more efficient and accurate. Decision Tree classifier, Gaussian Naïve Bayes, Random Forest classifier and Linear SVM algorithms are used to analyze the malware content in the dataset and compared the performance among them. Algorithms that yielded the least accuracy are Ada boosted again to increase the performance with the least false positives which boosts the accuracy of Decision Tree and Gaussian Naive Bayes methods to 98.623% and 79.607% respectively which were previously 71.322% and 32.245% respectively. Linear SVM yields an accuracy of 96.068% and the Random Forest classifier yields 98.455% without using the Ada boost technique. Keywords: Machine Learning · Signature-based detection · Behavior-based system · Decision Tree Classifier · Linear SVM Classifier
1 Introduction Malware detection is essential for obtaining early warnings about Malware and cyberattacks on computer security. It helps us prevent hackers from obtaining access to our networks and protects our sensitive data. Malware detection software detects and prevents such malware and intrusions from accessing our systems. This detection tool is essential for countering Malware without displaying any information or data stored on PCs. Malware analysis is also considered vital for any crimeware analysis that is necessary for an organization. While firewalls and anti-malware softwares are important, they are © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 281–290, 2023. https://doi.org/10.1007/978-3-031-35501-1_28
282
B. Karampudi et al.
not always enough to limit access. Malware analysis should be undertaken to further update current tools, monitor Malware activity, and know how Malware samples are collected to identify any Malware. Malware path analysis is used in most malware detection solutions since malware behavior analysis only predicts if the infection is an existing one or a freshly produced approach by comparing its characteristics to previous attacks. The traditional detection methods, like signature-based detection, work by creating a profile or signature of the hazard and storing it for future reference. Similarly, we examine Malware paths using Linear SVM classifiers and Decision tree classifiers. M. Ficco et al. [1] suggested a system that features an ensemble detector that makes use of the diversity of detection methods, ensemble detector, and malware activation techniques as the primary analytical algorithms. The article outlines various strategies for combining generic and specialized detectors in the most effective way. These strategies can be utilized to rise the unpredictability of the detection strategy, and also raise the detection rate in the presence of unknown malware families and yield better performance without the need for ongoing detector retraining to keep up with malware evolution. The study also offers an alphacount method which involves in the analysis of the time window’s length’s impact on the speed and accuracy of various combinations of detectors while analyzing malware. The Ada boost technique was used to launch hybrid approaches of Decision tree classifier and Linear SVM classifier algorithms to improve the performance of present techniques. Alireza Souri et al. [2] summarized the existing issues associated with malware detection methodologies in data mining. In addition, essential features of malware classification methodologies in data mining were examined. When compared to a static technique, it results in a significant increase in Algorithm performance efficiency. For improved accuracy, the ensemble technique combines weak learners and develops strong learners using a range of machine learning algorithms. Abijah Roseline et al. [3] introduced a malware detection method called Hybrid stacked multilayered ensembling. When compared to current deep learning models, hybrid stacked multilayered ensembling is a more robust and an effective approach. The recommended approach performs effectively for both small-scale and large-scale data due to its adaptive nature of automatically altering parameters (number of consecutive levels). With an accuracy of 98.91%, the suggested approach beats machine learning and deep learning models. Boosting algorithms have recently emerged as incredibly popular methodologies in the area of machine learning and Data science. To increase algorithm performance and accuracy, boosting approaches are necessary. It works by combining many low-accuracy models and improving precision. To increase the precision of existing methods, machine learning techniques such as AdaBoost, Gradient Boosting, and XGBoost are applied. Bagging combines a number of jobs into a single event at once by reducing the volatility of the event. The stacking method aggregates many basic classification model predictions of an event into a single data set, which is subsequently utilized as an input for additional classification events. In this paper we considered Decision Tree classifier, Gaussian Naive Bayes, Random Forest classifier and Linear SVM algorithms since they yield high accuracy and are one of the best classification algorithms. The Literature Survey is explained in Sect. 2 of this paper. The technique and criteria used to evaluate the suggested work are discussed in Sect. 3. The outcomes of the proposed work are
Malware Analysis Using Machine Learning
283
analyzed in Sect. 4, and the article is concluded and recommendations for next action are given in Sect. 5.
2 Literature Survey Kamalakanta Sethi et al. [4] recommended a model employing a machine learning analysis framework in their study. As selection methods, Chi-Square and Random Forest were utilized. For dynamic analysis, they used the Cuckoo sandbox to find and examine malware in a secure isolated environment. Gurdip Kaur et al. [5] offered nine major types of identifying ransomware samples in both dynamic and static analysis. Cuckoo’s sandbox isolated environment is employed since dynamic analysis may infect the computer. Without executing the malware, reverse engineering was used to understand its behavior. Sagar Sabhadiya et al. [6] suggested an approach based on deep learning for malware detection in Android devices by employing techniques such as Droid Detector, Droid Deep Learner, Maldozer, Droid Delver, and Deep Flow. For static analysis, MalDozer employs a convolution neural network, while infection detection analysis uses API method calls. Ryo Ito et al. [7] suggested a malware detection method that blends NLP algorithms with ASCII strings in the hopes of removing uncommon terms to improve the detection rate. They made use of a dataset provided by FFRI that contained over 23,000 malware samples (more than 2,100 malware families), as well as over 16,000 innocuous files downloaded from “download.cnet.com”. The suggested approach identifies unknown malware with excellent accuracy, according to the results. Shubham Chaudhary et al. [8] proposed a malware detection method based on algorithms such as Principal Component Analysis and Support Vector Machine along with software and libraries such as Python, Panda, Scikit-learn, and Eclipse. Different malware and benign that had previously been detected from various sources were gathered and subjected to feature extension. During validation, the suggested model obtained an accuracy of 97.75%, with 97% precision, 99% recall, and an f1 score of 0.98 for genuine malware. The main disadvantage of this suggested approach is the small dataset. As a result, the issue of underfitting and outlier noise occurs. Zheng Qin et al. [9] suggested an opcode-based malware detection methodology and realistic implementation. The hierarchical clustering approach was used to display the impacts of data and trends in malware detection, the findings were cross-checked and the accuracy of four clustering algorithms was evaluated. Refik Samet et al. [10] examined several malware detection algorithms. A consolidated study on several ML algorithms was undertaken, and the accuracy was evaluated using a reverse engineering technique. Numerous methods for various areas of malware detection, such as signature-based, heuristic-based, behavior-based, and so forth, were also mentioned. Arkajit Datta et al. [11] compared various malware analysis tools and concluded each malware has a different measure in validating malicious code. Finally concluded that among all tools integer analysis software gives the best results. Tebogo Mokoena et al. [12] compared different techniques of static and dynamic analysis to reduce the risk for organizations’ systems. Comparing both analyses with different data sets that can cause damage to enterprises and concluded that dynamic is a better analysis method. Ekta Gandotra et al. [13] provided a survey work on various papers with separate classifications for dynamic and static analysis methods and summarized
284
B. Karampudi et al.
them in a table for better comparison. F. Mira et al. [14] used a model where the Longest Common Subsequence Algorithm and the Longest Common Substring Algorithm were tested in an application. The observed findings demonstrate that the suggested method outperforms competing approaches that utilize API call sequences. In terms of detection, LCSS was outperforming LCS.
3 Methodology Machine learning may be used to analyze malware using many traditional categorization approaches. The dataset is initially pre-processed to eliminate impurities. Then we have used different types of algorithms such as Decision Tree classifier, Gaussian Naïve Bayes, Random Forest classifier and Linear SVM algorithms and calculated the accuracy among them. Finally, the algorithms that yielded lower accuracy are Ada boosted to increase their performance. 3.1 Algorithms Used 3.1.1 Decision Tree Classifier The root node of a decision tree is used to forecast the data sheet algorithm’s class. It compares the algorithm to the values of the data set’s root attribute and advances to the next node depending on the comparison and the branch. The criteria for attribute selection are based on information gain and Gini index. Information Gain is the computation of how much information an ado feature provides about a class based on changes in entropy following data set segmentation based on an attribute. Gini Index is an impurity and purity metric used in the decision tree creation process. The complete data set is delivered to the Root node, which divides it into decision nodes; yes, it is input if the data demands further division; otherwise, the decision node splits into Leaf nodes. 3.1.2 Linear SVM Classifier The Support Vector Machine, which is frequently used for Classification and Regression problems, is the most popular Supervised Learning technique. The SVM technique seeks to establish a decision boundary that can categorize an n-dimensional space into different groups, enabling the new data point to be assigned to the appropriate category. A hyperplane is another name for this decision boundary. Because it picks extreme points to form a decision boundary, SVM is referred to as a Linear SVM classifier. Multiple lines or decision boundaries can be used to separate classes in multidimensional space, however choosing the appropriate decision boundary aids in exact classification. The optimal line or decision boundary is referred to as a hyperplane. Support vectors are data points or vectors that are near to the hyperplane and have an impact on its location. 3.1.3 Random Forest Classifier Random forest is a Supervised Machine Learning Algorithm frequently used in regression and classification issues. This model constructs decision trees using several samples
Malware Analysis Using Machine Learning
285
and uses a larger part for classification and mean for regression. An important characteristic of the Random Forest Algorithm is that datasets with both contiguous and categorical variables can be handled, as in classification and regression. It outperforms other algorithms in categorization tasks. 3.1.4 Gaussian Naive Bayes The Naive Bayes family of supervised machine learning classification algorithms is built on the Bayes theorem. It is an enhanced categorization approach with a lot more power. It is essential when the input proportionality is high. It might also be used to resolve challenging categorization problems. Because the existing models do not provide the needed accuracy and performance, a hybrid method with an Ada boost classifier is utilized to increase the performance and accuracy of the existing models. To generate accurate forecasts of rare predictions, the basic Ada boost classification approach begins with training data sample iteration. It begins by collecting a data set, selecting a set of training data based on accuracy, and adding a larger weight to wrongly or erroneously predicted samples in the following iteration to boost classification likelihood. It also gives more weight to classifier accuracy. The procedure is continued until the sampling error reaches zero. 3.2 Methodology Dataset that we have used in our model are downloaded from Kaggle website. The dataset contains 19611 rows and 79 columns. Each row contains the URL or information of the malware site or exe files. Columns contains the information about the vulnerable features that the particular website usually shows and its intensity values. Various conventional classification techniques may be used to analyze malware using machine learning. We are employing a hybrid strategy to improve forecast accuracy. For classifying data, bagging and boosting approaches are the greatest examples of hybridization techniques. To remove contaminants, the dataset is initially pre-processed. The cleaned data is then sent into the Ada-boost Classifier. This classifier is essentially a hybrid classifier. We do not require every column in the dataset. The most responsive aspect of the data is taken into account, and a data frame is generated. The data is utilized to build and train a classifier model. Now that the model has been trained, it should be able to predict the righteous class. The working algorithm of the proposed model is shown in the flowchart in Fig. 1. There are several performance measures available to assure the proper operation of a trained machine. 3.3 Simulation Metrics Confusion Matrix, True Positives, True Negatives, False Positives, False Negatives, Precision, and F1 Score are simulation metrics that are being taken into consideration for the proposed work. A classification model’s performance based on a set of test data with known true values is described by a confusion matrix. The associated language can be perplexing however, the confusion matrix itself is straightforward to grasp. It is a much better technique to evaluate a classifier’s performance. The goal is to determine
286
B. Karampudi et al.
Fig. 1. Working Algorithm of the proposed model
how many occurrences of class A (malware positive) are classed as class B. (malware negative). When the model accurately predicts the positive class, the result is a true positive and when it accurately predicts the negative class, the result is a true negative. An outcome when the model predicts the positive class inaccurately is known as a false positive. And an outcome when the model predicts the negative class inaccurately is known as a false negative. Precision is a statistic that counts the number of positive forecasts that are right. Precision is computed in an unbalanced classification issue with two classes as the number of true positives divided by the total number of true positives and false positives TP . TP stands for the number of true positives, whereas FP stands for the number i.e., TP+FP of false positives. In binary classification statistical analysis, the F-measure or F-score measures how accurate a test is. It is calculated using the test’s precision and recall, where recall is the number of samples correctly identified as positive divided by the total number of samples that should have been identified as positive, and precision is the number of true positive results divided by the total number of positive results, including those that were incorrectly identified. The F1 score is calculated as the harmonic mean of accuracy and recall. The more general display style F beta score includes extra weights, favoring accuracy or recall over the other. f 1 − score = where
TP TP+FN
is recall and
TP TP+FP
2 ∗ (recall ∗ precision) (recall + precision)
is precision.
Malware Analysis Using Machine Learning
287
The percentage of accuracy is defined as the number of accurate predictions divided by the total number of input samples.
4 Results By using traditional algorithms, we obtained a certain level of accuracy for each algorithm and the accuracy results in Linear SVM, Decision Tree, Gaussian Naive Bayes, Random Forest, are 96.068%, 71.322%, 32.245% 98.445%. The data set has 74.4% of malware present rows and 25.6% rows belonging to no malware category. The percentage of Malware content and non-malicious content along with Confusion Matrix using non-binary classifier are presented in the below Fig. 2, Fig. 3, and Fig. 4 respectively.
Fig. 2. Plot of percentage of malicious and non-malicious content
Figure 4 shows the number of True Negatives = 941, False Positives = 19, False Negatives = 42 and True positives = 2921. Figure 5 shows the precision and f1-score for the Random Forest classifier as 0.99 and 0.99 respectively. Figure 6 shows the precision and f1-score for the Gaussian Naive Bayes as 0.80 and 0.88 respectively. Accuracy of the algorithms before and after Ada Boost is shown in the Tables given below where Traditional algorithms and Accuracy are being compared. The below Table 1 shows the accuracy of the algorithms before the Ada Boost. After Ada boost the accuracy of the Gaussian Naive Bayes and Decision Tree are increased which are shown in the Table 2 given below. In the classic approach, the Random Forest classifier and Linear SVM outperformed the other three methods. Gaussian Naive Bayes and Decision Tree algorithms improved their accuracy on a larger scale after ADA Boost.
288
B. Karampudi et al.
Fig. 3. Percentage of malware content and non-malicious content in the dataset
Fig. 4. Confusion Matrix Plot using non-binary classifier
Fig. 5. Precision and f1-score for Random Forest classifier
Fig. 6. Precision and f1-score for Gaussian Naive Bayes
Malware Analysis Using Machine Learning
289
Table 1. Traditional algorithms vs Accuracy before Ada bosting Algorithm
Accuracy in %
Random Forest
98.445
Linear SVM
96.068
Decision Tree
71.322
Gaussian Naive Bayes
32.245
Table 2. Traditional algorithms vs Accuracy after Ada boost Algorithm
Accuracy in %
Gaussian Naive Bayes
79.607
Decision Tree
98.623
5 Conclusion The proposed paper uses the Linear SVM, Decision Tree, and Gaussian Naive Bayes Random Forest algorithms to process the data set imported from the Kaggle website, and the accuracy is calculated. Out of these, Random Forest and Linear SVM algorithms produced an outstanding accuracy. Gaussian Naive Bayes got the lowest accuracy among the four. To improve the performance of the algorithms that resulted in lower accuracy ADA classifier technique is used to increase the performance of the algorithms. As a result of ADA boost, Gaussian Naive Bayes algorithm’s performance increased at a greater rate and improved the accuracy. Similarly, the Decision Tree algorithm performance also improved to its best by ADA boost it.
References 1. Ficco, M.: Malware analysis by combining multiple detectors and observation windows. IEEE Trans. Comput. 1 (2021). https://doi.org/10.1109/tc.2021.3082002 2. Souri, A., Hosseini, R.: A state-of-the-art survey of malware detection approaches using data mining techniques. Hum. Centric Comput. Inf. Sci. 8(3) (2018). https://doi.org/10.1186/s13 673-018-0125-x 3. Roseline, S.A., Sasisri, A.D., Geetha, S., Balasubramanian, C.: Towards efficient malware detection and classification using multilayered random forest ensemble technique. In: 2019 International Carnahan Conference on Security Technology (ICCST) (2019). https://doi.org/ 10.1109/ccst.2019.8888406 4. Sethi, K., Kumar, R., Sethi, L., Bera, P., Patra, P.K.: A novel machine learning based malware detection and classification framework. In: 2019 International Conference on Cyber Security and Protection of Digital Services (Cyber Security) (2019). https://doi.org/10.1109/cybersecp ods.2019.888 5. Kaur, G., Dhir, R., Singh, M.: Anatomy of ransomware malware: detection, analysis, and reporting. Int. J. Secur. Netw. 12(3), 188 (2017). https://doi.org/10.1504/ijsn.2017.084399
290
B. Karampudi et al.
6. Sabhadiya, S., Barad, J., Gheewala, J.: Android malware detection using deep learning. In: 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI) (2019). https://doi.org/10.1109/icoei.2019.8862633 7. Ito, R., Mimura, M.: Detecting unknown malware from ASCII strings with natural language processing techniques. In: 2019 14th Asia Joint Conference on Information Security (AsiaJCIS) (2019). https://doi.org/10.1109/asiajcis.2019.00-12 8. Chaudhary, S., Garg, A.: A machine learning technique to detect behavior-based malware. In: 2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence) (2020). https://doi.org/10.1109/confluence47617.2020 9. Yin, H., Zhang, J., Qin, Z.: A malware variants detection methodology with an opcode-based feature learning method and a fast density-based clustering algorithm. Int. J. Comput. Sci. Eng. 21(1) (2020) 10. Aslan, Ö., Samet, R.: A comprehensive review on malware detection approaches digital object identifier. https://doi.org/10.1109/ACCESS.2019 11. Datta, A.: An emerging malware analysis techniques and tools: a comparative analysis. Int. J. Eng. Res. Technol. (IJERT) (2021) 12. Mokoena, T., Zuva, T.: Malware analysis and detection in enterprise system. IEEE (2017). 0-7695-6329-5 13. Gandotra, E., Bansal, D., Sofat, S.: Malware analysis, and classification: a survey. J. Inf. Secur. (2014). https://doi.org/10.4236/jis.2014.52006 14. Mira, F., Brown, A., Huang, W.: Novel malware detection methods by using LCS and LCSS. In: 2016 22nd International Conference on Automation and Computing, ICAC 2016: Tackling the New Challenges in Automation and Computing, Wurzburg, Germany, pp. 554–559 (2016). https://doi.org/10.1109/IConAC.2016.7604978
SiameseHAR: Siamese-Based Model for Human Activity Classification with FMCW Radars ¨ Mert Ege1,2(B) and Omer Morg¨ ul2 1
Huawei Turkey R&D Center, Istanbul, Turkey [email protected] 2 Bilkent University, Ankara, Turkey
Abstract. Human Activity Recognition (HAR) is an attractive task in academic researchers. Furthermore, HAR is used in many areas such as security, sports activities, health, and entertainment. Frequency Modulated Continuous Wave (FMCW) radar data is a suitable option to classify human activities since it operates more robustly than a camera in difficult weather conditions such as fog and rain. Additionally, FMCW radars cost less than cameras. However, FMCW radars are less popular than camerabased HAR systems. This is mainly because the accuracy performance of FMCW radar data is lower than that of the camera when classifying human activation. This article proposes the SiameseHAR model for the classification of human movement with FMCW radar data. In this model, we use LSTM and GRU blocks in parallel. In addition, we feed radar data operating at different frequencies (10 GHz, 24 GHz, 77 GHz) to the SiameseHAR model in parallel with the Siamese architecture. Therefore, the weights of the paths that use different radar data as inputs are tied. As far as we know, it is the first time that the multi-input Siamese architecture has been used for human activity classification. The SiameseHAR model we proposed is superior to most of the state-of-the-art models. Keywords: Human Activity Recognition · Siamese Network Radar Data · micro-Doppler Signature · Deep Learning
1
· FMCW
Introduction
Human Activity Recognition (HAR) is an active field of study that tries to classify human movement with a camera and sensor data. Human movement is classified by processing spatial or temporal data from the sensor or camera. Since human movement classification is mostly used in the field of industry, academic studies in this field are also increasing. For example, human interaction [3], simple actions [2], and healthcare [10] are some of academic studies on HAR. With the increase in the effectiveness of Human Activity Recognition in the academic field, its performance in classifying human movements increases, and its c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 291–302, 2023. https://doi.org/10.1007/978-3-031-35501-1_29
292
¨ Morg¨ M. Ege and O. ul
use in the industry is also growing. Security, healthcare systems and entertainment are the main areas where the HAR system is used. A critical security feature is a system that can classify people who behave abnormally. Thanks to HAR, abnormal movements can be detected, and possible dangers can be avoided. In addition to security systems, we can get information about the patient’s health status by classifying human movements with wearable sensors. Thus, the doctor can be warned when the patient performs movements that pose a health risk [19]. In addition, the increased interaction between humans and computers in the entertainment industry allows the user to be more involved in the game. For this reason, games are played according to the player’s movement, such as the Wii Game Console System [12]. Radar data has advantages and disadvantages over camera images. Radar does not give an output that humans can understand. On the other hand, it continues to produce a stable output in rainy and foggy weather conditions where the camera image is lost. Also, radar data is more reliable than a camera in protecting personal data because radars cannot observe the private environment [20]. To use FMCW radar data to classify objects, it must be preprocessed. As a result of this preprocessing, the radar data can be represented as a 2D plot. Micro-Doppler signature is an instance of these radar data representations. Applying the Short-Term Fourier Transform (STFT) to the FMCW radar data creates the micro-Doppler signature. STFT is X(k, ω) =
n=+∞
xn wn−k e−iωn ,
(1)
n=−∞
where xn is the input signal to be applied to STFT and wn is the window size. The difference between STFT from Discrete Fourier Transform (DFT) is that the time interval of the transform is kept short. In this way, the frequency can be localized in time intervals. This paper proposes a new method to improve performance in classifying human activation using FMCW radar. In [6], three radars facing the same scene are lined up side by side. The operating frequencies of these three radars are different from each other. These frequencies are 10 GHz, 24 GHz, and 77 GHz. We propose a novel Siamese architecture to use these three radars together. This architecture includes parallel paths for the spectrogram outputs of the three radars, and all paths have the same parameters. We call our proposed solution: SiameseHAR. We also compared their performance using the different number of layers in the proposed SiameseHAR architecture. Finally, we get state-of-the-art results using the dataset in [6] compared to the proposed method in [6]. Our main contributions include: – As far as we know, for the first time, three different FMCW radars were used at the same time for the Human Activity Recognition task. When the usage performance of three different FMCW radars is compared in detail with a single radar, it is seen that multiple radars give superior performance. – The proposed SiameseHAR uses model-dependent weights to reduce at least one-third of the number of parameters compared to the multi-input model while providing better results.
SiameseHAR
293
– The SiameseHAR model uses GRU and LSTM networks in parallel paths. The advantage over the accuracy performance of using GRU blocks and LSTM blocks instead of using LSTM blocks alone is demonstrated in detail. – Ablation studies for Attention Mechanism and Global Average Pooling were performed. The rest of this paper is organized as follows. In Sect. 2, methods in the field of Human Activity Recognition from the relevant literature are summarized. Section 3 contains a description of the preprocessing of FMCW radar data and a detailed description of the proposed model. Experiments and results of the proposed deep learning model are presented in Sect. 4. Finally, we evaluate the results of the experiments and suggest potential improvements for future work in Sect. 5.
2 2.1
Related Work Conventional Methods
In the early 2000s, distance information from the radar is used for the classification of the time series data [1]. Dynamic Time Warp (DTW) and Euclidean distance are the main methods for classifying objects with radar data. Using these methods, the types of malfunctions that may occur in a nuclear power plant are classified [9]. For decades, DTW has been used with the k-nearest neighbor classifier to classify time series [17] because using DTW allows the classification of similar shapes on the time axis. 2.2
Deep Learning Based Methods
Due to the high performance of deep learning in many areas such as Computer Vision (CV), Natural Language Processing (NLP), and speech recognition, researchers have started to use deep learning for time series classification. In [21], multivariate time series data is classified using a deep convolutional neural network (CNN). In another study [8], the features of time-series data are extracted separately with Long short-term memory (LSTM) and FCN. Extracted features are concatenated, then classification is made with the merged features. LSTNet [11], on the other hand, solves the scale-intensive issue, which is one of the main problems in classifying time-series data using both CNN and RNN. This model classifies time-series data by extracting long and short-term patterns. 2.3
Classification Using FMCW Radar
Investigations in the time series domain have also affected the models in which FMCW radar data is given as input. Thus, in systems where FMCW radars are used as input, improvement is achieved in object detection [14,22], object classification [15], and human activity recognition [6].
294
¨ Morg¨ M. Ege and O. ul
In [22], the object is detected using FMCW radar data. A range-DopplerAzimuth (RAD) plot obtained from FMCW data was used to draw 2D boxes on the detected objects. Authors of [22] apply feature extraction with ResNet [7]. Then, they used YOLO [16] to detect the object. On the other hand, in [14], LSTM and CNN models were used together for object detection. These models are used in the RAD graph to extract both temporal and spatial information. Then, object detection is done with the SSD [13] head. 2.4
Human Activity Recognition Using FMCW Radar
Another classification task that can be done using FMCW radar data is Human Activity Recognition (HAR). A dataset containing 11 different human movements was created in [6], which we are using in our paper. These are “sitting on a chair”, “walking towards radar”, “kneeling”, “scissor gait”, “crawling towards the radar”, “walking on both toes”, “bending”, “picking up an object from the ground”, “limping with right leg stiff”, “walking with short steps”, and “walking away from the radar”. They used three different FMCW radars to collect the samples. These radars operate at 10 GHz, 24 GHz, and 77 GHz. This article observes classification performance when the radar used in training and the radar used in testing differ. For example, the training data consists of samples collected from the radar working with 24 GHz. However, the test data consists only of the 77 GHz radar data. In this case, synthetic data was obtained with Generative Adversarial Network (GAN) [5], and classification was made with Convolutional Autoencoder (CAE) to increase the performance. On the other hand, authors of [18] use Bi-LSTM. The main reason for using Bi-LSTM is to extract both forward-time and reverse-time motion features. After the features are extracted, classification can be done with a Dense Neural Network (DNN).
3 3.1
Model Architecture System Description
Three different radar data were collected to create the data set in [6], which is used in the proposed model experiments. The first radar, the Texas Instruments IWR1443 FMCW transceiver, operates at 77 GHz main frequency and has a bandwidth of 750 MHz. The other FMCW radar is Ancortek SDR-Kit. This radar has an operating frequency of 24 GHz and a bandwidth of 1500 MHz MHz. The last radar model is the XeThru X4, and transmits between 7 and 10 GHz. All radars used are listed adjacent. Therefore, adjacent radars can transmit their signals toward the same target. Spectrogram samples of these radars are shown in Fig. 1.
SiameseHAR
295
Fig. 1. Spectrogram examples of a person walking away from the radar in 10 GHz, 24 GHz, and 77 GHz radar data are given in order.
3.2
Spectrogram Input Pre-processing
Raw data from radar passes through preprocessing blocks before being converted to a spectrogram, shown in Fig. 2. Firstly, raw data of radar is validated using phase corrections and quadrature (I&Q) channels. After (I&Q) correction, the 1D signal is converted to a 2D signal by reshaping. Then, the Moving Target Indication (MTI) operation is used to locate moving objects. Finally, STFT is used to get the spectrogram plot.
Fig. 2. Block diagram of preprocessing of the dataset of [6]. This figure is cited by [4].
3.3
Attention Mechanism
We use the Attention Mechanism block in the models we proposed. Thanks to the mechanism of Attention, we give more importance to the features that affect the result. The typical Attention Mechanism uses the time-dependent attention method. Therefore, Attention applies its weight to time steps. On the other hand, we used a feature-dependent Attention Mechanism, not a time-dependent one. This is because each spectrogram feature gives the energy density at different frequency values. By establishing a feature-dependent model, we can give more importance to the frequencies that affect the result. We show the effect of the Attention Mechanism used in Table 1 inside the Ablation Study. 3.4
Proposed Method
LSTM- and GRU-Based Model. In our proposed method, we use LSTM and GRU blocks together. Stacked GRU and LSTM blocks are used on two separate paths in parallel. GRU blocks are less complex because they contain fewer gates than LSTM blocks. Therefore, LSTM and GRU blocks can extract different features from the same input. In order to use different features from both blocks simultaneously, we fed LSTM and GRU blocks parallel to the spectrogram input. As seen in Fig. 3, the spectrogram plot is fed as input to the LSTM-based
296
¨ Morg¨ M. Ege and O. ul
Fig. 3. Dataflow of LSTM- and GRU- based architecture. This figure is cited by [4].
Encoder and the GRU-based Encoder in parallel. Then, the two paths were merged into concatenation. After the normalization process, the spectrogram plot is fed to the LSTMbased and GRU-based Encoder. Encoder blocks include Average Pooling to halve the time step size of the input. Average Pooling block reduces the number of parameters, and time series blocks can use longer time intervals. The proposed model includes a total of three consecutive GRU and LSTM blocks. Then, using the feature-related Attention block, we place more emphasis on features that have a more positive impact on the outcome. The positive effect of this block on test accuracy can be seen in Table 1. At the end of the encoder block, each feature is averaged over time steps using Global Average Pooling. Hence, the number of parameters is significantly reduced, as shown in Table 2. The outputs of the Encoder blocks are concatenated side by side and become available in the Decoder block. At the beginning of the Decoder block, we use Batch Normalization to eliminate the scale and feature effects of different features coming out of the Encoders. Then we complete the classification process for human movement using the Dense layer, Softmax Activation Function, and Categorical Cross Entropy. In the model we propose, we feed the output of three radars operating at different frequencies in parallel. Since all three radars extract different features from the movements of the target, when we feed the spectrogram plots of the different radars in parallel to our model as in Fig. 4, the test accuracy results will increase, as shown in Table 3. SiameseHAR. The last step of the model we propose is to connect the weights of the paths we feed in parallel and ensure that all paths share the weights. Weights that share the same values on different paths are defined as “Tied Weight” in Fig. 4. With this model, which we call SiameseHAR, we reduce the number of parameters by 1 in 3 and increase performance, as seen in Table 3. The main reason for our increase in performance is to reduce the possibility of overfitting by decreasing the number of parameters.
SiameseHAR
297
Fig. 4. Dataflow of our proposed SiameseHAR architecture. This figure is cited by [4].
4 4.1
Experiments Training Setups and Data Split
The model we propose, seen in Fig. 4, has been tuned to get the most optimal accuracy result, inference time, and the number of parameter values. Grid search is used to tune the proposed model. All experiments were done in the Google Colab Pro application using Tesla P100 GPU or NVIDIA Tensor Core GPU machines. Before we use the dataset [6], three parts are created from this dataset: training, validation, and testing. Training operations are made by using the training dataset. Validation data is selected from the training data, and we use this dataset for the Early-Stopping method. When the validation loss value increases, we stop the training operation with the Early-Stopping method. Hence, overfitting is prevented. We use the test data to test our model after completing the training process. We use the five-fold cross-validation method to test our model. Hence, five equal parts are constructed from the dataset. The first part is assigned as the test dataset, wholly separated from training and validation data to prevent data leakage. This process continues five times, and another piece is assigned as test data each time. By taking the average of the results of this process, the bias problem that may occur due to the splitting of the dataset is avoided. In addition, to increase the reliability of the result, the five-fold cross-validation method is repeated three times, and all results are averaged.
298
4.2
¨ Morg¨ M. Ege and O. ul
Ablation Study
To show the effect of Attention and Global Average Pooling blocks on the performance of the proposed model, these blocks are removed from the proposed architecture, and ablation study is applied. We use feature-dependent Attention Mechanism, as can be seen in Sect. 3.3. Thanks to this method, we emphasize the feature that positively affects the result. The second block we do ablation study is Global Average Pooling, which is used at the last part of the Encoder block. Global Average Pooling averages all time steps of each feature for converting two-dimensional input to one dimension. Flatten is used instead of Global Average Pooling in our ablation study to convert 2D input to 1D. Attention Mechanism. When we test our model using the Attention block on the proposed model, and without the Attention block with the same parameters, we observe that the attention mechanism provides a %3 increase in test accuracy that can be seen in Table 1. When we compare the number of parameters, it can be observed that the number of parameters of Attention is low. Thus, the positive impact of the Attention Mechanism on test accuracy performance demonstrates the importance of this block. Table 1. An ablation study of the Attention block was conducted to investigate the impact on test accuracy results in a Siamese-based Network. Test accuracy Min Attention Mechanism is not added 0.913 Attention Mechanism is added
Parameters
Max
Mean
0.929
0.913
0.930 0.952 0.943
1,283,637 1,284,107
Global Average Pooling. Another ablation study was on Global Average Pooling. For performing ablation study of Global Average Pooling, instead of Global Average Pooling, Flatten layer is used to convert two-dimensional input to one dimension. The Global Average Pooling averages time steps in each feature of the spectrogram representation. Flatten, however, arranges all time steps side by side. Therefore, the number of parameters increases significantly when Flatten is used, as seen in Table 2. Also, there is a decrease in test accuracy results. Examining the results in Table 2, it seems that Global Average Pooling is the best option for converting 2D data to 1D. 4.3
Architecture Search
Since the dataset [6] used contains spectrogram plots of radars operating at three different frequencies (10 GHz, 24 GHz, 77 GHz), we tested each frequency separately that can be seen in Table 3. In addition, experiments were carried
SiameseHAR
299
Table 2. An ablation study of the Global Average Pooling layer was performed to investigate the impact on test accuracy results in a Siamese-based Network. Test Accuracy Flatten layer
Parameters
Min
Max
Mean
0.927
0.948
0.935
35,412,725
Global Pooling layer 0.930 0.952 0.943 1,284,107
out in which all frequencies were used together with the LSTM- and GRU-based model name. The experimental results of the LSTM-based model, in which only the LSTM block is used called LSTM-based, are also added to Table 3 to show the effect of using LSTM and GRU blocks in parallel on the results. The proposed SiameseHAR model is also included in Table 3 to show the effect of Siamese Architecture on the results. Experiment results include min, max, and mean test accuracy values, number of parameters, training time, and inference time. In addition to mean accuracy result, min and max accuracy results are shared to show the consistency of the results. Thus, comparisons of models can be made from different aspects. Table 3. This table compares our proposed models with the Baseline Method. Test Accuracy Min
Params
Training Time (ms)
Inference Time (ms)
Max
Mean
LSTM- and GRU-based 10 GHz 0.865 LSTM- and GRU-based 24 GHz 0.842 LSTM- and GRU-based 77 GHz 0.861
0.887 0.855 0.884
0.876 0.850 0.873
2,370,187 2,370,187 2,370,187
260.82 322.54 265.50
4.53 3.34 4.40
LSTM-based
0.919
0.924
0.922
2,405,769
161.12
3.48
LSTM- and GRU-based
0.915
0.946
0.939
4,286,727
183.60
6.77
SiameseHAR
0.930 0.952 0.943 1,284,107 224.00
4.86
Baseline Model [6]
–
–
–
0.915
–
–
When LSTM-based and LSTM and GRU-based networks are compared in Table 3, it is seen that using LSTM and GRU together increases the test accuracy by %1.7. However, the number of parameters also approximately doubles. All radar data (10 GHz, 24 GHz, and 77 GHz) are fed into The LSTM- and GRU-based model separately and together for the experiment. When the results of these experiments are compared, it is seen in Table 3 which mean test accuracy increases by an average of %7 when all radar data are fed together in parallel. However, the number of parameters also expands approximately twice.
300
¨ Morg¨ M. Ege and O. ul
We use the SiameseHAR model to reduce the increasing number of parameters. Since the weights in the paths of the different radars will be tied, the number of parameters is significantly reduced. The number of parameters of SiameseHAR is three times less than the LSTM- and GRU-based model, as seen in Table 3. In addition, there is no decrease in the test accuracy value. In this way, systems that need fewer parameters can use the SiameseHAR model to achieve high accuracy. The bottom line shows the test accuracy result of the model proposed in [6] with the same dataset. Parameter size, inference and training times are not included because authors of [6] does not share these results. Since the dataset we used for the experiments was taken from this article, this model was defined as the Baseline Model. Authors of [6] trained the Convolutional Autoencoder (CAE) model using data from a single input instead of multiple radar inputs with different main frequencies. Hence, our proposed SiameseHAR model gives better results than the Baseline Model with its low number of parameters.
5
Conclusion and Future Work
This paper proposes the SiameseHAR model to classify human activation with FMCW radar data. To the best of our knowledge, LSTM and GRU blocks used in parallel in this model are used together for the first time to classify human movement. When we add the GRU blocks in parallel with the LSTM blocks, the mean test accuracy increases by %1.7. Furthermore, as far as we know, SiameseHAR is the first model in Siamese architecture where three different radar data are fed in parallel. Thanks to SiameseHAR, the number of parameters is reduced to one-third, and the mean accuracy result is increased by %0.4. Comparing the proposed SiameseHAR model with the Baseline Model [6] shows a %2.8 increase in accuracy. When we perform an ablation study for Global Average Pooling and Attention Mechanism, used in the proposed SiameseHAR model, an increase of %3 for Attention Mechanism and %1 for Global Average Pooling occurs. In addition, the number of parameters is reduced to one-fifteen, thanks to Global Average Pooling. We tested SiameseHAR with the dataset of [6], but we need to experiment with different datasets to increase the reliability of our model on its results. The outputs of different FMCW radars facing the same scene are required for the proposed model to work. However, the number of datasets created in this way is insufficient to perform different experiments. Therefore, we can create a dataset containing FMCW radar data with different main frequencies. Furthermore, reducing the inference time to classify human activation in real-time is necessary.. To do this, quantization methods can be applied to the SiameseHAR model. Thus, we can reduce the number of parameters and the inference time.
SiameseHAR
301
References 1. Chiang, H.C., Moses, R.L., Potter, L.C.: Model-based classification of radar images. IEEE Trans. Inf. Theory 46(5), 1842–1854 (2000) 2. Doll´ ar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72. IEEE (2005) 3. Du, Y., Chen, F., Xu, W.: Human interaction representation and recognition through motion decomposition. IEEE Signal Process. Lett. 14(12), 952–955 (2007) 4. Ege, M.: Human activity classification with deep learning using FMCW radar. Ph.D. thesis, Bilkent University (2022) 5. Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, vol. 27 (2014) 6. Gurbuz, S.Z., Rahman, M.M., Kurtoglu, E., Macks, T., Fioranelli, F.: Crossfrequency training with adversarial learning for radar micro-doppler signature classification (rising researcher). In: Radar Sensor Technology XXIV. vol. 11408, p. 114080A. International Society for Optics and Photonics (2020) 7. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) 8. Karim, F., Majumdar, S., Darabi, H., Chen, S.: Lstm fully convolutional networks for time series classification. IEEE access 6, 1662–1669 (2017) 9. Keogh, E., Ratanamahatana, C.A.: Exact indexing of dynamic time warping. Knowl. Inf. Syst. 7(3), 358–386 (2005) 10. Kuo, Y.M., Lee, J.S., Chung, P.C.: A visual context-awareness-based sleepingrespiration measurement system. IEEE Trans. Inf. Technol. Biomed. 14(2), 255– 265 (2009) 11. Lai, G., Chang, W.C., Yang, Y., Liu, H.: Modeling long-and short-term temporal patterns with deep neural networks. In: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 95–104 (2018) 12. Lawrence, E., Sax, C., Navarro, K.F., Qiao, M.: Interactive games to improve quality of life for the elderly: towards integration into a WSN monitoring system. In: 2010 Second International Conference on eHealth, Telemedicine, and Social Medicine, pp. 106–112. IEEE (2010) 13. Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0 2 14. Major, B., et al.: Vehicle detection with automotive radar using deep learning on range-azimuth-doppler tensors. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp. 0–0 (2019) 15. Patel, K., Rambach, K., Visentin, T., Rusev, D., Pfeiffer, M., Yang, B.: Deep learning-based object classification on automotive radar spectra. In: 2019 IEEE Radar Conference (RadarConf), pp. 1–6. IEEE (2019) 16. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 779–788 (2016) 17. Saggio, G., Cavallo, P., Ricci, M., Errico, V., Zea, J., Benalc´ azar, M.E.: Sign language recognition using wearable electronics: implementing k-nearest neighbors with dynamic time warping and convolutional neural network algorithms. Sensors 20(14), 3879 (2020)
302
¨ Morg¨ M. Ege and O. ul
18. Shrestha, A., Li, H., Le Kernec, J., Fioranelli, F.: Continuous human activity classification from FMCW radar with BI-LSTM networks. IEEE Sens. J. 20(22), 13607– 13619 (2020) 19. Uddin, M.Z., Soylu, A.: Human activity recognition using wearable sensors, discriminant analysis, and long short-term memory-based neural structured learning. Sci. Rep. 11(1), 1–15 (2021) 20. Vandersmissen, B., et al.: Indoor person identification using a low-power FMCW radar. IEEE Trans. Geosci. Remote Sens. 56(7), 3941–3952 (2018) 21. Yang, J., Nguyen, M.N., San, P.P., Li, X.L., Krishnaswamy, S.: Deep convolutional neural networks on multichannel time series for human activity recognition. In: Twenty-Fourth International Joint Conference on Artificial Intelligence (2015) 22. Zhang, A., Nowruzi, F., Laganiere, R.: Raddet: range-azimuth-doppler based radar object detection for dynamic road users. In: 2021 18th Conference on Robots and Vision (CRV), Los Alamitos, CA, USA, pp. 95–102. IEEE Computer Society May 2021. https://doi.org/10.1109/CRV52889.2021.00021, https:// doi.ieeecomputersociety.org/10.1109/CRV52889.2021.00021
Automatic Bidirectional Conversion of Audio and Text: A Review from Past Research Pooja Panapana(B) , Eswara Rao Pothala, Sai Sri Lakshman Nagireddy, Hemendra Praneeth Mattaparthi, and Niranjani Meesala GMR Institute of Technology, GMR Nagar, Rajam 532127, India [email protected]
Abstract. Speech represents the most natural and basic method of communication for living beings. Speech provides the most direct and natural way for humans, and even humans and machines, to communicate. People who do not have disabilities can converse with each other in natural language, however people who have disabilities, such as Deafness or Dumbness, can only communicate by texting and sign language. But one can use sign language when the other person is near to us. Speech detection/recognition is a segment of computer science which allows the computer to recognize and translate spoken language into text. Speech detection technology gives machines the ability to identify and respond to spoken commands. If we need to send any information, we can make audio and send it to them. Every time we speak or play audio, it consists of some signals. These signals are used to make communication between humans and machines. The current systems can only have applications on speech to text conversion. The proposed system tries to implement more by converting audio to text and as well as text to speech which are more useful. This project will aid in the conversion of audio to manuscript and manuscript to speech. This project also translates the languages which is helpful for illiterate people too. Keywords: Speech Recognition · Communication · Signal Processing · Language Translation
1 Introduction Speech recognition began in the early 1950s with research at Bell Labs. Early versions had a solitary speaker and only a few dozen words [3]. Speech recognition technologies have progressed significantly since their forefathers. They have vast vocabularies in various languages and can recognize speech from multiple speakers [10]. Speech recognition is an area of computer science which allows the computer to recognize and translate uttered language into text, primarily for search capabilities [8]. Speech detection technology gives machines the ability to identify and respond to spoken commands [7]. Speech detection has become one of the biggest marketing strategies. The present systems have only applications on speech detection such as Siri, Alexa, Google Assistant etc. [9]. But these are not useful for people with disabilities such as deaf and dumb. The present scheme will be constructed in such a manner that it can convert audio/video © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 303–312, 2023. https://doi.org/10.1007/978-3-031-35501-1_30
304
P. Panapana et al.
input into text as well as text into speech, which will make it more useful [1, 3, 5, 6]. The method states that the voice or signal can be recognized by microphone and after that the signal will be processed and by language models and acoustic models, decoding will be done and that is converted applications in our daily life. There are many method used such as Dynamic Time Wrapping (DTW) [1], Hidden Markov Model (HMM) [1, 6], Artificial NN (ANN) [1], Hashbot Algorithm [2], Convolutional Neural Networks (CNN) [3], Support vector machines (SVM) [4], Voice Activity Detection (VAD) [4, 10], Adaptive Boosting (AdaBoost) algorithm [5], Recurrent Neural Network (RNN) [6], Speech Chain Algorithm [7], Text and Speech with Speaker Change Detection Using Meta Data (TSMD-SCD) [8], FastSpeech [9], Autoregressive Transformer [9], Auto Associative Neural Network (AANN) [10].
2 Literature Survey They examined various methods and approaches used to achieve a Speech-To- Text (STT) and Text-To-Speech (TTS) devices in this paper. They utilized Dynamic Time Warp (DTW), Hidden Markov Model (HMM) and Artificial Neural Network (ANN) classifier to convert Speech - To - Text. They used a speech synthesis technique to convert Text-To-Speech. A number of speech synthesis techniques exist, including articulator synthesis, formant synthesis and concatenative [1]. The paper describes the possibilities made available by open Application Programming Interface (API)s. The developed scheme made use of the “Hashbot” algorithm which is a mathematical algorithm that maps data of any size to a fixed-size hash. Future scope of this project includes creating a custom model based on Keras library and training it with a recurrent neural network. The next stage of research will be the development of such a custom system [2]. In this paper they used deep neural networks methods that have contributed significantly to the development of speech recognition, mainly they used the deep learning method. The prediction by CNN is better than the comparative approach in terms of Mean Absolute Error (MAE) and kendall scores. Finally, they intend to construct a user interface which will enable us to make decisions in real time [3]. This paper outlines a technique for recognizing speech utilizing the Support Vector Machines (SVM) and the Mel Frequency Cepstral Co-efficient (MFCC). SVM modelling techniques were used to prototype thus every individual word that has been trained to the framework. For every automatic word fragment from of the test assertion using Voice Activity Detection (VAD) is aligned against all of these models to assess the contextual features of the test input speech [4]. The AdaBoost algorithm is investigated in this paper. Ensemble boosting procedures compete with neural network models in terms of consistency, although they are mildly quite steady and comprehensible, putting them ahead in terms of choice. The program’s restriction is that the ensemble’s quality is reliant on the set of data. An imbalanced dataset will result in lower classification performance. Ensemble development from weak classifiers is a future extension of current work [5]. In this paper, they used RNN, which in itself is adapting at such a rapid pace it which it has become growingly difficult to keep apprised of new, extra intriguing, and far more
Automatic Bidirectional Conversion of Audio and Text
305
sophisticated solutions for attempting to accomplish more complicated and challenging tasks. The next stage of this research could include training model in large data volumes, as well as analyzing and comparing the speed and quality of its output [6]. This study demonstrates a novel deep learning-based Machine Speech Chain mechanism. The results show that when ASR and TTS are enabled, they enhance efficiency by trying to teach each other using only unattached data. Deep learning is used in this project to build a speaker encoding (Deep Speaker) architecture. In the future, the application’s effectiveness must be improved [7]. Text and Speech with Speaker Change Detection Using Meta Data (TSMD-SCD) is a model that employs meta-data features to detect speaker changes. This paper significantly outperformed and surpasses a combined approach based on Sequence-2- Sequence for the SCD problem, in addition to a province voice-based remedy for such SD problem. Its effectiveness is limited, and it cannot handle multiple data sets at once [8]. The FastSpeech model is used, which is made up of six FFT frames on both the phoneme and mel-spectrogram sides. In the sequence-level knowledge distillation, the Autoregressive Transformer TTS model is utilized to extract phoneme duration and generate mel-spectrogram. FastSpeech can fit the autoregressive Transformer TTS model in terms of speech intelligibility while also speeding up mel- spectrogram generation and end-to-end speech synthesis [9]. This paper demonstrates a approach for acknowledging speech based on features in a Sonogram using an Auto associative Neural Network (AANN). Voice Activity Detection (VAD) is a approach of identifying voiced snippets in speech that is useful in speech mining applications. Voice Activity Detection (VAD) is applied in this paper to separate individual words from continuous speeches [10]. This paper investigates the underpinning complicated relationship among speech recognition technology, university educational environments, and impairment issues. Specifically made Speech Recognition technology is used in Liberated Learning courses to provide better access to lecture content. The Liberated Learning Project’s main objective is to put speech recognition applications in actual university classrooms [11]. The authors present a novel Zero-Shot method (ZSM-SS) that combines a monologue embedding layer and standardization design with a quasi feed- forward transistor architecture. The results presented in this research include both qualitative and quantitative data, as well as high-quality audio output. The author proposed two architectures to grab the prosodic features of the network, such as the rise and decline of a speaker’s voice. The first is based on convolution, while the second is based on multi-head attention [12]. This article developed an adaptively responsive machine speech chain data fusion to aid TTS in noisy environments. Non-incremental TTS and accumulative TTS were described as two TTS systems with auditory stimuli (ITTS) in this study. This paper created two architectures, one of which only uses a semi-supervised technique. A learning device speech chain, an instantaneously versatile implication approach included [13]. The authors of this research suggested two effective batch normalization strategies for end-to-end TTS models to enhance the model’s reliability. The end-to-end TTS model may now be quickly regularized trained thanks to the application of two effective strategies by the author. Author creates a collaborative training technique for both approaches
306
P. Panapana et al.
so that forward and backward decoding can interact to improve one another. There are two basic parts to a typical TTS synthesis system: both the front to the back ends [14]. In this study, the authors suggested using recurrent neural networks to eliminate reverberation and additive noise from speech recordings used to train text-to- speech systems. To prepare the network, the author have been using the vibrational Transformation function of the both wash and warped speech as that of the program’s insight and goal, respectively [15]. Human computer interface refers to the communication between computers and people. Speech does have the possibility of becoming a crucial method of computer interaction. This paper offers a review of the major science and technology viewpoints and admiration again for fundamental advancement of speech recognition, in addition to an executive summary of the technology created at each stage of voice recognition. This paper aids in the selection of techniques by outlining their relative advantages and disadvantages [16]. Speech to text takes input from the microphone as speech and later transforms it into text to be displayed on the desktop. In this article, we intend to explore various speech-to-text methods that can be used in a voice-based email system. In this paper Speech Analysis Method is used and also STT (Speech To Text) conversion model STT conversion is executed using HMM (Hidden Markov Model)along with Neural Network [17]. The use of neural networks for text-to- speech conversion has indeed been discovered to be more precise. In this paper they have explained the neural network approach using different types of neural networks like DNN, RNN and CNN etc. This article explains the functions, applications, pros and cons of different types and techniques of neural networks, which is useful for choosing the best approach to work on a TTS system. Further research is going on to select a suitable methodology for designing text to speech system for Kannada language [18].
3 Various Methods Used in Audio to Text Conversion 3.1 Dynamic Time Wrap Based on dynamic programming, DTW [1] is used to compare the similarity of two time series with varying speeds. Its goal is to iterative manner align two feature vector sequences until an ideal match is identified [7]. DTW [1] works as follows: 1. Split the two series into equal portions. 2. Ascertain the distance measure between the initial point in the initial series and each of the points within the second series. Save all the determined shortest variance. 3. Continue with step 2 till all the positions are depleted. Step 2 should be repeated until all points have been drained. 4. Restate until each of the saved least ranges are added up to get a real measure of similarity between the two sequence.
Automatic Bidirectional Conversion of Audio and Text
307
For conversion of speech to text DTW [1] is used. Under speech recognition: feature extraction, they told dynamic time wrap works as a better for extracting the features because of its simple hardware implementation [1]. 3.2 Artificial Neural Network NNs are a collection of algorithms that attempt to identify trends, relationships, and information in data using a process influenced by and similar to the human brain/biology. ANN [2] is used for better communication, better recognition and to remove unwanted noise [3]. Steps for ANN [2]: 1. In the first step, Input units are passed i.e. data is passed with some weights attached to it to the the hidden layer. 2. Each hidden layer consists of neurons. All the inputs are connected to each neuron. 3. After passing on the inputs, all the computation is performed in the hidden layer. 4. After passing through every hidden layer, we move to the last layer i.e. our output layer which gives us the final output. 5. After getting the predictions from the output layer, the error is calculated i.e. the difference between the actual and the predicted output. 3.3 Convolutional Neural Network Convolutional Neural Networks [3] which is Deep Learning technique that is used joint inputs of texts and signals, also the NN [1] correctly predict the distribution of word error rates on a collection of records. Steps involved in CNN [3]: 1. 2. 3. 4. 5. 6.
Input layer Convolutional layer Pooling layer Second Convolution layer and pooling layer Dense layer Logit layer
CNN (Convolutional Neural Networks) [3] predictions outperform the relative technique in terms of Mean Absolute Error (MAE) [3] and Kendall scores. 3.4 Support Vector Machine Support vector machines [4] are a type of supervised learning method that is capable of categorising, correlating, and detecting outliers [7]. All of these are common machine learning tasks. Steps for SVM: 1. Separate the information into features and labels.
308
2. 3. 4. 5.
P. Panapana et al.
Separate the information into training and testing groups. Train the SVM algorithm Make some predictions Evaluate the results of the algorithm
3.5 Adaptive Boosting AdaBoost [5], an abbreviation for Adaptive Boosting [5], is used in this paper because of the basic algorithms, Integration convenience, good generalised property, which could be improved even more by increasing the number of basic classifiers, and capacity to spot emissions. Steps for AdaBoost: 1. 2. 3. 4. 5. 6.
Give all observations equal Weights. Using stumps, classify random samples. Determine Total Error Determine the Stump’s Performance. Iteratively Update Weights. Final Predictions
AdaBoost [5] is a curved gradient descent algorithm, which means that it is sensitive to data variability and susceptible to errors when compared to other approaches. 3.6 Voice Activity Detection Voice activity detection (VAD) [4, 10] is a method used to detect the presence or absence of human speech. VAD [4, 10] has been used in speech-controlled applications [7] and smartphones and other devices that can be controlled via speech commands [7]. Steps for VAD [4, 10]: 1. 2. 3. 4. 5. 6. 7.
Segmentation Voice over Internet Protocol Noise-to-Signal Ratio Detection Algorithm Home Automation Noise Suppression Speaker Diarization
Voice Activity Detection (VAD) is a tool for analysing voiced snippets in speech [4, 10]. Voice Activity Detection (VAD) [4, 10] is a technique for extracting specific words from ongoing speeches [7].
Automatic Bidirectional Conversion of Audio and Text
309
3.7 Auto Associative Neural Network AANN [7] is a five-layer network that encapsulates the allocation of a feature vector [4]. Back propagation [1] or similar learning procedures are used to instruct auto associative neural networks [7] in order to generate an estimate of the personality surveying among both network infrastructure inputs and outputs An auto implicit system has a directional roadblock for both input and output [7]. For catching the distribution of speech signals, the AANN [7] model used during our study is 13L 26N 4N 26N 13L, where L represents a linear unit and N reflects a non-linear unit [7]. The AANN learning method [7] has a high recognition rate of 90% (Table 1). Table 1. Comparison of different methods Title of the paper
Methods and tools Used
Merits
Demerits
Systems for speech to text and text to speech recognition
1. Dynamic Time Wrap (DTW) 2. Hidden Markov Model (HMM) 3. Artificial Neural Network (ANN)
DTW works as a better for extracting features. HMM works as a better speech to text converter
computational feasibility, no detailed work is presented, the architecture is more complicated than usual
Development of a Google API- based Speech-to-Text Chatbot Interface
1. Hashbot algorithm
Hashbot’s time complexity for finding key phrases is less than that of a well-known algorithm
less known and it only gives response which is auto generated and can’t recognize correct meaning
A study on automatic speech recognition
1. Convolutional Neural Network (CNN)
The prediction by CNN which only CNNis better than the works on datasets comparative approach mainly, having results only for a limited words
Speech Recognition using SVM
1. Support vector machines (SVM) 2. Voice Activity Detection (VAD)
good performance and method requires large 95% speech and complete datasets recognized rate for training
Creating a system for automatic audio-to-text conversion
Adaptive Boosting (AdaBoost)
In comparison to other algorithms, it is sensitive to variability in the data and prone to overfitting
The classification performance will be lower for an imbalanced dataset
(continued)
310
P. Panapana et al. Table 1. (continued)
Title of the paper
Methods and tools Used
Merits
Demerits
Creating a System 1. Recurrent Neural for Automatic Audio Network (RNN) to Text Conversion 2. Hidden Markov Model (HMM)
machine translation by handling large data using RNN has given volumes. Time the maximum complexity is more accuracy
Audio, Speech, and Language Understanding, Machine Speech Chain
Speech Chain Algorithm
enables us to prepare Also, its effectiveness our model on a is less compared to combination of other algorithms labelled and unlabeled data
Hybrid Speech and Text Analysis Methods for Speaker Change Detection
Text and Speech with Speaker Change Detection Using Meta Data (TSMD-SCD)
greater robustness solves SCD problem than a hybrid to some extent, Sequence-2-Sequence effectiveness is less approach
Fast, robust and controllable text to speech
1. Fast Speech model 2. Autoregressive Transformer
nearly match the autoregressive Transformer TTS model
not effective and the quality of synthesized speech is less
Sonogram and AANN are used to recognize speech
1. Voice Activity Detection (VAD) 2. Auto Associative Neural Network (AANN)
The AANN learning method boasts a high recognition rate of 90%
complex task, if signals are not recorded correctly, we will have less efficiency
Speech recognition in university classrooms: A project for liberated learning
Automatic Speech Recognition (ASR)
For Automatic Speech System is only Recognition we will understand by students get 95% of accuracy who have basic English knowledge
Multi-Speaker Text-to-Speech Synthesis Using Zero-Shot Normalization
Zero-Shot multispeaker voice synthesis (ZSM-SS)
Produces Complicated method spectrograms used for and requires perfect detailed visualization data
A Machine Speech Lombard Text to Chain Approach for Speech Technique Static and Dynamic Noise Environments
Dynamically adaptable inference approach
If the surrounding noise is also more then it is hard to predict correct words (continued)
Automatic Bidirectional Conversion of Audio and Text
311
Table 1. (continued) Title of the paper
Methods and tools Used
Merits
Demerits
A forward-backward End-to-end TTS decoding sequence model based on is used to regularize Tacotron end-to-end TTS
Forward and reverse decoding work together to enhance one another
Not much increase in robustness
Text-to-speech Reverb and dereverb speech enhancement Algorithms for noisy and reverberant speech
Important parameters can be controlled in real- time
more difficult to eliminate than additive noise
A review on speech recognition technique
Segment Analysis By comparing, GHM technique, Gaussian and HMM are best mixture model(GMM)
Error rate is more compared to other method
Speech to text Conversion using Deep Learning Neural Net Methods
Speech Analysis Method, Hidden Markov Model (HMM)
HMM gives the better Without the efficiency to the Microphone, the system system will not run
Text-to-Speech Kannada Using Neural Networks Survey
convolutional neural network, recurrent neural network, deep neural network
Helpful to know how The system is not speech recognition developed, various systems actually work algorithms must use to get high accuracy
4 Conclusion Speech and audio are the easiest way of communication and fastest. We don’t need education of writing and typing to communicate where we can use our voice and speak with people but till 1950s to communicating with machines, we need to type words in the system. Then Speech recognition is introduced where we can also communicate with machines easily. Speech to text conversion is indeed a fast- expanding aspect of computer technology that is becoming extremely relevant in how we interact with the system. The usage of machines also increased by then and till today the speech recognition has its deep roots into many applications. A speech to text conversion is a valuable tool that is becoming more common. It is simple to create implementations with this tool using Python, one of the most prevalent programming languages in the world. Speech recognition has also drawbacks like dumb people cannot speak and deaf people cannot hear. So, to solve the problem with deaf and blind people, audio- to-text and text-tospeech systems are introduced in this project with addition of language transformation. The audio-to-text and text-to-speech with language translation gives accurate results. The future work of this paper includes the emotion recognition for the speech in audio input, language translation for the text acquired in Audio-To-Text conversion and for speech acquired from Text-To-Speech conversion. The emotion recognition helps to
312
P. Panapana et al.
understand the situation and emotion of a person. Language translation helps people to understand the deepness in the speech and to react to situation correctly.
References 1. Trivedi, A., Pant, N., Shah, P., Sonik, S., Agrawal, S.: Speech to text and text to speech recognition systems-a review. IOSR J. Comput. Eng. 20(2), 36–43 (2018) 2. Shakhovska, N., Basystiuk, O., Shakhovska, K.: Development of the speech-to- text chatbot interface based on google API. In: MoMLeT, pp. 212–221 (2019) 3. Benkerzaz, S., Elmir, Y., Dennai, A.: A study on automatic speech recognition. J. Inf. Technol. Rev. 10(3), 80–83 (2019) 4. Thiruvengatanadhan, R.: Speech recognition using SVM. Int. Res. J. Eng. Technol. (IRJET) 5(9), 918–921 (2018) 5. Basystiuk, O., et al.: The developing of the system for automatic audio to text conversion. In: IT&AS, pp. 1–8 (2021) 6. Tsap, V., Shakhovska, N., Sokolovskyi, I.: The developing of the system for automatic audio to text conversion. In: MoMLeT+DS, pp. 75–84 (2021) 7. Tjandra, A., Sakti, S., Nakamura, S.: Machine speech chain. IEEE/ACM Trans. Audio Speech Lang. Process. 28, 976–989 (2020) 8. Anidjar, O.H., Lapidot, I., Hajaj, C., Dvir, A., Gilad, I.: Hybrid speech and text analysis methods for speaker change detection. IEEE/ACM Trans. Audio Speech Lang. Process. 29, 2324–2338 (2021) 9. Ren, Y., et al.: Fastspeech: fast, robust and controllable text to speech. In: Advances in Neural Information Processing Systems, vol. 32 (2019) 10. Thiruvengatanadhan, R.: Speech recognition using sonogram and AANN (2019) 11. Bain, K., Basson, S.H., Wald, M.: Speech recognition in university classrooms: liberated learning project. In: Proceedings of the Fifth International ACM Conference on Assistive Technologies, July 2002 12. Kumar, N., Narang, A., Lall, B.: Zero-shot normalization driven multi- speaker text to speech synthesis. IEEE/ACM Trans. Audio Speech Lang. Process. 30, 1679–1693 (2022) 13. Novitasari, S., Sakti, S., Nakamura, S.: A machine speech chain approach for dynamically adaptive Lombard TTS in static and dynamic noise environments. IEEE/ACM Trans. Audio Speech Lang. Process. (2022) 14. Zheng, Y., Tao, J., Wen, Z., Yi, J.: Forward–backward decoding sequence for regularizing end-to-end tts. IEEE/ACM Trans. Audio Speech Lang. Process. 27(12), 2067–2079 (2019) 15. Valentini-Botinhao, C., Yamagishi, J.: Speech enhancement of noisy and reverberant speech for text-to-speech. IEEE/ACM Trans. Audio Speech Lang. Process. 26(8), 1420–1433 (2018) 16. Gaikwad, S.K., Gawali, B.W., Yannawar, P.: A review on speech recognition technique. Int. J. Comput. Appl. 10(3), 16–24 (2015) 17. Babu Pandipati, D.R.: Speech to text conversion using deep learningneural net methods. Turkish J. Comput. Math. Educ. (TURCOMAT), 12(5), 2037–2042 (2021) 18. Nadig, P.P.S., Pooja, G., Kavya, D., Chaithra, R., Radhika, A.D.: Survey on text-to-speech Kannada using neural networks. Int. J. Adv. Res. Ideas Innov. Technol. 5(6), 128 (2019)
Content-Based Long Text Documents Classification Using Bayesian Approach for a Resource-Poor Language Urdu Muhammad Pervez Akhter1(B)
, Muhammad Atif Bilal2
, and Saleem Riaz3
1 Riphah College of Computing, Riphah International University, Faisalabad Campus,
Faisalabad 38000, Pakistan [email protected] 2 College of Instrumentation and Electrical Engineering, Jilin University, Changchun 130061, China [email protected] 3 School of Automation, Northwestern Polytechnical University, Xi’an 710072, People’s Republic of China [email protected], [email protected]
Abstract. Due to the emerging technologies like hand-held devices, fast and reliable internet services, everyday millions of text documents are being produced by online libraries, blogs, and news forums. To reuse and extract useful information from these enormous amount of documents, these documents must be process and arranged into labels automatically. Text classification is an important task of text mining where a document is labeled based on its content. Long text classification is more challenging than short text classification because of high dimensional feature space, irrelevant features require more time and resources for processing. Further, Urdu text processing is more challenging because of its complex language features like rich morphological font and lack of public datasets. To overcome the above challenges, in this study, we attempt to investigate the potential of six machine learning classifiers based on Bayesian theorem in the context of long text document classification of Urdu. For systematic performance comparison, we perform experiments on three datasets of small, medium and large size. Performance measures such as F-measure, Root Mean Square Error (RMSE), and time have been considered to evaluate the model’s performance. Usually Naïve Bayes is considered an efficient classifier but our experimental results show that DMNBtext is the most fast, reliable and accurate classifier than five Bayesian and three non-Bayesian models. Keywords: Urdu Text Classification · Natural Language Processing · Information Retrieval · Document Text Classification · Bayesian Models
1 Introduction Wide range of sources are producing enormous amount of unstructured data such as emails, web pages, online news blogs or forums, social media and online libraries publishing articles. The growth in electronic data no doubt will be continue to increase in © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 313–321, 2023. https://doi.org/10.1007/978-3-031-35501-1_31
314
M. P. Akhter et al.
future with the development of new technologies like Internet of things (IoTs), cloud computing and mobile devices. Manually conversion of unstructured data into structured data (meaningful) is time and effort taking process. Automatic processing and classification of this data is required. From last two decades, machine learning methods are very popular to handle and classify such large amount of unstructured data. In computational linguistics, natural language processing (NLP) is concerned with designing and developing systems that can process and understand languages for better human-computer interaction. A primary goal of NLP is to enable a machine that can analyze, understand, process and output text for better human understanding [1]. Text categorization, sometimes referred to as text classification, is a crucial NLP activity that involves grouping a collection of text documents according to a predetermined set of categories [2]. Spam detection, sentiment analysis, documents organizations are some important applications of text classification. In the past, the task of text classification has been performed in popular languages like Arabic, Chinese, and Turkish. Pakistan’s official language is Urdu, which is also one of the official languages in a few Indian states. There are more than 11 million native speakers of Urdu worldwide. Urdu is regarded as one of the resource-poor but important language, in contrast to English, due to the lack of linguistic resources [3]. Due to its difficult morphology, distinctive features, and lack of linguistic resources, Urdu has historically received little attention from researchers [4]. One of the most popular research languages in South Asian languages right now is Urdu. Machine learning techniques are very popular in classification of short text and long text from last few decades. As compare to short text, automatic processing of text documents is more challenging long text has a large vocabulary size, more noisy text, sentences or paragraph with in a document may belong to multiple categories, more training and testing time, and require more hardware like RAM and CPU [5]. Naïve Bayes is a popular classifier and usually is considered an efficient and accurate machine learning classifier for classification task. Many studies conclude that Naïve Bayes outperforms the other on text classification task [6, 7]. Our experiment reveals that Naïve Bayes is not a good classifier for text classification task while DMNBText is the best classifier for long text classification task. In this paper, our contribution is as follows: • We perform intensive analysis to evaluate the effectiveness of Bayesian-based machine learning models for Urdu text classification task. • To make our finding useful for future work, we use three datasets (two publically available and one our designed dataset) of small, medium and large size. • We use three metrics: F-measure, RMSE, and time to compare the performance of our models. • We also evaluate the performance of three non-Bayesian models in order to increase the effectiveness of the Bayesian models. Rest of the paper is in the following order: Sect. 2 gives a short review of Urdu language, Sect. 3 presents the related work, Sect. 4 discuss the datasets and Bayesian models, results are discussed in Sect. 5 and conclusion is given at the end of this paper.
Content-Based Long Text Documents Classification
315
2 Review of Urdu Language There are 38 alphabets in Urdu, and it is written from right to left. The text is of the Nastaleeq font family. In the Urdu language, ligatures—which can be made up of one character or several characters—are combined to form words. Sentences are constructed by connecting words in a right-to-left arrangement [4]. The Urdu language has certain features that make automated text processing more difficult compared to other languages. Table 1 outlines a number of the features of English and Urdu. Table 1. Some similarities and differences of Urdu and English languages Language
Characters
Writing directions
Diacritics
Free word order
Sentence Order
Capitalization
Cursive
Urdu
38
Left
Yes
Yes
SOV
No
Yes
English
26
Right
No
No
SVO
Yes
No
In NLP most of the focus has been given to major languages like English, Arabic, and Chinese languages by the researchers but the Urdu has been neglected from past many years because of the three main reasons: 1) unique characteristics and complex morphology of Urdu, 2) lack of linguistics resources like stemmer and tokenizer, and 3) lack of some standard corpus for research work.
3 Related Work Urdu is a resource-poor language and lack of language resources only limited work has been performed in this language. In most of the studies, authors design their own dataset and unfortunately they didn’t share their datasets for public and future research. From last decade, development and availability of internet and WWW make it possible to design resources like datasets for resource-poor languages. For text document classification, online news blogs and forums are helping to researchers to collect and design datasets. Many studies used these news blogs to collect news articles and design a dataset like Turkish [8] Arabic [9]. Three datasets of Urdu text documents from internet news blogs were used in this work. To prepare a dataset for classifiers to automatically process it, preprocessing is applied. Typically, punctuation and space characters are used to tokenize a document [10]. Characters other than the Urdu language, special symbols, integers or real values, and website URLs are eliminated from documents to make sure that they only contain words in the intended language. Stopwords are eliminated, and the remaining words are stemmed to form their base word. Typically, TF and IDF methods are used to create a vector representation of documents. Further these vectors are used for feature selection and classification [11]. Preprocessing a dataset improves its performance, according to several research on document categorization [5]. A comparison of ten automated classification approaches on forty one datasets is performed and concluded that NB performed the best on small size dataset [12]. Tehseen
316
M. P. Akhter et al.
also concludes that NB shows better performance on small size dataset of Urdu text documents [13]. Multinomial NB outperformed the others in Indonesian text document classification task [14]. The study of [15] concluded that NB achieved best scores than others on Urdu sentiment analysis task. In this research, we use Bayesian and nonBayesian models to three sets of news articles that we gathered from various news blogs and categorize. We make logical association of these models and found that DMNBText outperform the others.
4 Datasets and Bayesian Models 4.1 Text Documents Datasets Urdu is one of the resource-poor but an important language of the world and there is no publically availed dataset for text classification task. In this study, we used three datasets of text documents collected from various online news blogs like BBC, Express, Geo, etc. Each document has its title and detailed description about the title. Documents are distributed into various classes. Table 2 shows a comparison of the three datasets used in this study. COUNTER dataset is publically available. Documents in naïve and COUNTER datasets are in XML format while NPUU is in Unicode format. We use MATLAB code to clean and convert these XML files to Unicode. Along with other preprocessing steps we also remove XML tags. NPUU is a large size and imbalance dataset as compare to others. Table 2. Statistical comparison of three datasets used in this study Classifiers
COUNTER
naïve
NPUU
Size
Small
Medium
Large
No. of docs.
1,200
5,003
10,819
Maximum length doc.
2,480
4,129
3,254
Minimum length doc.
43
47
08
Average length doc.
215
439
325
No. of classes
5
4
6
No. of words
288,835
2,216,845
3,611,756
Imbalanced level
High
Low
High
Split-ratio
5
5
5
4.2 Dataset Preprocessing Machine learning models cannot process a raw dataset directly. A dataset must be preprocessed and converted into meaningful form. We tokenize the text, remove noise (URLs, acronyms, non-language characters) and remove stopwords. An example of common preprocessing steps is shown in Fig. 1.
Content-Based Long Text Documents Classification
317
Fig. 1. Urdu text categorization preprocessing procedure that is commonly used.
4.3 Feature Selection
No. of Features
A dataset includes millions or thousands of words, which together form a high dimensional feature space. After preprocessing the datasets, we employed IG to evaluate the usefulness of the features, as suggested by [6] and [19] for Urdu text. The number of features selected by IG and BoW are shown in the Fig. 2. 4000 3500 3000 2500 2000 1500 1000 500 0
3561 2729 2642 2132 2021
1799
COUNTER
naïve BoW
NPUU
IG
Fig. 2. Number of selected features from each dataset
4.4 Bayesian Models The Bayes theorem or rule can be used to calculate the probability of an occurrence when only limited information is available. It is a relation in probability theory that connects the conditional probability and the marginal probabilities of two independent random events [5]. The Bayes theorem can be stated as follows: P(C|D) =
P(D|C) ∗ P(C) P(D)
(1)
where C and D are two events and P(D) = 0. P(D) and P(C) are the prior probabilities of observing D and C without regard to each other. P(D|C) is the probability of observing event D given that C is true. For text document classification, Bayes theorem can be
318
M. P. Akhter et al.
defined as given in Eq. 2. Here dj is the weighted vector dj = of jth document and qkj is the weight of kth feature in jth document: P(di |ci ) ∗ P(ci ) − → − P(ci | dj ) = → P dj
(2)
P(ci ) is the probability of a randomly selected document which belongs to the class − → − → ci . P( dj ) is the of a randomly chosen document with weighted vector dj and P(di |ci ) is the probability of a document di belongs to a given class ci . Followings Bayesian models have been used in this study. • Naïve Bayes (NB): Bayesian theorem is used in NB, which is a simple probabilistic model. It is based on the assumption that the features in a dataset are conditionally independent. Although this assumption is rarely true in practice, it works well with high-dimensional data and text classification tasks. Typically, NB will place a document into a category that consists of more documents than the other categories. • Naïve Bayes Kernel: Each class’s kernel density estimate is calculated independently by NB using the available training data. Comparatively, it requires more computational time and data storage space than the normal distribution. It works best with continuous data [16] • Complement Naïve Bayes (CNB): With the exception of the category that is the focus, CNB estimates a category’s parameters using data from all other categories [12]. Contrary to NB, CNB frequently assigns a document to a class with fewer documents than NB. CNB often performs better than MNB. • Multinomial Naïve Bayes (MNB): A unique variation of NB, MNB makes use of multinomial distribution. The likelihood of any combination of outcomes for different categories is provided by multinomial distribution. When we have more than two classes, it is a common technique for classification tasks. Discriminative Multinomial Naïve Bayes (DMNB) Text: Structure learning and parameter learning are the two core components of a Baysian network. Additionally, it is a combination of Multinomial Naive Bayes and Naive Bayes. The advantages of Bayesian networks and discriminative learning are included. It offers an effective, useful, and easy method that discriminatively estimates frequencies from a dataset and uses the right frequencies to estimate parameters.
5 Experimental Results and Discussion 5.1 Experimental Setup and Performance Measures Average F-measure, time, and Root Mean Square Error (RMSE) are used to evaluate how well models work on Urdu text documents classification. For all of the experiments, we use a well-known data mining tool WEKA [7] and ten-fold cross validation. For all the nine models, we use the default parameters given in the WEKA.
Content-Based Long Text Documents Classification
319
5.2 Results Discussion Average F-measure values achieved by all the models are shown in Fig. 3. For all the datasets, DMNBText model outperforms the others and achieves 96.9%, 96.6%, and 92.4% F-measure score on COUNTER, naïve, and NPUU datasets. It can be seen from Fig. 3 that NB has shown worst performance on three datasets to classify long text documents of Urdu language.
Fig. 3. F-measure values of Bayesian models
To measure the error of a model in prediction, RMSE scores are shown in Fig. 4. DMNBText shows outstanding performance than other Bayesian models by achieving minimum scores of RMSE to classify small, medium and large size datasets. DMNBText achieved lowest residual values (or difference between the actual values and the predicted values). It achieves 0.1006, 0.1157 and 0.1466 RMSE scores on COUNTER, naïve and NPUU datasets.
Fig. 4. Error analysis of Bayesian models on three datasets
Time is also another important metric to evaluate performance of a model. From Table 3, it can be seen that NB-multinomial, NBcomplement, and DMNBText takes
320
M. P. Akhter et al.
very low time (less than a second) to build the model. It is concluded that with respect to time the performance of NB is not very good and it is a fast classifier for text documents classification. From the analysis of results achieved by all the Bayesian models, it is concluded that DMNBText is the best classifier with respect to time, accuracy and error. It is fast, reliable and accurate classifier. Table 3. Time taken by each model on three datasets (in seconds) Classifiers
COUNTER
naïve
NPUU
NB-Multinomial
0
0.03
0.18
NB-Complement
0.03
0.02
0.05
DMNBText
0.09
0.14
0.33
NB
0.5
4.1
13.83
NB-Kernel
0.56
3.33
14.98
BayesNet
1.5
14.85
39.86
We also compare the performance of DMNBText model with three non-Bayesian models. We use J48, Random Forest and LibLINEAR models. F-measure values and time taken to build a model is given in Table 4. Experimental results show that DMNBText also perform superior performance than others. Table 4. F-measure and Time taken by non-Bayes models Model
COUNTER
naïve
NPUU
F
T
F
T
F
T
DMNBText
96.9
0.09
96.6
0.14
92.4
J48
89.6
6.92
89.9
74.34
87.3
95.3
Random Forest
94.9
6.32
95.5
22.81
90.8
84.12
LibLINEAR
96.4
0.14
96.1
1.09
88.3
6.24
0.33
6 Conclusion In this study we compare the performance of six machine learning models based on Bayes theorem on three datasets of Urdu text documents of small, medium and large size. Although NB is a popular model for information retrieval and text classification for other languages but for text document classification task of Urdu it is not a good classifier. By comparison of three performance measures (f-measure, RMSE, and time), the performance of DMNBText is outstanding while the performance of NB is worst
Content-Based Long Text Documents Classification
321
on three datasets. Comparison of DMNBText with non-Bayesian models also show the superiority of this model in context of long text document classification of Urdu. In future, we aim to investigate the performance of Bayesian models on short Urdu text like sentiment analysis. Also, we aim to apply these models on Roman Urdu text that is very similar to English text but different from the Urdu text. Further applying ensemble learning or hybrid approaches to classify these datasets would be a good future contribution.
References 1. Sarkar, D.: Text classification BT - text analytics with python: a practical real-world approach to gaining actionable insights from your data. Presented at the (2016) 2. Aggarwal, C.C.: Text sequence modeling and deep learning BT - machine learning for text. Presented at the (2018) 3. Riaz, K.: Comparison of Hindi and Urdu in computational context. Int. J. Comput. Linguist. Nat. Lang. Process. 01, 92–97 (2012) 4. Daud, A., Khan, W., Che, D.: Urdu language processing: a survey. Artif. Intell. Rev. 47(3), 279–311 (2016). https://doi.org/10.1007/s10462-016-9482-x 5. Akhter, M.P., Jiangbin, Z., Naqvi, I.R., Abdelmajeed, M., Mehmood, A., Sadiq, M.T.: Document-level text classification using single-layer multisize filters convolutional neural network. IEEE Access. 8, 42689–42707 (2020) 6. Bilal, M., Israr, H., Shahid, M., Khan, A.: Sentiment classification of Roman-Urdu opinions using Naïve Bayesian, Decision Tree and KNN classification techniques. J. King Saud Univ. Comput. Inf. Sci. 28, 330–344 (2016) 7. Akhter, M.P., Jiangbin, Z., Naqvi, I.R., AbdelMajeed, M., Zia, T.: Abusive language detection from social media comments using conventional machine learning and deep learning approaches. Multimedia Syst. (2021) 8. Yüksel, A.E., Türkmen, Y.A., Özgür, A., Altınel, A.B.: Turkish tweet classification with transformer encoder. In: International Conference on Recent Advances in Natural Language Processing, RANLP 2019, pp. 1380–1387, September 2019 9. Alshammari, R.: Arabic text categorization using machine learning approaches. Int. J. Adv. Comput. Sci. Appl. 9, 226–230 (2018) 10. Jabbar, A., Iqbal, S., Khan, M.U.G., Hussain, S.: A survey on Urdu and Urdu like language stemmers and stemming techniques. Artif. Intell. Rev. 49(3), 339–373 (2016). https://doi.org/ 10.1007/s10462-016-9527-1 11. Miro´nczuk, M.M., Protasiewicz, J.: A recent overview of the state-of-the-art elements of text classification. Expert Syst. Appl. 106, 36–54 (2018) 12. Hartmann, J., Huppertz, J., Schamp, C., Heitmann, M.: Comparing automated text classification methods. Int. J. Res. Mark. 36, 20–38 (2019) 13. Tehseen, Z., Akhter, M.P., Abbas, Q.: Comparative study of feature selection approaches for Urdu text categorization. Malays. J. Comput. Sci. 28, 93–109 (2015) 14. Wongso, R., Luwinda, F.A., Trisnajaya, B.C., Rusli, O.: Rudy: news article text classification in Indonesian language. Procedia Comput. Sci. 116, 137–143 (2017) 15. Bilal, A., Rextin, A., Kakakhel, A., Nasim, M.: Roman-txt: forms and functions of Roman Urdu texting. Presented at the (2017) 16. Pérez, A., Larrañaga, P., Inza, I.: Bayesian classifiers based on kernel density estimation: flexible classifiers. Int. J. Approx. Reason. 50, 341–362 (2009)
Data-driven Real-time Short-term Prediction of Air Quality: Comparison of ES, ARIMA, and LSTM Iryna Talamanova1 and Sabri Pllana2(B) 2
1 Stockholm, Sweden Center for Smart Computing Continuum, Forschung Burgenland, Eisenstadt, Austria [email protected]
Abstract. Air pollution is a worldwide issue that affects the lives of many people in urban areas. It is considered that the air pollution may lead to heart and lung diseases. A careful and timely forecast of the air quality could help to reduce the exposure risk for affected people. In this paper, we use a data-driven approach to predict air quality based on historical data. We compare three popular methods for time series prediction: Exponential Smoothing (ES), Auto-Regressive Integrated Moving Average (ARIMA) and Long short-term memory (LSTM). Considering prediction accuracy and time complexity, our experiments reveal that for short-term air pollution prediction ES performs better than ARIMA and LSTM.
Keywords: air pollution
1
· Exponential Smoothing · ARIMA · LSTM
Introduction
The high-density of urban population may often lead to high-levels of various kinds of pollution, such as, air pollution [25], light pollution [11], or noise pollution [2,3]. Air pollution is an important indicator of the life quality of a city. It may affect human health by causing heart and lung diseases [7]. Furthermore, air pollution is a significant contributor to climate change [19]. Table 1 describes the health impact of some of the most harmful air pollutants. Sensors for personal use enable to measure the air pollution at a specific location. Based on the retrieved data, it is possible to predict future air pollution using time series forecasting methods. Existing solutions, are already capable to quantify the air quality state. However, these solutions do not provide short-term forecasts in real time. Prediction of time series in near real time implies that the model should be periodically updated to account for recently generated data. Amjad et al. [5] propose to monitor the trend of the financial time series and update the model if the trend changes direction. Qin et al. [18] suggest to create a schedule for c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 322–331, 2023. https://doi.org/10.1007/978-3-031-35501-1_32
Short-term Prediction of Air Quality
323
Table 1. Impact of air pollution on human health. Pollutant Health Impact CO
Mostly affects the cardiovascular system and leads to a lack of oxygen in the human body. The outcome is bad concentration and slow reflexes. It also may cause lung inflammation
O3
Affects lungs and respiratory system and may lead to lung inflammation. It also reduces lung functions and may cause asthma or even lung cancer
NO2, SO2 Affects the respiratory system by reducing its resistance to respiratory infections PM
May cause inflammatory lung changes. It is dangerous due to its small size, which allows it to reach the heart and the brain and may cause inflammation there
updating the data within fixed time frame. For an adaptive model selection, Le et al. [10] suggest to update the model each time new data arrives. A trend in the series of air pollution data may change unpredictably because of environmental changes. Therefore, updating the model each time new data arrives is a promising approach for air pollution data. In this paper, we compare methods that may be applied to real-time prediction of air quality. We conduct a performance comparison with respect to the prediction accuracy and time complexity for three methods: Exponential Smoothing, ARIMA, and LSTM. For experimental evaluation we use a data set [16] that comprises air pollution data collected in Skopje. Our experimental results indicate that Exponential Smoothing outperforms ARIMA and LSTM for real time prediction of air quality. The rest of the paper is structured as follows. Related work is discussed in Sect. 2. We describe the data set, code implementation, and metrics in Sect. 3. Section 4 describes the experimental evaluation. Section 5 concludes the paper.
2
Related Work
In this section, we provide an overview of the related work. Shaban et al. [20] investigate three machine learning algorithms that could be used for alarming applications in the context of air pollution: Support Vector Machines, M5P Model Trees, and Artificial Neural Networks. In future they plan to study real-time prediction of air pollution. Subramanian [23] studies application of Multiple Linear Regression and Neural Networks to forecasting of the pollution concentration. Le et al. [9] describe an air-pollution prediction model that is based on spatiotemporal data that is collected from air quality sensors installed on taxis running across the city Daegu, Korea. The prediction model is based on the Convolutional Neural Network for an image like spatial distribution of air pollution.
324
I. Talamanova and S. Pllana
The temporal information in the data is handled using a combination of a Long Short-Term Memory unit for time series data and a Neural Network model for other air pollution impact factors (such as, weather conditions.). Ochando et al. [15] use traffic and weather data for prediction of the air pollution. The aim is to provide general information about the air quality of the city, and they do not focus on a particular spot within the city. In this study, the Random Forest model performs best. Related work mostly focuses on studying air pollution data that has been already generated and stored in the past. In contrast to related work, we focus on using continuously updated data (that may be generated by sensors) for short-term prediction of the air pollution in near real-time.
3
Methodology
In this section, we describe data preparation, code implementation of prediction methods, and metrics. 3.1
Data Preparation
We use for experimental evaluation the data collected in Skopje [16]. This data set contains air pollution measurements from seven stations, and includes CO, NO2, O3, PM10, and PM2.5. The pollutants behave quite similar over time, and our focus in this study is on PM2.5. Figure 1 visualizes the data set that originally had missing values and outliers. We use linear interpolation for filling missing values. It is assumed that missing values lie on the line which could be drawn using a set of known points. The data set after applying interpolation is depicted in Fig. 2. Because we are interested in short-term predictions, we decided to predict the air pollution 24 h ahead; that is, the test interval is 24 h. In accordance with [21] the training and test interval are split as 70% and 30% respectively. We should consider that the prediction accuracy of a Neural Network usually improves with the increase of the amount of training data. Therefore, for the LSTM model the training set is much larger than for statistical models. 3.2
Code Implementation
We have implemented all algorithms in this study in Python using open source libraries and frameworks (including, statsmodels [12], pmdarima [22], keras [8], tensorflow [1]). 3.3
Measurement Metrics
For the real-time environment emulation, we use the rolling window technique. After building the model and making the forecast, the data is moved one hour ahead and model building and forecasting is repeated.
Short-term Prediction of Air Quality
325
Fig. 1. Initial data set.
Figure 3 depicts the rolling forecast. The forecast is done multiple times using different data and the error is calculated by averaging all errors that we have got. This technique is independent of the data and provides correct results. We use the Root Mean Squared Error (RMSE), because it is considered to be less sensitive with respect to outliers. RMSE is the square root of the average of squared differences between the predicted value and actual value of the series. n 2 t=1 (yi − xi ) (1) RM SE = n All experiments were conducted multiple times on different time intervals and the mean of the results has been taken. Also for each method, the multi-step forecast has been produced for comparison.
4
Evaluation
In this section, we present the training intervals and the Root Mean Squared Error (RMSE) for Exponential Smoothing, ARIMA, and LSTM. Additionally, a performance comparison of the studied methods is provided. 4.1
Exponential Smoothing (ES)
To determine the best training intervals we conducted the following experiment. We have trained the model for various numbers of hours and selected the one that resulted with the lowest RMSE. Because we are aiming for short-term predictions, it is not necessary to consider larger training intervals.
326
I. Talamanova and S. Pllana
Fig. 2. Data set after filling missing values.
Table 2 presents the average RMSE for different training intervals. We may observe that the best training interval for Exponential Smoothing is 96 h, because RMSE for this period is the lowest. Table 2. RMSE of ES model for various training intervals. Time [h] 48 RMSE
4.2
72
96
120 144
168 196
10.54 9.76 6.39 8.26 10.47 12.6 10.12
ARIMA
To make the prediction process faster and reduce the model selection time an experiment for determining hyperparameter intervals has been conducted. We have evaluated various time intervals to determine what the minimum and maximum values of the hyperparameters could be. The results of this experiment are presented in Table 3. After the hyperparameters boundaries were determined, we observed a significant improvement in the speed of building an ARIMA model. Table 4 indicates that the best training interval for the ARIMA model is 120 h, because it results with the lowest RMSE across the considered intervals.
Short-term Prediction of Air Quality
327
Fig. 3. Rolling forecast enables to emulate the real-time environment. Table 3. ARIMA hyperparameter intervals. Hyperparameter Minimum Maximum p
0
6
d
0
2
q
0
5
P
0
3
D
0
1
Q
0
2
The time for building the ARIMA model and performing prediction is 19.8 s in our experiment. While the time required for building ARIMA model and performing prediction is shorter than for Exponential Smoothing (20.5 s), the prediction error of ARIMA model is larger. 4.3
LSTM
We have implemented LSTM using Keras framework [8]. In the first stage of the experiment, we evaluated different LSTM configurations: Simple, Stacked, Bidirectional, and Encoder-decoder. Evaluation results of LSTM network configurations are presented in Table 5. We may observe that the best accuracy is achieved using a Simple LSTM configuration. In what follows in this section, we describe experimental results for the Simple LSTM configuration. Table 6 shows the selected values of hyperparameters for the Simple LSTM configuration. Table 7 shows RMSE and execution time of the Simple LSTM configuration for various values of prediction horizon. We may observe that the execution time of LSTM is significantly higher than for the other methods considered in this study (that is, ES and ARIMA). Furthermore, the prediction accuracy worsens with the increase of the prediction horizon.
328
I. Talamanova and S. Pllana Table 4. RMSE of ARIMA model for various training intervals Time [h] 72 RMSE
96
120
144 168
196
220
11.24 8.89 8.59 12.1 14.32 12.83 11.56
Table 5. Accuracy and execution time of various LSTM network configurations. Network Configuration RMSE Time [s] Simple
4.4
3.26
1196.32
Stacked
3.32
1253.21
Bidirectional
4.19
1372.65
Encoder-decoder
3.75
1476.87
Performance Comparison of ES, ARIMA, and LSTM
Figure 4 depicts the relationship of RMSE and prediction horizon for ES, ARIMA, and LSTM. RMSE increases for all three considered methods with the increase of prediction horizon. For all considered values of prediction horizon, ES has the lowest RMSE (that is, the highest prediction accuracy) compared to ARIMA and LSTM.
Fig. 4. RMSE and prediction Horizon [h] for ES, ARIMA, and LSTM.
With respect to the time for building a model and making a forecast, ARIMA and ES require about 20 s in our experiments. LSTM is significantly slower and requires about 1000 s for model building and forecasting.
Short-term Prediction of Air Quality
329
Table 6. LSTM hyperparameter values. Hyper-parameter
Value
Epoch
800
Patience coefficient
0.1
Validation size
72 h
Dropout
0.1
Recurrent dropout
0.3
Batch size
12
Type
Statefull
Coefficient for counting units 3 Training size
8000
Considering prediction accuracy (that is, RMSE) and model building and forecasting time, we may conclude that ES is the most suitable among the studied methods for short-term prediction of air pollution. Table 7. RMSE and execution time of the Simple LSTM configuration for various values of Horizon. Horizon [h] RMSE Time [s] 1
3.26
2
4.47
1282.46 953.40
3
5.35
1140.42
4
6.17
1185.93
5
7.07
1101.65
6
7.31
967.06
7
8.24
1303.88
8
6.21
900.61
9
9.06
1111.17
10
9.14
1055.00
11
9.67
1354.31
12
10.41
1081.18
13
10.44
1258.37
14
11.06
1195.12
15
10.71
936.59
16
11.29
1107.04
17
11.88
1213.67
18
13.40
1155.57
19
12.02
977.91
20
12.55
966.46
21
12.71
1199.05
22
12.93
1078.39
23
11.74
1003.03
24
14.69
1117.89
330
4.5
I. Talamanova and S. Pllana
Future Research Directions
The future work may investigate the performance of additional prediction methods in the context of air pollution. Techniques for parallel processing [4,17], acceleration [6,13,24], and intelligent parameter selection [14] could be studied to further improve the efficiency.
5
Conclusions
We have presented various methods that may be applied to short-term air quality prediction in real time. Using a real-world data set we conducted a performance comparison of three methods: Exponential Smoothing, ARIMA, and LSTM. We have observed that Exponential Smoothing performs more efficiently for the short-term air pollution prediction compared to ARIMA and LSTM. LSTM has larger prediction error and takes more time to make the prediction. For the multi-step ahead forecast the gap between prediction errors of these methods grows with the increase of the number of steps.
References 1. Abadi, M., et al.: Tensorflow: a system for large-scale machine learning. In: 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16), pp. 265–283 (2016) 2. Alsouda, Y., Pllana, S., Kurti, A.: A machine learning driven IoT solution for noise classification in smart cities. In: Machine Learning Driven Technologies and Architectures for Intelligent Internet of Things (ML-IoT), pp. 1–6. Euromicro (2018). https://doi.org/10.48550/arXiv.1809.00238 3. Alsouda, Y., Pllana, S., Kurti, A.: IoT-based Urban Noise Identification Using Machine Learning: Performance of SVM, KNN, Bagging, and Random Forest. In: Proceedings of the International Conference on Omni-Layer Intelligent Systems. COINS 2019, New York, NY, USA, pp. 62–67. ACM (2019). https://doi.org/10. 1145/3312614.3312631 4. Amaral, V., et al.: Programming languages for data-intensive HPC applications: a systematic mapping study. Parallel Comput. 91, 102584 (2020). https://doi.org/ 10.1016/j.parco.2019.102584 5. Amjad, M., Shah, D.: Trading bitcoin and online time series prediction. In: NIPS 2016 Time Series Workshop, pp. 1–15 (2017) 6. Benkner, S., et al.: PEPPHER: efficient and productive usage of hybrid computing systems. IEEE Micro 31(5), 28–41 (2011). https://doi.org/10.1109/MM.2011.67 7. Das, P.K., A, D.V., Meher, S., Panda, R., Abraham, A.: A systematic review on recent advancements in deep and machine learning based detection and classification of acute lymphoblastic leukemia. IEEE Access 10, 81741–81763 (2022). https://doi.org/10.1109/ACCESS.2022.3196037 8. Gulli, A., Pal, S.: Deep Learning with Keras. Packt Publishing Ltd (2017) 9. Le, D.: Real-time air pollution prediction model based on spatiotemporal big data. arXiv preprint arXiv:1805.00432 (2018)
Short-term Prediction of Air Quality
331
10. Le Borgne, Y.A., Santini, S., Bontempi, G.: Adaptive model selection for time series prediction in wireless sensor networks. Signal Process. 87(12), 3010–3020 (2007) 11. Longcore, T., Rich, C.: Ecological light pollution. Front. Ecol. Environ. 2(4), 191– 198 (2004) 12. Massaron, L., Boschetti, A.: Regression Analysis with Python. Packt Publishing Ltd. (2016) 13. Memeti, S., Pllana, S.: Accelerating DNA Sequence Analysis Using Intel(R) Xeon Phi(TM). In: 2015 IEEE Trustcom/BigDataSE/ISPA. vol. 3, pp. 222–227 (2015). https://doi.org/10.1109/Trustcom.2015.636 14. Memeti, S., Pllana, S.: Optimization of heterogeneous systems with AI planning heuristics and machine learning: a performance and energy aware approach. Computing 103(12), 2943–2966 (2021). https://doi.org/10.1007/s00607-021-01017-6 15. Ochando, L.C., Juli´ an, C.I.F., Ochando, F.C., Ferri, C.: Airvlc: an application for real-time forecasting urban air pollution. In: Proceedings of the 2nd International Conference on Mining Urban Data, vol. 1392, pp. 72–79. MUD 2015, CEUR-WS.org, Aachen, DE (2015) 16. Petrushevski, S.: Air pollution in Skopje from 2008 to 2018 (2018). https://www. kaggle.com/cokastefan/pm10-pollution-data-in-skopje-from-2008-to-2018 17. Pllana, S., Xhafa, F.: Programming Multicore and Many-core Computing Systems. Wiley, Hoboken (2017). https://doi.org/10.1002/9781119332015 18. Qin, X., Mahmassani, H.S.: Adaptive calibration of dynamic speed-density relations for online network traffic estimation and prediction applications. Transp. Res. Record 1876(1), 82–89 (2004) 19. Seinfeld, J.H., Pandis, S.N.: Atmospheric chemistry and physics: from air pollution to climate change. John Wiley & Sons (2016) 20. Shaban, K.B., Kadri, A., Rezk, E.: Urban air pollution monitoring system with forecasting models. IEEE Sensors J. 16(8), 2598–2606 (2016). https://doi.org/10. 1109/JSEN.2016.2514378 21. Siami-Namini, S., Namin, A.S.: Forecasting economics and financial time series: Arima vs. lstm. arXiv preprint arXiv:1803.06386 (2018) 22. Smith, T.G.: pmdarima: Arima estimators for Python (2017) 23. Subramanian, V.N.: Data analysis for predicting air pollutant concentration in smart city uppsala (2016) 24. Viebke, A., Memeti, S., Pllana, S., Abraham, A.: CHAOS: a parallelization scheme for training convolutional neural networks on Intel Xeon Phi. J. Supercomput. 75(1), 197–227 (2019). https://doi.org/10.1007/s11227-017-1994-x 25. Yang, D., Wang, J., Yan, X., Liu, H.: Subway air quality modeling using improved deep learning framework. Process Saf. Environ. Prot. 163, 487–497 (2022). https:// doi.org/10.1016/j.psep.2022.05.055
A Flexible Implementation Model for Neural Networks on FPGAs Jesper Jakobsen, Mikkel Jensen, Iman Sharifirad, and Jalil Boudjadar(B) Aarhus University, Aarhus, Denmark {201708777,201708684}@post.au.dk, {rad,jalil}@ece.au.dk
Abstract. Machine Learning (ML) is being widely used to enhance the performance and customize the service of different products and applications using actual operation data. Most ML-empowered systems are usually delivered as cloud solutions, which may represent different challenges specifically related to connectivity, privacy, security and stability. Different Implementation architectures of ML algorithms on edge and embedded devices have been explored in the literature. With respect to FPGAs, given that software functions can be implemented on the hardware logic fabric for acceleration purposes, synthesizing the hardware and configuring the implementation integration following changes in the ML models is a challenging task. In this paper we propose an adaptive architecture and flexible implementation model of Neural Networks (NN) on Xilinx FPGAs where changing the NN structure and/or size does not require a re-synthesis of the HW neither a reconfiguration of the system integration. Furthermore, we carried out a hardware optimization to enhance the performance of the IP acceleration cores and data transfer, and analyzed the underlying performance, accuracy and scalability for different NNs, up to 180000 parameters.
Keywords: Neural networks architecture · IP cores
1
· Embedded systems · FPGA · Software
Introduction
Over the last decade, Machine Learning (ML) [21] has been gaining tremendous attention in different industrial applications thanks to its ability in event detection, classification, prediction, performance optimization and service customization [14,15,18,19,22–25]. A key enabler was the data availability, thanks to the handy sensing technologies, and the existence of powerful computation infrastructures such as cloud solutions. The accuracy of a ML algorithm, implementing an operation model such as Neural Network (NN) that is in turn synthesized from a data set, is mostly dependent on the data accuracy, reliability and timeliness [16,17,22]. Thus, securing data availability is primordial to the functionality and viability of MLbased decision making and control systems. However, securing data availability c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 332–342, 2023. https://doi.org/10.1007/978-3-031-35501-1_33
A Flexible Implementation Model for Neural Networks on FPGAs
333
through connectivity might be challenging due to cyber attacks, hostile environment or simply the absence of connectivity such as in deep water or high seas [22]. Moreover, the nascent privacy concerns and underlying GDPR policies, and the conventional security challenges have motivated the use of edge and embedded computing to process data locally for different applications [20]. NNs have been executed on CPUs and GPUs; however their low throughput and/or energy efficiency presents a bottleneck in their use [10,12,13]. Alternatively, FPGAs present promising platforms for the execution of NN algorithms [9,11] thanks to 1) custom HW acceleration that is vital to boost NN performance; 2) full customization and flexibility in partitioning software and hardware; 3) high energy efficiency. However, FPGAs suffer complex design cycle and expensive reconfiguration for the HW synthesis, which is especially crucial given the dynamic architectures of NNs following data changes. To tackle the aforementioned issues, we propose an adaptive architecture and flexible implementation model of NN on Xilinx FPGAs where changing the NN structure and/or size does not require a re-synthesis of the HW neither a reconfiguration of the system integration. Furthermore, we carried out a hardware optimization to enhance the performance of the IP acceleration cores and data transfer. NN processing is split into different functions where the ARM processor executes NN instantiation and manages the data exchange between layers, whereas the synthesized IP core processes layers iteratively. Thanks to its modularity, our architecture and implementation model enable extensibility and updates to the NN weights and bias in a straightforward manner. Performance, accuracy and scalability properties have been tested for different NNs (up to 256 neurons per layer and 180000 parameters in total), three different activation functions and different SW-HW implementation configurations on a Xilinx ZYBO Z7 FPGA board. The rest of the paper is structured as follows: Sect. 2 presents the background. Section 3 cites relevant related work. In Sect. 4, we introduce the flexible architecture of NN models, whereas Sect. 5 illustrates an adaptive implementation model of NN on FPGAs. In Sect. 6 testing and experimental results are discussed. Section 7 concludes the paper.
2
Background
This section introduces the background related to FPGAs and neural networks. 2.1
Design on FPGAs
A System on Chip (SOC) integrates most components of a computer system, such as central processing unit (CPU), memory interfaces, input/output devices, storage, and possibly programmable logic. FPGA stands for Field Programmable Gate Arrays and is a reconfigurable hardware formed by configurable logic blocks and interconnections [2]. The configurable logic blocks can be programmed independently and integrated to form
334
J. Jakobsen et al.
hardware components (IP cores) that can serve to implement a software functionality. A FPGA board includes also a processing system (PS) formed by a set of ARM processing cores and controllers. Hardware IP cores are practical to accelerate computations and perform the calculations off CPU. Integrating a PS with an FPGA allows to parallelize executions. The process of generating an IP core, from a software code, is called high level synthesis (HLS) and can however be tedious where tools like Vivado HLS can be used to semi-automate the creation and configuration of IP cores with limited knowledge on hardware programming. An IP core resulting from a software code can be exported and accessible by other IP cores within the design to flash to FPGA. 2.2
Neural Networks
Algorithm 1. Neural Network output calculation for l ← 1, L do for j ← 1, N l do for i ← 1, N l−1 do l−1 × Xil−1 Zjl + = Wji Zjl + = blj Xjl = Act F un(Zjl )
L = number of layers N l = number of nodes in layer l N l−1 = number of nodes in layer l − 1 W.. are connection weights blj is the bias of node j in layer l Act F un(n) is the activation function of neuron n
A neural network is a collection of algorithms that attempt to recognize underlying relationships in a set of data using a process that mimics how the human brain works. NNs work by having an input layer, several hidden layers,
Fig. 1. Multi layer neural network architecture
A Flexible Implementation Model for Neural Networks on FPGAs
335
and an output layer, and these are being fully connected by weights. Figure 1 shows the architecture of the neural network, and Algorithm 1 demonstrates the way the output of a neural network is calculated. Implementing NNs on FPGAs has the benefit of unrolling for-loops, thus executing the neuron computations in a single clock cycle instead. This implementation pattern however requires a very large amount of hardware resources. Instead Pipelining of the loops can be done, which enables the compiler to figure out which parts of the calculations can be parallelized. In practice, the key loop to parallelize the neuron computations is taking the input vectors and applying weights and bias.
3
Related Work
Different implementation models for neural networks on FPGAs have been proposed in the literature [4,5,7–11]. The authors of [6,7] proposed a catalog of NNs statically implemented on a FPGA where the choice of NN model to use at runtime is inferred from the inputs by considering the desired accuracy and inference time. To automate the choice of NN models following inputs, the authors employed a predictive model to quickly select a pre-trained NN to use. However, having a multi-model NN and predictive model to infer NNs activation drains the FPGA resources, that can be best allocated to the execution of NN rather. In [4], the authors presented a Neural Engineering Framework (NEF) network with online learning. They used a tuning tool to generate the optimized hardware designs in under 500 iterations of Vivado HLS before integrating the entire solution implementation. However, the number of iterations needed to synthesize the IP cores is rather high and there is no guarantee that the end solution converges to the optimal performance. One can rather put the focus on synthesizing a generic IP core that can be reused to pay back the investment cost of HLS. Loni et al. [8] introduced a framework with the goal to automate the design of highly robust deep NN architecture for embedded devices through a design space exploration approach where accuracy and network size are the primary parameters of the objective function. This customized design approach delivers high computation performance however it lacks adaptability of the hardware implementation in a way that for each NN model a new HLS is needed. The authors of [3,5] proposed a low level implementation of adaptive NN on Xilinx FPGAs. The implementation model is optimized by design in way that it incorporates delays simulating the access to shared memories so that it synchronizes the computation resources following the data transfer delays. Our paper differs from that work by making the IP core parametrized enabling thus to execute different NN structures as hardware without the need to perform a high level synthesis for each new NN architecture or configuration. The authors of [1] defined a swap architecture for neural networks where different nodes, at different layers, can randomly be deactivated in a probabilistic way. Compared to that, to secure high accuracy we rather adopt a deterministic
336
J. Jakobsen et al.
model where the nodes to deactivate, and potentially the layers to ignore, are identified through the input parameters.
4
Adaptive ML Architecture Model
Neural Networks’ structure usually contains several hidden layers, and each layer has its own parameters, hyperparameters, and inputs, which nessiciate creating separate IP cores corresponding to each layer. These IP cores need to resynthesized when the structure is changed. On the other hand, the IP cores are synthesized for an specific set of inputs. Therefore, if a layer’s input is changed, for instance when the input sample of the neural network is changed, the corresponding IP core needs to be resynthisized, which has a complex design cycle and incur expensive HLS cost. To solve these problems, this paper proposes a novel adaptive architecture for executing variety of neural networks by introducing an IP core parameterized in accordance with the parameters of the layers [1]. In fact, the combination of parameters, hyperparameters, and input data of a layer of a NN is defined as the input parameters of the designed IP core. We define the maximum neurons in a layer that the IP core supports to execute. Then, a layer having smaller width can be fit into the IP core via setting the weight of the rest of neurons
Fig. 2. Dynamic Neural Network Model
A Flexible Implementation Model for Neural Networks on FPGAs
337
to zero, leading to neutralizing the contribution of the zero-weighted neurons. Consequently, a single parametrized IP core, by being synthesized once, can iteratively process the cascade of the layers in the neural network. Thus, the designed IP core is independent of the depth and width of NN, and able to efficiently compute the output of structurally varied layers. The proposed model creates a dynamic scalable neural network model that can be utilized on an embedded platform. The flexible architecture of NN can be seen in Fig. 2, where the dynamic neural network is created from a static-hardware-parameterized IP core and a configuration file contains parameters and hyperparameters of neural network.
5
Implementation Model
Initially, the neural network is trained on a cloud server or a PC, and the trained weights are transferred to the PS (ARM Processor) using a configuration file stored on a SD card. In addition, the Neural Network Manager and the synthesized IP core are implemented on the PS and the FPGA board, respectively. The parameterized IP core is synthesized once for NN architecture. Besides, the configuration file, being provided on-the-fly, contains parameters and hyperparameters of the neural network such as weights, sizes of the layers and activation functions. Given the configuration file, the NN Manager parses the stored parameters, and instantiates the structure of the neural network layers in tandem with weight values, and push the instantiated layer to the IP core to get processed, and fetch the results of the IP core back to instantiate the next layer of the neural network. Figure 3 shows the workflow of the proposed model. It is possible to increase the execution speed by using more and larger IP cores;
Fig. 3. Execution workflow of the Adaptive ML Architecture Model
338
J. Jakobsen et al.
however, there is a critical challenge which is transferring data between PS and FPGA, which is the bottleneck of the proposed model, and takes the most of the total computation time corresponding to a layer of NN. 5.1
IP Core
The implementation of the IP core is one of the main parts of the system, and therefore, took a considerable development time. Xilinx Vivado is used for this task since it allows assisted and configurable synthesis. The fixed point is used since it can have a big impact on the overall latency, and the number of resources used. The fixed point has a total length of 16 bits, which 8 of them are used for fractional bits. In addition, AXI lite interface is used to transfer the input, output, bias, and weights vectors between PS and FPGA. The maximum size of the neurons in a layer to be supported by the IP core is set to 256 neurons, which allows the IP core to dynamically change the structure of a neural network with sizes up to 256. This size can be increased in the future if a bigger FPGA is used or the data transformation protocol is improved. Theoretically, it seems possible to run the IP core multiple times, in order to use layer sizes over 256. Considering the aforementioned factors, two IP core designs are provided, synthesized and used for testing. For comparison, the hardware utilization report of the two models can be seen in Table 1.
Table 1. IP core implementation hardware usage Version BRAM DSP FF LUT
5.2
Non Optimized
7%
3%
2%
Optimized
7%
3%
13% 21%
6%
Instantiation of Neural Network
In this paper, a simple neural network, trained on the MNIST dataset [14], is used as a benchmark. The weights and bias’ are generated from the training process using Tensorflow, and stored in the configuration file, which then is parsed by the processor to extract the corresponding values of weights, biases, and activation functions. The values are used to create a vector of NN Layer class instances, each of which represents one NN layer. Since the IP core executes a single layer at a time, it is necessary to create some logic to retrieve the weights of the new layer and outputs of the previous layer for the new layer calculation. For this purpose, a class called Network Manager will use the vector of NN Layer, which contains all the information about the structure of the neural network which is supposed to run. Then, the Network Manager applies the input, bias, and weights from the first layer, and if there are more than one entry in the vector,
A Flexible Implementation Model for Neural Networks on FPGAs
339
starts iterating through the entire vector. In this way, the Network Manager class acts as a sort of glue between each layer of NN, no matter how the NN architecture is defined.
6
Experiments and Results
To analyze the performance of our implementation model, we conducted 8 experiments where the system can dynamically change the NN architecture, activation functions, number of layers and number of neurons. The first experiment consists in checking whether the implementation can dynamically change the NN configuration at runtime. The second experiment analyzes the execution time of the NN on a purely software implementation versus a hardware-software co-designed implementation. The third experiment aimed to check how much the result of the approximated softmax function is accurate in comparison to conventional software function. The different implementations configurations are executed on a Xilinx Zynq 7000 FPGA board. Figure 4 depicts the average execution time versus average total time on nonoptimized hardware, optimized hardware, and software implementation. It shows that both hardware implementations outperform the software implementation significantly and the hardware introduction of pipelining further improves the hardware run-time at a very low utilization cost. The main issue with both hardware implementations is that more than half of the time is spent to transfer data, which is a significant part of the total execution time. This makes any further optimization of the hardware less useful, since the majority of the time is still spent on transferring the data. Table 2 presents the execution time of activation functions on hardware and software implementations, demonstrating that the execution speed of hardware implementation is faster than their software
Fig. 4. Experiment 1 - Each test was run in software, on non optimized and optimized hardware.
Fig. 5. Experiment 1 - Accuracy from each test. Every implementation resulted in the same result and therefore is only depicted once.
340
J. Jakobsen et al.
implementation. Besides, Fig. 6 illustrates the execution time graph for different number of parameters up to 180000 (Fig. 5). As the figure show, the execution time grows linearly as the number of parameters increases, and improving the transfer speed between the IP core and the PS will be effective in execution time reduction.
Table 2. Execution time of activation functions on hardware and software Act Func
HW Exec Time
SW Exec Time
Softmax
0.0053169 s
0.0073846 s
Sigmoid
0.00001 s
0.0366277 s
Fig. 6. Graph of Total Execution Time as a function of Number of Parameters
7
Conclusion
This paper presented a flexible and adaptive implementation model for neural networks on FPGAs. This adaptivity enables on-the-fly changes to NN structure and configuration without the need to re-synthesize the HW neither a reconfiguration of the system integration. NN Layers processing is implemented as a parametrized IP core where the IP core functionality changes following the actual structure of NN. NN parameters are provided to the FPGA on a SD card where the ARMbased processing system instantiates the NN structure and maps the parameters to the neurons and connections, whereas IP core processes the instantiated layers iteratively. Furthermore, we carried out a hardware optimization to enhance the execution performance. To analyze the performance and scalability of our flexible model, we conducted different implementation test cases by instantiating different NNs (up to 256 neurons per layer and 180000 parameters in total), with three different activation functions on a Xilinx ZYBO Z7 FPGA board. Our experiments showed a good performance and scalability metrics even though the IP core utilize 30% of the FPGA fabric. As a future work, we plan to optimize the bottleneck related to the memory interference and data transfer between the ARM processing system and IP cores so that we maximize the parallel computation of the NN layers [2]. Another elevant future work will be to improve the scalability of the implementation model through a swapping operation so that large NN layers can be split into smaller portions to be processed individually and stitched together afterward to compute the layer outputs.
A Flexible Implementation Model for Neural Networks on FPGAs
341
References 1. Yamashita, T., Tanaka, M., Yamauchi, Y., Fujiyoshi, H.: SWAP-NODE: a regularization approach for deep convolutional neural networks. In: 2015 IEEE International Conference on Image Processing (ICIP), pp. 2475–2479 (2015) 2. Boudjadar, J., Kim, J.H., Nadjm-Tehrani, S.: Performance-aware scheduling of multicore time-critical systems. In: 2016 ACM/IEEE International Conference on Formal Methods and Models for System Design (MEMOCODE), pp. 105–114 (2016) 3. Semmad, A., Bahoura, M.: Serial hardware architecture of multilayer neural network for automatic wheezing detection. In: International Midwest Symposium on Circuits and Systems (2021) 4. Morcos, B., Stewart, T.C., Eliasmith, C., Kapre, N.: Implementing NEF neural networks on embedded FPGAs. In: International Conference on Field-Programmable Technology (FPT), pp. 22–29 (2018) 5. Bahoura, M., Park, C.-W.: FPGA-implementation of an adaptive neural network for RF power amplifier modeling. In: 2011 IEEE 9th International New Circuits and Systems Conference, pp. 29–32 (2011) 6. Taylor, B., Marco, V.S., Wolff, W., Elkhatib, Y., Wang, Z.: Adaptive selection of deep learning models on embedded systems. CoRR - Comput. Res. Repository J. (2018) 7. Taylor, B., Marco, V.S., Wolff, W., Elkhatib, Y., Wang, Z.: Adaptive deep learning model selection on embedded systems. In: Proceedings of the 19th ACM SIGPLAN/SIGBED International Conference LCTES 2018 (2018) 8. Loni, M., Daneshtalab, M., Sjodin, M.: ADONN: adaptive design of optimized deep neural networks for embedded systems. In: 2018 21st Euromicro Conference on Digital System Design (DSD), pp. 397–404 (2018) 9. Suda, N., et al.: Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks. In: International Symposium on FieldProgrammable Gate Arrays (2016) 10. Mittal, S., Vetter, J.: A survey of methods for analyzing and improving GPU energy efficiency. ACM Comput. Surv. 47, 19 (2015) 11. Ovtcharov, K., Ruwase, O., Kim, J.-Y., Fowers, J., Strauss, K., Chung, E.S.: Accelerating deep convolutional neural networks using specialized hardware. In: Microsoft Research White paper, vol. 2-11 (2015) 12. Preuveneers, D., Tsingenopoulos, I., Joosen, W.: Resource usage and performance trade-offs for machine learning models in smart environments. Sens. J. (2020) 13. Boudjadar, J., David, A., Kim, J.H., Larsen, K.G., Nyman, U., Skou, A.: Schedulability and energy efficiency for multi-core hierarchical scheduling systems. In: Proceedings of ERTS14, Embedded Real Time Systems and Software Conference (2014) 14. Boudjadar, J., Tomko, M:, A digital twin setup for safety-aware optimization of a cyber-physical system. In: Proceedings of the International Conference on Informatics in Control, Automation and Robotics (2022) 15. Saleem, B., Badar, R., Manzoor, A., Judge, M.A., Boudjadar, J., Islam, S.U.: Fully adaptive recurrent neuro-fuzzy control for power system stability enhancement in multi machine system. IEEE Access 10, 36464–36476 (2022) 16. Wu, Z., Wang, Q., Hu, J.X., Tang, Y., Zhang, Y.N.: Integrating model-driven and data-driven methods for fast state estimation. Int. J. Electr. Power Energy Syst. 139 (2022)
342
J. Jakobsen et al.
17. Banaei, M., Boudjadar, J., Khooban, M.-H.: Stochastic model predictive energy management in hybrid emission-free modern maritime vessels. In IEEE Trans. Industr. Inform. 17(8), 5430–5440 (2021) 18. Bega, D., Gramaglia, M., Banchs, A., Sciancalepore, V., Costa-Perez, X.: A machine learning approach to 5G infrastructure market optimization. In IEEE Trans. Mob. Comput. 19(3), 498–512 (2020) 19. Memon, S., Maheswaran, M.: Using machine learning for handover optimization in vehicular fog computing. In: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing (2019) 20. Liu, Y., Sun, Y.L., Ryoo, J., Rizvi, S., Vasilakos, A.V.: A survey of security and privacy challenges in cloud computing: solutions and future directions. J. Comput. Sci. Eng. (2015) 21. Shalev-Shwartz, S., Ben-David, S.: From Theory to Algorithms. Cambridge University Press, Cambridge (2014) 22. Wang, S., et al.: When edge meets learning: adaptive control for resourceconstrained distributed machine learning. In: IEEE INFOCOM (2018) 23. Min, Q., Lu, Y., Liu, Z., Su, C., Wang, B.: Machine learning based digital twin framework for production optimization in petrochemical industry. Int. J. Inf. Manag. 49 (2019) 24. Konstantakopoulos, I.C., Barkan, A., He, S., Veeravalli, T., Liu, H., Spanos, C.: A deep learning and gamification approach to improving human-building interaction and energy efficiency in smart infrastructure. Appl. Energy J. 237 (2019) 25. Liu, T., Tian, B., Ai, Y., Wang, F.-Y.: Parallel reinforcement learning-based energy efficiency improvement for a cyber-physical system. In IEEE/CAA J. Automatica Sinica 7(2), 617–626 (2020)
SearchOL: An Information Gathering Tool Farhan Ahmed1 , Pallavi Khatri1(B) , Geetanjali Surange1 , and Animesh Agrawal2 1 ITM University, Gwalior, India
{Pallavi.khatri.cse,Geetanjali.surange}@itmuniversity.ac.in 2 NFSU, Gandhinagar, India [email protected]
Abstract. Most organizations and their network administrators are familiar with penetration testing and possible attacks that can be done on a system through any software or hardware vulnerabilities of the system. System administrators, however neglect the quantity of system and user information that can be extracted anonymously from the content that is publicly available on the internet. This publicly available information is critical and of great use to penetration testers who wish to exploit the system. This work proposes a tool called ‘SearchOL’ developed in Python for gathering user related data from social sites using multiple search engines. Tool collects data passively and from the results proves to be a comprehensive data aggregator from multiple social platforms. The tool can be used for Information gathering which is the first phase of ethical hacking. The novelty of this tool is that it gives most important, most relevant and concise results from various search engines. It will help in reducing the efforts of the pen testers to gather information from public domains. Keywords: Ethical Hacking · Reconnaissance · Penetration Testing · Vulnerabilities
1 Introduction Now a days, everyone using social media and update their daily activities on social media. Autofill forms that are filled while creating any account on any social site store lots of personal information of an individual. We unknowingly provide lot of sensitive information to know people about out likes, dislikes, location, friends that can be exploited if gathered by attackers. Foot printing and reconnaissance process is used by hackers for gathering sensitive information from social sites and online presence. It is also used by ethical hackers to check that sensitive information is discoverable from the social sites and suggest techniques to hide it. Many tools are available online for gathering information having different functionality and use cases. Reconnaissance can be done in two ways: 1. Passive 2. Active © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 343–349, 2023. https://doi.org/10.1007/978-3-031-35501-1_34
344
F. Ahmed et al.
Passive reconnaissance means gathering of information without the direct interaction or connection with target. It can be done from social sites and internet searching. Social engineering is also considered as passive information gathering method. In Active reconnaissance we directly interact with target network or system for gathering information. This has high risk of detection than passive reconnaissance. It involves discovering of hosts, IP addresses, services, router on the network. Proposed tool ‘ShearchOL’ decrease the efforts done by ethical hackers for doing passive reconnaissance about target by simply entering the keyword (person name or organization name) and the tool will search the most relative informative websites links from Google, Ask, Yahoo, Bing search engines and save the links in a text file for further analysis. This decreases the overhead that ethical hacker faces by going one by one on different search engines and search information about the target. This is automatically done by proposed tool ‘SearchOL’. This article is organised in to 6 sections where the concept in introduced in Sect. 1, survey of literature has been summarised in Sect. 2. Section 3 and 4 brief about the Proposed work and experimental setup to conduct the experiment. Results and discussed in Sect. 5 and the complete work is concluded in Sect. 6.
2 Literature Survey The process of ethical hacking starts with the gathering of more and more information about the target, we can gather simple and sensitive both information from social media and internet sites, the gathering of information about the target is known as Footprinting. There are many tools available to do footprinting. With the help of Footprinting we can gather information about network such as Network ID, domain name, IP address, protocols, news articles, web server links etc. If hacker get some very sensitive information, he or she can use this information for its malicious activities [1]. Authors in [2] proposes a cyber-reconnaissance tool named SearchSimplified built using Java. This tool gather data related to the organization entered. This tool gather data using Google’s cache system, advanced query operators such as intitle:, site: filetype:. This tool gather data with the help of Google. Work proposed in [3] provides survey and taxonomy of adversarial Reconnaissance Technique, this paper tells us about cyber kill chain, Open-Source Intelligence, Sniffing. Cyber Deception and case studies of cybercrimes, categories of Target information for reconnaissance, external an Internal Reconnaissance, Taxonomy of reconnaissance techniques, Defensive Measures against Reconnaissance Techniques, etc. In paper [4] author shows various web-based platforms for collecting and tracking IP information. Author performed an experiment in specialized university computer lab, Connect all hosts machine in the lab in Local Area Network (LAN). The results provide host name, Autonomous system, Internet Service Provider, country, continent etc. A Comparative Study on Web Scraping is done in [5]. In these various practices of web scraping is shown. The author shows the multiple web scraping techniques by we can easily scrap websites, and compare various web scraping software. This paper gives the knowledge of various web scraping tools and techniques. We can easily and efficiently gather data from publicly available sources using OSINT (Open-Source Intelligence)
SearchOL: An Information Gathering Tool
345
tools of OSINT used in investigation phase for collecting information about target. The use of OSINT to gather information is shown in [6]. The proposed work uses the API keys of social media platforms and python libraries to check usernames exists or not, if exists gather data, store the results in database and display results in UI. The outcome of this study offers a review on web scraping techniques and software which can be used to extract data from web sites.
3 Proposed Work From the extensive literature survey, following conclusions are derived: • • • •
Existing system cannot Search using multiple search engines. They only search the usernames in different social sites. They require multiple dependencies to be installed before data searching. They search data from Organizations only and not from social accounts.
To extract precise and more information from the web this work proposes a Python based web scraping tool called SearchOL that will work on Google, Bing, Yahoo, Ask and will retrieve most relatable URLs from various websites. The tool will also store the retrieved information in a text file that can be further used by an attacker to exploit the system or a user. The proposed tool used Advanced Search technique of Google search engine called Google Dorking [7] to discovers the data. The working methodology of SearchOL is as follows: It uses Python Requests Module [10], that allows to send HTTP requests using Python, we use the requests.get (url) [10] method to send a GET request to the specified url and returns a Response Object that contains the server’s response to the HTTP request. It uses Python library Beautiful Soup for parsing structured data. It allows to interact with HTML in a similar way to how you interact with a web page using developer tools. It uses OS module [12] of python that provides functions for interacting with the operating system. This module is used to save the information that is gathered gather from search using SearchOL. The work flow of SearchOL is described below: • Take input • Create url for the input keyword • Make request for the url using Requests Module [10], one by one on Google, Bing, Yahoo, Ask Search engines [13]. • Find all links in the search result using Beautiful Soup [11] module. • Filter the most related and useful links • Append sites in a list named ‘sitelist’ • Append the links in a list named ‘links’ • Iterate through the ‘sitelist’ and ‘links’ to print the information • Now, save the information gather in a text file using OS module [12].
346
F. Ahmed et al.
4 Experimental Setup The tool developed in Python version 3.9.7 [8] is tested on system with configuration: Processor: Intel(R) Core (TM) i5-10210U CPU @ 1.60 GHz 2.11 GHz, System type: 64-bit operating system, x64-based processor, RAM: 8.00 GB with search keyword as Farhan Ahmed and scraping is done from Google, Bing, Yahoo, Ask as can be seen in Fig. 1.
Fig. 1. Output results of ‘SearchOL’
All information gathered from running the tool can be saved in a text file as shown in Fig. 2. The gathered information is vital as any penetration tester can use this information to exploit the target system.
5 Results A sample output from the tool is displayed in Fig. 3. As the user enters the name of an individual and proceeds to search on the social websites. The tool lists all the findings and allows to store complete data in a text file.
SearchOL: An Information Gathering Tool
347
Fig. 2. Saving all links in a text file.
Fig. 3. Saved text file
As summarised in Table 1 SearchOL tool has been compared with existing tools doing the same kind of work and results prove that amount of information that can gathered using SearchOL is more compared to others. This makes SearchOL tool more usable in case of information gathering about a person.
348
F. Ahmed et al. Table 1. Comparison of ‘SearchOL’ and other tools
Tools/Techniques
Source of data
Type of data
Restrictions
Sherlock [9]
Social media sites
Usernames
Only Usernames
Google Dorking [7]
Google
Files and sites
Only work with google
SearchSimplified [2]
Google
Only Organizational data
Only get organizational data
SearchOL (proposed) Google, Bing, Yahoo, All links related to None, it works with all Ask person or organization search engines, can get organizational data as well as persons
6 Conclusions As we see above that so much information is available on internet and this information if use for wrong purposes it will costs a lot. Many fraud calls, schemes, OTP frauds, Bank frauds done by just using your small-small information available online. Many hackers make fake accounts of victim user to deface him/her. The information we think useless, but it can have great impact on our life if goes in wrong hands. This paper provides the python-based tool ‘ShearchOL’ to gather important links related to the input keyword from Google, Ask, Bing and Yahoo search engines and save them easily in a text file for further analysis. The tool we present can be used by penetration testers to look for the sensitive information released on internet. So that they can take appropriate measures to protect the sensitive information. More features can be added to the tool for Data gathering. More search engines [13] can be added from where data can be scraped, secure web browsing [14] can be taken up as future work.
References 1. Shreya, S., Kumar, N.S., Rao, K., Rao, B.: Footprinting: techniques, tools and countermeasures for footprinting. J. Crit. Rev. 7, 2019–2025 (2020). https://doi.org/10.31838/jcr. 07.11.311 2. Roy, A., Mejia, L., Helling, P., Olmsted, A.: Automation of cyber-reconnaissance: a Javabased opensource tool for information gathering. In: 2017 12th International Conference for Internet Technology and Secured Transactions (ICITST), pp. 424–426 (2017). https://doi.org/ 10.23919/ICITST.2017.8356437 3. Roy, S., et al.: Survey and taxonomy of adversarial reconnaissance techniques. ACM Comput. Surv. (2022) 4. Boyanov, P.Kr.: Implementation of the web based platforms for collecting and footprinting IP information of hosts in the computer network and systems. Space Research and Technology Institute-BAS, Bulgaria Konstantin Preslavsky University-Faculty of Technical Sciences Association Scientific and Applied Research, vol. 16, p. 42 (2019) 5. Sirisuriya, S.C.M.de.S.: A Comparative Study on Web Scraping (2015)
SearchOL: An Information Gathering Tool
349
6. Sambhe, N., Varma, P., Adlakhiya, A., Mahakalkar, A., Nakade, N., Lakhe, R.: Using OSINT to gather information about a user from multiple social networks. Inf. Technol. Ind. 9(2), 207–211 (2021) 7. Parmar, M.: Google Dorks -Advance Searching Technique (2019). https://doi.org/10.13140/ RG.2.2.24202.62404 8. Van Rossum, G., Drake, F.L.: Python 3 Reference Manual. CreateSpace, Scotts Valley, CA (2009) 9. Sherlock-project. https://github.com/sherlock-project/sherlock 10. Chandra, R.V., Varanasi, B.S.: Python Requests Essentials. Packt Publishing Ltd. (2015) 11. Richardson, L.: Beautiful soup documentation. Dosegljivo (2007). https://www.crummy.com/ software/BeautifulSoup/bs4/doc/. Accessed 7 July 2018 12. Pilgrim, M.: Exceptions and file handling. In: Dive Into Python, pp. 97–120. Apress, Berkeley, CA (2004) 13. Croft, W.B., Metzler, D., Strohman, T.: Search engines: Information Retrieval in Practice, vol. 520, pp. 131–141. Addison-Wesley, Reading (2010) 14. Tang, S.: Towards secure web browsing. University of Illinois at Urbana-Champaign (2011)
Blockchain for Smart Healthcare: A SWOT Analysis from the Patient Perspective Kamal Bouhassoune1(B)
, Sam Goundar2
, and Abdelkrim Haqiq1
1 Faculty of Sciences and Techniques, Computer, Networks, Mobility and Modeling
Laboratory: IR2M, Hassan First University, Casablanca Street, Box 577, 26000 Settat, Morocco [email protected] 2 School of Computing, RMIT University, Handi Resco Building, 521 Kim Ma, Ba Dinh District, Hanoi, Vietnam
Abstract. Smart healthcare lies in the technologies that enable physicians and care providers to cooperate in the best interest of the patients. The expansion of smart healthcare services raises relevant challenges to keep the patient at the heart of a cooperative, trusted, and datafied healthcare system. From the patient perspective, the purpose of this contribution is to highlight the relevance of blockchains and related technologies for smart healthcare. Recent blockchains are programmable, decentralized, and trust technologies with suitable properties for intelligent, personalized, preventive, and participatory healthcare. Through a Strengths, Weaknesses, Opportunities, and Threats analysis, the key factors of blockchain’s relevance have been grouped and analyzed to build the four quadrants of a patientdriven SWOT matrix. Decentralized by design, the recent blockchains are promising extensions to the current centralized healthcare systems. Subject to the identified weaknesses and threats, blockchain might be the next patient-centric trust technology that enables trustworthy infrastructures for smart healthcare services. Keywords: Blockchain · Smart healthcare · Patient-centric trust technology · Strength · Weakness · Opportunity · Threat · Analysis
1 Introduction The healthcare system consists of multiple participants. Physicians, caregivers, and researchers cooperate in the best interest of patient health and recovery. The Hippocratic oath is an ethical code based on practicing medicine, diagnosing, and delivering face-to-face treatment with integrity and full respect for the patients and other parties. The patient’s involvement is as important as all the data his healthcare generates (see Fig. 1). Emerging innovations resulting in information technology-mediated services support the required cooperation in multidisciplinary healthcare fields, not to mention that they also might dehumanize the relationship with the patient [1]. As per the Smart Health journal introduction, smart healthcare refers to methods, models, information technologies, and patient care outcomes in favor of personalized, © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 350–357, 2023. https://doi.org/10.1007/978-3-031-35501-1_35
Blockchain for Smart Healthcare: A SWOT Analysis from the Patient Perspective
351
Fig. 1. The patient is at the heart of a trusted, cooperative, and datafied healthcare
participatory, preventive, and programmable healthcare. The patient experience in smart healthcare is enhanced in terms of disease prevention, risk monitoring, diagnosis, treatment, and drug development [2]. Ease of use, perceived usefulness, and mainly personalization have positive effects on the trust underpinning patient’s acceptance of smart healthcare services [3]. Blockchains are programmable and decentralized infrastructures that would serve patients’ trust. The underpinning technology might provide a smart profile for the patients [4]. Blockchains are also cooperative frameworks for healthcare, they would primarily be dedicated to parties engaged in care. The remainder of the contribution is structured as follows: Sect. 2 provides preliminary knowledge about blockchain technology and smart healthcare. Subsequently, the method used for the study is described in Sect. 3. Before concluding with research perspectives in Sect. 5, the analysis and discussion are addressed in Sect. 4.
2 Background 2.1 Blockchain Basics Beyond the hype and expectations surrounding blockchain-based technologies, the undeniable reality is that Bitcoin, appointed as the first generation of blockchains, has served from its genesis block to the latest mined one as a bit-based value exchange system devoid of any central trusted party. Bitcoin is a public, open, and time-tested transaction ledger where all and only involved nodes carry the network trust. The second generation of blockchains, such as Ethereum, moved the paradigm from a current state of value based approach (Unspent Transaction Output, UTXO) into an account-based model [4]. In programmable blockchains, nodes have a greater say in the network,
352
K. Bouhassoune et al.
and autonomous components, namely smart contracts, allow script self-execution and decentralized applications (Dapps) development. Decentralized by design, blockchains are in healthcare promising alternatives to vulnerable and centralized systems. They merge engaged parties into a patient-focused direction that tackles healthcare operational efficiency matters, mainly trust, traceability, and transparency. Blockchain technology has the potential to elevate telehealth care transparency and improve communication between patients and healthcare providers. More efficiency, time, and cost saving are some of blockchain’s benefits that may empower patient-centered healthcare delivery [5]. 2.2 Smart Healthcare Smart healthcare lies in the technologies that enable patients, physicians, and caregivers to cooperate and proactively monitor disease risk, assist in diagnosis, and assure treatment. The growing appeal for technologies such Internet of Medical Things (IoMT) based on sensors and wearable devices, Robotics, and Artificial Intelligence (AI) contributes to the emergence of new levers in gathering EHRs and building smart and connected healthcare environments. In a world with more concerned populations about health conditions and elderly individuals, the use of personal health devices (PHDs) is increasing to monitor and share health information, they also have an impact on patients’ IT identity and post-adoption use attitudes [6]. Health data integrity and privacy are subject to continuous regulations and ethical considerations, they are the main concerns for the practical adoption of emerging technologies such as blockchains. The current contribution focuses on personalized traits, as well as cooperative and programmable features for smart healthcare delivery. A decentralized patient-centered approach requires data and information security, whereas a Secured Smart Healthcare System would rely on blockchain fundamentals to encrypt patients’ sensitive electronic health records in a distributed infrastructure rather than vulnerable and centralized systems [7].
3 Methodology Blockchain has raised, as an emerging technology, high expectations. The technology has led to multiple experiments in several sectors including healthcare. The hype surrounding the exploration of blockchains has resulted in many projects and proofs of concept. To further identify how such a decentralized paradigm might benefit the healthcare industry, the analysis of Strengths (S), Weaknesses (W), Opportunities (O), and Threats (T) has been conducted. The deliberate choice of a patient-centered perspective, distinctive of our contribution, lies in the fact that, in a healthcare context, the primary interest of the patients is above all concerns, it is also a relevant driver for care providers’ cooperation. The patient’s best interest is a lofty purpose and a cooperative ground for committed participants who would use decentralized trust technologies. The SWOT analysis has been widely used in blockchain adoption within several industries. However, in the healthcare sector, the SWOT matrix has almost been used with a focus on institutions, data, and healthcare sector issues [8]. Unfortunately, the decentralized
Blockchain for Smart Healthcare: A SWOT Analysis from the Patient Perspective
353
nature of blockchains has not been fully captured [9] since patients are not sufficiently involved in the analysis building. In the following SWOT matrix, decentralization is considered the key driver for patient enrolment in the blockchain network. Patients as well as healthcare providers might assume a more cooperative role in the healthcare path. Blockchain technology has significant characteristics that would drive medical decision-making in inclusive and consensual directions.
4 Analysis and Discussion The four quadrants (see Fig. 2), strengths, weaknesses, opportunities, and threats are preliminary ground-breaking to develop strategies around blockchains for smart healthcare. The proposed patient-driven SWOT matrix of internal factors (S & W) and external ones (O & P) provides the first tier of understanding blockchain as patient-centric trust technology. Many challenges lie ahead of blockchain-enabled technologies in the healthcare domain [10].
Fig. 2. Blockchain for smart healthcare through a patient-centered SWOT analysis
Strengths S1: Blockchain attributes for individuals In a chain of blocks, each node benefits from the relevant following attributes: – Total or pseudo-anonymity in interacting within the network – Privacy of transactions and data ownership – Immutability of records immune to possible tampering, repudiation, and forgery
354
K. Bouhassoune et al.
From an individual standpoint, such socio-material characteristics of blockchain influence the user’s self-conceptions toward commoditization, self-sovereign ownership of data, transaction sequentialization, mediated interactions, and reputation [11]. The self-construal behaviors fit with the patient’s needs in the healthcare context. Data forgery protection, medical secrecy, and patient control of Personal Health Records (PHRs) would be achieved through blockchain characteristics for individuals. It would also provide personalized access control to Electronic Health Records (EHRs) [12]. S2: Collective gains through the blockchain Blockchain is a peer-to-peer (P2P) network that allows a growing number of participants to exchange a trusted value in a cooperative framework. Decentralized by design, the purpose of a blockchain such as Bitcoin is to share the same time-stamped truth with help of cryptographic proof and enforced incentives. The collective’s contribution is therefore at the service of the group as long as the majority of participants are not malicious nodes [13]. In the healthcare context, the patient is surrounded by a network of cooperative parties a priori not malicious. Weaknesses Decentralization is at the core of blockchains and has been considered a strength. Security and scalability are being examined as weaknesses whenever one or the other is selected as a second strength. W1: Network and data security The primary vow of blockchains is decentralization, in other words, no central third party guarantees the group’s trust. As the network grows in terms of nodes, it is subject to security vulnerabilities. According to the Open Web Application Security Project (OWASP), blockchain technology confronts nine of the ten identified web application security risks [14]. Injection, broken authentication, and broken access control are vulnerabilities that involve blockchain users (patients a fortiori in the case of healthcare context). There exist programming-based security issues as well as network-based ones. Malicious, vulnerable, and non-optimized smart contracts are a source of risks and attacks in programmable blockchains. The network-based vulnerabilities are more concerned with the participating nodes. In Proof Of Work (PoW) consensus-based blockchains, transaction integrity is liable to connection hijacking attacks such as black races, majority control, or selfish mining [15]. In blockchain-based healthcare applications, the security weakness affects EHRs in general and PHRs in the most critical ways. W2: Storage and scalability limitations From a node perspective, a blockchain participant must store a part or all the exchanged data in the underpinning peer-to-peer network. Storage requirements in such distributed systems, compared to centralized databases capabilities, are subject to several challenges. The main concerns are high throughput and scalability, transaction latency, research query complexity, and DCS (Decentralization, Consistency, Scalability) trilemma [16]. The high availability of data, a vulnerable point of peer-to-peer storage systems, reflects the scalability weakness of blockchains. The main challenge remains in achieving at once, decentralization, security, and scalability. Two alternatives might be considered, the first one consists of advancing
Blockchain for Smart Healthcare: A SWOT Analysis from the Patient Perspective
355
consensus-building and data storage mechanisms (on-chain), and the second is to supplement transaction processing and data exchange outside of the blockchain (off-chain) [17]. Opportunities O1: Patient engagement and empowerment. As a part of a trustworthy peer-to-peer network, the patient would apply and benefit from self-sovereignty through self-managing personal health data [18]. According to physicians who are familiar with blockchain concepts and technical implications, the main benefits that emerge from the qualitative study [19] were direct, lookup, and patient-mediated health information exchange (HIE). These three key factors would enlarge the role of the patients and would improve medical practices and the care path. A smart health profile based on tokens, wallets, programmable components, oracles, and a genuinely distributed identity might see the day and enhance existing and coming healthcare services [20]. O2: Patient-driven interoperability instead of institution-centric one. In healthcare, interoperability historically refers to interactions, processes, and data exchange between institutions. Blockchain technology strengths would enhance patientfocused interoperability where data exchange is mediated by patients. It enables the shift toward five axes. On one hand, aggregation, liquidity, and immutability of data, on the other hand, digital access rules, and patient identity [21]. Patient-centric and blockchainbased models transfer data ownership from care providers to patients. Through trusted facilities, healthcare professionals and service providers are in a position to interpret and check EHRs [22]. Fragmented patient data in traditional healthcare systems and the lack of standardization are issues that EHR storage and integrity management system based on blockchain would overcome with a primary focus on patients [23]. Threats T1: Patient resistance to change The acceptance of blockchain by patients would face legal and practical barriers. The reluctance in adopting such distributed technology by patients is related to legal concerns and policy [24]. The healthcare sector is subject to strong constraints in terms of patient medical secrecy, privacy respect, and PHRs integrity. Furthermore, like any technology that promotes contactless healthcare, blockchain is decentralized and suitable for mobile and remote medical services. They have taken on unprecedented importance at the risk of affecting face-to-face healthcare delivery [25]. T2: Organizational and effectiveness obstacles The lack of solid and proven standards in blockchains affects the intention of use and adoption in the healthcare field. Both the perceived standardization and regulatory uncertainties are unfavorably associated with the intention of blockchain organizational adoption. Moreover, the volatility and the lack of knowledge reinforce that negative effect [26]. Compared to existing solutions, cost-effectiveness is also an organizational metric for patients and healthcare institutions. The shift from proprietary and centralized systems to a distributed model would raise the question of cost support. The large-scale development of decentralized applications would impact patients and organizations [27].
356
K. Bouhassoune et al.
5 Conclusions Blockchains and related technologies have much to contribute to smart healthcare services. As a node in a peer-to-peer and programmable trust infrastructure, the patient benefits from the blockchain’s characteristics for individuals as well as from the collective gains through the technology. The identified opportunities are patient empowerment and patient-driven interoperability instead of the institution-centric one. Physicians and care providers would cooperate efficiently toward a patient-focused care path. Security and scalability concerns have been considered the main vulnerabilities in the SWOT matrix. These weaknesses explain the obstacles that would face blockchains’ acceptance by patients, and by institutions subject to organizational and effectiveness constraints. To summarize, the present contribution assesses blockchain relevance for smart healthcare through a patient-centered SWOT analysis. Acknowledgment. A warm thank you to Dr. Layth Sliman for his seemingly simple initiative that ignited our doctoral journey.
References 1. Botrugno, C.: Information technologies in healthcare: enhancing or dehumanising doctorpatient interaction? Health (London). 25, 475–493 (2021) 2. Tian, S., Yang, W., Grange, J.M.L., Wang, P., Huang, W., Ye, Z.: Smart healthcare: making medical care more intelligent. Glob. Health J. 3, 63–64 (2019) 3. Liu, K., Tao, D.: The roles of trust, personalization, loss of privacy, and anthropomorphism in public acceptance of smart healthcare services. Comput. Hum. Behav. 127, 107026 (2022) 4. Tatiana Gayvoronskaya, Christoph Meinel,: Blockchain: Hype or Innovation. Springer International Publishing, Cham (2021). https://doi.org/10.1007/978-3-030-61559-8 5. Cerchione, R., Centobelli, P., Riccio, E., Abbate, S., Oropallo, E.: Blockchain’s coming to hospital to digitalize healthcare services: designing a distributed electronic health record ecosystem. Technovation. 120, 102480 (2022) 6. Esmaeilzadeh, P.: How does IT identity affect individuals’ use behaviors associated with personal health devices (PHDs)? An empirical study. Inf. Manage. 58, 103313 (2021) 7. Tripathi, G., Ahad, M.A., Paiva, S.: S2HS- A blockchain based approach for smart healthcare system. Healthcare 8, 100391 (2020) 8. Khujamatov, H., Akhmedov, N., Amir, L., Ahmad, K.: Blockchain Adaptation in Healthcare: SWOT Analysis. In: Giri, D., Mandal, J.K., Sakurai, K., and De, D. (eds.) Proceedings of International Conference on Network Security and Blockchain Technology. pp. 346–355. Springer Nature Singapore (2022). https://doi.org/10.1007/978-981-19-3182-6_28 9. Siyal, A.A., Junejo, A.Z., Zawish, M., Ahmed, K., Khalil, A., Soursou, G.: Applications of blockchain technology in medicine and healthcare: challenges and future perspectives. Cryptography 3, 3 (2019) 10. Hussien, H.M., Yasin, S.M., Udzir, N.I., Ninggal, M.I.H., Salman, S.: Blockchain technology in the healthcare industry: trends and opportunities. J. Ind. Inf. Integr. 22, 100217 (2021) 11. Heister, S., Yuthas, K.: The blockchain and how it can influence conceptions of the self. Technol. Soc. 60, 101218 (2020) 12. Cunningham, J., Ainsworth, J.: Enabling patient control of personal electronic health records through distributed ledger technology. Stud Health Technol Inform. 245, 45–48 (2018)
Blockchain for Smart Healthcare: A SWOT Analysis from the Patient Perspective
357
13. Nakamoto, S.: Bitcoin: A peer-to-peer electronic cash system. Decentralized Business Review. 21260 (2008) 14. Poston, H.: Mapping the OWASP Top Ten to Blockchain. Procedia Comput. Sci. 177, 613–617 (2020) 15. Vyas, P., Goundar, S.: Security Issues in blockchain from networking and programming perspective. In: The Convergence of Artificial Intelligence and Blockchain Technologies: Challenges and Opportunities, pp. 243–269 (2022) 16. Raikwar, M., Gligoroski, D., Velinov, G.: Trends in development of databases and blockchain. In: 2020 Seventh International Conference on Software Defined Systems (SDS). pp. 177–182. IEEE, Paris, France (2020) 17. Hafid, A., Hafid, A.S., Samih, M.: Scaling blockchains: a comprehensive survey. IEEE Access 8, 125244–125262 (2020) 18. Fatokun, T., Nag, A., Sharma, S.: Towards a blockchain assisted patient owned system for electronic health records. Electronics 10, 580 (2021) 19. Esmaeilzadeh, P.: Benefits and concerns associated with blockchain-based health information exchange (HIE): a qualitative study from physicians’ perspectives. BMC Med. Inform. Decis. Mak. 22, 80 (2022) 20. Vian, K., Voto, A., Haynes-Sanstead, K.: A blockchain profile for medicaid applicants and recipients. Inst. Future August 8, 1 (2016) 21. Gordon, W.J., Catalini, C.: Blockchain technology for healthcare: facilitating the transition to Patient-driven interoperability. Comput. Struct. Biotechnol. J. 16, 224–230 (2018) 22. Zhuang, Y., Sheets, L.R., Chen, Y.-W., Shae, Z.-Y., Tsai, J.J.P., Shyu, C.-R.: A Patient-centric health information exchange framework using blockchain technology. IEEE J. Biomed. Health Inform. 24, 2169–2176 (2020) 23. Chelladurai, M.U., Pandian, S., Ramasamy, K.: A blockchain based patient centric electronic health record storage and integrity management for e-Health systems. Health Policy Technol. 10, 100513 (2021) 24. Mamun, Q.: Blockchain technology in the future of healthcare. Smart Health. 23, 100223 (2022) 25. Lee, S.M., Lee, D.: Opportunities and challenges for contactless healthcare services in the post-COVID-19 Era. Technol. Forecast. Soc. Chang. 167, 120712 (2021) 26. Dehghani, M., William Kennedy, R., Mashatan, A., Rese, A., Karavidas, D.: High interest, low adoption. a mixed-method investigation into the factors influencing organisational adoption of blockchain technology. J. Bus. Res. 149, 393–411 (2022) 27. Zhang, P., Walker, M.A., White, J., Schmidt, D.C., Lenz, G.: Metrics for assessing blockchainbased healthcare decentralized apps. In: 2017 IEEE 19th International Conference on e-Health Networking, Applications and Services (Healthcom), pp. 1–4. IEEE, Dalian (2017)
Enhancing the Credit Card Fraud Detection Using Decision Tree and Adaptive Boosting Techniques K. R. Prasanna Kumar1(B) , S. Aravind1 , K. Gopinath1 , P. Navienkumar1 , K. Logeswaran2 , and M. Gunasekar1 1 Department of Information Technology, Kongu Engineering College, Erode, Tamil Nadu, India
[email protected] 2 Department of Artificial Intelligence, Kongu Engineering College, Erode, Tamil Nadu, India
Abstract. The recent technology enhancement in the mobile and e-commerce field includes a huge number of online transaction, which results an increased number of fraud transaction and made a notable financial loss, for the individuals and banking sectors. Significant number of fraud transactions are made with credit cards. So, it is essential to develop a mechanism that ensures security and integrity of credit card transactions. In this article the main aim is to detect such fraud transactions using several machine learning algorithms such as Decision Tree & Adaptive Boosting. Due to high imbalance in dataset, the Synthetic Minority Oversampling Technique (SMOTE) is used to balance the data and Decision tree algorithm for classification. The Decision tree algorithm is used with the Adaptive Boosting technique to increase their quality of binary classification. The results are compared using the accuracy, precision and recall. Keywords: Machine Learning · Synthetic Minority Oversampling Technique · Decision Tree · Adaptive Boosting · Support Vector Machine
1 Introduction A rapid growth in the telecommunication industry has made a revolutionary change in the individual lifestyle like shopping, ordering food, making bill payments through online. The user can do all these using their phone or laptop. After a pandemic, count of online transactions are increased, which paves the way to the user to transfer money using their mobile phones linked with their bank account, credit card, debit card and other payment options like Unified Payment Interface (UPI) at any period of time through internet. Especially, most number of fraud activities were noted with the credit card. A notable number of credit card fraud recorded every year. The increased number of fraud activities in credit card creates a negative impact on the banking sectors and also creates a negative impression among the users [15]. These kind of cyber threat in credit card fraud occurs when an unauthorized user stole the details of the credit card holders and use it for several activities. They use techniques like intercepting an e-commerce transaction © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 358–365, 2023. https://doi.org/10.1007/978-3-031-35501-1_36
Enhancing the Credit Card Fraud Detection
359
through online, cloning an existing card and theft or lost of credit card might result in these kind of criminal works and also sharing account details with unauthorized persons [4]. Everyday cyber crime department getting more number of complaints related to this fraud activities. Lack of knowledge is the reason for most of the cases. Apart from individual level there are notable pit falls in the system side too. This fraudulent credit card transactions have severe impact to financial companies, an recent survey says RBI lost a huge amount in 2021–2022 in credit card fraud transactions. If this still continues, by 2023 the economic losses caused by credit card fraud will exceed to 35 billion dollars. Hence a solution should be developed, in order to reduce these kinds of transactions and provide an increased customer confidence and reduce customer complaints. By evaluating each transactions, there is a possibility for identifying the fraud activities [11]. Machine learning algorithms are widely used approach to solve many real time problems in different manner by applying different techniques [7]. Most of the present machine learning algorithms helps to solve this kind of issues but still an efficient solution is needed [13]. By involving machine learning techniques to identify the fraud transactions will improve the system accuracy [9]. However, there is an issues in class imbalance that exists in the data sets in credit card fraud detection and also many of the banking sectors do not want to share their data to solve this issues [14]. In this paper, we implemented a several Machine Learning algorithms for credit card fraud detection using real world dataset. Due to high imbalance in the dataset we use a Synthetic Minority Oversampling Technique (SMOTE) for balancing the dataset. The ML methods which we consider in this research includes Support Vector Machine, Decision Tree and Adaptive Boosting.
2 Related Works Ebenezer Esenogho et al. developed a credit card fraud detection using Deep learning algorithms. It uses a Synthetic Minority Oversampling Technique- Ensemble Neural Network (SMOTE-ENN) method for handling imbalanced dataset and Long Short Term Memory (LSTM) neural network as the base learner in the adaptive boosting technique [1]. The European card holders dataset is used for evaluation which contains 284,807 transactions. The sensitivity, specificity and AUC(Area Under Curve) are considered as the main performance metrics. The results shows that the Adaptive Boost occurred a sensitivity of 96.8% and specificity of 99.4%. The approach proposed by Altyeb Altaher Taha et al. for detecting fraud in credit card transaction was an Optimized Light Gradient Boosting machine(OLGBM) and the Bayesian-based hyperparameter optimization algorithm is integrated to tune the parameters of the light gradient boosting machine [2]. They have used two datasets, the first dataset consist of 284,807 credit card transactions and the second data set has 94,683 transactions [2]. The performance metrics considered are accuracy and precision. The result shows that Optimized Light Gradient Boosting machine(OLGBM) obtained an accuracy of 98.4% and precision of 97.3% compare to other methods. Nghia Nguyen et al. preferred an approach for credit card fraud detection they uses the CatBoost and Deep Neural Network for detecting fraud and the dataset provided by the Vesta Corporation is considered which includes the real world transactions [3]. The experimented result analysis showed that this method performed well and obtained AUC scores of 0.97 and 0.84.
360
K. R. Prasanna Kumar et al.
Fawaz Khaled Alarfaj et al. proposed a new approach to credit card fraud detection based on the State-of-the-Art machine learning and deep learning algorithms. The European card dataset, the Brazilian dataset, and the dataset provided by the commercial bank of China is considered for evaluation [5]. The accuracy is used for evaluation purpose and it shows State-of-the-Art Machine Learning and Deep Learning method provided an accuracy of 98.9% [5]. Huang Tingfei et al. applied a Variational auto encoder(VAE) method instead of Baseline, Generative adversarial network(GAN), SMOTE because the credit card datasets are highly imbalanced compared to other dataset and uses the ML algorithm for fraud detection [6]. The Sensitivity, Specificity, Accuracy and Precision act as the main performance metrics for this experiment. The accuracy of this method using VAE is 0.99962.
3 Dataset The dataset considered here was generated by the European bank. It was freely available in Kaggle. It consists of 284,807 transactions occurred in two days. This dataset is highly imbalanced because 99.828% of transactions are valid and 0.172% of transactions are invalid. It totally contains 30 attributes. The last column of the dataset is Class, it contains only 0 and 1, where 0 represents the valid transaction and 1 represents the invalid transaction.
4 Proposed Method The proposed work focus on the credit card fraud detection based on the Decision Tree and Adaptive Boosting algorithms. The overall workflow of the proposed algorithm is shown in the Fig. 1. The proposed work has been divided into three phases as follows, Phase 4.1: Dataset Pre-Processing. Phase 4.2: Classification using Decision Tree. Phase 4.3: Ensemble Decision tree with Adaptive Boosting.
Phase 4.1: Dataset Pre-Processing The dataset considered for this proposed work contains around 284,807 transactions in which 492 transactions are identified as fraud cases. It is noticed that the dataset is in a imbalanced state, since it contains more number of valid transactions which are called as majority classes than invalid transactions which is denoted as minority classes [10]. The imbalance in the dataset will leads to a poor performance when it is used with any of the ML models [8]. The SMOTE is the most widely used resampling method to balance such datasets. It is an oversampling technique which will generate the new data from existing minority classes [1]. First, it chooses a random data from minority class of given dataset and search for a nearest neighbor of the selected sample from the same minority class. After identifying the suitable neighbor, it creates a new synthetic instance between the two samples which is not a duplicate of the minority [1]. It ensures
Enhancing the Credit Card Fraud Detection
361
Fig. 1. Depicts the overall performance of proposed method
the count of minority classes equals to the majority classes in the dataset. In this manner, it balances the dataset.
362
K. R. Prasanna Kumar et al.
Algorithm 1: Input: Input data Output: Balanced dataset Step 1: Choose a random sample X from the minority class Step 2: Search for nearest neighbours of X from minority class Step 3: Creating synthetic instance by choosing one of the nearest neighbour Y by random Step 4: Connect X and Y Step 5: After connecting, the line segment will form the feature space Step 6: It generates a successive synthetic instances as a combination of two selected data
Phase 4.2: Classification using Decision Tree The Decision tree acts as a base learner in Adaptive Boosting for improving the fraud detection. It is used for both classification and regression. It follows a divide and conquer strategy by conducting a greedy search to identify the suitable splits points within a tree. The process of splitting is repeated in a recursive manner until the majority of data have been classified under specific classes like valid transaction and invalid transaction. The selection of root node is the difficult process in decision tree. In this, the root node has been selected using the Gini index method. Gini index is calculated by subtracting the sum of probabilities from one. Gini index = 1 − (Pi )2 where P denotes probability. Gini index value is calculated for all the columns in the dataset. A Gini index value of 0.5 denotes that all elements are uniformly classified. The column with minimum Gini index is considered and a random row value is selected and it is considered root node of the tree. The remain values are compared with root. This method recursively applied to each subset until it reaches the final node. Algorithm 2: Input: Dataset Output: Classification of the dataset Step 1. S is considered as root node Step 2. Find the best attributes based on the Gini index Step 3. Divide S into subsets Step 4. Generate decision tree node Step 5. Repeat Step-2 and Step 4 until it reach a final node Step 6. Final node
Phase 4.3: Ensemble Decision tree with Adaptive Boosting Boosting is an approach in ML models, it is used for creating an accurate model with
Enhancing the Credit Card Fraud Detection
363
the help of combination of several methods [12]. Adaptive Boosting is an ensemble technique used for building the strong classifiers by voting the weighted predictions of the weak learners [1]. It is integrated with the other ML models to improve their performances. Here the Adaptive Boost uses the Decision Tree as the base learner for improving the performances. The total classification error of the current model can be computed by using. N gt(X) GN(X) = (t=1)
where gt is a weak learner, t is an iteration, x is an input vector. The weight dependent of the input data is changed by the prediction performance of the previous classifier. It assigns a higher weights to the incorrectly classified and lower weight to the correctly classifies. New sample weight = old weight ∗ eα where old weight is the previous weight calculated in the above classification and α is the influence of the classifier it will be negative when it classified correctly and positive when it classified incorrectly. The above steps are repeated until it reaches the maximum iterations.
5 Results and Discussions The experimental process was carried out in three phases. In phase 1 the SMOTE method is used for for balancing the dataset because the credit card dataset are highly imbalanced. In phase II the Decision Tree algorithm is implemented for classification of dataset. In phase III the Decision tree algorithm is used as a base learner for the Adaptive Boosting algorithm.
Fig. 2. Decision Tree
Fig. 3. SVM
Figures 2, 3, and 4 shows the confusion matrix for Decision Tree, SVM, Decision Tree with Adaptive Boosting. The values in the confusion matrix are used to calculate the accuracy, recall and precision. Where the values are given in the Table 1. The Table 1. depicts the performance of ML algorithms like Decision Tree, SVM and Decision Tree ensemble with Adaptive Boosting while comparing Decision Tree
364
K. R. Prasanna Kumar et al.
Fig. 4. Decision Tree ensemble
Table 1. Performance Evaluations of the proposed methodology. ALGORITHM
ACCURACY RECALL PRECISION F1_SCORE
Decision Tree
96.5%
0.952
0.981
0.965
SVM
98.5%
0.985
0.989
0.985
Adaptive Boosting with Decision 99.7% Tree
0.996
0.998
0.997
and SVM, the SVM acquires the 2% to 3% of better accuracy and recall than Decision Tree. The SVM provides better results. The Decision tree further extended with adaptive boosting and results are noted. The adaptive boosting further improved the results. The Decision Tree ensemble method provides 3 to 4% better than Decision Tree and 1% to 2% better than SVM. It shows that the Decision Tree with Adaptive Boosting provides the better performance than other two algorithms.
6 Conclusion The proposed work provides an optimal solution to the credit fraud detection using Decision Tree ensemble Adaptive Boosting ML algorithm. Day by day credit card usage transactions are increasing where other side notable number of frauds activities also recorded by the officials. Even though high level security mechanisms were used but still loopholes are identified by the hackers to perform fraud transactions. Recent days ML algorithms are providing significant solutions to various real time problems. This paper provides Decision tree with adaptive boosting method to detect the fraud transactions which trained and tested using the European credit card dataset. Decision tree with adaptive boosting method provides overall best performance while comparing with Decision tree and SVM algorithms. In future this can be further extended with other ML algorithms to improve the detection. Also it can be integrated with real time transactions systems to prevent from fraud transactions.
Enhancing the Credit Card Fraud Detection
365
References 1. Esenogho, E., Mienye, I. D., Theo G. S.: A neural network ensemble with feature engineering for improved credit card fraud detection. IEEE Access 10, 16400–16407 (2022) 2. Taha, A. A., Malebary, S. J.: An intelligent approach to credit card fraud detection using an optimized light gradient boosting machine. IEEE Access 8, 25579–25587 (2020) 3. Nguyen, N., et al.: A proposed model for card fraud detection based on Catboost and deep neural network. IEEE Access 10, 96852–96861 (2022) 4. Zhou, H., Sun, G., Sha, F., Wang, L., Juan, H., Gao, Y.: Internet financial fraud detection based on a distributed big data approach with Node2vec. IEEE Access 9, 43378–43386 (2021) 5. Alarfaj, F. K., Malik, I., Khan, H. U., Almusallam, N., Ramzan, M., Ahmed, M.: Credit card fraud detection using State-of-the-Art machine learning and deep learning Algorithms. IEEE Access 10, 39700–39715 (2022) 6. Tingfei, H., Guangquan, C., Kuihua, H.: Using variational auto encoding in credit card fraud detection. IEEE Access 8, 149841–149853 (2020) 7. Dal Pozzolo, A., Boracchi, G., Caelen, O., Alippi, C., Bontempi, G.: Credit card fraud detection: a realistic modeling and a novel learning strategy. IEEE Trans. Neural Netw. Learn. Syst. 29, 3784–3797 (2017) 8. Makki, S., Assaghir, Z., Taher, Y., Haque, R., Hacid, M.-S., Zeineddine, H.: An experimental study with imbalanced classification approaches for credit card fraud detection. IEEE Access 7, 93010–93022 (2019) 9. Wang, H., Wang, W., Liu, Y., Alidaee, B.: Integrating machine learning algorithms with quantum annealing solvers for online fraud detection. IEEE Access 10, 75908–75917 (2022) 10. Kalid, S. N., Ng, K.-H., Tong, G.-K., Khor, K.-C.: A multiple classifiers system for anomaly detection in credit card data with unbalanced and overlapped classes. IEEE Access 8, 28210– 28221 (2020) 11. Lebichot, B., Verhelst, T., Le, Y.-A., He-Guelton, L., Oblé, F., Bontempi, G.: Transfer learning strategies for credit card fraud detection. IEEE Access 9, 114754–114766 (2021) 12. Jiang, C., Song, J., Liu, G., Zheng, L., Luan, W.: Credit card fraud detection: a novel approach using aggregation strategy and feedback Mechanism. IEEE Internet Things J. 5, 3637–3647 (2018) 13. Logeswaran K., Suresh P., Savitha S., Prasanna Kumar K.R.: Handbook of Research on Applications and Implementations of Machine Learning Techniques (US) Optimization of Evolutionary Algorithm Using Machine Learning Techniques for Pattern Mining in Transactional Database (2019) 14. Logeswaran K., et al.: Discovery of potential high utility itemset from uncertain database using multi objective particle swarm optimization algorithm, In: 2022 International Conference on Advanced Computing Technologies and Applications (ICACTA), pp. 1–6 (2022). https://doi. org/10.1109/ICACTA54488.2022.9753159 15. Prasanna Kumar, K.R., Kousalya, K.: Amelioration of task scheduling in cloud computing using crow search algorithm. Neural Comput. Appl. 32(10), 5901–5907 (2019). https://doi. org/10.1007/s00521-019-04067-2
A Manual Approach for Multimedia File Carving Pallavi Khatri1(B) , Animesh Agrawal2 , Sumit Sah1 , and Aishwarya Sahai1 1 Department of CSE, ITM University, Gwalior, India
[email protected] 2 National Forensic Sciences University, Gandhinagar, India
Abstract. The recovery and examination of digital data has become a significant part of numerous criminal investigations today. Given the ever-expanding number of individual digital gadgets, for example PDAs, tablets and cell phones, we as a whole accumulate, store, and generate an enormous amount of data. A portion of this data might be valuable proof for examination and might be utilized in courts. During the past two decades, significant research has been made towards characterizing devices for the analysis of evidence originating from different sources. This work attempts to provide an automated solution for the extraction and carving of data from digitally generated dumps or image files of the evidence. The process of file carving proposed in this work is suitable for readers of diverse backgrounds. Keywords: File carving · File signatures · Memory dump · Memory forensics
1 Introduction The world is getting progressively interconnected. We find ourselves connected virtually via gadgets and digital systems are the spines of corporate and government associations everywhere. The Internet is, in any case, a network of networks consisting of contending and simultaneous innovations with clients from various associations and nations. Digital forensics is getting progressively significant with the increase of cybercrime. Understanding the laws and guidelines overseeing electronic interchanges, cybercrimes, and information maintenance requires the persistent appropriation of new information, techniques, and devices [1]. Digital proof is everywhere and plays a vital role in any criminal investigation, from trivial crimes to cybercrime, systematic corruption, and terrorism. In absence of metadata of file system a file can be carved from the fragments of file in the memory image. File carving using this approach focuses on contents of the file rather than its metadata. It makes easy retrieval of files compared to metadata approach where file retrieval becomes difficult if directory entries of files are missing or some of the fragments have been deleted or found missing [2]. On damaged media too file carving process based on the metadata becomes difficult and this approach based on searching files or other kinds of objects based on content proves to be effective. In case of any digital investigation extracting files / data from unallocated space is a routine practice. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 366–374, 2023. https://doi.org/10.1007/978-3-031-35501-1_37
A Manual Approach for Multimedia File Carving
367
This work proposes and develops a technique for manual file carving in the absence of an automated tool. The experiments use reconstruction of data through file carving. Findings of the application of proposed carving method shows that the data could be restored from the unallocated space as well.
2 File Carving File carving technique uses information of internal file structure and contents of a deleted file for recovery [3]. Without any dependencies on file system data can be carved from raw data set available in a dump file. Unallocated space in memory does not store information of any metadata related to files or file system and it becomes difficult to extract data from these locations of memory. File carving can be used in such cases where file can be retrieved from these spaces based on their contents. 2.1 Magic Numbers Magic numbers are the constants that identify a file type with its unique signature [4]. All file types are identified by their unique extensions and operating system identifies the files in a system. This feature of files is used by the application programs before handling the files by verifying their signatures (Magic Numbers) [5]. Signatures normally indicate start of a file (Header) and end of file (Footer). 2.2 File Carving Techniques A huge amount of data can be retrieved from memory dumps. Any digital investigation leads to location and retrieval of data from the image in forensically sound manner so as to be produced in court of law. Most popular file carving methods are: (a) Header-footer technique or “maximum file size” carving Retrieve files with help of their headers and footers or their size. For example, JPEG file is identified by “\xFF\xD8” as header and “\xFF\xD9” as footer and GIF file has “\x47\x49\x46\x38\x37\x61” as header and “\x00\x3B” as footer. (b) Carving based on File structure Files internal structure is used to carve data using this technique. Elements like header, footer identifier strings and file size is used to carve the data. 2.3 Multimedia File Carving Multimedia files can be craved using their internal formats and their file signatures [6]. Files signatures such as Start Of File header (SOF) and End of File footer (EOF) are used in carving and all the bytes between these flags are used to reconstruct the file.
368
P. Khatri et al.
2.4 Fragment File fragments denote the pieces of files that are scattered throughout the storage instead of one continuous location. Periodic deletions of files from the storage leads to the creation of fragments and these fragments may get overwritten by newly stored data. A file can be reconstructed from these fragments if they are not overwritten.
Fig. 1. JPEG File Structure
Figure 1 shows a clear storage of JPEG files stored between the Header tag and the Footer tag or SOI (Start Of Image Marker) and EOI (End Of Image Marker). If the complete bit stream between SOI and EOI is fetched, JPEG image file can be carved successfully.
3 Literature Survey In this section, some past research work on file carving and advanced file carving techniques are being discussed. This section also helps in finding gaps where proposed technique can be used to carve the files from the system. The work described in [7] discusses the file recovery and repair of corrupted Mp4 video files. The technique for recovery MP4 (MPEG-4 Part 14) files is different from traditional programs like Scalpel’s method. The tool named MP4-Karver is used and it automatically extracts frames from video and repair corrupted videos. In [8] the author has proposed a method to recover fragmented media files. Recovery processes is complex and takes higher computation power as well as time, so the author also discusses different optimization to reduce computation speed and improve performance of the recovery. The research in [9] describes a different video recovery technique which is used for file carving. Scalpel’s method, Bi-fragment gap carving technique, Garfienkel method, smart Carving and Frame based recovery are some of the most common techniques. In [10] research work the author Pal and Menon discusses the need of improvement in digital file recovery in their research work. The paper [11] highlights the classic approach to recover data from corrupted storage media like hard drives, pen drives etc… This work is based on file system metadata. Flash based devices are non-volatile memory and its memory is electrically erasable programmable type in work [12] the author focused on the method on forensics data recovery from flash drives. In [13] research work the author described the process of carving which was equivalent to estimating a mapping function between bytes copied from an image of storage media to the recovered file. This paper also describes the mapping function generator for the PDF (Portable Document
A Manual Approach for Multimedia File Carving
369
Format) and Zip file format. The research work in [14] the author gives the method to statistical analysis of fragmented data by using their binary structure. Limitations of work done in past is that they work on some specific type of files like in paper [13] describes the mapping function generators only for PDF and zip file. In [7] the paper is focused on only MP4 (MPEG-4 Part 14) file carving. In paper [12] it’s work on flash based devices and recover data only from flash drives. Main research gap is that all the research works did not focus on carving data from raw dump and it also did not carve some popular types of files. Manual extraction of data has been explained in detail in [15]. Concept of Virtual Android phone is used to explain how file carving can be done and data from apps can be obtained in the absence of expensive forensic tools by [16, 17].
4 Proposed System In this work data is being carved from raw disk dump (DD) files. Raw disk dump of any storage either volatile or non-volatile memory is taken and is analyzed manually using SOF and EOF. Proposed work carves 4 types of files i.e. JPEG (Joint Photographic Expert Group), HTML (Hyper Text Markup Language), PNG (Portable Network Graphics) and PDF (Portable Document Format) from the memory dump. The best feature of this work is that it will recover the present files as well as deleted files (Fig. 2).
Fig. 2. Proposed System
4.1 File Formats Before moving on to the file carving of a file, the formats of various multimedia files is explained in subsequent paragraphs. 4.1.1 JPEG File Format Compression of images uses JPEG (Joint Photographic Expert Group). Various file formats under this group are .jpeg, .jpg, .jiff or .jpe and out of all .jpg is most popular. JPEG files are identified by unique file markers. Table 1 gives a brief insight of JPEG file markers. SOI or Start of Image is called the header of the file. EOI or End of Image is called the footer of the file.
370
P. Khatri et al. Table 1. JPEG Image File markers
Marker
Name
Payload
Byte
SOI
Start of Image
None
0xFF,0xD8
SOFO
Start of Frame(BaselineDCT)
Variable Size
0xFF,0xC0
SOF2
Start of Frame(Progressive DCT)
Variable Size
0xFF,0xC2
DHT
Define Huffman Table
Variable Size
0xFF,0xC4
DQT
Define Quantization Table
Variable Size
0xFF,0xDB
DRI
Dfine Restart Intervals
4 Bytes
0xFF,0xDD
SOS
Start of Scan
Variable Size
0xFF,0xDA
RSTn
Restart
None
0xFF,0xDn(n = 0.7)
APPn
Application Specific
Variable Size
0xFF,0xEn
COM
Comment
Variable Size
0xFF,0xFE
EOI
End of Image
None
0xFF,0xD9
4.1.2 HTML File Format HTML (Hyper Text Markup Language) defines and designs the structure of a Web page. The HTML document is identified by tag as the start and as the end of the code. The basic syntax for a HTML File is as shown in Fig. 3.
Fig. 3. HTML Syntax
Figure 3 shows the format of a HTML code with its standard tags present in the code. Each tag used in the code is enclosed within SOF and EOF of the code i.e. and . 4.1.3 PNG File Format An image file-format that supports the lossless data compression is Portable Network Graphics is PNG. PNG was developed as an improved, non-patented replacement for Graphics Interchange Format (GIF). A PNG file starts with an 8-byte signature as shown in Fig. 4.
A Manual Approach for Multimedia File Carving
371
Fig. 4. PNG File Format
4.1.4 PDF File Format The Portable Document Format (PDF) is a file format developed by Adobe in the 1990s to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. A PDF file header contains the magic number and the version of the format and ends with a footer containing startxref keyword followed by an offset to the start of the cross-reference table (starting with the xref keyword). There may be multiple end-of-file marks within the file. It must be ensured that while carving the last EOF marker is found to retrieve the file. Table 2 shows the SOF and EOF details of the files taken in to consideration for this study. Table 2. SOF and EOF of Multimedia Files File Type
SOF
EOF
JPEG
FFD8FFE0
0xFFD9
HTML
3C, 68, 74, 6D, 6C
3C, 2F, 68, 74, 6D, 6C, 3E
PNG
89 50 4E 47 0D 0A 1A 0A
49 45 4E 44 AE 42 60 82
PDF
25 50 44 46
0A 25 25 45 4F 46 (.%%EOF), 0A 25 25 45 4F 46 0A (.%%EOF.), 0D 0A 25 25 45 4F 46 0D 0A (..%%EOF..) 0D 25 25 45 4F 46 0D (.%%EOF.)
The information Table can be used for manual extraction of files from the image of the system. Next section briefs the procedure to extract the data from raw dump of the image using hex editor.
5 Manual Extraction/Carving of file using Hex Editor This section briefly describes the steps to be followed for manual carving of files from the raw image file. (a) Create a raw image using dd Command. (b) Upload the image created in step 1 to the hex editor. Once this is done, Hex-decimal raw data that forms the JPEG file can be seen in Fig. 5. (c) Search through the data the header and the footer of the files to be carved. Inspect the binary data of the dd file for the same.
372
P. Khatri et al.
Fig. 5. JPEG Hex:SOI/Header & Hex: EOI/Footer
(d) Once the header and trailer of a file are identified, copy all the bytes between them and save it as a new file to get the original file.
Fig. 6. HTML and PNG Header Footer
(e) Similarly Fig. 6 shows the SOF and EOF of HTML, PNG and Fig. 7 of PDF files respectively that are used to carve the existing and deleted files from the memory image.
Fig. 7. PDF Hex: Header and Footer
6 Results and Discussions We shall now see the results obtained when we actually tried to carve a .dd file. This study used dump of an old pen drive of 1 GB with a data set of 5 png file, 3 html file, 6 pdf, 6 jpg files and had some deleted data named as testdump.dd . After carving the data using manual extraction method following files is retrieved as shown in Table 3.
A Manual Approach for Multimedia File Carving
373
Table 3. Data extracted from image of Pen Drive Files
No. of files in input dataset
No. of files obtained after carving
JPEG
6
24
PNG
5
8
HTML
3
5
PDF
6
8 (6 working and 2 corrupted)
Finding listed in table shows clearly that there is an increase in the number of files present in the pen drive to the number of extracted files. This shows that manual extraction procedure is successfully carving data from the unallocated space of drive too. Of course the results are not 100% as the original data set. Some of the files that are received are corrupted. This shows the missing data or overwritten bits that could not be recovered. The PDF’s used in the data set contained some images. Those images are also recovered while extraction. Old files or software’s that were deleted from the drive earlier, their PNG file could be retrieved.
7 Conclusion In this work, we presented a manual extraction method for carving of file. There are although many tools present for data carving but most of them just recover the data instead of actually carving it. Proposed technique actually carves files from the image making it an ideal process for digital forensics. As the principal of forensics says even the tiniest detail is important, proposed method helps us carve the file as much as possible generating excellent results.
References 1. Surange, G., Khatri, P.: IoT forensics: a review on current trends, approaches and foreseen challenges. In: 2021 8th International Conference on Computing for Sustainable Global Development (INDIACom), pp. 909–913 (2021) 2. Sharma, A., Agrawal, A.K., Kumar, B., Khatri, P.: Forensic analysis of a virtual Android phone. In: Verma, S., Tomar, R.S., Chaurasia, B.K., Singh, V., Abawajy, J. (eds.) CNC 2018. CCIS, vol. 839, pp. 286–297. Springer, Singapore (2019). https://doi.org/10.1007/978-98113-2372-0_25 3. Yoo, B., Park, J., Lim, S., Bang, J., Lee, S.: A study on multimedia file carving method. Multimedia Tools Appl.61(1), 243–261 (2012) 4. Jenkinson, B., Sammes, A.J., 2000. Forensic Computing: A Practitioner’s Guide (Practitioner Series). Springer, Berlin, vol. 157, pp. 2–12 (2000) 5. GCK’S FILE SIGNATURES TABLE 24 April 2020. https://www.garykessler.net/library/ file_sigs.html 6. Battiato, S., Giudice, O., Paratore, A.: Multimedia forensics: discovering the history of multimedia contents. In: Proceedings of the 17th International Conference on Computer Systems and Technologies, pp. 5–16 (2016)
374
P. Khatri et al.
7. Abdi, A.N.E., Mohamad, K.M., Hasheem, Y.M., Naseem R., Jamaluddin, Aamir, M.: Corrupted MP4 carving using MP4 Karver. Int. J. Adv. Comput. Sci. Appl. 7(3), 2016 (2016) 8. Poisel, R., Tjoa, S., Tavolato, P.: Advanced file carving approaches for multimedia files. J. Wirel. Mob. Netw. Ubiquitous Comput. Dependable Appl. 2(4), pp. 42–58 (2011) 9. Garfinkel, S., Nelson, A., White, D., Roussev, V.: Using purpose-built functions and block hashes to enable small block and sub-file forensics. Digit. Invest. 7, S13–S23 (2010) 10. Pal, A., Memon, N.: The evolution of file carving. IEEE Signal Process. Mag. 26(2), 59–71 (2009) 11. Carrier, B.: File System Forensic Analysis. Addison-Wesley Professional (2005) 12. Breeuwsma, M., De Jongh, M., Klaver, C., Van Der Knijff, R., Roeloffs, M.: Forensic data recovery from flash memory. Small Scale Digital Device Forensics J. 1(1), 1–17 (2007) 13. Cohen, M.I.: Advanced carving techniques. Digit. Investig. 4(3–4), 119–128 (2007) 14. Karresand, M., Shahmehri, N.: File type identification of data fragments by their binary structure. In: Proceedings of the IEEE Information Assurance Workshop, pp. 140–147 (2006) 15. Agrawal, A.K., et.al.: Android forensics: tools and techniques for manual data extraction (March 20, 2019). In: Proceedings of International Conference on Sustainable Computing in Science, Technology and Management (SUSCOM), Amity University Rajasthan, Jaipur India (2019) 16. Sah, S., Agrawal, A.K., Khatri, P.: Physical data acquisition from virtual Android phone using Genymotion. In: Karrupusamy, P., Chen, J., Shi, Y. (eds.) ICSCN 2019. LNDECT, vol. 39, pp. 286–296. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-34515-0_30 17. Agrawal, A.K., et al.: Digital forensic analysis of Facebook app in virtual environment. In: 2019 6th International IEEE Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, pp. 660–664 (2019)
NadERA: A Novel Framework Achieving Reduced Distress Response Time by Leveraging Emotion Recognition from Audio Harshil Sanghvi, Sachi Chaudhary, and Sapan H. Mankad(B) CSE Department, Institute of Technology, Nirma University, Ahmedabad, India {19bce238,19bce230,sapanmankad}@nirmauni.ac.in
Abstract. This paper proposes a novel framework for automatically directing the user to an appropriate helpline number in an emergency based on his/her emotional state. For emotion detection, we integrate four benchmark datasets (SAVEE, RAVDESS, TESS, and Crema-D). We further examine the impact of various features on this comprehensive dataset and see the possibility of generalization with the help of diversified data. The highest accuracy achieved by the Convolutional Neural Network (CNN) model is 93.14% using the proposed approach. The results indicate that our emotion recognition model highly depends on the choice of audio features. Finally, we use this prediction to build our single-emergency number helpline architecture which predicts the caller’s emotions and directly connects them to the desired person for seeking mental help through counselors, protection with the help of police, or a general call center for any other help. This framework reduces the response time and provides a single point of connection. Keywords: audio · MFCC · CNN Crema-D · emotion detection
1
· RAVDESS · SAVEE · TESS ·
Introduction
Detecting a person’s emotion is important for several reasons like health applications, increasing business reach, user feedback, and making human-computer interaction much more efficient. The response time is crucial for emergency hotline services like 911 in the United States and 112 in India, where the goal is to deliver time-critical assistance to its callers. According to a study by the US Department of Homeland Security (USA), the typical shooter event at a school lasts 12.5 min. In comparison, the average response time for law enforcement is 18 min1 . The statistics reveal that officers arrive after the crime has been perpetrated. According to 2022 figures, it takes an average of 35 min for a law enforcement officer to get on the scene after dialing 911 [2]. In the event of a 1
https://www.creditdonkey.com/average-police-response-time.html.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 375–385, 2023. https://doi.org/10.1007/978-3-031-35501-1_38
376
H. Sanghvi et al.
lower-priority call, this period is extended even more. In terms of medical emergencies, the average delay from the moment of a 911 call to arriving on site is 7 min. In remote areas, the median duration is more than 14 min [13]. Communication between the caller and the callee is crucial in the increased response time in such cases. During an emergency, callers are usually under a lot of stress, worry, and strain, which makes it challenging to communicate clearly and quickly with the callee. Aside from that, the fact that many nations have different helpline numbers depending on their emergency type makes this issue much more difficult since the caller is expected to remember the number suited for the scenario and then ask for assistance. Once the correct number is dialed, the help-seeker impacted by emotions and, as a consequence, is weeping, suffering shortness of breath, and so on, must ensure that all of his information is accurately logged at the emergency services desk so that he receives the fastest possible assistance. However, since there is a step of communication between the caller and the callee, human interaction always results in a longer response time and allows room for human error. This research seeks to solve these concerns by presenting a unique framework that uses the intelligence of neural networks. We propose a framework with a single emergency number in which we detect the correct emotion from the caller’s side and direct the person to the relevant department. Furthermore, we pick up on two emotions whose scenario reaction time is considerably decreased by automating the complete communication process for callers. The following are the major contributions of this work: – A detailed literature review of the emotion recognition systems from the speech is presented. – We then showcase the process of data pre-processing, where we take four audio datasets and concatenate them to achieve a quality dataset on speech emotion classification, which provides a high-end feature engineering model. – A framework is proposed to provide a uni-call system using the speaker’s emotions. The rest of the paper is organized as follows. Section 2 provides a literature review of the task at hand. The framework for reducing distress response time is proposed in Sect. 3. Section 4 describes the data preparation and feature representation methodologies. We show experimental details in Sect. 5 and discuss the results in Sect. 6. Finally, the paper ends with concluding remarks and future directions in Sect. 7.
2
Related Work
Deep Learning has emerged as a versatile tool providing tremendous power and capabilities in diverse areas, from Computer Vision and natural language processing. Several deep learning models have been used in the field of medicine for leukemia detection [4–6]. Emotion recognition from speech/audio is a booming
NadERA: Emotion Recognition from Audio
377
field in the current period due to advancements in technology, artificial intelligence, etc. Kanwal et al. [10] presented a clustering-enabled genetic algorithm to select the best-fitting features from three datasets: SAVEE, EMO-DB, and RAVDESS. The support vector machine algorithm was further used to classify the audio emotion and showed effective results. A deep feature-based layered approach outperformed conventional machine learning algorithms, as presented in [15]. Further, Mehmet [8] proposed a novel hybrid technique based on deep and acoustic features (MFCC, zero-crossing rate, and root mean square energy values) to increase the accuracy. The emotion recognition systems lack high-quality input audio data and noisy environments. To address this issue, Mingke et al. [17] proposed a head-fusion framework to improve the robustness of the speech recognition system and accuracy. This model was built using the RAVDESS and IEMOCAP datasets. A transfer learning and autoencoder-based 1-dimensional deep convolutional neural networkbased framework for voice-based emotion recognition were presented by [14], and they obtained 96% accuracy on the TESS dataset. This paper proposes a neural network-based emotion recognition system from audio. Based on the detected emotion, the user is redirected to the respective channel, and help is sent immediately, whether it be police requirement, depression helpline, or some other help. Four datasets are merged to bring diversity in data.
3
Proposed Framework
This section presents the system model for the proposed work to accurately detect the emotion from the caller’s voice and redirect the call to the appropriate channel. Figure 1a shows the system model for the proposed work. The person in distress calls on a single unified helpline number. The audio sample of the caller is processed, and features are extracted from the audio. These features are fed to the CNN-based classifier, which predicts the emotion. Suppose the detected emotion is fear. In that case, the call is automatically redirected to the police control center; else, it is redirected toward the mental health counselor if the emotion is sadness. Now in the scenario of any other emotion being detected, the call is redirected to a general call center. This way, the response time is drastically reduced as the caller need not remember different helpline numbers for different problems; instead, they have to call on a single number for all their problems. As depicted in Fig. 1b, the proposed system architecture consists of four layers: the data preparation layer, dialing layer, intelligence layer, and assistance layer. The data preparation layer consists of a database formed by aggregating multimodal emotional speech databases. Then the data pre-processed in the data preparation layer is forwarded to the intelligence layer, which trains the neural network architecture on the dataset and keeps it ready for predictions on the unseen data. In the intelligence layer, the CNN model receives audio from
378
H. Sanghvi et al. Data Preparation Layer RAVDESS
SAVEE
TESS
CREMA-D
Multimodal databases of emotional speech and song
Aggregated database
Feature Extraction
Preprocessing
Intelligence Layer
Processing audio Distress call
Features extracted
CNN based emotion recognition
Transfer to police if fear is detected
Transfer to general call centre if any other emotion is detected
Predicting emotion
Feature extraction from audio
Emotion Prediction
Trained Model
1D Convolutional Neural Networks Dialing Layer
Distress Call Redirect the call as per emotion detected
Redirecting call based on emotion detected
Fear
Caller Sadness
Any other emotion
Transfer to counsellor if sadness is detected
Assistance Layer
(a) System Model
Police Control Center
General Call Center
Mental Health Counsellor
(b) Comprehensive System Architecture
Fig. 1. Proposed Framework
the distressed caller and predicts the emotion in the call. The call is automatically redirected to the appropriate channel based on the predicted emotion. The following is a detailed description of the four-layered architecture. Data Preparation Layer. First, we create a dataset in the data preparation layer by aggregating multimodal emotional speech and song databases. The four datasets that were combined are SAVEE [16], RAVDESS [11], TESS [7] and CREMA-D [3]. After the concatenation of data, several augmentation techniques, such as adding random noise, stretching, and shifting, were used to increase the number of samples. First, random noise with a rate of 0.035 was added to the original audio sample. The second technique involved stretching the audio data by a rate of 0.8. The third technique incorporated was shifting the data, wherein the first shift range was generated by taking a rate equal to 1000 and rolling the data with generated shift range. The last technique involved shifting the pitch of the given audio file by a factor of 0.7. Once the augmented dataset was ready, feature extraction was carried out. All the audio files were loaded for a duration of 2.5 s, with an offset of 0.6. The sampling rate was 22 kHz. Once the feature set was ready, some pre-processing techniques were employed to prepare data for training. First, the target labels were encoded with integers. Secondly, data was split into train, test, and validation with a split ratio of 72:20:8. To scale down the data points to a minimal range, standard scaling was performed on the input values of the dataset. Once the values were scaled down to the range of 0 and 1, the feature set was ready for training the model.
NadERA: Emotion Recognition from Audio
379
Dialing Layer. This layer comprises of helpline number powered by the automated recognition of emotion from audio using deep learning. Here the callerin-distress calls the helpline number, and the captured audio is transferred to the intelligence layer. Once the audio passes through the intelligence layer, the intelligence layer makes an accurate prediction of the emotion. The successive tiers of the framework address the pipeline’s flow.
Fig. 2. 1D CNN Model Architecture
Intelligence Layer. 1D CNN model [9] has been used for training the audio files. We conducted several experiments using different feature selection techniques, a different number of emotions, overlapping of audio frames, and averaging of audio frames. Figure 2 depicts 1D CNN model architecture for the proposed system. The proposed system model shown in Fig. 1a uses this CNN-based emotion recognition for predicting emotions and further redirecting the calls based on emotions detected by the system. Assistance Layer. In this layer, the trained neural network architecture predicts the emotion from the caller’s audio and takes action, as mentioned earlier in the framework. In this manner, the proposed framework eliminates the need for remembering multiple helpline numbers and automates the entire redirection by leveraging deep learning.
4 4.1
Methodology Dataset Selection
We have used four datasets. RAVDESS dataset is a multi-actor dataset of different emotion audios. The dataset comprises 24 actors, each having emotions of sadness, happiness, fear, calm, neutral, surprise, and disgust. This dataset is rich in emotions with different ranges without gender bias. SAVEE dataset consists of four male actors showing seven different emotions, as in the RAVDESS
380
H. Sanghvi et al.
dataset. The TESS dataset consists of seven emotions, the same as RAVDESS, expressed by two women. The SAVEE dataset is a multi-speaker database comprising audio-video data and text descriptions. The dataset expresses emotions such as happiness, sadness, anger, surprise, fear, disgust, and neutrality. The CREMA-D dataset comprises audio files from 91 actors having varied ages, gender, race, and backgrounds. The emotions it focuses on are sadness, happiness, anger, fear, disgust, and neutrality in four different speech levels. These datasets provide a varied class availability from different genders and varied data of different emotions. Hence, it helps in achieving generalization. Various datasets used in this research have varied shortcomings. An emotion’s expression depends on the speaker’s accent, language, background, etc. The model trained to recognize emotion in the English language will not equally work to classify the emotion in any other language like Chinese or Indian. RAVDESS dataset suffers from selection bias as it was formed by speakers belonging to different regions, like Canada, who have a rigid American accent. Also, these datasets are created by trained speakers, and therefore emotions from natural language speakers may vary from the dataset. Because of this reason, the architecture should be validated before deployment in the real world. Another limitation is that the datasets contain only a few statements spoken with different emotions. This limits the scope of the dataset. 4.2
Data Preparation
The audio files of each dataset contain different emotions, which are appended together. Each audio recording comprises approximately 4 to 5 s of audio. Further, data is set to be noise-free by different techniques. Each audio file is then augmented by adding noise where the rate is set to 0.035, stretched by setting the rate as 0.8, shifted by setting the rate as 1000 for both low and high pitches, and then the pitch is set with the rate as 0.7. No class balancing was performed. Due to space constraints, we have not included a detailed description of these features. The list of various features used in our proposed work is shown in Table 1 along with the dimension of the features taken. Interested readers may follow Librosa [1] and Spafe [12] for more details.
5
Experiments
Librosa and Spafe libraries were used to extract different features. All features were extracted with a frame length of 2048 ms and a hop length equal to 512 ms. 5.1
Train, Test and Validation Criteria
Using a stratified approach, we performed the train and test data split based on the 80:20 ratio. The training data was then split into a 90:10 ratio where the first part was taken as the training dataset, and the remaining portion was used as
NadERA: Emotion Recognition from Audio
381
Table 1. Features used during this work Feature
Dimensions Feature
Dimensions
Zero Crossing Rate
108
Bark Frequency Cepstral Coefficient (BFCC)
1781
Short Term Energy
1
Gamma Tone Frequency Cepstrum Coefficient (GFCC)
1300
Entropy of Energy
1
Magnitude-based Spectral Root Cepstral Coefficients (MSRCC)
1300
RMS
108
Normalized Gammachirp Cepstral Coefficients (NGCC)
1300
Spectral Centroid
108
Linear Frequency Cepstral Coefficients (LFCC)
1300
Spectral Flux
1
Power Normalized Cepstral Coefficient (PNCC)
7150
Spectral Rolloff
108
Phase-based Spectral Root Cepstral Coefficients (PSRCC)
1300
Chroma STFT
1296
Linear Predictive Coefficients (LPC) 1365
Mel Frequency Cepstral Coefficients (MFCC)
1300
Table 2. Configuration for various experiments performed Feature
Exp-1 Exp-2 Exp-3 Exp-4 Feature
Exp-1 Exp-2 Exp-3 Exp-4
ZCR
LFCC
Mean Energy
MFCC
Entropy of Energy
MSRCC
RMSE
NGCC
SPC
LPCC
SPC Flux
PNCC
SPC Roll-off
PSRCC
Chroma STFT
Number of Emotions 7
7
6
7
BFCC
Overlapping
GFCC
Averaging
iMFCC
Total features size
1861
1861
19827
19827
the validation dataset. Then, this validation data consisting of audio emotions from all four datasets were used to predict the accuracy [9] and performance of the model. All the data pre-processing and cleaning steps were performed on the merged dataset before splitting into train and test datasets. 5.2
Methodology
After merging and distributing audio samples from all the datasets, these audio samples were trimmed or padded to 2.5 s to keep a uniform duration. This was followed by feature extraction. We obtained features from each audio sample’s four (augmented) variations. Once these features were extracted, we appended
382
H. Sanghvi et al.
them to the list and thus created a new dataset containing features for each sample and having size four times the original dataset. Once this dataset was prepared, we partitioned the dataset into the train, test, and validation sets. Then we performed standard scaling on the training data to shift the distribution to have zero mean and unit standard deviation. Once the dataset was ready, we converted target labels into one-hot encoded vectors. This data was fed into a one-dimensional CNN model, and results were obtained. Different experiments are performed using the above features and emotions, as shown in Table 2. The number of emotions used varies in the experiments. Also, overlapping and averaging of audio frames have been done in some experiments. The respective values of coefficients from different frames are averaged and considered in the averaging configuration. Thus, in the averaging scenario, from each frame, a set of features were extracted and then averaged over an audio sample, whereas in the other case, no averaging was done. Therefore, the number of coefficients in the averaging configuration is much less than in the other case. Similarly, multiple frames were formed based on hop length during segmentation in the overlapping configuration. There was no overlapping for a hop length equal to the window size. In a nutshell, we performed two types of settings for these experiments: (i) with all seven emotions present in the database, Experiments 1, 2, and 4 correspond to this approach; (ii) with only six emotions, by dropping the corresponding samples of surprise class, to avoid the class imbalanced problem. Experiment 3 showcases this approach. The rationale for these two approaches was to look into the proposed system’s ability to generalize to unknown samples. Firstly, experiment 1 used all the features and seven emotions. Overlapping and averaging audio samples were also performed, and an accuracy of 87.74% was obtained after training the dataset on the CNN model. Experiment 2 was carried out using all the features and emotions. However, the difference is that the overlapping of audio files was not carried out, only averaging was performed, and an accuracy of 86.83% was obtained. In experiment 3, all the features were used except IMFCC features, and six emotions were considered, dropping the surprise class. The surprise class is dropped as it causes a class imbalance problem, and overlapping along with averaging technique on audio samples is also not performed. The accuracy in this experiment was 94.92%. Experiment 4 was carried out on all seven emotion types with all features except IMFCCs; overlapping and averaging were not performed, and the accuracy obtained was 93.14%. It can be observed from the F1-score that the proposed model with Experiment 4 performs adequately well even if one class is imbalanced. Thus, it has the capability to behave well in the presence of unseen samples. Table 3 illustrate the comparison of different performance metrics, such as Loss, Accuracy, and F1 score for all experiments.
6
Results and Discussion
The one-dimensional CNN model gave an accuracy of 93.14% on the testing dataset comprising the merged dataset from four audio emotion datasets:
NadERA: Emotion Recognition from Audio
383
SAVEE, RAVDESS, Crema-D, and TESS. The high performance indicates that the proposed model consisting of four differently sourced datasets is highly robust and can efficiently recognize the audio emotions of any race, gender, age, and emotion. From the results, we can see that applying frame overlapping while extracting features improves the performance by almost 1% (from 86.83% to 87.74% accuracy). However, from experiment 4 results, it is observed that the performance drastically improves from 87.74% to 93.14% when overlapping and averaging are not done. We achieved an accuracy of 93.14% for experiment 4 using all seven emotions. Experiment 4 gives us the best performance based on these metrics using all seven emotions compared to Experiment 3, which gives 1.78% higher accuracy than Experiment 4 but only using six emotions, where the surprise class has been dropped for better class balancing. As experiment 4 involves the presence of imbalanced class surprise, its performance should not be judged only based on accuracy measure; thus, we also calculated the F1 score for all experiments. Surprisingly, the performance of Experiment 4 based on the F1 score is also at par with Experiment 3. So the experiment chosen for further system model is Experiment-4. Table 3. Results obtained from various experiments performed on the merged dataset
Metric
Exp-1 Exp-2 Exp-3 Exp-4
Loss
0.6846
0.7563
0.2774
0.3973
Accuracy 87.74% 86.83% 94.92% 93.14% F1 Score 87.81% 86.98% 94.97% 93.17%
Although the performance metrics favor Experiment 3, we recommend the system in Experiment 4, which has a higher potential to withstand and show robust performance in realistic scenarios.
7
Conclusion and Future Work
This paper presents a novel framework for the automatic and seamless redirecting the distress calls to the corresponding help center through the emotional state of the caller’s voice. A one-dimensional CNN model was implemented to experiment on several audio representations derived from the audio’s time domain and frequency domain characteristics. We carried out different experimental scenarios to examine the impact of frame overlapping and frame averaging while extracting features for detecting emotions from audio. Results obtained from the experiments on these scenarios convey that the system exhibits better performance without averaging and overlapping.
384
H. Sanghvi et al.
In the future, we plan to extend our work by employing oversampling techniques and mel-spectrogram-based methods. In addition, we wish to add a feature in our framework where the caller’s location coordinates will automatically be notified to the appropriate agency and use blockchain-based smart contracts to store these coordinates to enhance the safety of the proposed framework. Parallelly, we will conduct more research and analysis to map different emotions to different distress response agencies, e.g., hospitals and fire brigades, to increase the use cases of our novel framework.
References 1. McFee et al., B.: librosa/librosa: 0.9.2, June 2022 2. Briggs, R.W., Bender, W., Marin, M.: Philadelphia police response times have gotten 4 min longer, about 20% worse, February 2022 3. Cao, H., Cooper, D.G., Keutmann, M.K., Gur, R.C., Nenkova, A., Verma, R.: CREMA-D: crowd-sourced emotional multimodal actors dataset. IEEE Trans. Affect. Comput. 5(4), 377–390 (2014) 4. Das, P.K., A, D.V., Meher, S., Panda, R., Abraham, A.: A systematic review on recent advancements in deep and machine learning based detection and classification of acute lymphoblastic leukemia. IEEE Access 10, 81741–81763 (2022) 5. Das, P.K., Meher, S.: An efficient deep convolutional neural network based detection and classification of acute lymphoblastic leukemia. Expert Syst. Appl. 183, 115311 (2021) 6. Das, P.K., Meher, S.: Transfer learning-based automatic detection of acute lymphocytic leukemia. In: 2021 National Conference on Communications (NCC), pp. 1–6 (2021) 7. Dupuis, K., Pichora-Fuller, M.K.: Toronto emotional speech set (TESS). University of Toronto, Psychology Department (2010) 8. Er, M.B.: A novel approach for classification of speech emotions based on deep and acoustic features. IEEE Access 8, 221640–221653 (2020) 9. Gajjar, P., Shah, P., Sanghvi, H.: E-mixup and siamese networks for musical key estimation. In: International Conference on Ubiquitous Computing and Intelligent Information Systems, pp. 343–350. Springer (2022) 10. Kanwal, S., Asghar, S.: Speech emotion recognition using clustering based gaoptimized feature set. IEEE Access 9, 125830–125842 (2021) 11. Livingstone, S.R., Russo, F.A.: The ryerson audio-visual database of emotional speech and song (ravdess): a dynamic, multimodal set of facial and vocal expressions in north american english. PLoS ONE 13(5), e0196391 (2018) 12. Malek, A., Borz`ı, S., Nielsen, C.H.: Superkogito/spafe: v0.2.0, July 2022 13. Mell, H.K., Mumma, S.N., Hiestand, B., Carr, B.G., Holland, T., Stopyra, J.: Emergency medical services response times in rural, suburban, and urban areas. JAMA Surg. 152(10), 983–984 (2017) 14. Patel, N., Patel, S., Mankad, S.H.: Impact of autoencoder based compact representation on emotion detection from audio. J. Ambient. Intell. Humaniz. Comput. 13(2), 867–885 (2022) 15. Suganya, S., Charles, E.Y.A.: Speech emotion recognition using deep learning on audio recordings. In: 2019 19th International Conference on Advances in ICT for Emerging Regions (ICTer), vol. 250, pp. 1–6 (2019)
NadERA: Emotion Recognition from Audio
385
16. Vlasenko, B., Schuller, B., Wendemuth, A., Rigoll, G.: Combining frame and turnlevel information for robust recognition of emotions within speech, pp. 2249–2252, January 2007 17. Xu, M., Zhang, F., Zhang, W.: Head fusion: Improving the accuracy and robustness of speech emotion recognition on the iemocap and ravdess dataset. IEEE Access 9, 74539–74549 (2021)
AI-Based Extraction of Radiologists Gaze Patterns Corresponding to Lung Regions Ilya Pershin1(B) , Bulat Maksudov2,3 , Tamerlan Mustafaev1,4,5 , and Bulat Ibragimov6 1
4
Innopolis University, Innopolis, Russia {i.pershin,t.mustafaev}@innopolis.ru 2 DCU Institute of Education, Dublin, Ireland [email protected] 3 Dublin City University, Dublin, Ireland Public Hospital no. 2, Department of Radiology, Marie Oulgaret, Russia 5 University Clinic, Kazan State University, Kazan, Russia 6 University of Copenhagen, Copenhagen, Denmark [email protected]
Abstract. The continuing growth of radiological examinations and the recent extreme workload on radiology divisions around the world make it necessary to optimize radiologist’s workflow. Intellectual radiologist’s gaze pattern analysis is a promising task in various applications of radiologist-AI interaction. In this paper, we propose an approach based on the Transformer deep learning architecture for predicting the current lung anatomical region by a short radiologist’s gaze history alone without any image information. For this study, we did a series of eye-tracking experiments with practicing radiologists and collected 400 chest X-ray images, that were analyzed independently by 4 doctors. From the results, we conclude the possibility of extracting useful information in various radiologist-AI interaction applications based only on a gaze history. Keywords: Eye-tracking · Artificial intelligence Lung fields · Chest X-ray · Image segmentation
1
· Gaze processing ·
Introduction
The number of radiological examinations is increasing every year, and at the same time increasing radiologist’s workload [27]. Optimization of workflow in healthcare in general and in radiology in particular is one of the important tasks in which the role of artificial intelligence is high [20]. Artificial intelligence has been applied to various tasks in the fight against the COVID-19 pandemic, We thank Khanov A.N. MD, Zinnurov A.R. MD, and Ibragimova D.A. MD for participating in the experiment. This work has been supported by the Russian Science Foundation under grant no.18-71-10072-P. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 386–393, 2023. https://doi.org/10.1007/978-3-031-35501-1_39
AI-Based Extraction of Rads Gaze Patterns Corresponding to Lung Regions
387
including the developing of new drugs, improving the quality of disease diagnosis in conditions of limited resources and increased workload, predicting the course of the disease and the further development of the pandemic [25]. Nevertheless, the introduction of artificial intelligence into the work practice of a radiologist should be verified and well-thought-out. There is a known problem of bias towards recommendations from artificial intelligence, which correlates with radiologist’s experience [9]. More generally, radiologists have both a visual bias [6] and a priori information bias [16]. Thus, the radiologist-AI interaction is an important and promising area for optimizing workflows, but requires high-quality and comprehensive research before clinical implementation [18]. The main part of a radiological examination is visual analysis [24]. Consequently, information about a radiologist’s eye-movements can make it possible to make a judgment about the feature of his visual perception. Eye-tracking technology is a perfectly suitable for integration in a radiologist’s workflow, because it does not require sensors, has no restrictions on the body positions, and is relatively cheap. An increase in workload negatively affects the quality of radiological diagnostics [5]. Assessing the fatigue of a radiologist is not a trivial task because it is influenced by many factors (such as the number of distractions, comfort in the office, organization of work processes, etc.). One particular method, that could be used to assess the fatigue of the radiologist, is gaze analysis using deep learning [19]. Another problem is related to the assessment of competence and the improvement of the quality of medical education [2]. Experienced radiologists are known to have different gaze patterns than inexperienced radiologists, which lead to more efficient and better reading of X-rays [11]. The process of learning a radiologist is associated with mastering the patterns of visual analysis inherent in experienced radiologists. Demonstration of experts’ gaze points helps to improve the quality of student learning, allowing them to adopt the features of expert visual perception in different areas [3,15], include radiology [10]. Errors associated with interrupting the radiologist are also common, which affect the quality of diagnostics and can be leveled with the help of eye-tracking [26]. In general, people have a hard time remembering the exact areas of an image that they have viewed [22]. A previous research found, that according to the type of viewing of CT images, radiologists can be divided into scanners and drillers [7]. Thus, in some cases, important areas of the image are not covered during visual assessment. Eye-tracking combined with an artificial intelligence system could potentially help solve this problem. According to preliminary results of some recent studies, eye-tracking data could be used to predict the likelihood of diagnostic error based on gaze patterns [23]. This paper investigates the use of Transformer deep learning architecture for analyzing gaze movements of radiologists. Based on a public database, we collected a dataset of 400 chest X-ray (CXR) images, where each image was
388
I. Pershin et al.
labeled by 4 practicing radiologists. The workstation used during the labeling process was augmented with an eye-tracking device to provide information about gaze direction of a radiologist during analysis. The main contributions of this paper are summarized as follows: – As far as we know, this is a unique CXR-gaze dataset where each image is labeled by 4 radiologists with information about gaze direction. Thus, we collected a database of 1600 radiologists’ gaze sessions. – We explored the possibility of using the Transformer architecture to extract information about the anatomical region of an X-ray based on eye-tracking data.
2 2.1
Methodology Experiment Setting
The database consisted of 400 chest X-rays randomly selected from the publicly available VinDr-CXR [17] dataset. Each sample in the source database contained an annotation from three radiologists. Out of the 400 X-rays, 168 were from a healthy patient and 232 X-rays contained one or more pathologies. The key hardware for the experiment was a 10-bit monitor and Tobii Eye Tracker 4C to record the direction of gaze at a frequency 90 Hz. The experiment with each radiologist was conducted in a quiet room with artificial lighting. The eye-tracker was calibrated before the experiment and after every 100 viewed X-rays. To collect experimental data we implemented an automated logging software, which recorded parameters such as gaze direction, a change in head position, changing brightness and contrast in an image. To improve the quality of eye-tracking data, the logging system tracked the participant’s head position and produced a sound notification if the head position was outside the range recommended by the eye tracker manufacturer. 2.2
Data Labeling
The chest X-rays from the database were labeled in two steps. At the first step, we implemented the contour-aware lung segmentation algorithm [12] and divided the region of the left and right lungs into three segments of equal height. Thus, for each image, segmentation of the upper, middle and lower segments for the right and left lung was algorithmically obtained. After that, the X-rays and their corresponding segmentation results were sent to a radiologist, who corrected the segmentation results so that the visible area of the lung was highlighted. An example of CXR labeling and assigning a label to gaze points that fall into a certain segment are shown in Fig. 1.
AI-Based Extraction of Rads Gaze Patterns Corresponding to Lung Regions
389
Fig. 1. In the figure (a) the labeling of the visible part of the lungs, in the figure (b) the labeling of the segments of the lungs. Figure (c) shows the labeling of the radiologist’s gaze points.
2.3
Model Architecture and Train Process
To address the problem of classifying gaze points, we use the unchanged architecture of the Transformer neural network from the paper “Attention Is All You Need” by Vaswani et al. [21] was used. For this task, only the encoder part of the model was used. To map the features of a two-dimensional space to the space with Dmodel = 240 dimension, a single-layer MLP was used. The encoder consisted of N = 3 layers, each with a multi-headed attention of h = 6 heads. The weighted cross-entropy loss function was used. We used the Adam optimizer and the warm-up technique for the first 50 iterations to learning rate = 5 ∗ 10−4 . Unlike the original paper, we didn’t use the exponential decay learning rate scheduler, but we used the cosine warm-up scheduler. We used Residual Dropout as described in the original paper with Pdrop = 0.2. All hyperparameters are shown in Table 1. The early stopping method was used to prevent overfitting. Table 1. Model hyperparameters. Hyperparameter
Value
Hidden dim, dmodel Num heads Num layers Learning rate Dropout, Pdrop
240 6 3 5 ∗ 10−4 0.2
390
I. Pershin et al.
Augmentation techniques were applied to gaze sequences. For each sequence, a rotation transformation of ±30◦ and a coordinate shift of ±20% were randomly applied. Gaze sequences were split into subsequences of length 512. Since the frequency of the eye tracker used 90 Hz, the length of the subsequence corresponds to approximately 5.7 s of analysis by a radiologist. Gaze coordinates were reduced to the range of values [0, 1]. The model was trained on a computer with NVIDIA GeForce RTX 2080 Ti GPU, AMD Ryzen Threadripper 1920X CPU and 62 GB of RAM. The data set was divided into 3 parts: 80% of the data was assigned to the training set, 10% to the validation set, and 10% to the test set.
3
Results and Discussion
On the validation set, the average f-score value is 0.934, and on the test set is 0.918. Usually in the upper fields of the lungs, a infectious such as tuberculosis is more often detected. While the lower fields are more characterized by infectious such as bacterial pneumonia. This may possibly explain the lower f-score values for the lower lung regions. Table 2 shows F-score for each lung segment on the test dataset. Figure 2 shows a visualization of label prediction on a random sample from the validation dataset. Class name “ru” is an abbreviation for “upper right” segment of the lung, “lm” is an abbreviation for “left middle” and so on. Please note that radiology notation is used, so the lung shown on the left is the anatomically right lung. We can see that the model is able to extract information about the viewed segment of the lung from gaze patterns. This is confirmed by the obtained metric value and a fairly strong augmentation (Sect. 2.3). Even if the model has doubts (for example, in Fig. 2 this is a point with the predicted label “left middle” segment), then these doubts are within reason. In addition, we split each gaze sequence into small subsequences of 5.7 s. This further complicates the task, since the model only handles part of the gaze, and not the entire gaze scope. Analysis of the obtained metrics in Table 2 showed that the model has approximately the same classification quality for each class. Table 2. F-score for each lung segment for the test dataset. Lung region
Value
Right up Right middle Right down Left up Left middle Left down Not lung
0.933 0.942 0.907 0.914 0.918 0.872 0.932
AI-Based Extraction of Rads Gaze Patterns Corresponding to Lung Regions
391
Thus, we created a model, that allows to determine the segments of the lung using only a portion of a radiologist’s gaze sequence. Despite the apparent simplicity of the problem, this result shows the feasibility of building deep learning models for more complex problems, such as estimating the probability of a radiologist’s error. In addition, the obtained result allows using the model in various practical tasks. For example, to assess the fatigue of a radiologist [19], to solve the problem of radiologist interruptions [26], to assess the skills of a radiologist [11] and so on. It has previously been shown that adding information about the view of a radiologist to a deep learning model for finding pathologies on CT and X-ray images can improve the quality of models [1,13,14]. The most critical result is that the model is built only on a bases on radiologist’s gaze information, without the use of any other modalities.
Fig. 2. The figure shows a pair of true labels and predictions obtained from the validation dataset. Class name “ru” is an abbreviation for “upper right” segment of the lung, “lm” is an abbreviation for “left middle” and so on.
Deep learning models provide amazing results in the field of medical diagnostics [4,8], but hasty and thoughtless implementation may not bring benefits. Analyzing a radiologist’s gaze and building deep learning models based only on the gaze or using the gaze as one of the modalities takes a step towards effective radiologist-AI interaction.
References 1. Aresta, G., et al.: Automatic lung nodule detection combined with gaze information improves radiologists’ screening performance. IEEE J. Biomed. Health Inform. 24(10), 2894–2901 (2020)
392
I. Pershin et al.
2. Ashraf, H., Sodergren, M.H., Merali, N., Mylonas, G., Singh, H., Darzi, A.: Eyetracking technology in medical education: a systematic review. Med. Teach. 40(1), 62–69 (2017) 3. Ball, L.J., Litchfield, D.: Interactivity and embodied cues in problem solving, learning and insight: further contributions to a “theory of hints”. In: Cowley, S., Vall´eeTourangeau, F. (eds.) Cognition Beyond the Brain, pp. 115–132. Springer, Cham (2017). https://doi.org/10.1007/978-1-4471-5125-8 12 4. Born, J., et al.: On the role of artificial intelligence in medical imaging of COVID19. Patterns 2(6), 100269 (2021) 5. Bruls, R.J.M., Kwee, R.M.: Workload for radiologists during on-call hours: dramatic increase in the past 15 years. Insights Imaging 11(1), November 2020 6. Chen, J., Littlefair, S., Bourne, R., Reed, W.M.: The effect of visual hindsight bias on radiologist perception. Acad. Radiol. 27(7), 977–984 (2020) 7. Drew, T., Vo, M.L.-H., Olwal, A., Jacobson, F., Seltzer, S.E., Wolfe, J.M.: Scanners and drillers: characterizing expert visual search through volumetric images. J. Vis. 13(10), 3–3 (2013) 8. Esteva, A., et al.: Deep learning-enabled medical computer vision. npj Digital Med. 4(1), January 2021 9. Gaube, S., et al.: Do as AI say: susceptibility in deployment of clinical decision-aids. npj Digital Med. 4(1), February 2021 10. Gegenfurtner, A., Lehtinen, E., Jarodzka, H., Roger, S.: Effects of eye movement modeling examples on adaptive expertise in medical image diagnosis. Comput. Educ. 113, 212–225 (2017) 11. Harezlak, K., Kasprowski, P.: Application of eye tracking in medicine: a survey, research issues and challenges. Comput. Med. Imaging Graph. 65, 176–190 (2018) 12. Kholiavchenko, M., Sirazitdinov, I., Kubrak, K., Badrutdinova, R., Kuleev, R., Yuan, Y., Vrtovec, T., Ibragimov, B.: Contour-aware multi-label chest x-ray organ segmentation. Int. J. Comput. Assist. Radiol. Surg. 15(3), 425–436 (2020) 13. Kholiavchenko, M., Pershin, I., Maksudov, B., Mustafaev, T., Yuan, Y., Ibragimov, B.: Gaze-based attention to improve the classification of lung diseases. In: Iˇsgum, I., Colliot, O. (eds.) Medical Imaging 2022: Image Processing. SPIE, April 2022 14. Khosravan, N., Celik, H., Turkbey, B., Jones, E.C., Wood, B., Bagci, U.: A collaborative computer aided diagnosis (c-CAD) system with eye-tracking, sparse attentional model, and deep learning. Med. Image Anal. 51, 101–115 (2019) 15. Leff, D.R., et al.: The impact of expert visual guidance on trainee visual search strategy, visual attention and motor skills. Front. Hum. Neurosci. 9, October 2015 16. Littlefair, S., Brennan, P., Reed, W., Mello-Thoms, C.: Does expectation of abnormality affect the search pattern of radiologists when looking for pulmonary nodules? J. Digit. Imaging 30(1), 55–62 (2016) 17. Nguyen, H.Q., et al.: Vindr-cxr: an open dataset of chest x-rays with radiologist’s annotations (2020) 18. Parikh, R.B., Obermeyer, Z., Navathe, A.S.: Regulation of predictive analytics in medicine. Science 363(6429), 810–812 (2019) 19. Pershin, I., Kholiavchenko, M., Maksudov, B., Mustafaev, T., Ibragimov, B.: AIbased analysis of radiologist’s eye movements for fatigue estimation: a pilot study on chest x-rays. In: Mello-Thoms, C.R., Taylor-Phillips, S. (eds.) Medical Imaging 2022: Image Perception, Observer Performance, and Technology Assessment. SPIE, April 2022 20. Ranschaert, E., Topff, L., Pianykh, O.: Optimization of radiology workflow with artificial intelligence. Radiol. Clin. North Am. 59(6), 955–966 (2021)
AI-Based Extraction of Rads Gaze Patterns Corresponding to Lung Regions
393
21. Vaswani, A., et al.: Attention is all you need, Gomez (2017) 22. V˜ o, M.L.-H., Aizenman, A.M., Wolfe, J.M.: You think you know where you looked? you better look again. J. Exp. Psychol. Hum. Perception Performance 42(10), 1477–1481 (2016) 23. Voisin, S., Pinto, F., Morin-Ducote, G., Hudson, K.B., Tourassi, G.D.: Predicting diagnostic error in radiology via eye-tracking and image analytics: Preliminary investigation in mammography. Med. Phys. 40(10), 101906 (2013) 24. Waite, S., et al.: Analysis of perceptual expertise in radiology – current knowledge and a new perspective. Front. Hum. Neurosci. 13, June 2019 25. Wang, L., et al.: Artificial intelligence for COVID-19: A systematic review. Front. Med. 8, September 2021 26. Williams, L.H., Drew, T.: Distraction in diagnostic radiology: How is search through volumetric medical images affected by interruptions? Cognitive Res. Principles Implications 2(1), February 2017 27. Winder, M., Owczarek, A.J., Chudek, J., Pilch-Kowalczyk, J., Baron, J.: Are we overdoing it? changes in diagnostic imaging workload during the years 2010–2020 including the impact of the SARS-CoV-2 pandemic. Healthcare 9(11), 1557 (2021)
Explainable Fuzzy Models for Learning Analytics Gabriella Casalino(B) , Giovanna Castellano , and Gianluca Zaza University of Bari, Department of Computer Science, Bari, Italy {gabriella.casalino,giovanna.castellano,gianluca.zaza}@uniba.it
Abstract. Learning Analytics has been widely used to enhance the educational field employing Artificial Intelligence. However, explanations of the data processing have become mandatory. In order to do this, we suggest using Neuro-Fuzzy Systems (NFSs), which can provide both precise forecasts and descriptions of the processes that produced the outcomes. The balancing between model explainability and accuracy has been studied by reducing the number of relevant features. Click-stream data, describing the interactions of the students with a Virtual Learning Environment has been analyzed. Results on the OULAD datset have shown that the NFS model provides effective prediction of the outcomes of students and can well explain the reasoning behind the prediction using fuzzy rules. Keywords: Learning Analytics
1
· Neuro-fuzzy systems · Explainability
Introduction
The educational field is benefiting from the use of Artificial Intelligence (AI) which is transforming the way of teaching and learning. The digitalization process in the educational field is so important that it has become a priority for different countries [1]. Particularly, the term Learning Analytics is used to refer to the use of technological solutions in the educational domain [2,3], such as augmented reality [4], Social Robots [5] Internet of Things (IoT) [6,7], information visualization [8], just to mention a few examples. However, while AI is spreading in everyday lives, the need to be legislated has emerged, to avoid improper uses of automatic techniques that could affect the data owner or lead to misleading results. Particularly, the explainability of these automatic techniques has become mandatory in high-risk contexts to inform the user about how results have been obtained. European Union considered the educational field as high-risk 1 since the data are sensitive [9]. In this context, the use of fuzzy logic for explanation purposes becomes critical [10,11]. Indeed, fuzzy logic has been already used to explain the processing 1
White Paper on Artificial Intelligence: https://ec.europa.eu/info/publications/ white-paper-artificial-intelligence-european-approach-excellence-and-trust en (last access: February 21, 2022).
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 394–403, 2023. https://doi.org/10.1007/978-3-031-35501-1_40
Explainable Fuzzy Models for Learning Analytics
395
of automatic methods through fuzzy rules expressed in natural language [12–14]. Besides that, fuzzy logic capability to represent vague concepts has made it a suitable methodology for different applications in the educational field. Fuzzy logic has been used for customizing the pedagogical resources in e-learning contexts through user profiling [15,16], to evaluate the students’ academic performance [17–20], to improve coding abilities [21] or to assess students’ engagement [22]. In this work, we focus on the use of fuzzy models for students’ assessments prediction. In this context, the manual definition of fuzzy rules based on expert knowledge is a complex task. Conversely, the automatic definition of fuzzy rules through learning from available data is feasible. To learn fuzzy models from data, hybridization of fuzzy inference systems and neural networks has been proposed in the form of neuro-fuzzy systems (NFS), which are hybrid models possessing the ability to learn from real-world observations and whose behavior can be described naturally as a collection of linguistic rules. Particularly, we use NFS to learn accurate fuzzy models from clickstream data collected through the Virtual Learning Environment (VLE) of the Open University has been used. VLEs are online platforms encapsulating online courses, resources, skill assessments, etc. In the last few years, they have attracted a growing interest due to their capability to trace students’ interaction with the learning activities in form of logs. This information is of great value since it can be used for predicting students’ outcomes, monitoring their learning, and it can be a useful aid for teachers, students, tutors, managers, etc. A preliminary attempt to apply NFS models to these data was done in our previous work [23]. This paper extends it by using Gaussian membership functions to design the fuzzy terms, instead of the triangular ones. Moreover, a comparison with standard machine learning models has been performed to evaluate the effectiveness of the predictions given by the NFS model. Finally, a feature selection phase has been introduced to reduce the complexity of the NFS model. The article is structured as follows: the data will be described in Sect. 2. Then the learning models, which are the neuro-fuzzy system and the black-box models, used as baselines, are described in Sect. 3, together with the adopted feature selection method. Discussion on the results will be detailed in Sect. 4. Lastly, Sect. 5 will conclude the article and will draw possible future works.
2
Materials
In this work, a subset of the Open University Learning Analytics Dataset (OULAD) has been used [24]. It collects information about students attending the Open University, their courses, assessments, interactions with the Virtual Learning Environment, etc. However, since the goal of our analysis is to define students’ behaviors models, based on their interactions with the VLE, to be used to assess the students’ performances and thus to predict them, a studentoriented data has been selected 2 . Nine different activities emerged as the most 2
OULAD: https://zenodo.org/record/4264397#.X60DEkJKj8E.
396
G. Casalino et al.
discriminant ones for the predictive task [24]: quiz, forum, glossary, homepage, out collaboration, out content, resource, subpage, URL, that refer to the number of times each user accessed these support tools (i.e. number of clicks). Each sample represents the learning processes of a given student for a given subject, in terms of these nine features. A total of 25, 819 rows has been considered, collected in 2013 and 2014. Moreover, just two students’ outcomes (Pass and Fail ) have been considered, since previous works have shown the original four output classes not being sufficiently representative of the available data.
3
Methods
The work aims to verify the effectiveness of neuro-fuzzy systems in the context of learning analytics. Indeed, while a good accuracy is mandatory, the interpretability of the resulting models could lead to a higher acceptance from the final stakeholders, which are not technicians. For this purpose, the use of neurofuzzy models, based on Gaussian fuzzy sets is proposed. The effectiveness of such models is compared with baseline machine learning models which, however, are not explainable. Furthermore, a feature selection method is applied with the aim to reduce the complexity of the neuro-fuzzy model explanations, in terms of IF-THEN rules. The following is a brief description of the adopted methods. 3.1
Neuro-fuzzy Modeling
The neuro-fuzzy architecture has been used to learn a classification model for predicting student outcomes. The neuro-fuzzy network integrates and encodes a set of IF-THEN fuzzy rules in its structure. We consider zero-order Takagi-Sugeno (TS) rules, whose antecedent is represented by fuzzy sets while the consequent part is defined by fuzzy singletons. For each output class, the fuzzy model provides degrees of certainty by the inference mechanism. The fuzzy rule structure integrated into the knowledge base is made of K rules having the form: IF (x1 is Ak1 ) AND . . . (xn is Akn ) THEN (y1 is bk1 ) AND . . . (ym is bkm ) where Aki are fuzzy sets defined over the input variables xi (i = 1, ..., n) and bkj are fuzzy singletons expressing the certainty degree that the output belongs to one of the m classes yj , j = 1...m. Fuzzy sets are defined by Gaussian membership functions: (xi − cki )2 (1) uki = μki (xi ) = exp 2 σki where cki and σki are the centers and the widths of the Gaussian function, respectively. To learn fuzzy set parameters and consequent parameters, we use a neuro-fuzzy network based on the well-known ANFIS (Adaptive-Network-Based Fuzzy Inference System) architecture [25]. As shown in Fig. 1, the ANFIS architecture includes four feed-forward layers:
Explainable Fuzzy Models for Learning Analytics
397
Fig. 1. Architecture of the neuro-fuzzy network.
– – – –
Layer Layer Layer Layer
1 2 3 4
computes computes computes computes
the the the the
membership degree of input values to fuzzy sets; activation strength of each fuzzy rule; normalized activation strengths; certainty degree for output classes.
This network is trained using Backpropagation procedure based on the gradient descent optimization procedure. 3.2
Black-Box Models
Four standard machine learning algorithms have been used for comparison purposes. They have been chosen for their effectiveness, but differently from the Neuro-Fuzzy System (NFS), they are not explainable. – Random Forest (RF): it is an ensemble method that consists of multiple decision trees; – Multilayer Perceptron (MP): it is a class of feedforward artificial neural networks (ANN) that uses a supervised learning technique called backpropagation for training. It can distinguish data that is not linearly separable. – Multiclass support vector machine (SVC): it uses the same principle as the classical SVM by splitting the multiclassification problem into several binary classification problems; – XGBoost (XGB): a gradient boosting decision tree algorithm. It is an ensemble technique where new models are sequentially added until no further improvements can be obtained. For all these methods Python code available in the Scikit-learn library has been used with default parameters3 . A statistical test, based on the Univariate ANOVA Test for classification has been used for feature selection. The method uses F-test for estimating the degree of linear dependency between two random variables, and thus it returns the top-most significant variables. 3
Scikit-learn library: http://scikit-learn.org/stable/.
398
G. Casalino et al.
Table 1. Quantitative evaluation of the NFS model and the black-box models. (a) Without feature selection. model accuracy precision recall f1-score NFS 0.73 0.72 0.71 0.71 RF 0.82 0.82 0.80 0.81 MP 0.72 0.72 0.73 0.72 SVC 0.60 0.80 0.50 0.37 XGB 0.80 0.79 0.78 0.78
(b) With feature selection. model accuracy precision recall f1-score NFM 0.73 0.72 0.71 0.71 RF 0.72 0.72 0.72 0.72 MP 0.74 0.74 0.71 0.71 SVC 0.67 0.76 0.59 0.55 XGB 0.75 0.75 0.72 0.73
The number of features to consider is an input parameter, thus it will be decided during the experimental phase. To implement feature selection, the Python function available in Scikit-learn library has been used4 .
4
Results
A set of experiments has been conducted to verify the effectiveness of the NFS model in correctly predicting the students’ outcomes, given fuzzy rules and the relative fuzzy terms automatically generated from data. Standard classification measures, such as accuracy, precision, recall, and f1-score have been used to quantitatively evaluate the classification performance. Moreover, visual representations of confusion matrices have been used to better investigate the predictions. A quantitative comparison with black-box models has been carried out to verify the effectiveness of the NFS predictions. Comparisons have been conducted on the original data and after the feature selection phase. Table 1 shows the qualitative evaluation of the NFS model and the black-box models in terms of standard classification measures, without and with feature selection, respectively. The aim is to verify if the feature selection phase impacts the performance. Results show that the best models are Random Forest and XGBoost, without feature selections, achieving accuracy values around 0.80). These are ensemble methods, which combine different classifiers to improve the performance while losing explainability. We can also observe that the feature selection phase affects methods thus leading to lower accuracy. On the contrary, MLP and Multiclass SVM improve their performances when feature selection is used, differently from the neuro-fuzzy model which achieves the same accuracy of 0.73 with and without feature selection. These results are also confirmed by the confusion matrices for each classification model, with and without feature selection, as shown in Fig. 2. Thanks to the use of Heatmap visualisation, misclassifications have been identified more easily. Particularly, the blue color is used for high values, whilst yellow color for low values, and shades of blue and yellow are used for intermediate values. Overall, we can conclude that ensemble methods outperform the others without feature selection. When feature selection is used, the maximum performance values decrease, and all the algorithms are comparable. 4
f classif : https://scikit-learn.org/stable/modules/generated/sklearn.feature selecti on.f\ classif.html.
Explainable Fuzzy Models for Learning Analytics
399
Fig. 2. Comparison among the NFS models and the black-box models, with and without features selection, in terms of confusion matrices.
400
G. Casalino et al.
Table 2. Fuzzy rules generated by the neuro-fuzzy model with the feature selection process. Premise (IF)
Consequent (THEN)
Homepage Homepage Homepage Homepage Homepage Homepage Homepage Homepage Homepage
Fail Fail Fail Fail Fail Fail Fail Fail Fail
is is is is is is is is is
Low and Quiz is Low Low and Quiz is Medium Low and Quiz is High Medium and Low is Low Medium and Quiz is Medium Medium and Quiz is High High and Quiz is Low High and Quiz is Medium High and Quiz is High
is is is is is is is is is
1.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0,
Pass Pass Pass Pass Pass Pass Pass Pass Pass
is is is is is is is is is
0.0 0.0 1.0 0.0 1.0 1.0 0.0 1.0 1.0
In addition, we performed a qualitative evaluation of the NFS model to appreciate its explainability, in the context of learning analytics. The number of fuzzy rules, their understandability, and the coverage of the fuzzy sets have been evaluated. We observed that when all features are used, the neuro-fuzzy model is very complex. In fact, the fuzzy system consists of 9 variables, and each variable is described by 3 fuzzy sets, leading to a total of 19683 fuzzy rules. The feature selection process reduces the number of fuzzy rules thus making the model easier to handle and more “readable”. Indeed, a total of 9 fuzzy rules is obtained, given only 2 features with 3 fuzzy sets each. In detail, from the feature selection process we obtained Homepage and Quiz as the two most relevant variables. This could be quite predictable since students who frequently access the course homepage and actively participate in the teaching activities by answering evaluation tests are more willing to succeed. Table 2 shows the fuzzy rules generated by ANFIS with the feature selection process. The antecedents of the fuzzy rules are the two input variables Homepage and Quiz with all the configurations of the three fuzzy terms (low, medium, and high), as shown in Fig. 3. The consequent of the fuzzy rules have two only possible output values, which correspond to the linguistic terms Failed and Pass, respectively, with their possibility values ranging in [0,1]. So, for example, the exam will be failed with a possibility value equals to the maximum value of 1 if Homepage is low i.e. the number of accesses to the course homepage are low), and Quiz is low, or the exam will be failed with a possibility value equal to the maximum value of 1.0, on the contrary, if Homepage is medium and Quiz is high, the exam will be passed with a possibility value equal to 1.0, thus suggesting that practicing helps successful assessments. To conclude, the NFS model has shown to be effective and also explainable when feature selection is applied. Indeed, in this case, its performance is comparable with those obtained with the other black-box models. Moreover, in the context of Learning Analytics, the NFS model has been useful to explain why a given result has been obtained. This explanation is crucial for both the learners and the teachers, who can improve their study or their course, respectively.
Explainable Fuzzy Models for Learning Analytics
401
Fig. 3. Gaussian membership functions generated by the neuro-fuzzy model with the feature selection process.
5
Conclusions
Given that information about students engagement with learning platforms is regularly produced, learning analytics is highly relevant. Automatic procedures are needed for this analysis due to the volume and velocity of the data. However, interpretable methodologies such as neuro-fuzzy systems are required to explain the results of the computations because the ultimate stakeholders are not specialists. Specifically, feature selection showed that the two most important aspects for the classification task have been determined to be Homepage and Quiz, namely. This has been done to increase the interpretability of the models that have been produced. An experimental setting was designed to evaluate the two neuro-fuzzy models’ qualitative performance and their capacity to explain. Considering the data examined, the experiments showed that the features selection algorithm significantly reduced the performance of ensemble methods, unlike the neuro-fuzzy system, whose performance was not affected. However, from the perspective of interpretability, taking into account all nine characteristics in data is unrealistic since it results in numerous rules that are difficult for a human expert to understand. The number of rules significantly reduced thanks to the feature selection strategy (from 19683 to 9). Overall, research has proven that neuro-fuzzy modeling is a helpful tool for supporting learning analytics since it not only outperforms commonly used black box models in terms of performance but also has the ability to explain linguistically how these outcomes were reached. Future work will be devoted to better assessing the feature selection task and to studying methodologies to reduce the complexity of the neuro-fuzzy models. Acknowledgment. Gianluca Zaza and Giovanna Castellano acknowledge the support of the PNRR project FAIR - Future AI Research (PE00000013), Spoke 6 - Symbiotic AI (CUP H97G22000210007) under the NRRP MUR program funded by the NextGenerationEU.
402
G. Casalino et al.
References 1. Sadiku, M.N.O., Musa, S.M., Chukwu, U.C.: Artificial Intelligence in Education. iUniverse (2022) 2. Holmes, W., Tuomi, I.: State of the art and practice in AI in education. Eur. J. Educ. (2022) 3. Zambrano, J.L., Torralbo, J.A.L., Morales, C.R., et al.: Early prediction of student learning performance through data mining: a systematic review. Psicothema (2021) 4. Farella, M., Arrigo, M., Chiazzese, G., Tosto, C., Seta, L., Taibi, D.: Integrating XAPI in AR applications for positive behaviour intervention and support. In: 2021 International Conference on Advanced Learning Technologies (ICALT), pp. 406– 408. IEEE (2021) 5. Schicchi, D., Pilato, G.: A social humanoid robot as a playfellow for vocabulary enhancement. In: 2018 Second IEEE International Conference on Robotic Computing (IRC), pp. 205–208. IEEE (2018) 6. Tripathi, G., Ahad, M.A.: IoT in education: an integration of educator community to promote holistic teaching and learning. In: Nayak, J., Abraham, A., Krishna, B.M., Chandra Sekhar, G.T., Das, A.K. (eds.) Soft Computing in Data Analytics. AISC, vol. 758, pp. 675–683. Springer, Singapore (2019). https://doi.org/10.1007/ 978-981-13-0514-6 64 7. Ahad, M.A., Tripathi, G., Agarwal, P.: Learning analytics for IOE based educational model using deep learning techniques: architecture, challenges and applications. Smart Learn. Environ. 5(1), 1–16 (2018) 8. Malandrino, D., Guarino, A., Lettieri, N., Zaccagnino, On the visualization of logic: a diagrammatic language based on spatial, graphical and symbolic notations. In: 2019 23rd International Conference Information Visualisation (IV), pp. 7–12. IEEE (2019) 9. Khosravi, H., et al.: Explainable artificial intelligence in education. Comput. Educ. Artif. Intell. p. 100074 (2022) 10. Alonso Moral, J.M., Castiello, C., Magdalena, L., Mencar, C.: Toward explainable artificial intelligence through fuzzy systems. In: Explainable Fuzzy Systems. SCI, vol. 970, pp. 1–23. Springer, Cham (2021). https://doi.org/10.1007/978-3030-71098-9 1 11. Sohail, S., Alvi, A., Khanum, A.: Interpretable and adaptable early warning learning analytics model. CMC-Comput. Mater. Continua 71(2), 3211–3225 (2022) 12. Sol´ orzano Alc´ıvar, N., Zambrano Loor, R., Carrera Gallego, D.: Natural language to facilitate the analysis of statistical evaluation of educational digital games. In: Salgado Guerrero, J.P., Chicaiza Espinosa, J., Cerrada Lozada, M., BerrezuetaGuzman, S. (eds.) TICEC 2021. CCIS, vol. 1456, pp. 127–141. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-89941-7 10 13. Alonso, J.M., Casalino, G.: Explainable artificial intelligence for human-centric data analysis in virtual learning environments. In: Burgos, D., et al. (eds.) HELMeTO 2019. CCIS, vol. 1091, pp. 125–138. Springer, Cham (2019). https://doi. org/10.1007/978-3-030-31284-8 10 14. Kaczmarek-Majer, K., et al.: Plenary: explaining black-box models in natural language through fuzzy linguistic summaries. Inf. Sci. (2022) 15. Ulfa, S., Lasfeto, D.B., Fatawi, I.: Applying fuzzy logic to customize learning materials in e-learning systems. Ubiquitous Learn.: Int. J. 14(2) (2021) 16. Stojanovi´c, J., et al.: Application of distance learning in mathematics through adaptive neuro-fuzzy learning method. Comput. Electr. Eng. 93, 107270 (2021)
Explainable Fuzzy Models for Learning Analytics
403
17. Dhokare, M., Teje, S., Jambukar, S., Wangikar, V.: Evaluation of academic performance of students using fuzzy logic. In: 2021 International Conference on Advancements in Electrical, Electronics, Communication, Computing and Automation (ICAECA), pp. 1–5. IEEE (2021) 18. Tsiakmaki, M., Kostopoulos, G., Kotsiantis, S., Ragos, O.: Fuzzy-based active learning for predicting student academic performance using automl: a step-wise approach. J. Comput. High. Educ. 33(3), 635–667 (2021) 19. Dhankhar, A., Solanki, K., Dalal, S., et al.: Predicting students performance using educational data mining and learning analytics: a systematic literature review. Innov. Data Commun. Technol. Appl. 127–140 (2021) 20. Gabriella, C., Pietro, D., Michela, F., Riccardo, P.: Fuzzy hoeffding decision trees for learning analytics. In: First Workshop on Online Learning from Uncertain Data Streams 2022. CEUR-WS (2022) 21. Ardimento, P., Bernardi, M.L., Cimitile, M., De Ruvo, G.: Learning analytics to improve coding abilities: a fuzzy-based process mining approach. In: 2019 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 1–7. IEEE (2019) 22. Nagothu, S.K., Sri, P.B., Koppolu, R.: Smart student participation assessment using fuzzy logic. IN: ICoCIST 2021, p. 673 (2021) 23. Casalino, G., Castellano, G., Zaza, G.: Neuro-fuzzy systems for learning analytics. In: Abraham, A., Gandhi, N., Hanne, T., Hong, T.-P., Nogueira Rios, T., Ding, W. (eds.) ISDA 2021. LNNS, vol. 418, pp. 1341–1350. Springer, Cham (2022). https:// doi.org/10.1007/978-3-030-96308-8 124 24. Casalino, G., Castellano, G., Vessio, G.: Exploiting time in adaptive learning from educational data. In: Agrati, L.S., et al. (eds.) HELMeTO 2020. CCIS, vol. 1344, pp. 3–16. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-67435-9 1 25. J-SR Jang and Chuen-Tsai Sun: Neuro-fuzzy modeling and control. Proc. IEEE 83(3), 378–406 (1995)
DL vs. Traditional ML Algorithms to Recognize Arabic Handwriting Script: A Review Anis Mezghani1 , Mohamed Elleuch2,3(B) , and Monji Kherallah3 1 Higher Institute of Industrial Management (ISGIS), University of Sfax, Sfax, Tunisia 2 National School of Computer Science (ENSI), University of Manouba, Manouba, Tunisia
[email protected] 3 Faculty of Sciences, University of Sfax, Sfax, Tunisia
Abstract. Handwriting Arabic script recognition has become a popular area of research. A survey of such techniques proves to be more necessary. This paper is practically interested in a bibliographic study on the existing recognition systems in an attempt to motivate researchers to look into these techniques and try to develop more advanced ones. It presents a comparative study achieved on certain techniques of handwritten character recognition. In this study, first, we show the difference between different approaches of recognition: deep learning methods vs. holistic and analytic. Then, we present a category of the main techniques used in the field of handwriting recognition and we cite examples of proposed methods. Keywords: Offline Handwriting Recognition · Segmentation · Training and classification · Deep Learning (DL) · Machine Learning (ML)
1 Introduction The techniques related to the information processing know currently a very active development in connection with data processing. It presents an increasingly important potential in the field of the human-computer interaction. In addition, machine simulation of human reading has been the subject of intensive research for the last years. The recognition of the writing comes under the wider domain which is the pattern recognition. It seeks to develop a system which is nearest to the human capacity of reading. The difficulty of recognition is partly related to the writing style; more writing is legible and regular, more the resolution is easy. The performance obtained by recognition system varies greatly with the clarity of the images provided (readability, good image resolution, lines of paragraph well spaced…). Three writing styles in ascending order of difficulty can be distinguished: the printed word, handwriting but capital letters or characters sticks, and handwriting cursive. The problem of handwritten character recognition is more complex than printed character recognition problem, due to variations in shapes and sizes of handwritten characters. Arabic script has many features that make the recognition of handwritten Arabic text an inherently difficult problem [2]. The main objective of the offline system of handwriting recognition is the transcription of handwriting script in a symbolic © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 404–414, 2023. https://doi.org/10.1007/978-3-031-35501-1_41
DL vs. Traditional ML Algorithms to Recognize Arabic Handwriting Script
405
representation for use by the computer. A typical recognition system consists of a number of stages to implement: pre-processing, segmentation, feature extraction training, classification and post-processing presented in Fig. 1. Faced with the complexity of the features extraction phase that requires domain expertise (HOG, SIFT,… Etc.) and problems encountered during the segmentation phase, many research have adopted a deep learning approach capable of processing a variety of data (sequential or spatial) to recognize characters, words and concluding sentences [3– 5]. This is due to best power of representations that possess deep architectures compared to traditional or shallow architectures. Each layer allows a representation of a highest level than the previous one (Fig. 2). We therefore hope to learn an abstract representation of data, in which the task to resolve will be easier. In the following, we will talk, in Sect. 2, of two approaches of recognition which are analytical and holistic approach then we show the main distinctions between them. In Sect. 3 we present the main steps of the recognition process and the most important techniques used in the field of handwriting recognition are presented in Sect. 4. In Sect. 5, we compare some existing work and we gave a summarize. We finish in Sect. 6 with a conclusion.
Fig. 1. Diagram of recognition System
2 Recognition Approaches According to the nature and the size of the considered unit, there are two approaches of offline handwritten recognition. It can be a global (holistic) approach or a local (analytical) approach.
406
A. Mezghani et al.
Fig. 2. A Deep architecture topology
2.1 Holistic Approach Holistic approaches are known by their simplicity and their similarity to the human reading ability. The holistic approach consists of modeling the words as not divisible entities [6–9]. In this strategy, global features, extracted from the entire word, such as loops, ascendants, descendants, profiles up/down, valleys, length, terminal dots, and many others, avoiding the segmentation process and its problems. As the size of the lexicon gets larger, the complexity of algorithms increases linearly due to the need for a larger search space and a more complex pattern representation. 2.2 Analytic Approach The analytical approach overcomes the limitations of the holistic approach but requires local interpretation based on segmentation. In this paradigm, words are broken down into collection of simpler subunits such as characters. It is a bottom-up approach which starting from identifying character and going towards building a meaningful text [10]. A recognition process according to this approach is based on the segmentation and the identification of extracted segments. Thanks to the segmentation stage, analytical approach can be able to generalize the recognition of an unlimited vocabulary. However, there is no efficient segmentation for correctly extracting the characters from a given word image.
3 Recognition Process 3.1 Segmentation The stage of segmentation makes it possible to decompose the image of a text into entities (words, graphemes or characters) to reduce the complexity of the subsequent processing
DL vs. Traditional ML Algorithms to Recognize Arabic Handwriting Script
407
modules. Segmentation is a critical phase of the single word recognition process [1]. Indeed, the separation of lines, words, pseudo-words, characters and graphemes is difficult and expensive. Moreover, scripts are varied, the lines are sometimes tangled and characters are generally related (case of Arabic: writing is semi- cursive) to each other. According to the literature, the most difficult problem is the case of segmentation of cursive writing, where the community of handwriting recognition admitted the paradox of Sayre [11] “Alerter cannot be segmented before have been recognized and cannot be recognized before being segmented”. In order to solve this problem several segmentation algorithms are proposed [12, 13]. The proposed solutions are based on two different strategies of segmentation. a) Explicit segmentation Explicit segmentation (INSEG: input segmentation) performs a priori segmentation, it constructs a graph of all cases of cuts of the word. It consists in the use of feature points in the word [14]. This type of segmentation is based directly on a morphological analysis of the text or word, or on the detection of characteristic points such as intersections, inflection points, and loops in the interior of text or words to locate prospective segmentation points [15]. Several approaches propose a direct segmentation of text or word in primitive graphemes, followed by a step of combining of these characters or graphemes [15, 16].The advantage of this segmentation is that the information is explicitly located. The major flaw of this segmentation comes first from choice of the limits independent of criteria of models. b) Implicit segmentation The implicit segmentation (OUTSEG: output segmentation) does not proceed to a prior segmentation in input of the classifier but to competition of classes of letters or graphemes in output of the classifier Contrary to the explicit segmentation, there is no pre-segmentation of the word. The segmentation is carried out during the recognition and is guided by the latter. The system seeks in the image the components or groups of graphemes that correspond to its class of letters. It is based on a recognition engine to validate and classify the segmentation hypothesis (search path points of possible segmentation). In this case, the segmentation and the recognition are made jointly; hence the name sometimes used “integrated segmentation-recognition”. This segmentation can be based on the sliding window [17, 18]. The advantage of this segmentation is that the information is located byte models letters and the validation is done by its models. There will be no segmentation fault and finally bypasses the dilemma of Sayre because knowing the letters. 3.2 Training The training phase consists to find the most appropriate models to the inputs of the problem. The result is a training database that constitutes the reference base of the system. The training step is to characterize the shape classes in order to distinguish homogeneous families of shapes. There are three types of training: supervised training, unsupervised training and reinforcement training.
408
A. Mezghani et al.
a) Supervised training For supervised training, a representative sample of all shapes to be recognized is provided to the module of training. Each form is labeled by an operator called professor; this label is used to indicate to the training module the class in which the professor wants that the shape being row. Parameters describing this partition are stored in training table, at which the decision module will then refer to classify the shapes that are presented [19]. b) Unsupervised training Contrary to supervised training, in unsupervised training there is no labeled data. It is a question of building the classes automatically without professor intervention, from reference samples and grouping rules. This mode requires a large number of samples and precise rules of construction and non contradictory, but does not always provide classification corresponding to the reality of user. This type of training is used particularly for handwritten isolated characters because the classes are known and in a limited number [2]. c) Reinforcement training Reinforcement learning refers to all the methods that allow an agent to choose which action to take and this in an autonomous way. Immersed in a given environment, it learns by receiving rewards or penalties based on its actions. Through his experience, the agent seeks to find the optimal decision-making strategy that can allow him to maximize the rewards accumulated over time. 3.3 Classification In the complete process of a pattern recognition system, the classification plays an important role in deciding on the belonging of a form to a class. The main idea of the classification is to assign an example (a form) not known to a predefined class from the description in parameters of form [19]. It constitutes the decisive element in a system of pattern recognition. The two main methods that predominant are neural networks and Markov modeling. From theoretical and industrial view, they have amply demonstrated their effectiveness. There are also various methods of pattern recognition based on the theory of fuzzy logic, KNN (K-Nearest Neighbors), Support Vector Machines (SVM), etc.
4 Used Techniques for Offline Handwriting Recognition System Today, Ecosystems for constrained and restricted vocabulary give important results. Recognition rates approaching to 100% for printed or mono-scripter of good quality documents. However, the recognition system of Arabic handwritten reached less important than system for Latin language [2, 7]. Researcher’s tyro fined supple and intelligent techniques of recognition. We can classify the useful techniques into four main classes: statistical methods, connectionist methods, structural methods and stochastic methods and there are also hybrids methods that use two or more of these techniques together.
DL vs. Traditional ML Algorithms to Recognize Arabic Handwriting Script
409
4.1 Statistical Methods It is an approach that is mainly based on mathematical foundations. The object of this approach is to describe the forms from a simple probabilistic model. We distinguish in general two types of methods: (1) the non-parametric methods, where we seek to define the class boundaries in the space of representation, in order to organize the unknown point by a series of simple tests; (2) the parametric methods (Bayesian) where we are given a model of the distribution of each class (typically Gaussian), and where we seek the class to which the point haste greatest probability of belonging. Parametric and non-parametric statistical classifiers were used by Abandon et al. in [20] for optical character recognition in Arabic handwritten scripts. They used Quadratic Discriminate Analysis (QDA) and extracted features from the raw main body, the main body’s skeleton, and main body’s boundary. Another theory of statistical methods is the KNN. This method has the advantage of being easy to implement and provides good results. Its main disadvantage is due to the low speed of classification due to the large number of distances to be calculated. In a statistical model, the recognition is represented by a mathematical model whose parameters must be estimated. These models constitute a limitation since they always remain an approximation of the shape of classes. 4.2 Structural Methods In general, structural methods allow the description of complex shapes from basic shapes called features which are extracted directly from data present in input of system. The main difference between these methods and statistical methods is that these features are elementary shapes and not measures. Another difference is that they introduce the notion of ordering the description of a shape. The most common methods use the calculation of edit distance between two strings and the dynamic programming. These methods are divided depending on the structure used. A semi-skeletonization technique was used in [21] to reconstruct in offline Arabic handwriting a tracing path similar to that in the case of online to calculate the characteristics of characters. With the application of an SVM classifier in the classification phase, this approach allowed to achieve very interesting recognition rates in reduced time. 4.3 Stochastic Methods Unlike the methods described above, the stochastic approach uses a model for the recognition, taking into account the high variability of form. The distance commonly used in techniques of “dynamic comparison” is replaced by probabilities calculated more finely by learning. The form is considered as a continuous observable signal in time in different locations constituting statements of observations. The model describes these states using probabilities of transitions of states and probabilities of observation per state. The comparison is to look in the graph of sates the path of high probabilities corresponding to sequence of elements observed in the input string. These methods are robust and reliable due to the existence of efficient training algorithms. The most answered methods in his approach are methods using Hidden Markov Models (HMM).
410
A. Mezghani et al.
In addition, semi-cursive Arabic script, in both printed and manuscript form, lends itself naturally to a stochastic modeling, at all levels of recognition. These models can take over the noise and the inherent variability of handwriting and avoid the problem of explicit segmentation of words. As result, they are widely used in automatic handwriting recognition [13, 22]. 4.4 Connectionist Methods The connectionist model overcomes the problem of statistical methods in repre-senting recognition in the form of a network of elementary units connected by weighted arcs. It is in these connections reside the recognition, and it can take more varied forms than mathematical model. Nodes of this graph are simple automata called formal neurons. The neurons are endowed with an internal state, the activation, by which they influence the other neural of the network. This activity is propagated in the graph along weighted arcs called synaptic links. In OCR, the primitives extracted on image of character (or the selected entity) constitute the network inputs. The activated output of the network corresponds to the recognized character. The choice of network architecture is compromise between computational complexity and recognition rate. Moreover, the strength of neural networks resides in their ability to generate decision region of any shape required by classification algorithm, at the price of the integration of layers of additional cells in the network. The artificial neural networks (ANN) are means strongly connected of distribut-ed elementary processors functioning in parallel and used for the training and classification. Currently, the most used types in systems of handwriting recognition are multilayer perceptions(MLP) [15, 23–25], direct propagation and associative memories or Kohonenmaps (SOM) ‘Self Organizing Map’ [26] which are recurring ANN. The convolution neural network was also used for multi-script recognition by extracting features and learning from raw input [27].This eliminates the need for manually defining discriminative features for particular scripts. These networks have been successful in various vision problems such as digit and character recognition. The problem using neural network approaches is that the objective function is non-convex, and their learning algorithms may get stuck in local minima during gradient descent. To overcome these limitations, many studies have proposed replacing these approaches successfully by a classifier, namely the deep networks [28, 29]. The new concept behind it the use of hidden variables as observed variables to train each layer of the deep structure independently and greedily.
5 Synthesis From the realized study, firstly, we conclude that the major problems in this field can be reduced to the cursively of the writing and sensitivity of certain topological characteristics of Arabic; to the variability of connections inter-character or the horizontal and vertical ligatures as well as the presence of overlapping. Secondly, Arabic script and cursive Latin script have many common points, so it is possible to transfer to the Arabic
DL vs. Traditional ML Algorithms to Recognize Arabic Handwriting Script
411
system the techniques already proven in Latin. Specific preprocessing to the Arabic writing are required (diacritics detection, detection of the baseband), but the segmentation into graphemes, the feature extraction and the recognition step can be the same as those used for recognition of the Latin script. The major advantage of the analytical approach is in the ability, theoretically, to recognize any words, since the basic unit of modeling is the character or sub-character (grapheme) and the number of characters is naturally finished. However, his biggest weakness is in the process of segmentation that is not always trivial and requires a lot of computation time, and there is a great variability inherent in the form of segments. The global approach is perfectly possible for the recognition with a limited vocabulary, even for degraded words. It generally suffers from a problem of lack of information sufficiently discriminating for words, which can increase the risk of confusion when the size of the lexicon becomes important. Some of the current approaches propose to take advantage of both methods, reducing the complexity of the holistic approach by applying it to smaller entities (letters). The analytical approach seeks the sequence of letters contained in the image to recognize. Some models allow combining these two levels in one and thus can overcome the prior segmentation of the image. In addition, according to this study performed about the techniques used in the field of off-line handwriting recognition we can say that in the recent years the research is mainly directed towards techniques that have proven their performance in the field of handwriting recognition like HMM, SVM and Neural Networks. The Markov approaches are well adapted to modeling sequential data.HMMs have several advantages in handwriting recognition. Indeed, they take into account the variability of forms and noise that disturb writing especially in the handwriting. They also allow taking into account the sequence of variable lengths. This quality is especially critical in recognition of handwriting, where the length of letters, words can vary greatly depending on the writing styles and habits of writers. A major problem in handwriting recognition is the huge variability and distortions of patterns. Elastic models based on local observations and dynamic programming such HMM are not efficient to absorb this variability. SVMs are discriminative models that attempt to minimize the leaning error while maximizing the margin between classes. The main idea is that two classes can be linearly separated in a high-dimensional space. In the case where the points are separable, there is often an infinite hyper plane separator. There is not much system of recognition of Arabic handwriting which is based on SVM compared with other technical such as HMM and ANN. The majority of classifiers meet a major problem which lies in the variability of the vector features size. In literature, three approaches are commonly used to manage the problems of dimensionality. These approaches are: Genetic Algorithms, Dynamic Programmation and Graph Matching. Recently, deep learning architectures [30] have been used for unsupervised feature learning, such as Convolutional Neural Network (CNN), Deep Belief Network (DBN) and Convolutional DBN. Convolutional Neural Network developed by LeCun et al. [31], is a specialized type of neural network which learns the good features at each layer of the visual hierarchy via back propagation (BP). Ranzato et al. [32] achieves improvements in performance when they applied an unsupervised pre-training to a CNN. These networks
412
A. Mezghani et al.
have been successful in various vision problems such as digit and character recognition [33]. Deep Belief Network is a multi-layer generative model [34] which learns higher-level feature representations from unlabeled data using an unsupervised learning algorithms, such as Restricted Boltzmann Machines (RBMs), auto-encoders and sparse coding. These algorithms have only succeeded in learning low-level features such as “edge” or “stroke” detectors with simple invariances. CDBN, which is composed of convolutional restricted Boltzmann machines (CRBMs), have been applied in several fields such as vision recognition task [35], automatic speech recognition (ASR) and EEG signal classification. Lee et al. in [35] demonstrated that CDBN had good performance. The principal idea is to scale up the algorithm to deal with high-dimensional data.
6 Conclusion We presented in this work the field of automatic handwriting recognition. Different distinctions between deep learning and holistic / analytical approaches were presented in the first stage of this paper. In second stage, the main steps followed for the realization of a recognition system were displayed. We also mentioned some work done in the field of recognition of Arabic handwriting and compared the results. It is very difficult to make a judgment about the success of the results of recognition methods, especially in terms of recognition rates, because of different databases, constraints and sample spaces. Several improvements can be attributed to the systems of recognition of Arabic script. These improvements can be based mainly on a combination of classifiers and a specific pretreatment to the Arabic script.
References 1. Krichen, O., Corbillé, S., Anquetil, É., Girard, N., Fromont, É., Nerdeux, P.: Combination of explicit segmentation with Seq2Seq recognition for fine analysis of children handwriting. Int. J. Document Anal. Recogn. (IJDAR), 1–12 (2022) 2. Mezghani, A., Kallel, F., Kanoun, S., Kherallah, M.: Contribution on character modelling for handwritten Arabic text recognition. In: International Afro-European Conference for Industrial Advancement: AECIA 2016, pp. 370–379 (2016) 3. Elleuch, M., Jraba, S., Kherallah, M.: The Effectiveness of Transfer Learning for Arabic Handwriting Recognition using Deep CNN. J. Inf. Assurance Sec. 16(2) (2021) 4. Al-Saffar, A., Awang, S., Al-Saiagh, W., Al-Khaleefa, A. S., Abed, S.A.: A Se-quential Handwriting Recognition Model Based on a Dynamically Configurable CRNN. Sensors, 21(21), 7306 (2021) 5. Elleuch, M., Kherallah, M.: Convolutional Deep learning network for handwritten Arabic script recognition. In: Abraham, A., Shandilya, S.K., Garcia-Hernandez, L., Varela, M.L. (eds.) HIS 2019. AISC, vol. 1179, pp. 103–112. Springer, Cham (2021). https://doi.org/10. 1007/978-3-030-49336-3_11 6. Malakar, S., Sahoo, S., Chakraborty, A., Sarkar, R., Nasipuri, M.: Handwritten Arabic and Roman word recognition using holistic approach. Visual Comput., 1–24 (2022) 7. Madhvanath, S., Krpasundar, V., Govindaraju, V.: Syntactic methodology of pruning large lexicons in cursive script recognition. Pattern Recogn. 34(1), 37–46 (2001)
DL vs. Traditional ML Algorithms to Recognize Arabic Handwriting Script
413
8. Khorsheed, M.S.: Recognising handwritten Arabic manuscripts using a single hidden Markov model. Pattern Recogn. Lett. 24(14), 2235–2242 (2003) 9. Jayech, K., Mahjoub, M.A., Ben Amara, N.E.: Synchronous multi-stream hidden Markov model for offline Arabic handwriting recognition without explicit segmentation, NeuroComputing 214, 958–971 (2016) 10. Sadhu, S., Mukherjee, A., Mukhopadhyay, B.: A comparative review on machine learning based algorithms to convert handwritten document to English characters. In: Applications of Machine Intelligence in Engineering, pp. 21–529 (2022) 11. Sayre, K.M.: Machine recognition of handwritten words: A project report. Pattern Recogn. 5(3), 213–228 (1973) 12. Kundu, S., Paul, S., Bera S.K., Abraham A., Sarkar R.: Text-line extraction from handwritten document images using GAN. Expert Syst. Appli., 140 (2020) 13. Kohli, M., Kumar, S.: Segmentation of handwritten words into characters. Multimedia Tools Appli. 80(14), 22121–22133 (2021). https://doi.org/10.1007/s11042-021-10638-0 14. Bozinovic, R.M., Srihari, S.N.: Off-line cursive script word recognition. IEEE Trans. Pattern Anal. Mach. Intell. 1, 68–83 (1989) 15. Zermi, N., Ramdani, M., Bedda, M.: Arabic handwriting word recognition based on hybride HMM/ANN approach. Int. J. Soft Comput. 2(1), 5–10 (2007) 16. Touj, S.M., Amara, N.E.B., Amiri, H.: A hybrid approach for off-line Arabic handwriting recognition based on a Planar Hidden Markov modeling. In Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), vol. 2, pp. 964–968, IEEE (September 2007) 17. Pechwitz, M., Maergner, V., El Abed, H.: Comparison of two different feature sets for offline recognition of handwritten arabic words. In: Tenth International Workshop on Frontiers in Handwriting Recognition.Suvisoft (October 2006) 18. Noubigh, Z., Mezghani, A., Kherallah, M.: Open vocabulary recognition of offline Arabic handwriting text based on deep learning. In: International Conference on Intelligent Systems, Design and Applications: ISDA 2020 19. A. Mezghani et M. Kherallah, “Recognizing handwritten Arabic words using optimized character shape models and new features”, International Arab Conference on Information Technology: ACIT’2017 20. Abandah, G.A., Younis, K.S., Khedher, M.Z. Handwritten Arabic character recognition using multiple classifiers based on letter form. In: Proceedings of the 5th International Conference on Signal Processing, Pattern Recognition, and Applications (SPPRA), pp. 128–133 (February 2008) 21. Faouzi, Z., Abdelhamid, D., Chaouki, B.M.: An approach based on structural segmentation for the recognition of arabic handwriting (2010) 22. Azeem, S.A., Ahmed, H.: Effective technique for the recognition of offline Arabic handwritten words using hidden Markov models. Int. J. Document Anal. Recogn.(IJDAR) 16(4), 399–412 (2013). https://doi.org/10.1007/s10032-013-0201-8 23. Chakraborty, A., De, R., Malakar, S., Schwenker, F., Sarkar, R.: Handwritten digit string recognition using deep autoencoder based segmentation and resnet based recognition approach. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 7737–7742. IEEE (January 2021) 24. Amin, A.: Recognition of hand-printed characters based on structural description and inductive logic programming. Pattern Recogn. Lett. 24(16), 3187–3196 (2003) 25. Sari, T., Sellami, M.: Cursive Arabic script segmentation and recognition sys-tem. Int. J. Comput. Appl. 27(3), 161–168 (2005) 26. Mezghani, N.: Densités de probabilité d’entropie maximale etmemoires associatives pour la reconnaissance en ligne de caractères Arabes”, Doctoral thesis, (INRS), Canada, (2005)
414
A. Mezghani et al.
27. Noubigh, Z., Mezghani, A., Kherallah, M.: Densely connected layer to improve VGGnetbased CRNN for Arabic handwriting text line recognition. Int. J. Hybrid Intell. Syst. IJHIS 2021, 1–15 (2021) 28. Elleuch, M., Tagougui, N., Kherallah, M.: Deep learning for feature extraction of Arabic handwritten script. In: International Conference on Computer Analysis of Images and Patterns, pp. 371–382. Springer, Cham (2015) 29. Al-Ayyoub, M., Nuseir, A., Alsmearat, K., Jararweh, Y., Gupta, B.: Deep learning for Arabic NLP: A survey. J. Comput. Sci. 26, 522–531 (2018) 30. Alrobah, N., Albahli, S.: Arabic handwritten recognition using deep learning: a survey. Arabian J. Sci. Eng., 1–21 (2022) 31. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings IEEE 86(11), 2278–2324 (1998) 32. Ranzato, M., Boureau, Y., LeCun, Y.: Sparse feature learning for deep belief networks. In: Proceedings of Annual Conference on Neural Information Processing Systems (NIPS), Canada (2007) 33. Elleuch, M., Tagougui, N., Kherallah, M.: A novel architecture of CNN based on SVM classifier for recognising Arabic handwritten script. Int. J. Intell. Syst. Technol. Appli. 15(4), 323–340 (2016) 34. Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006) 35. Lee, H., Grosse, R., Ranganath, R., Ng, A.Y.: Unsupervised learning of hier-archical representations with convolutional deep belief networks. Commun. ACM 54(10), 95–103 (2011)
Data Virtualization Layer Key Role in Recent Analytical Data Architectures Montasser Akermi(B) , Mohamed Ali Hadj Taieb, and Mohamed Ben Aouicha Data Engineering and Semantics Research Unit, Faculty of Sciences of Sfax, University of Sfax, Sfax, Tunisia [email protected], {mohamedali.hajtaieb,mohamed.benaouicha}@fss.usf.tn
Abstract. The amount of data, its heterogeneity and the speed at which it is generated are increasingly diverse and the current systems are not able to handle on-demand real-time data access. In traditional data integration approaches such as ETL, physically loading the data into data stores that use different technologies is becoming costly, time-consuming, inefficient, and a bottleneck. Recently, data virtualization has been used to accelerate the data integration process and provides a solution to previous challenges by delivering a unified, integrated, and holistic view of trusted data, on-demand and in real-time. This paper provides an overview of traditional data integration, in addition to its limits. We discuss data virtualization, its core capabilities and features, how it can complement other data integration approaches, and how it improves traditional data architecture paradigms. Keywords: Data virtualization architecture · Big Data
1
· Data integration · Data
Introduction
New emerging technologies such as smart devices, sensors, augmented and virtual reality, robotics, biometrics, 5G, and blockchain have led to generating a huge amount of real-time; both structured and unstructured; data that flows across different environments and infrastructures. It is estimated that the global data created by 2025 will reach 175 zettabytes [30]. Often, the overwhelming of information and distributed data streams surpass the current technological capacity to manage and analyse data [27]. However, this enormous amount of data creates more opportunities for modern organizations to reduce overhead and gain a competitive advantage [29]. Before it is analyzed by data scientists and data analysts, data has to be integrated. But with the volume, variety, and velocity of data, organizations ended up creating data silos and data swamps [34]. They assumed that data lakes would solve their problems, but now these data lakes are becoming data swamps with so much data that is impossible to analyse and understand [9,18], this data is referred to as dark data [10]. This occurred because a data lake; c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 415–426, 2023. https://doi.org/10.1007/978-3-031-35501-1_42
416
M. Akermi et al.
just like a traditional data warehouse and data lakehouse [2]; involves physically extracting and loading the data. Even with the most recent generation of data architecture paradigm; data lakehouse; which implements data warehouse features such as ACID transactions and SQL support directly over the data lake, relying on ETL presents new challenges, particularly challenges related to query execution engines [3,20]. Data virtualization enables disparate data sources to appear as a single data store. This allows faster data integration and processing for different practitioners. Instead of moving data to a new location, data is left in its original place, while data quality and ownership are managed by data virtualization [8]. In this year’s report; 2022 Gartner Magic Quadrant for Data Integration Tools, Gartner analysts once again included data virtualization as a key criterion for evaluating data integration vendors. Data virtualization is no longer a differentiating but a “must have” feature [36]. This paper aims to focus on data virtualization and its approaches to creating modern data architectures. Section 2 addresses the challenges of traditional data integration. Section 3 discusses the concept and features of data virtualization. Section 4 provides an overview of how data virtualization can be used to enhance traditional data architecture paradigms i.e. data warehouse and data lake. Finally, the last section includes the conclusion and future work.
2
Traditional Data Integration Methods
There are many possible approaches for data integration, but they can be classified into physical integration and virtual integration. The goal of data integration is to offer unified access to a set of disparate data sources [14]. 2.1
Extract, Transform, Load (ETL)
The extract, transform, load process is a data integration pattern first coined in the 1970 s. This process involves: i) extracting the data from the source, ii) transforming the extracted data into the format required by the final data repository, and iii) persisting the transformed data in the final data repository. This process is the main integration pattern used in data warehouses [25]. Over the years other processes emerged from ETL, such as ELT (Extract, Load, Transform), which loads the data before making any transformation. It is especially used in data lakes. ELT improves the up-front transformation that data warehouses demand. This up-front transformation is a blocker. Not only does it lead to slower iterations of data modelling, but also it alters the nature of the original data and mutates it. Both ETL and ELT involve copying the data, thus creating more replications. When moving data in bulk, ETL is very effective, since it is easy to understand and widely supported. Most organizations have in-house ETL. However, moving data is not always the best strategy, since data in the new location has to be maintained as well. Another disadvantage is dealing with thousands of ETL
Data Virtualization Layer Key Role in Recent Analytical Data Architectures
417
processes synchronized by some scripts, it becomes very complex to modify. And with todays data volume and variety, the ETL process is struggling. Organizations started looking for an alternative data integration approach that supports real-time capability. 2.2
Enterprise Service Bus (ESB)
Enterprise Service Bus was originally coined by analysts from Gartner [21]. It does not involve moving the data, instead, it uses a message bus to facilitate the interactions of applications and services. Applications are connected to the message bus. This allows them to communicate and exchange messages in realtime. Applications in ESB are decoupled, therefore, no need for one application to know about, or depend on other applications [21]. Just like ETL processes, ESB is supported by most organizations. It is fitted for operational scenarios but not for analytical use cases because it cannot integrate data. ETL processes run in batches. They are often scripted, and hard to maintain over time. ESB was introduced to move away from point-to-point integration, like ETL scripts, and offer real-time interaction between applications. However, it can not integrate application data to deliver analytical use cases. In the next section, we introduce the concept of data virtualization which overcome the challenge to offer real-time data integration.
3
Data Virtualization
Data virtualization is an abstraction layer that integrates data from a wide variety of structured, semi-structured and unstructured sources; whatever the environment; to create a virtual data layer that delivers unified data services in real-time [23] to disparate data consumptions e.g. applications, processes, and users (see Fig. 1). Data virtualization is a modern data integration solution. It does not copy the data as most data integration patterns do. Instead, it takes a different approach by providing a view of the integrated data. It keeps the source data exactly where it is [32]. Resulting in lowered costs, fewer replications, and minimum data latency. Data virtualization can replace traditional data integration to reduce the number of replicated data marts and data warehouses [24]. But it is also highly complementary since it is a data services layer, which means it can be used between ETL, ESB, applications, whatever the environment e.g. public cloud and on-premise. Data consumers query the data virtualization layer which gets data from various data sources. The Data virtualization layer hides where and how data is stored, and whether data needs to be aggregated, joined, or filtered before it is delivered to data consumers. Thus, the location and the implementation of the physical data, and the complexity of accessing the data are hidden from
418
M. Akermi et al.
Fig. 1. Data virtualization integrates data from multiple sources and delivers it to different data consumers
data consumers [4]. The data virtualization layer takes the shape of a single data repository. Because data is not replicated, the data virtualization layer contains only the metadata of each data source, as well as any global instruction e.g. data governance policies and data security rules. Data virtualization is an abstraction layer. Therefore, it is highly complementary to use ETL, ESB, and other data integration patterns. In the following few sections, we discuss how is data virtualization complementing ESB and ETL processes. Furthermore, an overview of data virtualization core capabilities and features was provided. 3.1
Data Virtualization Complements ETL to Support Cloud-Based Sources
ETL was created to move data to other repositories e.g. data warehouses. However, it is not easy to move data from cloud-based sources with ETL [22]. Data virtualization can complement ETL processes to i) enable real-time data integration of multiple data sources. ii) connect on-premise with cloud-based data sources without the need to move all the data and put it in a single repository. iii) unify data across data warehouse and new on-premise or cloud-based data store. iv) use the data virtualization layer to access data faster than using ETL processes [13]. 3.2
Data Virtualization Complements ESB to Add More Sources
Adding new sources to ESB can be a complex task, especially unstructured data sources e.g. web pages, flat files, email messages and cloud-based sources. To facilitate this process, data virtualization can unify these disparate sources and provide a single data source that is supported by ESB [22].
Data Virtualization Layer Key Role in Recent Analytical Data Architectures
419
Table 1. Data integration use cases and different patterns Use case
Patterns Data virtualization ETL ESB
Moving data between data repositories
X
Data unification
X
Real-time reports and insights
X
Migrating data to the cloud
X
Self-service analytics
X
360 customer view
X
Data warehouses and data marts
X X
X
Logical data warehouses and virtual data marts X Data warehouse offloading
X
Logical data lakes
X
X
Table 1 shows data integration approaches that can be applied to different use cases. Data virtualization is not always the best data integration approach for a specific problem. Shraideh et al. [33] developed a structured and systematic decision support that considers fifteen critical success factors [12] to decide upon ETL, data virtualization, or a hybrid solution of both patterns as a suitable data integration approach. 3.3
Faster Data Access and Delivery
Traditional data integration patterns; such as ETL; involve physically moving the data to multiple locations e.g. data stores, databases, data warehouses, data lakes, data lakehouses, and cloud-based repositories. This process is usually done manually which creates many replications across the network. This makes the data architecture slower, costlier, and more complex [12]. Adding a data virtualization layer to the data architecture enables fast, easy, and agile solutions [17], therefore helping organizations become data-driven. In the case of self-service business intelligence [19], no need to physically move the data and aggregate it locally, business users can add multiple data sources, these sources are then connected to the data virtualization layer through pre-built data connectors, also called adapters. The data is unified and rapidly delivered to the business intelligence system for future reports and insights because physically moving data is one of the main reasons for the high latency in traditional data architectures. The development of data services is faster because of the unified data layer. Developers do not need to connect to all data sources that have different formats residing in different data repositories.
420
3.4
M. Akermi et al.
Self-service Analytics
With self-service analytics, business users do not need to ask the data engineering team to assist when doing analytics. The data engineering team can then focus on and improve other architectural matters. However, it can not be easily achieved because i) data is everywhere, in databases, data warehouses, cloud and big data architectures, ii) low data integrity because of no single source of truth, iii) high data latency, and iv) there is no data lineage which affects data quality and makes it questionable. Data virtualization can overcome these challenges and makes self-service available to business users [1,7]. By this, cost and complexity are reduced, and replications are created only when it is necessary. 3.5
Data Virtualization Core Capabilities
A data virtualization layer should be at least capable of the following core principles [5,6]: Pre-Built Connectors to quickly connect, explore, and extract the data from any on-premise or cloud-based sources and any data type: structured, semistructured, and unstructured. Self-Service Data Services where complexity is hidden from data consumers, data sources and data consumers are decoupled, which enables data services to be easily created without the interference of the data engineering team. Single Logical Data Model by running automatic processing to maintain data catalogs that contain metadata, data classes, data clusters, etc. Unified Data Governance to enable a single entry point for data, metadata management, audit, logging, security, and monitoring. External data management tools are also integrated into the system. A Unified Data Layer is the main component of a virtual data layer. This layer harmonizes, transforms, improves quality, and connects data across different data types. Universal Data Publishing Mechanism through unified connected services to provide users with the requested results of processed data. Agile and High Performance where real-time optimizations are performed repeatedly to create a flexible workload [8,12]. 3.6
Privacy and Data Protection
Data virtualization enables data privacy by default. It helps organizations comply with the protected by design requirement of GDPR1 . Data sources are not required to be predefined in a particular format or accessed in a certain method. Data virtualization supports advanced data protection mechanisms e.g. anonymizing data, the immutability of data (refusal of signature), and end-toend encryption of transmitted transactions [6]. 1
The European Unions General Data Protection Regulation.
Data Virtualization Layer Key Role in Recent Analytical Data Architectures
3.7
421
Data Services
These services are critical to the data virtualization layer. Example of services: 360 customer view such as querying all customers who exist in all repositories, or getting the last 5 years’ revenue, from all stores etc. These services are usually business-oriented, which helps business users create reports and insights easily and without needing to wait for a data engineer to build a pipeline to get the result needed. Operational-oriented services also exist, e.g. changing the address of a customer or updating his email address. A service can be formed of three components. An interface that is responsible for handling incoming parameters and outgoing results. The logic forms the body of the service that deals with data preparation specifications. The source abstraction makes the service independent of the data source system. These services may sound easy to implement, but that is not true. They are responsible for most of the data preparation work, such as data transformation, enrichment, joining, federation, synchronization, historicization, etc. Some of the services might be extremely complex and need access to multiple systems, some might need to move the data, and others might need to run some machine learning algorithms. The fact that this complexity is common in ETL processes, however, with data virtualization, it is completely hidden. The performance of a data service depends on the performance of the systems where data is stored and other services it runs. However, data services must know the best-optimized way to access these systems. It could be set-oriented by sending a request for a set of data (e.g. get a customer’s info and all his invoices). Or a record-oriented approach by sending a request for one record which is relatively simple to process (e.g. get a customer’s info). The biggest performance challenge, however, is when a service joins data from multiple systems. Extracting the data and letting the service execute the joins leads to poor performance. This is because so much data is being extracted from the systems and so much data is being moved across the network. To overcome this challenge data virtualization layer needs to push-down queries as much as possible to use the power of connected data sources and get back only the requested records; or the result of the request; instead of the complete dataset. Services must be served to data consumers as one integrated system. For example, developing a 360 customer view involves access to data from disparate repositories, however, this is hidden from the consumers. Abstraction is achieved through data services provided by the data virtualization layer. 3.8
Virtual Tables
Data virtualization helps in making the development of service logic easier. The main component of data virtualization is a virtual table. A virtual table describes how the data sources need to be transformed. It can be used to define how data, which can be from other virtual tables, need to be
422
M. Akermi et al.
prepared. A virtual table can be accessed through different interfaces, it can be published as REST services, OData2 services, JDBC services, etc (see Fig. 2).
Fig. 2. Data virtualization and virtual tables
Data virtualization hides from virtual tables developers the location and the implementation of the real data. To them, it seems as if they are accessing a single logical database instead of many data sources and services, a single interface instead of all the different technologies and interfaces. This makes development faster. 3.9
Query Pushdown
Query pushdown is an optimization technique used by data virtualization. This technique consists of pushing down data processing as much as possible to the data source system e.g. data lake, database, flat file, etc. Data virtualization minimizes network traffic and uses the maximum potential of the connected data sources. For example, a NoSQL database is a scalable and highly optimized database engine [1]. Running aggregations on the database is much more efficient than running it on the data virtualization layer.
2
Open Data Protocol.
Data Virtualization Layer Key Role in Recent Analytical Data Architectures
4
423
Data Virtualization Architectural Approaches
In a monolithic data architecture e.g. data lake and data warehouse, data is located in a single repository. In contrast, in a distributed data architecture, data is distributed across multiple locations. 4.1
Monolithic Data Architecture Challenges
Monolithic data architectures often fail because of many challenges [5]. These challenges can be classified into three categories: Architectural challenges such as source and use case proliferation, the continuous change of the data landscape, the non-scalable approach, and the data landscape outgrowing the data management architecture. Technological challenges such as the tightly coupled pipelines, the inconsistent tools and various specialized skills, the complexity debt of the data pipeline, and the difficulty of sharing data between public clouds and on-premise. Organizational challenges include the absence of data culture in the organization, the data engineers lacking domain expertise and being siloed from the operational units. Overcoming many of these challenges is possible by integrating a data virtualization layer to enable distributed data architectures. 4.2
Logical Data Warehouses
Business analysts are experiencing new challenges with the technological revolution such as big data and cloud-based analytics [28]. New data sources e.g. data from robots and intelligent devices, social media, raw data, etc., are not structured to be suited for traditional data warehouses. It could be converted but it would increase the volume of data, therefore increasing the cost to maintain this data inside data warehouses. Organizations tend to use alternative solutions e.g. data lakes because storing data there is way cheaper than storing it in data warehouses [35]. However, since not all the data is in the data warehouse, analysts can not generate reports of all the relevant data [16]. The logical data warehouse is a data architecture that extends the traditional or enterprise data warehouse concept by adding an abstraction layer where external sources are integrated without the heavy ETL workload. The core components of a logical data warehouse architecture contain i) a layer of real-time connectors to data sources. ii) unified data layer. iii) normalized views [6,26]. Some common use cases for logical data warehouses include virtual data marts, integration of multiple data warehouses, and data warehouse offloading.
424
4.3
M. Akermi et al.
Logical Data Lakes - Big Data Virtualization
Just like data lakes, big data virtualization applies the schema-on-read principle. This approach can handle the massive volume and variety of data [11]. As mentioned before, data virtualization allows the data to stay in its original data storage, while being available to data consumers such as application and business users. To find datasets in the systems, a data catalog that presents all the data; including physical data lakes; is needed. Data catalog stores metadata of all the data inside the different systems connected to the data lake. Thus, the search functionality runs rather quickly. The data catalog is the main interface of logical data lakes. When a user searches for data, it does not matter where it has physically located. The search process is always processed the same way [11].
5
Conclusion and Future Work
Data virtualization is a decoupling technology, that offers many features which overcome traditional data integration patterns and enhance analytical data architectures. It increases the efficiency of data operations, minimizes replications, and reduces complexity and cost. Data virtualization enables agile data services, real-time data delivery, and self-service for different users without the intervention of the data engineering team. This shifts the paradigm from data storage to data usage, and from moving data to connecting data. Therefore, creating Data as a Service (DaaS) [31] where data is not moved from the source system to the target system. Instead, the target system can ask for data on demand. It can use services offered by the source system to manipulate data. Data virtualization changed the way data is looked at and how data services are developed. The next information revolution will give more attention to semantics and the meaning of the distributed data through data virtualisation. Data virtualization offers more opportunities for data management solutions, thus creating new data architectures which will stand out when more heterogeneous data comes in. For future work, we recommend evaluating the performance of the logical data warehouse and the data lakehouse using the same data. We also recommend creating a conceptual model or a reference architecture of the data virtualization. Another future work would be creating a new generation of data virtualization that supports many data management capabilities. Lastly, we have not seen any implementation of data virtualization in the lakehouse architecture. Therefore, future work should focus on the data lakehouse and the newly emerging architectures in general and the role of data virtualization in these architectures.
Data Virtualization Layer Key Role in Recent Analytical Data Architectures
425
References 1. Alagiannis, I., Borovica, R., Branco, M., Idreos, S., Ailamaki, A.: NoDB: efficient query execution on raw data files. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 241–252 (2012) 2. Armbrust, M., Ghodsi, A., Xin, R., Zaharia, M.: Lakehouse: a new generation of open platforms that unify data warehousing and advanced analytics. In: Proceedings of CIDR (2021) 3. Behm, A., et al.: Photon: a fast query engine for Lakehouse systems. In: Proceedings of the 2022 International Conference on Management of Data, pp. 2326–2339 (2022) 4. Bogdanov, A., Degtyarev, A., Shchegoleva, N., Khvatov, V.: On the way from virtual computing to virtual data processing. In: CEUR Workshop Proceedings, pp. 25–30 (2020) 5. Bogdanov, A., Degtyarev, A., Shchegoleva, N., Khvatov, V., Korkhov, V.: Evolving principles of big data virtualization. In: Gervasi, O., et al. (eds.) ICCSA 2020. LNCS, vol. 12254, pp. 67–81. Springer, Cham (2020). https://doi.org/10.1007/ 978-3-030-58817-5_6 6. Bogdanov, A., Degtyarev, A., Shchegoleva, N., Korkhov, V., Khvatov, V.: Big data virtualization: why and how? In: CEUR Workshop Proceedings (2679), pp. 11–21 (2020) 7. Chatziantoniou, D., Kantere, V.: Datamingler: a novel approach to data virtualization. In: Proceedings of the 2021 International Conference on Management of Data, pp. 2681–2685 (2021) 8. Earley, S.: Data virtualization and digital agility. IT Professional 18(5), 70–72 (2016) 9. Eryurek, E., Gilad, U., Lakshmanan, V., Kibunguchy-Grant, A., Ashdown, J.: Data governance: the definitive guide. “O’ Reilly Media, Inc.” (2021) 10. Gartner: Definition of dark data - it glossary. https://www.gartner.com/en/ information-technology/glossary/dark-data. Accessed 14 Apr 2022 11. Gorelik, A.: The enterprise big data lake: delivering the promise of big data and data science. O’Reilly Media (2019) 12. Gottlieb, M., Shraideh, M., Fuhrmann, I., Böhm, M., Krcmar, H.: Critical success factors for data virtualization: a literature review. ISC Int. J. Inf. Secur. 11(3), 131–137 (2019) 13. Guo, S.S., Yuan, Z.M., Sun, A.B., Yue, Q.: A new ETL approach based on data virtualization. J. Comput. Sci. Technol. 30(2), 311–323 (2015) 14. Halevy, A., Doan, A.: Zgi (autor). Principles of data integration (2012) 15. Hilger, J., Wahl, Z.: Graph databases. In: Making Knowledge Management Clickable, pp. 199–208. Springer, Cham (2022). https://doi.org/10.1007/978-3-03092385-3_13 16. Kukreja, M.: Data engineering with apache spark, delta lake, and Lakehouse. “Packt Publishing Ltd.” (2021) 17. Van der Lans, R.F.: Creating an agile data integration platform using data virtualization. R20/Consultancy technical white paper (2014) 18. Van der Lans, R.F.: Architecting the multi-purpose data lake with data virtualization. Denodo (2018) 19. Lennerholt, C., van Laere, J., Söderström, E.: Implementation challenges of self service business intelligence: a literature review. In: 51st Hawaii International Conference on System Sciences, Hilton Waikoloa Village, Hawaii, USA, 3-6 Jan 2018, vol. 51, pp. 5055–5063. IEEE Computer Society (2018)
426
M. Akermi et al.
20. LEsteve, R.: Adaptive query execution. In: The Azure Data Lakehouse Toolkit, pp. 327–338. Springer (2022). https://doi.org/10.1007/978-1-4842-8233-5_14 21. Menge, F.: Enterprise service bus. In: Free and open source software conference, vol. 2, pp. 1–6 (2007) 22. Miller, L.C.: Data Virtualization For Dummies, Denodo Special Edition. “John Wiley & Sons, Ltd.” (2018) 23. Mousa, A.H., Shiratuddin, N.: Data warehouse and data virtualization comparative study. In: 2015 International Conference on Developments of E-Systems Engineering (DeSE), pp. 369–372. IEEE (2015) 24. Mousa, A.H., Shiratuddin, N., Bakar, M.S.A.: Virtual data mart for measuring organizational achievement using data virtualization technique (KPIVDM). J. Teknologi 68(3), 2932 (2014) 25. Muniswamaiah, M., Agerwala, T., Tappert, C.: Data virtualization for analytics and business intelligence in big data. In: CS & IT Conference Proceedings. CS & IT Conference Proceedings (2019) 26. Offia, C.E.: Using logical data warehouse in the process of big data integration and big data analytics in organisational sector, Ph. D. thesis, University of the West of Scotland (2021) 27. Oussous, A., Benjelloun, F.Z., Lahcen, A.A., Belfkih, S.: Big data technologies: a survey. J. King Saud Univ.-Comput. Inf. Sci. 30(4), 431–448 (2018) 28. Papadopoulos, T., Balta, M.E.: Climate change and big data analytics: challenges and opportunities. Int. J. Inf. Manage. 63, 102448 (2022) 29. Raguseo, E.: Big data technologies: an empirical investigation on their adoption, benefits and risks for companies. Int. J. Inf. Manage. 38(1), 187–195 (2018) 30. Reinsel, D., Gantz, J., Rydning, J.: The digitization of the world from edge to core. Framingham: International Data Corporation, p. 16 (2018) 31. Sarkar, P.: Data as a service: a framework for providing reusable enterprise data services. John Wiley & Sons (2015) 32. Satio, K., Maita, N., Watanabe, Y., Kobayashi, A.: Data virtualization for data source integration. IEICE Technical Report; IEICE Tech. Rep. 116(137), 37–41 (2016) 33. Shraideh, M., Gottlieb, M., Kienegger, H., Böhm, M., Krcmar, H., et al.: Decision support for data virtualization based on fifteen critical success factors: a methodology. In: MWAIS 2019 Proceedings (2019) 34. Skluzacek, T.J.: Automated metadata extraction can make data swamps more navigable, Ph. D. thesis, The University of Chicago (2022) 35. Stein, B., Morrison, A.: The enterprise data lake: better integration and deeper analytics. PwC Technol. Forecast: Rethinking Integr. 1(1–9), 18 (2014) 36. Zaidi, E., Menon, S., Thanaraj, R., Showell, N.: Magic quadrant for data integration tools. Technical report G00758102, Gartner, Inc. (2022)
Extracting Knowledge from Pharmaceutical Package Inserts Cristiano da Silveira Colombo1,2(B)
, Claudine Badue1
, and Elias Oliveira1
1 High Performance Computing Laboratory, Graduate Program in Computer Science, Federal
University of Espírito Santo, Vitória, Brazil [email protected] 2 Federal Institute of Espírito Santo, Cachoeiro de Itapemirim, Brazil
Abstract. This work aims to present and describe the methodology used for the automatic Information Extraction from pharmaceutical package inserts based on Named Entities Recognition and their relationships. For this, an Artificial Intelligence model was used, built from a hybrid approach based on Conditional Random Fields (CRF) and Local Grammar (LG), called CRF + LG. The results obtained were, in F1 measure, 94.85% in the extraction of entities from the ABS category (diseases and their symptoms) and 68.63% in the extraction of entities from the TNG category (drugs). The results showed that the initiative presented in this work could help health professionals make decisions about therapies and drug prescriptions and clear patients’ doubts. In addition, the time and human effort to perform Information Extraction from pharmaceutical package inserts are considerably reduced. Keywords: CRF + LG · Named Entities Recognition · Pharmaceutical Package Inserts
1 Introduction Pharmaceutical package inserts are documents that contain information about drugs such as their chemical composition, dosage, adverse reactions, interactions with other drugs, and their use in the treatment of certain diseases, among other information. The package inserts are found on medication packaging and in an electronic version, available on the Eletronic Bularium website1 . The purpose of this work is to present and describe the methodology used for the automatic Information Extraction (IE) from pharmaceutical package inserts. For this, an Artificial Intelligence (AI) model was used, built from a hybrid approach based on Conditional Random Fields (CRF) and Local Grammar (LG), called CRF + LG. The results obtained were, in the F1 measure, 94.85% for entities in the ABS category (diseases) and 68.63% for entities in the TNG category (drugs). 1 https://consultas.anvisa.gov.br/#/bulario/.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 427–436, 2023. https://doi.org/10.1007/978-3-031-35501-1_43
428
C. da Silveira Colombo et al.
The results showed that the IE task could help health professionals make decisions about therapies and drug prescriptions and clear patients’ doubts. In addition, the time and human effort to perform the IE of the pharmaceutical package inserts are considerably minimized with the use of the CRF + LG. This article is organized as follows: Sect. 2 presents information on pharmaceutical package inserts; Sect. 3 addresses named entity recognition (Named Entity Recognition NER) and the hybrid model CRF + LG; Sect. 4 discusses related works; Sect. 5 describes the methodology used; Sect. 6 describes the experiments performed and their results; Sect. 7 presents the conclusions of this work.
2 Pharmaceutical Package Inserts Pharmaceutical package inserts are documents that accompany the medicines in their packaging and present technical information and guidelines for the proper use of medicines. Even with the package inserts in the medication packages, it is possible to obtain them through the Bularium Electronic, which corresponds to an electronic version of the package inserts available on the website Bularium Electronic (See Footnote 1). The purpose of package inserts is to clarify and inform and promote the rational use of the drug [16]. However, this function of package inserts can be compromised because their texts do not present adequate information that is easily understood by most drug users[5].
3 Named Entities Recognition and CRF + LG Named Entity Recognition (Named Entity Recognition - NER) aims to automatically identify and classify entities such as people, places, and organizations and is an essential task in Information Extraction. The approaches used in developing NER systems are linguistics, machine, or hybrid learning [14]. In addition to classifying named entities in the text into predefined categories, the NER task is also applied in other Natural Language Processing (Natural Language Processing - NLP) tasks. For example, when performing relation extraction, another branch of NLP, locating target entities in the text is usually the step before relation classification [3]. In this article, the hybrid model CRF + LG [14] was used, which uses Conditional Random Fields (CRF), a probabilistic structure to perform the marking and segmentation of sequence data proposed by [8]. In this hybrid model, the authors built a Local Grammar (Local Grammar - LG) to help the CRF recognize the 10 categories of named entities from HAREM2 . Therefore, CRF + LG combines labeling obtained by a linear chain CRF with a classification obtained through LG. Thus, the LGs perform a pre-label capturing general evidence of named entities in the texts, and the CRF performs sequential labeling using this pre-label. The pre-labelling is sent to the CRF along with the other input characteristics and can be seen as a hint to the CRF. 2 http://www.linguateca.pt/HAREM/.
Extracting Knowledge from Pharmaceutical Package Inserts
429
In this research, the categories ABSTRACCAO (ABS) and THING (TNG) are of interest, respectively, indicating entities related to the names of diseases (their symptoms and terms related to the patient’s health status) and the names of medicines (and chemical or similar substances).
4 Related Works In this section, works related to the proposal of this research are presented. A proposed neural network composed of BiLSTM with a sequential layer CRF where different word embeddings (classical, contextualized, and medical) were combined as input to the architecture is presented in [10]. The dataset used was Spanish Clinical Case Corpus (SPACCC), created from the collection of 1000 clinical cases from SciELO. The neural network recognized chemical substances and drugs, obtaining 90.77% of F1 measurement. This result is superior to ours for drug recognition (68.63%). However, our result is encouraging since a smaller number of documents was used in the training phase compared to the one used by [10]. In future works, we intend to use word embeddings to verify its impact on the results. It was described in [4], the use of two CRF algorithms to extract various terms of interest from the contraindication and dosage sections of the patient inserts in the Romanian language and later their use in the creation of decision support applications. The tools used were Stanford NER and Scikit-learn. The package inserts were structured in sections to extract important entities in each of them. After the tests, the best results were obtained with the CRF algorithm from the Scikit-learn library. In the information in the contraindication section, Scikit-learn recognized just over 20% of the F1 measure of drug entities. In recognition of disease-related entities, Scikit-learn obtained almost 80% of F1 measure. In the approach presented in our work, the texts of the package inserts were used in full in the training and testing phases, in addition to the clinical case reports in the training phase. The results obtained in the NER task of drugs (68.63%) and diseases (94.85%) surpass the results obtained by [4], demonstrating the potential of the CRF + LG model. In [19], its proposed a new publicly available Portuguese BERT-based model to support clinical and biomedical NLP tasks, denominate BioBERTpt. The NER experiments showed that, compared to out-of-domain contextual word embeddings, BioBERTpt reaches the state-of-the-art on the CLINpt corpus, obtained 92,6% of F1 measure. Additionally, it has better performance for most entities analyzed on the Sem-ClinBR corpus, obtained 60,4% of F1 measure. We carried out an experiment to process the CLINpt dataset to evaluate the performance of the CRF + LG model concerning the results obtained by BioBERTpt. It was impossible to present the results in this work due to differences in the annotation of the CLINpt dataset. One of them is the number of categories which are 14. Among these categories, Therapeutics is the closest to those used in our work ABS (diseases and symptoms) and TNG (drugs). In future works, we will make the necessary adjustments to verify the performance of CRF + LG concerning BioBERTpt in dataset processing CLINpt. It is also important to remember that the idea of CRF + LG is to improve the performance of NER systems that use the machine learning approach using less training
430
C. da Silveira Colombo et al.
corpus. LG can also be used to improve the performance of systems such as BiLSTMCRF or BERT-based [20].
5 Methodology In this section, the working methodology of this research is presented. With the help of NLP tools, in particular NER, an AI model was created based on a hybrid approach called CRF + LG. To achieve this objective, the methodology shown in Fig. 1 was used.
Fig. 1. Steps of the work methodology.
Initially, package inserts were downloaded in PDF format from the Electronic Bularium (step 1). The obtained files were converted to plain text format (step 2). Subsequently, the package inserts were pre-processed in order to guarantee their preparation and cleaning, such as the removal of special characters, blank spaces, and excess line breaks (step 3). Considering the number of package inserts obtained and the variety of therapeutic classes, it was decided to select 15 package inserts for the model training phase (step 4). The drug names in these package inserts are: Amoxicillin, Ranitidine Hydrochloride, Esogastro, Gastrium, Label, Iniparet, Laflugi, Omepramix, Pyloripac, Ziprol, Sodium Chloride, Levothyroxine Sodium, Losartan Potassium, Trastuzumabe and Botox. In 4 experiments, 10 clinical case reports from SciELO were included in the training phase. These case reports address the following topics: hemodialysis [6], liver failure [11], alcohol intoxication [1], lung injury [13], cardiac arrest [9] and adverse reactions [2, 7, 15, 17, 18]. This strategy was used to investigate the impact of these texts in improving training and, consequently, results. An example of a clinical case excerpt used is shown below:
Extracting Knowledge from Pharmaceutical Package Inserts
431
Male patient, physician, 30 years old, measuring 1.73 m in height and weighing approximately 92 kg, started treatment for obesity with an endocrinologist who prescribed sibutramine 10 mg, in a single dose a day, in the morning, in addition to a balanced diet [18]. To carry out the test experiments, 10 package inserts of drugs used to treat diseases of the digestive system were selected (step 4). The package inserts used in training were not used in the tests. The names of the drugs used in the treatment of diseases related to the digestive system, whose package inserts were used in the tests are: Lanzopept, Novocilin, Ocylin, Omeprazole, Pantozol, Prazol, Pyloripac, Ranitidine, Teutozol, Ziprol, Gelmax and Plasil. Once the package inserts were selected for training and testing, the manual annotation of the package inserts was started to carry out the training to create the model. This annotation was performed by a human expert, using the Etiquet(H)arem3 tool (step 5). For an example of the annotation made in this step, consider the following excerpt from the package insert of the medication Esogastro: ESOGASTRO is indicated for the treatment of acid-peptic disorders and relief of symptoms of heartburn, acid regurgitation and epigastric pain. [...] Do not use medicine without your doctor’s knowledge. In this stretch, the highlighted entities were annotated by the human, in the respective categories: ESOGASTRO (TNG), acid-peptic diseases (ABS), heartburn (ABS), acid regurgitation (ABS), epigastric pain (ABS) and medication (TNG). Then, the CRF + LG model was created based on the file containing all the annotations made by the human (step 6). Subsequently, the model was used to automatically note the package inserts selected for the tests (step 7). All entities classified by the model were counted, and a file was generated with the recognized entities (tokens) and the respective annotation class (step 8). It is important to highlight that in this step, data dictionaries were generated with the entities classified by the model, that is, a dictionary with the entities classified as ABS and another with the TNG entities. To evaluate the number of correct answers in the classifications made by the model, the CoNLL-20024 evaluation script was used. It evaluates the classification task, in addition to presenting the metrics of the experiments. As input, this script receives two files: one with the entities annotated by the model and another with the entities annotated by the human. As only the file with the entities classified by the model was obtained, it was necessary to obtain the file with the entities classified by the human. For this, a copy of the file generated by the model was created, and, in this file, the human expert reviewed the automatically generated classifications, making the necessary adjustments in order to correct the errors. Some examples of adjustments performed by the human expert are listed in Table 1. At the end of this process, the file with the classifications made by the human was obtained (step 9). This is a constraint to be overcome as it requires active 3 http://www.linguateca.pt/poloCoimbra/recursos/etiquetharem.zip. 4 http://www.cnts.ua.ac.be/conll2002/ner/bin/conlleval.txt.
432
C. da Silveira Colombo et al.
human participation, once it is necessary to manually evaluate the model classifications and adjust them, if necessary. With these 2 files, one with the model classifications and the other with the human classifications, both were processed by the CoNLL script, and the results of the experiments were generated (step 10).
6 Experiments and Results In this section, the experiments performed are described and the results obtained are analyzed. Initially, 5 package inserts were manually written down by a specialist and, later, 10 clinical cases and another 10 package inserts. The annotation made by the human indicates that s/he manually classified the entities of interest found in the package inserts. This procedure made it possible to use these documents in the training of the model and obtained the results that are presented in this section. With the package inserts manually annotated by the human, 5 experiments were performed, which are listed in Table 2. The training of experiments 1 and 2 were performed with the 5 package inserts: Amoxicillin, Ranitidine Hydrochloride, Esogastro, Gastrium and Label. In the training of experiments 3 and 4, 10 package inserts were used: Amoxicillin, Ranitidine Hydrochloride, Esogastro, Gastrium, Label, Iniparet, Laflugi, Omepramix, Pyloripac and Ziprol. In the training of experiment 5, the 15 package inserts were used: Amoxicillin, Ranitidine Hydrochloride, Esogastro, Gastrium, Label, Iniparet, Laflugi, Omepramix, Pyloripac, Ziprol, Botox, Sodium Chloride, Levothyroxine Sodium, Losartan Potassium and Trastuzumab. As can be seen, this experiment included 5 package inserts that are not for the treatment of diseases of the digestive system: Botox (blepharospasm, muscle spasticity and hyperhidrosis), Sodium Chloride (decongestant, vehicle for various injectable drugs or wound cleaning, among others), Levothyroxine Sodium (thyroid hormones), Losartan Potassium (hypertension) and Trastuzumab (treatment of breast cancer and cancer advanced gastric). These are package inserts that are featured in the Statistical Yearbook of the Pharmaceutical Market – Commemorative Edition 2019/20205 , in the revenue rankings by active ingredient (Table 26) and by the number of commercialized presentations, by active principle (Table 27). They were included in the training of these experiments to verify the impact of these types of inserts on the performance of the model in the testing phase. It is worth mentioning that in experiments 2, 4, and 5, in addition to the package inserts mentioned above, the 10 clinical cases obtained from SciELO were used in the training phases. All experiments used 10 package inserts in the testing phases. In experiments 1 and 2, the package inserts tested were Lanzopept, Novocilin, Ocylin, Omeprazol, Pantozol, Prazol, Pyloripac, Ranitidine, Teutozol, and Ziprol. In experiments 3, 4, and 5, the package inserts tested were Gelmax, Lanzopept, Novocilin, Ocylin, Omeprazole, Pantozol, Plasil, Prazol, Ranitidine and Teutozole. This change is justified for two reasons: first, 5 https://www.gov.br/anvisa/pt-br/assuntos/medicamentos/cmed/informes/anuario-estatistico-
2019-versao-final.pdf.
Extracting Knowledge from Pharmaceutical Package Inserts
433
Table 1. Human-made adjustments. Model Annotations
Human Corrections
aids (TNG)
aids (ABS)
etambutol (ABS)
etambutol (TNG)
isoconazol (ABS)
isoconazol (TNG)
Table 2. Experiments carried out. #
Training
Test
1
5 package inserts
10 package inserts
2
5 package inserts + 10 clinical cases
10 package inserts
3
10 package inserts
10 package inserts
4
10 package inserts + 10 clinical cases
10 package inserts
5
15 package inserts + 10 clinical cases
10 package inserts
because of the increase in the number of package inserts used in the training of experiments 3, 4 and 5. The inserts Pyloripac and Ziprol were included in the training of the model and replaced in the tests by the package inserts Gelmax and Plasil; second, to maintain a diversity of documents in training, thus avoiding the repetition of drug leaflets with similar active principles. The entities of interest in this work were those related to medicines and diseases. The entities referring to the names of diseases and terms related to the patient’s health status were classified in the ABS category. In the TNG category, were classified the entities referring to the names of medicines, chemical substances, and the like. In Table 3, the results obtained in the experiments performed are presented. The results of experiment 2 demonstrate that the inclusion of 10 clinical cases in the training phase contributed to the improvement of the F1 measure of entities classified in the ABS and TNG categories concerning the results of experiment 1. In the ABS category, the result was 68.47% and reached 82.08%. Furthermore, in the TNG category, the result was 58.99% and reached 59.14%. In experiments 3 and 4, which were trained with 5 package inserts more than experiments 1 and 2, it is possible to notice a considerable improvement in the results concerning those obtained previously. The best result for the ABS category was 94.85% (experiment 4), higher than the 82.08% (experiment 2). The same happened with the TNG category: the best result was 66.16% (experiment 4) compared to 59.14% (experiment 2). Increasing the number of package inserts in the training phase improved the results of experiments 3 and 4, which were superior to those obtained in experiments 1 and 2. Finally, in the results of experiment 5, when 5 more package inserts were added in the training phase, it was noticed that the results were superior to those obtained by
434
C. da Silveira Colombo et al. Table 3. Results of experiments performed. #
Category
P
R
F1
1
ABS
94.05
53.82
68.47
TNG
94.78
42.82
58.99
2
ABS
96.95
71.17
82.08
TNG
98.10
42.33
59.14
3
ABS
96.87
90.97
93.83
TNG
97.51
49.28
65.47
ABS
98.00
91.90
94.85
TNG
98.19
49.88
66.16
ABS
97.81
90.38
93.95
TNG
97.63
52.91
68.63
4 5
experiments 1 and 2. However, concerning experiments 3 and 4, it is evident that adding 5 more package inserts in training did not contribute to the improvement of the result of the ABS category, maintaining the best result obtained in experiment 4 (94.85%). The same did not occur for the TNG category, whose best result was obtained in experiment 5 (68.63%). This demonstrated that there are two hypotheses to be investigated in future experiments: 1) the addition of new package inserts in the training phase can improve the performance of the model in the recognition of entities of the TNG category; and 2) the addition of new package inserts in the training phase may compromise the model’s performance in recognizing entities of the ABS category.
7 Conclusions The main contribution of this work was creating an AI model capable of recognizing entities named in pharmaceutical package inserts, with F1 measurement results of 94.85% for the ABS category (diseases and their symptoms) and 68.63% for the TNG category (medicines). It is important to emphasize that the results derived from a model generated from the human annotation of pharmaceutical package inserts and clinical case reports. This demonstrates the potential of the hybrid CRF + LG model to classify package insert entities, even with smaller sets of documents in the training phase. As the experiments indicated, the automatic recognition of entities named in pharmaceutical package inserts is a viable task that reduces the human manual effort in classifying these entities. After all, manual data marking is time-consuming and limited in size [3]. Considering that this model would save time, effort, and possibly, financial resources for a human annotator to classify the entities present in the package inserts, it is clear that the initiative of this work is promising in this aspect. In future work, we intend to improve the quality of training by increasing the number of package inserts used in this phase. This action aims to investigate the appropriate proportions of package inserts and clinical cases that improve the training of the model [12].
Extracting Knowledge from Pharmaceutical Package Inserts
435
Another approach to this bias is to verify if the use of word embeddings can improve the results obtained[10]. Identifying adverse reactions is another unfolding for future work since this action is a step following the proposal presented in this work. Once the drug entities, diseases, and symptoms have been extracted from the package inserts, it becomes possible to obtain information on adverse reactions and drug interactions. For this, it is intended to mark this initiative from the concepts and studies on pharmacovigilance, which is defined as “the science and activities related to the identification, evaluation, understanding and prevention of adverse effects or any problems related to the use of medicines”6 .
References 1. Afonso, G.L., Cardoso, M.G.d.M., Coelho, I.P., Cardoso, B.G.d.M.: Intoxicacão alcóolica aguda: complicacão rara associada a neurólise do plexo celíaco durante procedimento cirúrgico a céu aberto em paciente com dor oncológica refratária. Relato de caso. Revista Dor 17(2), 145–147 (2016) 2. Agollo, M.C., Miszputen, J., Diament, J.: Hepatotoxicidade induzida por Hypericum perforatum com possível associacão a copaíba (Copaifera langsdorffii Desf): relato de caso. Einstein 12(3), 355–357 (2014) 3. Chen, Y., et al.: Named entity recognition from Chinese adverse drug event reports with lexical feature based BiLSTM-CRF and tri-training. J, Biomed. Inf. 96(1) (2019) 4. Chirila, O. S., Chirila, C. B., Stoicu-Tivadar, L.: Named entity recognition for the contraindication and dosing sections of patient information leaflets with CRFClassifier tools. In: Proceedings of the 23rd International Conference on System Theory, Control and Computing (ICSTCC), pp. 866–871. (2019) 5. Dal Pizzoli, T.d.S., et al.: Medicine package inserts from the users’ perspective: are they read and understood?. Revista Brasileira de Epidemiologia 22(1), 1–12 (2019) 6. Duayer, I. F., et al.: Plaquetopenia relacionada à hemodiálise: relato de caso. Brazilian J. Nephrology, 1–5 (2021) 7. Freitas, D.S., Machado, N., Andrigueti, F.V., Neto, E.T.R., Pinheiro, M.M.: Hanseníase virchowiana associada ao uso de inibidor do fator de necrose tumoral: relato de caso. Rev. Bras. Reumatol. 50(3), 333–339 (2010) 8. Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence sata. In: Proceedings of the 18th International Conference on Machine Learning (ICML 2001), pp. 282–289. ACM (2001) 9. Lopes, L.F., Monteiro, M.W., Berardinelli, L.M., de Oliveira, L.M.P., Silva, G.V.: Parada Cardíaca Após Anestesia Geral: Relato de Caso. Anestesia Analgesia Reanimación 30, 1 (2017) 10. López-Úbeda, P., Díaz-Galiano, M.C., Ureña-López, A., Martín-Valdivia, M.T.: Combining word embeddings to extract chemical and drug entities in biomedical literature. BMC Bioinformatics 22(1), 1–17 (2021) 11. Moss, J.L., Brown, B.W., Pai, S., Torp, K.D., Aniskevich, S.: Insuficiência hepática fulminante após transplante simultâneo de rim-pâncreas: um relato de caso. Rev. Bras. Anestesiol. 68(5), 535–538 (2018) 12. Oliveira, J. de, Colombo, C. da S., Izo, F., Oliveira, Pirovani, J. P. C., E. de: Using CRF+LG for Automated Classification of Named Entities in Newspaper Texts. In: 2020 XLVI Latin American Computing Conference (CLEI), (2020) 6 https://www.gov.br/anvisa/pt-br/assuntos/fiscalizacao-e-monitoramento/pharmacovigilancia.
436
C. da Silveira Colombo et al.
13. Pegler, J.R.M., Castro, A.P. B.M., Pastorino, A.C., de Dorna, M.B.D.: Lesão pulmonar aguda relacionada à transfusão associada com infusão de imunoglobulina intravenosa em paciente pediátrico. Einstein 18(5), 1–4 (2020) 14. Pirovani, J.P.C., de Oliveira, E.: CRF+LG: A hybrid approach for the Portuguese named entity recognition. In: Abraham, A., Muhuri, P.K., Muda, A.K., Gandhi, N. (eds.) ISDA 2017. AISC, vol. 736, pp. 102–113. Springer, Cham (2018). https://doi.org/10.1007/978-3319-76348-4_11 15. dos Santos, F.X., Parolin, A., Lindoso, E.M.S., Santos, F.H.X., de Sousa, L.B.: Hipertensão intracraniana com manifestacões oculares associada ao uso de tetraciclina: relato de caso. Arq. Bras. Oftalmol. 68(5), 701–703 (2005) 16. Silva, M., et al.: Estudo da bula de medicamentos: uma análise da situação. Revista de Ciências Farmacêuticas Básica e Aplicada 27(3), 229–236 (2006) 17. Sucar, D.D.: Interação medicamentosa de venlafaxina com captopril. Rev. Bras. Psiquiatr. 22(3), 134–137 (2000) 18. Sucar, D.D., Sougey, E.B., Neto, J.B.: Surto psicótico pela possível interacão medicamentosa de sibutramina com finasterida. Rev. Bras. Psiquiatr. 24(1), 30–33 (2002) 19. Schneider, E., et al.: BioBERTpt -A Portuguese Neural Language Model for Clinical Named Entity Recognition (2020) 20. Pirovani, J.P.C., Oliveira, E.: Studying the adaptation of Portuguese NER for different textual genres. J. Supercomput. 77(11), 13532–13548 (2021). https://doi.org/10.1007/s11227-02103801-9
Assessing the Importance of Global Relationships for Source Code Analysis Using Graph Neural Networks Vitaly Romanov(B) and Vladimir Ivanov Innopolis University, Innopolis, Russia {v.romanov,v.ivanov}@innopolis.ru
Abstract. Representing the source code as a sequence of tokens does not capture long-distance dependencies and inter-project dependencies. In this study, we analyze to which extent inter-project (global) relationships can be used in machine learning tasks related to source code analysis. Our findings show that information implicitly stored in interproject relationships can be used to select the next called function among candidates with an accuracy of 92%. We demonstrate that source code embeddings achieve the best performance on transfer learning tasks when they are computed with graph neural networks in a multitask mode. Keywords: source code · graph · neural networks · graph attention networks · relational graph convolutional networks · graph neural networks
1
Introduction
Lately, we see the rise of approaches that try using advances in the area of Natural Language Processing to solve machine learning tasks for source code. However, learning algorithms still struggle to process sequences of actions that are common in source code. Current approaches proved to be quite successful in capturing co-occurrence statistics. For source code, such statistics can be extracted from global usage patterns on a package or inter-project levels. Our goal is to study how useful this global information can be by evaluating machine learning tasks on graphs. Source code describes the relationships between source code elements (SCEs), such as modules, functions, classes, variables, etc. The information about how particular SCEs are used in different projects can provide insights about the common combinations of SCEs to be used together. The particular way an SCE is implemented becomes less important, and one can learn useful insights about the SCE’s purpose from merely observing the way it is used. The package level and inter-project level relationships can be represented using a graph. Such a graph can include different types of relationships, including call dependencies, type dependencies, import dependencies, variable usages, c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 437–447, 2023. https://doi.org/10.1007/978-3-031-35501-1_44
438
V. Romanov and V. Ivanov
inheritance, and others. Because the relationships between SCEs are not constrained to the boundaries of particular functions, files, and even packages, such a graph captures global relationships, and, therefore, is a global graph. Global graph representation is an alternative approach to represent source code besides other structured representations, such as abstract syntax trees (ASTs). We evaluate the usefulness of this global graph representation by solving different machine learning problems for source code. In this work, we aim at assessing what kind of information can be inferred from high-level inter-project global relationships between SCEs. Moreover, we look into the transferability of learned distributed representations for nodes in this graph to other problems. Several papers investigated what a model can learn from a particular source code representation. [2] used AST paths to predict the names of functions. [1] used graph representation that relies on both global relationships and ASTs to find bugs. [6] used an inter-procedural flow graph to find similar functions inside the Linux kernel source code. [10] used information about AST to predict variable names and types within a function. In contrast to these works, we focus on using only information about global relationships. Our contribution is the following – We evaluate the usefulness of a global graph on several machine learning tasks, including SCE name prediction, variable name usage prediction, API call sequence prediction. For our experiments, we use Graph Neural Networks (GNNs); – We explore the usefulness of multitask objective using different GNN architectures – We demonstrate that the embeddings for SCEs pre-trained on one task can be applied to solve other tasks such as function call prediction, type usage prediction, node type prediction;
2
Related Work
Machine learning approaches are rapidly adopted in many different areas, including solving problems in source code analysis. The list of problems that were already addressed include bug detection [7,12] and API search [11,13]. These problems present a fundamental challenge for source code analysis methods because they require the understanding of the program purpose. One important aspect of learning problems is the representation of the input data. In the area of NLP, the input is represented as a sequence. Although the source code is a sequential representation of a program, its execution nature is far from sequential. For this reason, researchers explored more suitable representation formats for source code. Here, we consider structural representation such as trees (ASTs) and graphs (dependence, control-flow, data-flow, etc.). One of the possible research questions is which representation formats are useful for solving downstream problems.
Assessing Importance of Global Relationships for Source Code
439
In [10], authors represented a program in the form of AST. They addressed the problem of predicting variable names in an obfuscated program. Their approach, based on CRF, was highly successful, and their method was able to annotate the names of the variables by only analyzing the program structure. This approach provides an insight into the information that is implicitly stored inside a program’s AST. In [1], graph-based representation of C# source code was studied. The authors used both global-level relationships and AST edges. This graph included such relationships as inheritance, definitions, variable usages, and typical dataflow dependencies. Authors used GNNs to address issues in source code analysis, such as variable name suggestion and bug detection. Another approach that adopted graph representation for source code was presented in [5]. Authors used information from AST and global relationships for solving the variable name suggestion problem. These two works combine the AST of a function with the inter-project information. However, the authors did not study the impact of different relationships on the final result. Some other approaches relied on the information from the LLVM compiler to build graph representations [3,4]. However, such LLVM compilers are available only for a handful of programming languages and, in general, are less interpretable. In contrast, our goal is to perform the analysis on the level of source code directly, so that a programmer can easily interpret the results. In [8], the authors studied the possibility of pre-training representations for source code using one of the latest language modeling architectures – BERT. Their result have shown that when using pre-trained layers, the model quickly adapts to the new problem and provides a significant boost in performance. The authors demonstrated, that pre-training with a language model can significantly decrease training times and increase accuracy. We are interested in a different aspect of pre-training. Given an objective function, the model learns useful features for this objective. However, the same features can be helpful for other objectives as well. Our goal is to investigate to which extent representations of SCEs learned on one task can be used for solving other tasks.
3
Main Task Description
3.1
Global Graph Description
In this work, we represent the source code with a global graph. We used the Sourcetrail1 tool to index a collection of Python packages. All packages are interconnected either through inter-package calls and imports, or through the use of similar built-in functions from Python language. The entire package collection is compiled into a graph with different types of nodes and edges. The type count for nodes and edges is given in Table 1. The central aspect of our source code graph is directed typed relationships. Five different edge types are available for Python. Call Edges are present 1
https://www.sourcetrail.com/.
440
V. Romanov and V. Ivanov
between two functions if one function calls another function inside its body. Define/Contain Edges are present between modules and functions or classes that are defined inside these modules, between classes and class methods. Type use edges appear between functions and classes to represent that a specific class (type) was mentioned inside the body of the function. Import edges appear when a module or a function is imported. Inheritance edges are present between classes if one inherits the other. The nodes in our graph also have different types that are self-explanatory (see Table 1). However, node types were not used when training our models due to the limitation of our computational capacity. The example of a graph, compiled using Sourcetrail is shown in Fig. 1.
Fig. 1. Example of global graph constructed from two toy modules. Table 1. Statistics for node types and edge types in source code graph Node Type
Count Edge Type
Count
Function
221,822 Call
614,621
Class field
83,077
Class
35,798
Type use
239,543
Module
18,097
Import
121,752
Class method
14,953
Inherit
26,525
Non-indexed symbol 853
Define/Contain 431,115
Assessing Importance of Global Relationships for Source Code
3.2
441
Description of Learning Objectives
Our goal is to evaluate a GNN model trained on a global source graph. We do this by training the model on several objectives. The first objective is SCE name prediction, which includes predicting function names, class names, variable names, etc. The premise for this task is that SCEs with similar names have similar usage patterns. The second objective is predicting names of variables that are used inside a function. Very often, variable names explain the purpose of a function, or at very least, the topical area to which this function belongs. We expect that functions that implement similar functionality, or belong to the same package, are likely to use similar variable names. The third objective is predicting the next function to be called after the current function. In order to be able to predict the next call, the model should learn common usage patterns for functions, e.g. usage with built-in functions, functions from the same package, and functions from other packages. All three objectives can be treated as link prediction problems. During the training, embeddings for two nodes are passed to the classifier for link prediction. The two nodes specify the source and destination of a directed edge. The link prediction is implemented by concatenating the embeddings for the source and destination of the edge, and passing the resulting vector to a binary classifier. We implement this classifier using a simple neural network. We use all three objectives explained above to train several GNN models to see to which extent the information about global relationships is useful for solving optimizing these objectives. The schematic representation of the training procedure is shown in Fig. 2.
Fig. 2. Schematic representation of the multitask training procedure. The final neural network makes a binary decision whether the edge exists or not.
3.3
GNN Models
Graph Neural Networks (GNN) are based on a message-passing mechanism. Each node possesses an internal state. During message-passing, the node sends
442
V. Romanov and V. Ivanov
its state to its neighbors. The neighbors aggregate messages from adjacent nodes. Usually, the messages are passed over the entire network for a fixed number of steps. The message-passing steps are treated as network layers. The final node state is passed to the classifier that predicts links. The initial representation of a node is a context-free embedding. Embeddings from the consecutive layers can be viewed as the contextual. The size of the context depends on the number of layers. The best context size for solving the link prediction task is subject to exploration. In our experiments, we explore two different GNN architectures. The first is the Graph Attention Network (GAT). This architecture does not support different types of relationships and treats all of them identically. The second architecture is the Relational Graph Convolutional Network (RGCN). The reason we chose these GNN models for our experiments is because, at the moment, they were considered the most suitable for processing graphs.
4
Experiments for Transfer Learning
One of our goals is to study the transferability of learned representations for nodes onto different tasks. For this reason, we use node embeddings trained with objectives described in Sect. 3.2 for predicting some properties of these nodes. The list of experiments includes: 1. Predicting SCE name. This task is the exact copy of one of the pretraining objectives. If the pre-training objective was to predict SCE names, it is expected that this experiment should show nearly identical performance. Moreover, this task allows evaluating whether embeddings trained on predicting next calls are useful for predicting names as well. 2. Predicting variable name usages. The goal of this task is to predict variable names that were used inside a given function. 3. Predicting next function. In this experiment, the task is to predict the edge between functions A and B, that shows that B was called after function A. 4. Predicting call links. In this experiment, the goal is to predict the existence of call edges between two nodes. Training data is taken from the holdout set that contains edges that were not used during the training. 5. Predicting type usage. The edges for type usage are taken from the holdout edges. These edges show that the specific type (class) was mentioned inside the body of a function. 6. Predicting node types. For this task, a simple node embedding classifier is used. Node types are not used during training of graph embeddings and instead are used as labels for this task. The experiments are performed only on nodes that were previously reserved for the test set. The embeddings for nodes are precomputed and remain fixed during experiments. This ensures that we test what can be extracted from these embeddings instead of using them merely for initialization.
Assessing Importance of Global Relationships for Source Code
5
443
Experimental Setup and Results
For implementing GNN training procedure, we use DeepGraphLibrary2 . We use standard implementations of GAT and RGCN networks. For GAT, we use the size of the hidden representation of 50 with two-head-attention, which results in the size of the hidden state equal to 100. In all layers, we use LeakyReLU as the activation function. For RGCN, we use the size of hidden state equal to 100. HardTanh is chosen as the activation function. Both network types have three layers in total. The embeddings for SCE names and variable names reside in separate embedding tables. In both cases, they have embeddings of size 100. We use a simple neural network as a binary classifier. Since it accepts concatenated vectors as input, the input dimensionality is 200. This network has one hidden layer with 20 units. ReLU is chosen as the activation function for the classifier layers. We trained GNN models for 60 epochs. The negative sampling procedure is similar to the one used in [9]. During the training of a GNN model, we sample 3 negative edges per one positive edge. During additional experiments, the numbers of positive and negative edges are equal. 5.1
Performance on Main Objective
Our training and evaluation procedure consists of several steps. Further, we describe the result of training on the main objectives. The three main objectives that we explored are described in Sect. 3.2. Our GNN models struggle to predict SCE names the most. In our global graph, the only information that a predictor can use is the degree of a node in a graph, and the types of relationships. Additionally, it can get insight into a small neighborhood at a distance of 3 from the current node (based on the numbed of GNN layers). The accuracy is larger for models without multitask, that optimize specifically for name prediction. Variable name usage prediction is a much more relaxed problem. Variable names are highly reused in different contexts. For this objective, we see a consistent improvement with the increase of the model complexity. The outcome is better for models trained on multitask objectives and models that support relationship types. The last objective is to predict which a function that will be called next. The performance for this objective is the highest, which suggests that the models were able to learn call sequence paths to some degree. The outcome was better for the relational model and improved significantly when trained on the multitask objective. These results demonstrate that the global graph contains some information that can be used for learning API call sequences, some information about SCE names, and variable usages. The accuracy for these objectives is not very high 2
https://www.dgl.ai/.
444
V. Romanov and V. Ivanov
(except for the next call predictions), which suggests that global graph relationships can be used for augmenting other representations, but are likely not sufficient on their own for solving practical problems. The summary of these results are shown in Fig. 3.
Fig. 3. Accuracy on the validation set when training main objectives on the global graph. The color of a bar represents the model architecture that was used during training. During multitask, all three objectives are optimized at the same time.
5.2
Transfer Learning Experiments
In this part, we are going to discuss the performance of node embeddings on transfer learning tasks. We conducted a series of experiments, and compared the performance of pre-trained embeddings with randomly initialized embeddings for SCEs. When training models for our experiments, we passed the test scores after each epoch through the exponential moving average (of length 10), and recorded only the maximum test scores. The result of these experiments for GAT are shown in Table 2. For the first three experiments, we can see that the accuracy is higher when the embeddings are trained on the same objective. This result is challenged only by multitask objectives. The fact that multitask objective results in the average increase of accuracy suggests that these tasks are not entirely independent. The problem of predicting SCE names is a hard one, and the result barely changes when a different pre-training objective is used. The results on variable usage prediction suggest that there is an improvement for both GAT and RGCN when the multitask objective is used. The accuracy of predicting the next call grows significantly when the multitask objective and relationship types are used. The results for predicting function calls correlate with the results of the next call prediction. The accuracy for predicting type usage varies a lot and does not display any specific improvements for different pre-training objectives. This can be attributed to small test size and a large variance in the test scores. The results show that the difference between GAT and RGCN is very small when trained with the multitask objective. The side by side comparison of multitask models is shown in Fig. 4.
Assessing Importance of Global Relationships for Source Code
445
Table 2. Results of using pre-trained embeddings on different tasks Model
SCE name
Var usage
Next call
Call
Type usage
Node type
Average
Random init
52.6
64.81
60.75
52.84
53.38
46.45
55.13
GAT, SCE name GAT, Var usage GAT, Next call RGCN. SCE name RGCN. Var usage RGCN. Next call GAT-MT RGCN-MT
63.6 61.8 60.13 63.27 63.73 63.02 62.79 64.48
74.86 75.3 72.91 76.28 78.33 76.62 78.17 78.88
82.29 81.7 87.82 86.37 89.09 92.22 92.8 91.54
65.04 71.19 72.86 65.07 73.24 74.54 80.03 74.44
57.81 67.28 69.29 64.67 68.72 66.76 65.51 63.9
65.51 58.01 52.6 81.84 80.9 77.73 63.43 81.62
68.18 69.21 69.26 72.91 75.66 75.14 73.78 75.81
Fig. 4. Accuracy comparison for different types of multitask models. RGCN-5L corresponds to a GNN model with 5 layers
5.3
Comparing Performance of Embeddings from Different Layers
An intuitive question to ask is whether additional GNN layers provide any benefit for learning SCE embeddings. To investigate the impact of the number of layers, we evaluated our experiments using embeddings from different layers of GNN. The results for RGCN with 3 layers are shown in Fig. 5 (left half). We observe that the accuracy on different experiments improve rapidly and consistently with every layer (except type usage prediction). Three layers of GNN correspond to two message-passing steps. Therefore, a GNN with three layers can model only short distance dependencies. Since every layer brings improvements in accuracy, a natural urge is to add more layers. However, our experiments show that adding more layers is not guaranteed to bring better results. In Fig. 5 (right half) the accuracy for embeddings from RGCN with five layers is shown. The average accuracy did not exceed the accuracy of the 3-layered model. Moreover, we can observe that improvements between consecutive message passing became smaller. The fact that all three models shown in Fig. 4 reach similar performance (despite their architectural
446
V. Romanov and V. Ivanov
Fig. 5. Accuracy for embeddings from different layers of RGCN-3L (left) and RGCN-5L (right) trained on the multitask objective.
differences) suggests that the accuracy for these models in the current setting is close to maximum. Another explanation of why models with more layers do not achieve better accuracy is related to the general issue of GNNs to prefer a fewer number of layers [14].
6
Conclusions and Future Work
In this paper, assessed the value of global relationships in the source code represented as a graph for solving several tasks. Also, we tested the possibility of transferring learned representations of SCEs (classes, functions, variables, etc.) to other tasks. We performed this analysis by first pre-training representations for nodes in the source code graph using several GNN architectures. We used tasks such as SCE name prediction, variable usage prediction (for functions), next function call prediction as pre-training objectives. We found that different tasks for source code, such as function call prediction, node type predictions, type (class) usage predictions, are related to the objectives mentioned above. Moreover, the task of predicting next function calls yields SCE embeddings that allow predicting SCE names better than random. This suggests that information about SCEs can be inferred from global relationships between them. Acknowledgments. The study was supported by the Russian Science Foundation grant No. 22-21-00493, https://rscf.ru/en/project/22-21-00493/.
References 1. Allamanis, M., Brockschmidt, M., Khademi, M.: Learning to represent programs with graphs. In: International Conference on Learning Representations (ICLR) (2018) 2. Alon, U., Zilberstein, M., Levy, O., Yahav, E.: A general path-based representation for predicting program properties. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pp. 404–419 (2)018
Assessing Importance of Global Relationships for Source Code
447
3. Ben-Nun, T., Jakobovits, A.S., Hoefler, T.: Neural code comprehension: a learnable representation of code semantics. In: Advances in Neural Information Processing Systems, December 2018 (NeurIPS), pp. 3585–3597 (2018) 4. Brauckmann, A., Goens, A., Ertel, S., Castrillon, J.: Compiler-based graph representations for deep learning models of code. In: Proceedings of the 29th International Conference on Compiler Construction, pp. 201–211 (2020) 5. Cvitkovic, M., Singh, B., Anandkumar, A.: Deep learning on code with an unbounded vocabulary. In: Machine Learning for Programming (ML4P) Workshop at Federated Logic Conference (FLoC) (2)018 6. DeFreez, D., Thakur, A.V., Rubio-Gonzalez, C.: Path-based function embedding and its application to error-handling specification mining. IN: ESEC/FSE 2018 Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 423–433 (2018) 7. Dinella, E., et al.: Hoppity: learning graph transformations to detect and fix bugs in programs. ICLR 2020, 1–17 (2020) 8. Kanade, A., Maniatis, P., Balakrishnan, G., Shi, K.: Pre-trained Contextual Embedding of Source Code, pp. 1–22 (2019) 9. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013) 10. Raychev, V., Vechev, M., Krause, A.: Predicting program properties from “big code”. In: Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 111–124 (2015) 11. Wan, Y., et al.: Multi-modal attention network learning for semantic source code retrieval. In: Proceedings - 2019 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019, pp. 13–25 (2019) 12. Wang, Y., Gao, F., Wang, L., Wang, K.: Learning a Static Bug Finder from Data, 1 January 2019 13. Zhang, J., Wang, X., Zhang, H., Sun, H., Wang, K., Liu, X.: A novel neural source code representation based on abstract syntax tree. In: Proceedings - International Conference on Software Engineering, May 2019, pp. 783–794 (2019) 14. Zhou, J., et al.: Graph neural networks: a review of methods and applications. AI Open 1, 57–81 (2020)
A Multi-objective Evolution Strategy for Real-Time Task Placement on Heterogeneous Processors Rahma Lassoued1 and Rania Mzid2(B) 1
ISI, University Tunis-El Manar, 2 Rue Abourraihan Al Bayrouni, Ariana, Tunisia 2 CES Lab ENIS, University of Sfax, F-B.P:w.3, 3038 Sfax, Tunisia [email protected]
Abstract. This paper deals with the task placement problem for realtime systems on heterogeneous processors. Indeed, the task placement phase must ensure, on the one hand, that the temporal properties are respected while also being optimal in terms of meeting their limited resources. We propose in this paper an optimization-based strategy to investigate how tasks are assigned to processors in order to solve this issue. We suggest a formulation for a multi-objective evolution approach that maximizes the system’s extensibility while minimizing energy consumption. The proposed approach enables designers to investigate the search space of all potential task to processor assignments and identify schedulable solutions that offer excellent trade-offs between the two optimization objectives. We first describe the mapping approach and then offer a series of experiments to test the effectiveness of the proposed model. Keywords: Genetic Algorithm · Task Placement Problem Multi-Objective Optimization · SPEA2 · Real-Time
1
·
Introduction
A real-time system is any system that must respond to externally supplied input stimuli within a specific deadline [1]. Real-time system development is a difficult task because a failure can be critical for the safety of human beings. Due to their promise of high performance and reliability, distributed architectures are used in a wide variety of real-time applications. The Task Placement Problem (TPP) is a critical concern for improving productivity during the development of Real-Time Distributed Systems (RTDS). Indeed, the TPP focuses on assigning tasks to processors while ensuring real-time constraints are respected and optimizing system performance. As the number of tasks and/or processors increases, this problem becomes NP-hard. As a result, several genetic approaches to searching optimal solutions have been proposed. In [2], the authors propose a genetic algorithm for maximizing reliability. In [3], c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 448–457, 2023. https://doi.org/10.1007/978-3-031-35501-1_45
A MOO Approach for RT Task Placement on Heterogeneous Processors
449
the authors focus on the optimization of communication cost. These studies take into account homogeneous processors and only deal with single objective optimization. For Multi-Objective Optimization (MOO), the authors in [4] propose a method based on the Pareto Archived Evolution Strategy (PAES) to mapping a set of functions to tasks for mono-processor architectures. In this paper, we propose a MOO evolution strategy for the TPP on heteregenous processors based on the Strength Pareto Evolutionary Algorithm (SPEA2) [5]. The proposed method explores feasible solutions in the mapping search space rather than exhaustively searching across all potential solutions. The proposed model focuses on increasing the system’s robustness by maximizing its extensibility and saving energy by minimizing its energy consumption. The rest of the paper is organized as follows. In Sect. 2, we give an overview of the SPEA2 algorithm. Section 3 describes the proposed approach and details the SPEA2 model for the task placement problem. Results from experiments are presented in Sect. 4. Section 5 concludes the paper and outlines future directions.
2
SPEA2 Algorithm Principles
Evolutionary algorithms (EA) [6] are a heuristic-based methods for resolving NP-hard problems. Due to their capacity to address these issues, employing EA to resolve highly difficult multi-objective optimization problems has grown in prominence in recent years. To handle MOO problems, a number of EA algorithms based on the Pareto approach have been proposed in the literature. These include for instance the Pareto Archived Evolution Strategy (PAES) [4], Particle Swarm Optimization (PSO) [3], and the Strength Pareto Evolutionary Algorithm (SPEA2) [5]. The fitness function calculation, the utilization of the archive, and the diversity process differ across these approaches. Figure 2 outlines the different steps of the SPEA2 algorithm. In fact, the input for this algorithm includes a set of parameters, such as the size of the popula tion Np , the size of the archive NA , the maximal number of generations T max, and the specialized parameters Rcross (i.e., crossover rate) and Rmut (i.e., mutation rate) for crossover and mutation respectively. The SPEA2 method begins with creating an initial population P0 with an empty external archive A0 . This archive is an implementation of the elitist EA technique [7], where a collection of excellent solutions is retained. Only these solutions will be used to construct new generations. Then, using the count and strength dominance metrics, the fitness values are calculated for the population and archive. In order to preserve the best solutions for the following generation, the population and archive’s Non-Dominated solutions (ND) are then copied to the next archive At+1 . In the case where |(N D)| > NA , a truncation technique is used to reduce the number of ND solutions using a nearest neighbor density estimate to fine-tune fitness [5]. Otherwise, the method fills At+1 with dominated individuals in Pt and At . Once the number of (ND) solutions is equal to NA , the SPEA2 algorithm applies the binary tournament selection to produce the mating pool. For the purpose of recombining genetic material to create new offspring or solutions, members
450
R. Lassoued and R. Mzid
Fig. 1. The description of SPEA2 algorithm
of the mating pool are randomly coupled (by applying crossover and mutation operators). Until the stopping criterion is satisfied (i.e., the number of generations), the aforementioned process is repeated. The final archive is produced as the SPEA2 output (Fig. 1).
3
MOO Approach for the Task Placement Problem
In this section, we first give an overview of the proposed approach. Then, we detail the SPAE2 model proposed for the task placement problem. 3.1
Approach Overview
The proposed approach is shown in Fig. 2. The task model, which specifies the set of application tasks, and the hardware model, which presents the execution platform on which the tasks will be run, are provided by the designer as entries. We assume in this work that the task model that we denote by τ , is composed of n synchronous, periodic, and independent tasks (i.e.,τ = {T1 , T2 . . . Tn }). Each task Ti is characterized by static parameters Ti = (Ci , P ri ) where Ci = (c1 , . . . , cm ) such as cij represents is an estimation of the worst case execution time of the task Ti on the processor Pj ; i ∈ {1 . . . n} and j ∈ {1 . . . m}, and P ri is the activation period of the task Ti . The hardware model is composed of m heterogeneous processors (i.e., P = {P1 , P2 , . . . , Pm }). Each processor Pj is characterized by
A MOO Approach for RT Task Placement on Heterogeneous Processors
451
its capacitance ζj , its frequency fj , and its voltage vj (i.e., Pj = (ζj , fj , vj ). Each processor has its own memory and runs a Real-Time Operating System (RTOS). These models are the inputs of the execution of the generate initial deployment models step, which aims to produce a set of initial feasible deployment models. The deployment model that we denote by D in this work consists of a set of tuples D = {(P1 , ξ1 ), (P2 , ξ2 ), . . . (Pk , ξk )} where k represents the number of used processors such as k ≤ m and ξj represents the subset of tasks allocated to the processor Pj after the placement step. For real-time embedded systems, a deployment model is said to be feasible when the placement of the real-time tasks on the different processors guarantee the respect of the timing requirements of the system. In that context, Liu and Layland [8] developed a necessary and sufficient schedulability test. The feasibility test determines whether a given task set will always meet all deadlines under all release conditions. This test is based on the computation of the processor utilization factor U p and is defined as follows: n cij ≤ 0.69 (1) U pj = P ri i=1 The models generated as a result of the execution of the generate initial deployment models step will be used as the SPEA2 algorithm’s initial population. Algorithm 1 outlines this step. Indeed, this algorithm considers the initial population size Np as an input to generate P0 , which consists of Np feasible chromosomes. For each randomly created chromosome (i.e., which represents a possible deployment model), a feasibility evaluation is performed according to the expression 1. This process is repeated until the size of the initial popula tion Np is reached. The produced initial population serves as an input to the SPEA2 model, which in turn will be executed to generate optimal deployment models. These models constitute the front that is nearly Pareto optimal rather than a single answer, providing the decision-maker space to consider trade space analysis (i.e., archive of non-dominated solutions). Algorithm 1.Initial feasible deployment models generation
Require: Np :Integer Ensure: P0 1: P0 ⇐ ∅ , k ⇐ 0, indiv ⇐ ””, f easibility ⇐ f alse 2: while k < Np do 3: indiv ⇐ GenerateRandomIndiv(n, m) n and m represent the number of tasks and processors respectively 4: feasibility ⇐ IsF easible(indiv) 5: if f aisability == T rue then 6: P0 ⇐ indiv 7: k ⇐k+1 8: end if 9: end while
452
R. Lassoued and R. Mzid
Fig. 2. Overview of the proposed approach
3.2
SPEA2 Model for the Task Placement Problem
In this section, we describe the SPEA2 model proposed in this paper to solve the problem of task placement problem on heterogeneous distributed processors.
Fitness Function In this paper, we investigate two optimization metrics for the task placement problem: (1) Maximization of system extensibility, which is directly related to system robustness and specifies a system’s ability to adapt its behavior while maintaining schedulability constraints, and (2) Minimization of overall energy consumed by the system. In order to maximize the system’s extensibility, we define a new parameter that we denote by SlackCapacity. This parameter represents the remaining capacity for a processor and is computed as the difference between the maximal capacity of a processor Pj named CapacityM axj and the actual utilization of the processor U pj . The processor utilization is calculated following the expression 1. The maximal capacity of a processor is equal to 0.69 when the task model satisfies the RM optimality conditions [8]. This fitness function is highlighted as follows: F1 = M ax(SlackCapacityj ), j ∈ {1 . . . m} where SlackCapacityj = M in(CapacityM axj − U pj )
(2)
A MOO Approach for RT Task Placement on Heterogeneous Processors
453
The total energy consumed by a processor Pj , denoted by Ej , is calculated by adding the dynamic energy Edynamic and static energy Estatic . The first is mostly caused by transistor state changes, whereas the second is primarily caused by leakage currents. Static energy is frequently overlooked As a result, Ej corresponds only to the dynamic energy and can be expressed according to the model proposed by Winter et al.in [9] as follows: Ej = fj ζj Vj2 ∗ cij , j ∈ {1 . . . m}, i ∈ {1 . . . n}
(3)
where fj , ζj and Vj are the processor’s clock frequency, equivalent capacitance, and supply voltage, respectively, and cij is the worst-case execution time of task Ti on the processor Pj . The fitness function that represents this second metric is given as follows: F2 = M in(fj ζj Vj2 ∗ cij ), j ∈ {1 . . . m}, i ∈ {1 . . . n}
(4)
In this work, the objective is to maximize the slack capacity while minimizing energy consumption. We may claim that one solution dominates another solution, if it generates more slack capacity while using less or the same amount of energy as the rival solution.
Encoding In this work, a solution is expressed using an Integer Encoding technique that has been used in various previous work [4]. A chromosome might indeed be represented as an integer vector with n locations (where n is the number of tasks). Each position (i.e., gene) in this vector represents a task. More specifically, the position i in the vector represents the task Ti . The value assigned to this location corresponds to the processor to which the task is assigned (for example, each gene could have a value of {1, . . . , m}, where m is the number of processors). It’s worth noting that each chromosome represents a single individual. A chromosome for the task allocation problem is depicted in Fig. 3. In this example, 6 tasks (i.e., n=6) are mapped to 4 processors (i.e., m=4), such as T1 , T2 , and T6 are assigned to the processor P2 , T3 to P4 , T4 to P1 and T5 to P3 .
Fig. 3. Chromosome representation of a possible deployment model
454
R. Lassoued and R. Mzid
Selection and Recombination Operators SPEA2 employs binary tournament selection to create the mating pool. This technique finds the fittest candidates from the current generation by randomly selecting two solutions and comparing their fitness. To build a mating pool, the solution with the lowest fitness value is chosen. For the purpose of recombining genetic material to produce new solutions, members of the mating pool are randomly coupled. The two main recombination operators we use in this paper are single point crossover and gene mutation. Figure 4 shows an example of application of the single point crossover and the gene mutation operators for the task placement problem. Indeed, in the single point crossover, a crossing site is randomly determined in both parents’ chromosomes. Two new child chromosomes are then created by exchanging all the bits comprised in the front portions of the crossing site. The crossing is applied based on a pre-defined probability value (i.e.,crossover rate), which indicates whether two randomly selected parents will undergo the crossing. However, the mutation is applied to a single solution. It is typically used after the parents have crossed to provide new genetic characteristics that would be difficult to obtain with a single crossbreeding operator. This operator allows one to randomly select a gene from the chromosome based on a pre-defined probability value (i.e., mutation rate) and replace it with a random value. with the lowest fitness value is chosen.
Fig. 4. Application of the crossover and mutation operators of the SPEA2 method for the task placement problem
4
Evaluation
In this section, we provide a case study to demonstrate the proposed approach’s efficiency in solving the task placement problem on heterogeneous processors and finding the set of non-dominated solutions in the search space. The experiment was conducted using a personal computer equipped with a 2 GHz Intel Core i3 processor, 8 GB of RAM, and Windows 10. The JavaScript programming language was utilized to implement the proposed model with the Visual Studio framework. Table 1 depicts the task model considered for our evaluation. The model consists of 10 tasks that should be mapped onto the three processors P1 , P2 , and P3 , with respect for the scheduling constraints. As described in Sect. 3.2, each task Ti , i ∈ {1 . . . 10} is distinguished by its period P ri and an estimate
A MOO Approach for RT Task Placement on Heterogeneous Processors
455
of the task’s worst-case execution time cj on the processor Pj , j ∈ {1 . . . 3}. In addition, each processor is characterized by its capacitance, frequency, and voltage. For this evaluation, we assume that P1 = (1,2,2), P2 =(1,4,3), and P3 = (1,6,4). To run our experiment, the EA operational parameters must be defined. As a result, we set the population size and the archive size to 8. As recommended by the SPAE2 algorithm, we use the binary tournament as a selection technique. We also set the crossover rate and the mutation rate to 0.9 and 0.01 respectively. Table 1. Task model description Task Pr c1 c2 c3 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10
10 8 20 20 20 30 35 60 55 60
1 1 1 1 2 4 1 1 1 1
2 2 4 3 4 5 2 3 2 2
4 3 5 4 6 6 3 4 4 3
In our experiments, we run three tests with different generations to show how the presence of non-dominated solutions in the archive evolves from the original one. The progression of theoretical solutions from the first archive to 120 generations is depicted in Fig. 5. Indeed, the SPEA2 method is designed to keep solutions dispersed along the non-dominated front while moving them toward the Pareto optimal front. Figure 5a presents the initial archive of the first generation. From this figure, we can notice that the displayed solutions are not properly positioned in the search space. Indeed, these solutions are a hybrid of non-dominated and dominated solutions. The solutions preserved in the archive after 30 generations are depicted in Fig. 5b. This graph clearly shows that the derived solutions are better than the initial ones. For instance, the highest value found for slack capacity changes from 0.24 to 0.33, while the minimum value obtained for energy decreases from 768 to 340. Figure 5d shows the final archive of our experiment after 120 generations. The archived solutions are all nondominated, with the slack capacity reaching 0.35 and the energy minimal reduced to 112. Table 2 describes the set of solutions obtained from the last test (i.e., 120 generations). Each solution corresponds to a potential deployment model that maximizes slack capacity and minimizes energy consumption while ensuring the system’s real-time feasibility. The software designer can select the deployment model that best meets his/her needs from this collection.
456
R. Lassoued and R. Mzid
Fig. 5. The evolution of the archive Table 2. Deployment models obtained in the final archive Solution P1 S1 S2 S3 S4 S5 S6 S7 S8
5
T1 ,T2 ,T3 ,T4 ,T9 T1 ,T3 ,T4 ,T5 , T7 ,T8 , T9 T2 ,T3 ,T4 ,T6 , T7 ,T8 , T9 ,T10 T1 ,T2 ,T4 ,T5 ,T8 , T9 , T10 T1 ,T3 ,T4 ,T5 ,T6 , T7 ,T8 , T9 ,T10 T1 ,T2 ,T3 ,T4 ,T5 ,T6 , T7 ,T8 , T9 ,T10 T1 ,T3 , T4 ,T5 ,T7 ,T9 , T10 T1 ,T3 ,T4 , T6 ,T7 ,T8 ,T9 , T10
P2
P3 Slack Energy
T6 ,T7 ,T8 ,T10 T6 ,T10 T1 ,T5 T3 ,T6 ,T7 T2 ∅ T8 ,T9 T5
T5 T2 ∅ ∅ ∅ ∅ T6 T2
0.35 0.31 0.25 0.26 0.18 0.05 0.34 0.28
1048 604 304 460 176 112 884 520
Conclusion
We have proposed in this paper an approach for MOO of the task placement problem for real-time systems. Indeed, for distributed systems, task mapping is crucial to the overall system’s performance. Based on the well bench-marked algorithm SPEA2, the proposed approach aims to generate design solutions that guarantee trade-offs between two optimization metrics which are the system extensibility and the energy consumption.
A MOO Approach for RT Task Placement on Heterogeneous Processors
457
As future work, we aim to extend the proposed approach by considering also the scheduling issue at each processor. In order to attain energy autonomy, we also intend to address energy harvesting constraints. As a result, the placement model is determined by the energy level at each time step.
References 1. Ebrahimian Amiri, J.: A foundation for development of programming languages for real-time systems. The Australian National University (2021) 2. Deo Prakash, V., Anil Kumar, T.: Maximizing reliability of distributed computing system with task allocation using simple genetic algorithm. J. Syst. Archit. 47(6), 549–554 (2001) 3. Mostafa, H.K., Houman, Z., Ghazaleh, J.: A new metaheuristic approach to task assignment problem in distributed system. In: IEEE 4th International Conference on Knowledge-Based Engineering and Innovation (KBEI) (2017) 4. Rahma, B., Laurent, L., Frank, S., Bechir, Z., Mohamed, J.: Architecture exploration of real-time systems based on multi-objective optimization. In: the 20th International Conference on Engineering of Complex Computer Systems (2015) 5. Lesinski, G., Corns, S.: A pareto based multi-objective evolutionary algorithm approach to military installation rail infrastructure investment. Indus. Syst. Eng. Rev. 7(2), 64–75 (2019) 6. Vikhar, P.A.: Evolutionary algorithms: a critical review and its future prospects. In: 2016 International Conference on Global Trends in Signal Processing, Information Computing and Communication (ICGTSPICC), pp. 261–265 (2016). https://doi. org/10.1109/ICGTSPICC.2016.7955308 7. Grosan, C., Oltean, M., Oltean, M.: The role of elitism in multiobjective optimization with evolutionary algorithms. In: Acta Universitatis Apulensis. Mathematics Informatics, vol. 5 (2003) 8. Liu, C.L., Layland, J.W.: Scheduling algorithms for multiprogramming in a hardreal-time environment. J. ACM (JACM) 20(1), 46–61 (1973). ACM New York, NY, USA (1973) 9. Winter, J.A., Albonesi, D.H., Shoemaker, C.A.: Scalable thread scheduling and global power management for heterogeneous many-core architectures. In: the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 29–39 (2010)
Comprehensive Analysis of Rice Leaf Disease Detection and Classification Models L. Agilandeeswari1(B) and M. Kiruthik Suriyah2(B) 1 SITE, VIT, Vellore 632014, TN, India
[email protected]
2 SITE, VIT, Vellore 632014, TN, India
[email protected]
Abstract. More than 60% of people in India consume rice in their day-to-day life [1] hence it is essential to identify the diseases at an early stage to prevent them from causing further damage which increases the yield of rice. The automatic way of detecting and diagnosing rice diseases is highly required in the agricultural field. There are various models proposed by researchers which detect paddy disease. We have classified and listed models based on their architecture such as CNN (Convolutional Neural Network), ANN (Artificial Neural Network), and ML (Machine Learning). The best model out of these is selected by the following criteria mainly on their performance, efficiency, and the number of diseases a model can detect. Further, we have discussed image pre-processing and segmentation techniques used. This article will direct new researchers into this domain. Keywords: Convolutional neural network · Support vector machine · Principal component analysis · Artificial Neural Network · Image Segmentation · Pre-processing
1 Introduction Rice is a major food crop in Asia and more than a billion people in the world depend on the cultivation of rice, with about 507.87 million tons consumed worldwide in the period 2021–2022. It is estimated that around 37% of yield is reduced due to diseases and pests. Hence identification of disease at an early stage is important to prevent it from further spreading and damage. The Paddy crop is affected by a disease like false smut, blast, bacterial blight, and brown spot. These are the common types of leaf diseases that affect the paddy crop. These diseases occur on the paddy leaf which has different characteristics; hence they can be identified using image classification techniques also we can classify the level of infection using classification models. Identifying these diseases in an early stage can help the farmers to take necessary actions to prevent them from spreading. Applying timely treatment to plants can reduce economic losses substantially and improve the yield. The Recent improvements in deep learning convolutional neural networks (DCNN) improved image detection and classification accuracy. And the introduction of transfer learning has improved detection accuracy. The efficiency of a model © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 458–469, 2023. https://doi.org/10.1007/978-3-031-35501-1_46
Comprehensive Analysis of Rice Leaf Disease Detection and Classification Models
459
is an important factor. It should be feasible to use in the paddy field and must be easily accessible by the farmers (i.e. models that can run on a smartphone). Hence both accuracy and efficiency are important factors. From the observation of the recent articles, we could see that they have developed neural networks with existing models as a base such as VGG16, VGG19, and ResNet50. These models are performing well but require more computational power. Also, when these models are fused with classifiers like SVM, and others we can observe a greater improvement in their accuracy. The rest of the article is presented as follows: Sect. 2 details the general flow of rice leaf disease detection. Section 3 elaborates on the detection and classification techniques. A comprehensive Analysis is presented in Sect. 4 and Sect. 5 concludes the article.
2 General Architecture The general architecture of rice leaf disease detection follows a series of steps starting from image acquisition, pre-processing, segmentation and classification. The image is cropped center to (300,300) dimension for standard image dimension and the dataset used is divided into two sub-parts for training and validation as given in Fig. 1.
Fig. 1. General flow of Rice leaf disease detection and classification
2.1 Image Acquisition Image acquisition is used to collect images from the cultivation fields. The dataset consists of 4 classes of common rice leaf disease Bacterial Blight, Leaf Blast, Brown Spot, and Leaf Blight, and the rest are healthy plant images (Fig. 2). 2.2 Image Pre-processing and Data Augmentation Using pre-processing techniques, the quality of the acquired image is improved. Next, data augmentation is performed on the data set expanded. That is in real-time pre-process, the image is flipped vertically and horizontally [2] also the image is modified by shearing, rotation, and applying random zooming in and out. This Operation of augmentation was
460
L. Agilandeeswari and M. Kiruthik Suriyah
Fig. 2. Sample Dataset Images
performed using the pre-built Keras -Image Data Generator module [3]. It provides the feature of data integrity and facilitates the extraction of useful information form of the data [4]. Figure 3 is representing the steps for pre-processing. From the images from the dataset consisting of the classification of disease rice blast, Bacterial blight, red rot, and sheath blight, we could observe that their characteristics like colors and shape are different. These characteristics are used to apply appropriate filters to them hence they can be classified with greater accuracy.
Fig. 3. Image Pre-processing Stages
2.3 Image Segmentation The main purpose of segmenting the image has two aspects: i. It does not only get rid of the background noise of the dataset image but also improves the quality of the image which gives higher accuracy ii. It minimizes the size of data required, thus involves in reducing the running time of the code. To reduce the running time of the program and improve the recognition efficiency of the program, the sizes of the rice disease images are compressed from 5213 * 3246 to 800 * 600. The most popular image segmentation algorithm is the mean shift algorithm which is used to provide segments (sub-images) [5]. 2.4 Feature Extraction The feature extraction of the image is done by network hidden layers. The layers are connected layer fully which is then covered by SoftMax.This results in the classification and detection of the images. Mainly Shape feature extraction and Color feature extraction are used [5].
Comprehensive Analysis of Rice Leaf Disease Detection and Classification Models
461
3 Detection and Classification Techniques The popular detection and classification techniques for rice leaf diseases are (i) Machine learning based models (Table 1), (ii) ANN based models (Table 2), (iii) CNN based models (Table 3), and (iv) Transfer Learning models (Table 4). 3.1 Machine Learning Models The state-of-the-art machine learning models in the literature is listed in Table 1. Table 1. State-of-the-art Machine Learning models Ref Methodology Dataset
Accuracy
Advantage
Disadvantage Diseases Detected
[6]
SVM
Own Dataset Accuracy: This model is 95% developed in a general way so that, it can also be used for other plants/crop disease detection
This model can only detect two types of diseases of the paddy crop leaf
Leaf Smut, Brown Spot, and Bacterial Blight
[7]
SVM
Own Dataset Accuracy: This model 96.8% replaces the most used SoftMax method with SVM which gives more Accuracy than other models
This model efficiency is reducedin great scale if an image is not segmented
Rice Bast
[8]
Machine Learning Approaches
Own Dataset Accuracy: Using ML 98.63% Algorithms the detection of four common rice diseases is detected using their leaf image
This model has some problems in detecting images with j noises and other problems caused due to external factors
Rice blast, red blight stripe blight, and sheath blight
(continued)
462
L. Agilandeeswari and M. Kiruthik Suriyah Table 1. (continued)
Ref Methodology Dataset [9]
SVM with Deep CNN
Accuracy
Advantage
Own dataset Accuracy: Implemented with 1080 97.5% SVM classifier images combined with DCNN
Disadvantage Diseases Detected The model is Rice Blast trained to detect only one type of disease at a time, for each type it is trained separately and used
From the above Machine learning models, the combined model of machine learning and cnn outperforms the others in means of their accuracy and this model classifies four common types of rice leaf disease rice blast, brown spot, red rot, and bacterial blight. 3.2 ANN Models Table 2. State-of-the-art ANN Models Ref
Methodology
Dataset
Metrics and accuracy
Advantage
Disadvantage
Diseases Detected
[10]
Back Propagation NN and PCB
Own Dataset
Accuracy: 95.83%
The Fusion of BP and PCB methods offered a 2.5% boost in the performance and identification time than novel SVM methods
This model is not efficient in detecting the lesion for the leaf with a similar type of morphology
Rice Blast
3.3 CNN Models By comparing several CNN models used for the detection of paddy/rice leaf disease detection, the highest accuracy achieved is 99.7%. But based on the criteria such as mobility, performance, and efficiency the ADSNN-BO [7] model performs well in all these criteria and it is capable of detecting common disease which occurs in rice. The ADSNN-BO is an optimized deep learning model built on the MobileNet platform making it more efficient and available.
Comprehensive Analysis of Rice Leaf Disease Detection and Classification Models
463
Table 3. State-of-the-art CNN Models Ref
Methodology
Dataset
[11]
3 Convolution layers and one pooling layer with a softmax output layer
[3]
Accuracy
Advantage
Disadvantage
Diseases Detected
Own Data sets 95.48%
This model can predict multiple typesof rice disease with the same network
The efficiency of Rice sheath this model and blight depends upon the quality and the size of datasets, so the accuracy is not acquired
CNN
Own Dataset with 900 Images
97.40%
Two types of rice leaf diseases identified
The 10-fold approach used in this will cost more computing time and resources
Leaf Smut, Brown Spot, Bacterial Blight, and blast
[5]
CNN
Rice Data set
93.3%
Simple Structured Network for efficient use in mobile platform
Due to the reduced size of layers in the network, the accuracy is less when compared with other models
False Smut, Brown Spot, and Neck Blast
[12]
VGG16
Own Data set
92.46%
Theuser of the transfer learning technique with fine-tuning of the VGG16 model gives elevated accuracy
The network is trained on very small-scale datasets and hence received a cut point in accuracy at 25 epochs
Leaf Blast, Brown Spot and Leaf Blight
[13]
CNN
Rice Disease Dataset
99.61%
Inthis model, they have applied 32 different types of filtersin their network which detect the brown spot disease with good accuracy
The validation accuracy is not stable it is varying through the number of epochs
Leaf Blast
(continued)
464
L. Agilandeeswari and M. Kiruthik Suriyah Table 3. (continued)
Ref
Methodology
Dataset
Accuracy
Advantage
Disadvantage
Diseases Detected
[14]
A depth-wise attention-based network called as (ADSNNB O)
Rice Diseases image dataset
94.65%
The model is built upon MoblieNet structure and is suitablefor mobile devices
Requires Multilevel processing of the image to get the required accuracy
Rice Blast, Brown Spot, Leaf Smut, Bacterial Blight
[6]
CNN
Rice Leaf Dataset
99.7%
With 4 hidden layers of network and image noise reduction, it attains higher accuracy
Pre-Processing of the dataset using the ‘Otsu global’ method slows down the network
Rice Hispa and Stem Borer
[2]
VGG16
UCI machine learning database
97.22%
The tuned VGG16 model out-performs the original model in its accuracy
This model is weak in multitasking type of learning methodology
Bacterial blight, leaf smut, and brown spot
[15]
CNN and Convolution Auto Encoder (CAE)
Plant village data sets
98.38%
The fusion of CAE (Convolution Auto Encoder) enables the network to automatic identification plant diseases using images
Sincethis model General is designedfor detection less use of data sets and computational power its accuracy on detection in different environments will not be stable
[4]
Customized CNN
Own Data Set
97.82%
This model is suitable for devices with memory restriction
The Model performance is not consistent and misclassifies when there is a complex background
Rice Blast, Brown Spot, Leaf Smut, Bacterial Blight, and Neck blast
[16]
Convolutional Multiple Neural Network Dedicated (CNN) datasets related to rice diseases
93.75%
This model can detect diseases in rice leaves at a very early stage which helpsto take effective initiatives to sustain the crop
Different variants of deep architecture can provide a higher accuracy percentage, but efficiency is compromised
Tungro Disease
(continued)
Comprehensive Analysis of Rice Leaf Disease Detection and Classification Models
465
Table 3. (continued) Ref
Methodology
Dataset
Accuracy
Advantage
Disadvantage
Diseases Detected
[17]
Deep convolution neuralnetwork by fusing CBAM and ResNet models
Own Data set
98.36%
The fusion of Novel Model ResNet and CBAM abled the analysis of the image spectrum and gave better results
The model is limited to a certain range as dispersion occurs, hence the lesser the rangethe model prediction will be high
Brown Spot, Leaf Smut, Red Strip, Bacterial Blight, and Neck blast
[18]
Deep Own Data set Convolutional with 2906 Neural Network positive and negative
93.45%
Rice disease diagnosis can be performed by fusing the detection models and domain instinct of rice disease
This paper Blight, brown requires spots, and Some work to smut improve its detection accuracy and its efficiency in rice disease detection systems
[19]
Deep Learning
1000 images of rice disease
60%
The implemented model can perform well and be used to control rice leaf disease
It is a developed-in for running on smartphones but 60% accuracy is not enoughto detect the type and stage of the disease
[20]
DenseNet201
A total dataset 96.09% of 240 images
The model can also be used to detect other diseases that affect major Indian crops, Can detect rice diseases like tungro, brown spot, and bacterial blight
This model Rice Blast receives a cut point at 30 epochs further which the model did not improve in its accuracy (reached a saturation point)
Brown spots, hispa, blast
(continued)
466
L. Agilandeeswari and M. Kiruthik Suriyah Table 3. (continued)
Ref
Methodology
Dataset
[21]
Convolutional Own data Neural Network Sets (CNN)
Accuracy
Advantage
Disadvantage
Diseases Detected
90.0%
The given trained model performs the prediction and classifies images with a lesser percentage loss
The proposed General model is disease and developed for pest detection the android platform hence its detection accuracy is various and not stable, and only can be used for immediate disease detection
Table 4. State-of-the-art Transfer Learning models Ref
Methodology
Dataset
Metrics and accuracy
Advantage
Disadvantage
Diseases Detected
[22]
Deep Learning model with a transfer learning method
ImageNet dataset
Accuracy: 91.37%
This model uses the latest methodology called transfer learning for classification
The Results may not be satisfied in some conditions to detect the disease, as the transfer learning model is implemented
Brown spots, hispa, blast, and bacterial blight
3.4 Transfer Learning Models
Comprehensive Analysis of Rice Leaf Disease Detection and Classification Models
467
4 Comparative Analysis To conclude, the above-discussed state-of-the-art approaches are compared in terms of accuracy, efficiency, and the number of diseases detected and the results are recorded in Fig. 4, 5, and Fig. 6. From Fig. 4, hybrid CNN achieves the highest accuracy. From Fig. 5, we infer that a maximum of four diseases are detected by DNCNN and ADSNN-BO approaches. Figure 6 depicts that the ADSNN-BO approach has better efficiency.
Fig. 4. Accuracy of State-of-the-art approaches
Fig. 5. Number of diseases detected
468
L. Agilandeeswari and M. Kiruthik Suriyah
Fig. 6. Efficiency Analysis among State-of-the-art approaches
5 Conclusion In this paper, we have presented a comprehensive analysis of various rice leaf diseases detection models such as VGG16, VGG19, ResNet50, Hybrid CNN Models, DNN and Network with a fusion of models like SVM combination of various clustering algorithms, Principal component analysis. We have observed the following issues in these systems Models are not capable of detecting multiple rice leaf diseases, the detection accuracy is not stable, we could see some cut points in testing the model, and it is difficult for the model to detect images with a complex background in real-time detection of disease the detection accuracy is limited to certain range, Model requires high computational power which makes the real-time detection process delay. These models were compared based on learning and testing accuracy, efficiency, the computational power required to run, and the number of rice diseases identified and classified by the model.
References 1. https://globalagriculturalproductivity.org/a-sustainable-rice-solution-dia/#:~:text=Total% 20rice%20consumption%20in%20India,60%20percent%20of%20the%20population 2. Jiang, Z., Dong, Z., Jiang, W., Yang, Y.: Recognition of rice leaf diseases and wheat leaf diseases based on multi-task deep transfer learning. Comput. Electron. Agric. 186, 106184 (2021) 3. Al-Amin, M., Karim, D. Z., Bushra, T. A.: Prediction of rice disease from leaves using deep convolution neural network towards a digital agricultural system. In: 2019 22nd International Conference on Computer and Information Technology (ICCIT), pp. 1–5. IEEE (December 2019) 4. Hossain, S.M., et al.: Rice leaf diseases recognition using convolutional neural networks. In: ADMA 2020, pp. 299–314. Springer, Cham (2020). https://doi.org/10.1007/978-3-03065390-3_23 5. Rahman, C.R., et al.: Identification and recognition of rice diseases and pests using convolutional neural networks. Biosys. Eng. 194, 112–120 (2020)
Comprehensive Analysis of Rice Leaf Disease Detection and Classification Models
469
6. Verma, G., Taluja, C., Saxena, A. K.: Vision based detection and classification of disease on rice crops using convolutional neural network. In: 2019 International Conference on Cuttingedge Technologies in Engineering (ICon-CuTE), pp. 1–4. IEEE (November 2019) 7. Feng, S., et al.: A deep convolutional neural network-based wavelength selection method for spectral characteristics of rice blast disease. Comput. Electron. Agric. 199, 107199 (2022) 8. Jiang, F., Lu, Y., Chen, Y., Cai, D., Li, G.: Image recognition of four rice leaf diseases based on deep learning and support vector machine. Comput. Electron. Agric. 179, 105824 (2020) 9. Daniya, T., Vigneshwari, S.: A review on machine learning techniques for rice plant disease detection in agricultural research. System 28(13), 49–62 (2019) 10. Xiao, M., et al.: Rice blast recognition based on principal component analysis and neural network. Comput. Electron. Agric. 154, 482–490 (2018) 11. Lu, Y., Yi, S., Zeng, N., Liu, Y., Zhang, Y.: Identification of rice diseases using deep convolutional neural networks. Neurocomputing 267, 378–384 (2017) 12. Ghosal, S., Sarkar, K.: Rice leaf diseases classification using CNN with transfer learning. In: 2020 IEEE Calcutta Conference (CALCON), pp. 230–236. IEEE (February 2020) 13. Rathore, N.P.S., Prasad, L.: Automatic rice plant disease recognition and identification using convolutional neural network. J. Critical Rev. 7(15), 6076–6086 (2020) 14. Wang, Y., Wang, H., Peng, Z.: Rice diseases detection and classification using attention based neural network and bayesian optimization. Expert Syst. Appl. 178, 114770 (2021) 15. Bedi, P., Gole, P.: Plant disease detection using hybrid model based on convolutional autoencoder and convolutional neural network. Artifi. Intell. Agricul. 5, 90–101 (2021) 16. Daud, S.M., Jozani, H.J., Arab, F.: A review on predicting outbreak of tungro disease in rice fields based on epidemiological and biophysical factors. Int. J. Innovation, Managem. Technol. 4(4), 447 (2013) 17. Hasan, M. J., Mahbub, S., Alom, M.S., Nasim, M.A.: Rice disease identification and classification by integrating support vector machine with deep convolutional neural network. In 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT), pp. 1–6 (May 2019) 18. Upadhyay, S.K., Kumar, A.: A novel approach for rice plant diseases classification with deep convolutional neural network. Int. J. Inf. Technol. 1–15 (2021). https://doi.org/10.1007/s41 870-021-00817-5 19. Andrianto, H., Faizal, A., Armandika, F.: Smartphone Application for Deep Learning-Based Rice Plant Disease Detection. In: 2020 International Conference on Information Technology Systems and Innovation (ICITSI), pp. 387–392. IEEE (October 2020) 20. Liang, W-j., et al.: Rice blast disease recognition using a deep convolutional neural network. Scientific Reports 9(1), 1–10 (2019) 21. Mique Jr., E.L., Palaoag, T.D.: Rice pest and disease detection using convolutional neural network. In Proceedings of the 2018 International Conference on Information Science and System, pp. 147–151 (April 2018) 22. Shrivastava, Vimal K., et al.: Rice plant disease classification using transfer learning of deep convolution neural network. Int. Arch. Photogrammetry, Remote Sensing & Spatial Inf. Sci. 3(6), 631–635 (2019)
Multimodal Analysis of Parkinson’s Disease Using Machine Learning Algorithms C. Saravanan1 , Anish Samantaray1 , and John Sahaya Rani Alex2(B) 1 School of Computer Science Engineering, Vellore Institute of Technology, Chennai, India 2 Centre for Healthcare Advancement, Innovation and Research,
Vellore Institute of Technology, Chennai, India [email protected]
Abstract. The manifestations of Parkinson’s disease (PD) are multifold. One of the manifestations is via the motor movements of a person. With a lot of data available from the Parkinson’s Progressive Markers Initiative (PPMI), the data can be analyzed and predictive analysis can be done. The goal of the effort is to identify the dominant features along with the preeminent machine learning (ML) algorithm that can determine whether a person has Parkinson’s disease or not. In this work, we have used the gait data of a person for motor movement along with the patient’s diagnostic information. Feature selection analysis using correlation coefficient is done on the gait data to provide improved accuracy. Further, those features are applied to the ML algorithms. Our experimental results show LightGBM and Xgboost provided the best result with an accuracy of 91.83%, followed by Extra tree classifier and Logistic Regression with an accuracy of 90.81%. Keywords: Machine Learning · Parkinson’s disease · GAIT · Longitudinal Analysis · Accuracy
1 Introduction Parkinson’s disease is one of the neurological disorders that affect both the neurological system and the bodily components that are under the control of the nervous system. Symptoms emerge gradually. The initial sign could be a slight tremor in just one hand [1]. The loss of dopaminergic neurons in the substantia nigra is one of the main causes of Parkinson’s disease (PD). With a typical beginning age of 55 and a pronounced age-related increase in incidence, PD is a progressive disease [2]. Parkinson’s disease can lead to tremors, bradykinesia, rigid muscles, poor posture, and balance issues. You might mumble, speak fast, slur, or pause before speaking. There is a great relationship between motor movements and Parkinson’s disease. Therefore, motor movements may be very crucial to know if a person has Parkinson’s disease [1]. Analyzing raw datasets for trends, conclusions, and improvement opportunities is the part of data analytics. Health care analytics employs both recent and old data to produce macro and micro insights to help business and patient decision-making [3]. There is a wealth of information available nowadays about many different topics, and © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 470–478, 2023. https://doi.org/10.1007/978-3-031-35501-1_47
Multimodal Analysis of Parkinson’s Disease
471
this information can be really useful to understand, predict and prevent many things in the healthcare sector. The gait system is used to analyse the motor movements of the people, in this system wireless sensors are worn by the patients on both wrists and on the back. The sensors contain three accelerometers’ gyroscopes and magnetometers. With the help of these sensors the data is collected regarding various parameters like sway, walking, arm swing and axial. This data can be analysed independently and also using the features collected by the patients during their longitudinal analysis [4]. This analysis can also be helpful for the prognosis of the patient.
2 Related Work Anat Mirelman of Tel Aniv University created documentation on the subject. He demonstrates that traditional GAIT Data evaluations focused on measuring motor deficits rather than quantitative measures, resulting in a critical change in predicting Parkinson’s disease diagnosis [4]. In 2016, Tanya Simuni and others examined the frequency and stability of the subgroup characterization of tremor dominant (TD) versus postural instability gait disorder dominant (PIGD) Parkinson’s disease (PD) in de novo patients. In the PD de novo cohort, TD versus PIGD subtype classification varies greatly over the first year. PD subtype change is unaffected by dopaminergic therapy [5]. In 2018 Ryul Kim and co-authors used PPMI Dataset to investigate whether presynaptic striatal dopamine depletion predicts Parkinson’s disease freezing of gait (FOG). Baseline DAT uptake of each striatal subregion was used to compare FOG cumulative risk in tertiles using the Kaplan-Meier method. Cox proportional hazard models assessed DAT uptake of striatal subregions’ predictive power for FOG development and concluded that it may provide reliable insight into FOG’s nigrostriatal mechanism [6]. In 2018, Gunjan Pahuja, et al., developed a novel prediction/identification model by combining biological biomarkers with SBR values of only four brain regions. PD classification and risk estimation models were developed using MLR to demonstrate that the prediction model fits the data well [7]. In 2020, Chesney E. Craig et al. proposed a relationship between posture and gait and degeneration of the cholinergic pedunculopontine nucleus and investigated whether metrics of microstructural integrity can independently predict future postural instability and gait difficulties and can be combined with other candidate biomarkers to improve prognosis [8]. In 2022, Kelly N. H. et al. investigated the genetic origins of dystonia-parkinsonism in patients with DAT-SPECT scans but no dopaminergic abnormalities (SWEDD). Data from the Parkinson’s Progression Markers Initiative (PPMI) cohort, rare variations in SWEDD but not HC, known pathogenicity in the literature and genetic databases, and variant conservation and protein function programmes were prioritized for this casecontrol investigation. Genetic variations in parkinsonism-dystonia genes in SWEDD cases can improve clinical diagnosis and therapy [9].
472
C. Saravanan et al.
3 Proposed Work Based on the literatures, we found that motor movement data alone will not be helpful in finding the patient is Parkinson’s or not. So, in this work, from PPMI data [4], appropriate patient’s diagnostic information are selected and combined with suitable Gait data to predict a patient is having Parkinson’s disease. The selected Gait features are listed in Table 1. First, the preprocessing of the Gait data is done using correlation coefficient method. The highly correlated data are removed and combined with the suitable columns from diagnostic features table provided by PPMI. The suitable columns that are considered from PPMI’s patient’s diagnostic information are listed in Table 2. The Gait data is then joined with PPMI Diagnostic features dataset by using inner join to understand the importance of the features which resulted in 76 features. Then, we have applied seven machine learning algorithms. Table 1. GAIT motor movements data description
Sway (SW) Velocity
Sway path Centroidal frequency
Jerk
Timed up and Go (TUG) TUG duration
Number of Steps Average step duration during straight walking Average step duration during turns Step regularity Step Symmetry
Walking
Arm Swing (ASYM)
Axial
Walk Speed
Amplitude_Right_arm
Cadence
Amplitude_Left_arm
Trunk Rotation Asymmetry Average Amplitude trunk
Average stride time
Variability_Right_arm
Stride CV
Variablity_left_arm
Step Regularity Step Symmetry Jerk
Symmetry Right/Left Jerk Right Jerk Left Asymmetry_index
Multimodal Analysis of Parkinson’s Disease
473
Table 2. Diagnostic feature data description Diagnostic feature
Description
DFRTREMP
Resting tremor present
DFRTREMA
Resting tremor absent
DFPATREM
Prominent action tremor
DFOTHTRM
Tremor - Other, Specify
DFRIGIDP
Rigidity is present and typical for PD
DFRIGIDA
Rigidity is absent
DFAXRIG
Axial rigidity excess of distal rigidity
DFUNIRIG
Marked unilateral or asymmetric rigidity
DFTONE
Additional type of increased tone
DFOTHRIG
Rigidity – Other, Specify
DFRIGCM
Rigidity – comment
DFBRADYP
Bradykinesia is present
DFBRADYA
Bradykinesia is absent
DFAKINES
Pure Akinesia (without rigidity, tremor)
DFBRPLUS
Bradykinesia not account for rapid movement
DFOTHABR
Akinesia/Bradykinesia – Other, Specify
3.1 Experimental Data We have downloaded the gait data from the PPMI website [4] with prior permission. The data consists of 59 columns and 192 rows initially. This data has been preprocessed later. The Diagnostic features data has also been collected from PPMI which describes the various changes and tests that people had recorded during their multiple visits to the Clinic. Both datasets have been joined using an inner join and the columns which had a correlation coefficient over 0.8 have been removed. The combined dataset has 76 features from 296 patients [4]. 3.2 Data Preprocessing At first, for the gait dataset alone, there are many empty values which are imputed with the median values of respective columns. The COHORT column, which predicts that a person has Parkinson’s disease, is label encoded into 0 and 1 for binary classification. Then there are multiple columns having skewness outside of the range of −1 to 1 have been transformed using a logarithm to bring them into the acceptable range. The ML techniques are then used to model the data. Then while combining the two datasets, the important columns mentioned in Table 2 which are a part of diagnostic features dataset have been segregated into a single file along with the patient no. The gait data and the new created file have been joined using inner
474
C. Saravanan et al.
join using the common column “Patient no”. The dataset has a lot of empty values, these values have been imputed with the median values of the respective columns. The COHORT column which predicts that a person has Parkinson’s disease is label encoded into 0 and 1 for binary classification similar to the earlier dataset. Then we check the correlation between columns to reduce the number of dimensions. The correlated columns found are shown in Table 3. All the ‘Feature 2’ in the Table 2 have then been dropped from the dataset. After this step there are 1291 rows for every column. During the inner join, there are many duplicate rows as well since there are various patients who visited the hospital several times and their readings did not change. So, after we remove the duplicate rows. The unique rows (296) are left behind and we apply our Machine learning algorithms to them. The code can only be shared if an individual requests access and is not available to the community. Table 3. Correlation coefficients between two columns in the combined dataset Feature1
Feature2
Correlation coefficient
ASYM_IND_U
ASA_U
0.998483
ASYM_IND_DT
ASA_DT
0.998006
TUG2_STRAIGHT_DUR
TUG1_STRAIGHT_DUR
0.956847
DFBRADYA
DFBRADYP
0.903129
DFBRADYA
DFRIGIDA
0.835618
DFRIGIDP
DFRIGIDA
0.898554
DFRIGIDP
DFBRADYP
0.848042
DFRTREMP
DFRTREMA
0.892891
LA_AMP_DT
LA_AMP_U
0.845896
TUG2_DUR
TUG1_DUR
0.836734
TUG2_DUR
TUG2_STEP_NUM
0.819604
SW_VEL_CL
SW_VEL_OP
0.82475
3.3 Machine Learning Algorithms and Hyper Parameters We have used seven ML algorithms that are Logistic Regression, Support vector machine, Random Forest, Extra Tree Classifier, LightGBM, Xgboost and KNN [x] for binary classification [10]. The hyper parameters used are the default hyperparameters by sklearn library for Xgboost, SVM, LightGBM and Random Forest. For KNN n_neighbors are given value of 7 and for logistic Regression max iterations are changed to 1000.
4 Results and Discussion The results show a significant increase in accuracy after the gait data is combined with the various other details shown in the patient’s longitudinal analysis in the diagnostic features
Multimodal Analysis of Parkinson’s Disease
475
dataset. KNN achieved the highest accuracy of 78.125 using only motor movements. The combined data, on the other hand, yields a maximum accuracy of 91.82%, which is achieved by the Xgboost and LightGBM classifiers. The Table 3 shows the accuracies of the data achieved when the gait Motor movements is used alone compared to when it is joined with the other diagnostic features over a period of time. Figure 1 depicts the accuracy of the machine learning algorithms with and without patient’s diagnostic information. The Precision, Recall, and F1 score of the combined data were then calculated and are tabulated in Table 4 along with a bar chart depicting the same in Fig. 2. Extra Tree classifier has the greatest precision, while Xgboost has the highest recall and F1 score. The confusion matrix whose accuracies are maximum are also provided in Fig. 3 and 4.
Fig. 1. Accuracy of the seven machine learning algorithms
Fig. 2. Performance metrics of the seven ML algorithms using Gait data combined with Patient’s diagnostic information
476
C. Saravanan et al.
Fig. 3. The confusion matrix of LightGBM algorithm
Fig. 4. The confusion matrix of Xgboost algorithm
Multimodal Analysis of Parkinson’s Disease
477
Table 4. Machine learning algorithm’s accuracy. Machine learning algorithms Accuracy in % using Gait data Accuracy in % of the Gait data combined with diagnostic information Logistic Regression
64.0625
90.816
SVM
62.5
88.775
Random Forest
75
89.795
Extra Tree Classifier
73.437
90.816
LightGBM
65.625
91.836
Xgboost
71.815
91.836
KNN
78.125
81.635
Table 5. Precision, Recall, F1 score of selected machine learning algorithms. ML Algorithms
Precision in%
Recall in %
F1-Score in%
Logistic Regression
92.4
85.9
88.3
SVM
89.3
83.5
85.7
Random Forest
93
81.6
85
Xgboost
91.9
88.5
89.9
KNN
80
74.65
76.4
Extra tree Classifier
94.1
85
88
LightGBM
93.1
87.5
89.7
5 Conclusion In this research work, multimodal analysis of Parkinson’s disease has been carried out. Gait data considered as the motor movement data has been used to classify Parkinson’s disease. We found that this data produced a 60 to 78% accuracy using seven ML algorithms. So, the patient’s diagnostic information from the PD diagnostic feature table is chosen. In this table, few columns are chosen based on its qualitative nature and combined with gait data, yielded better accuracy of 81 to 92%. We also observed the Xgboost and LightGBM algorithms providing high accuracies. In the future, the data could be applied for prognosis analysis because of the longitudinal nature of data.
References 1. Davie, C.A.: A review of Parkinson’s disease. Br. Med. Bull. 86(1), 109–127 (2008) 2. Dauer, W., Przedborski, S.: Parkinson’s disease: mechanisms and models. Neuron 39(5), 899–909 (2003)
478
C. Saravanan et al.
3. Informatics, H.: The Role of Data Analytics in Health Care. University of Pittsburgh (2021) 4. The Parkinson Progression Marker Initiative (PPMI). Prog. Neurobiol. 95(4), 629–635 (2011). https://doi.org/10.1016/j.pneurobio.2011.09.005 5. Simuni, T., Caspell-Garcia, C., et al.: How stable are Parkinson’s disease subtypes in de novo patients: Analysis of the PPMI cohort? Parkinsonism Relat. Disord. 28, 62–67 (2016) 6. Pahuja, G., Nagabhushan, T.N., Prasad, B.: Early detection of Parkinson’s disease by using SPECT Imaging and Biomarkers (2018) 7. Kim, R., Lee, J., et al.: Presynaptic striatal dopaminergic depletion predicts the later development of freezing of gait in de novo Parkinson’s disease: an analysis of the PPMI cohort. 51, 49–54 (2018) 8. Craig, C.E., Jenkinson, N.J., et al.: Pedunculopontine nucleus microstructure predicts postural and gait symptoms in Parkinson’s disease. Mov. Disord. 35(7), 1199–1207 (2020) 9. Nudelman, K.N.H., Xiong, Y., et al.: Dystonia-Parkinsonism gene variants in individuals with parkinsonism and brain scans without evidence for dopaminergic deficit (SWEDD). MedRxIv (2022) 10. Bonaccorso, G.: Machine Learning Algorithms. Packt Publishing, UK (2017)
Stock Market Price Trend Prediction – A Comprehensive Review L. Agilandeeswari1(B) , R. Srikanth1(B) , R. Elamaran1(B) , and K. Muralibabu2 1 SITE, VIT, Vellore 632014, Tamil Nadu, India
[email protected], {srikanth.2020, elamaran.2020}@vitstudent.ac.in 2 EEE, GIET University, Gunupur 765022, Odisha, India
Abstract. Stock market price prediction is like an Astrologer which will predict the future value of the stocks whether our share brings in profit or loss, the significance of this prediction system is to gain profit on our invested money and to prevent huge financial loss in the share market, to predict this asset we use many methods. In this article, we analyzed the different steps in the stock market price trend prediction namely, data collection, Pre-processing, dimensionality reduction, classification, prediction, and validation. This article does a thorough analysis of machine learning, deep learning, fuzzy based, and some hybrid models with help of performance metrics such as accuracy. This article also concludes which model is best among all existing models. Keywords: Data collection · Pre-processing · Dimensionality reduction · Classification · Prediction · Machine Learning · Deep Learning · Fuzzy based · And Hybrid models
1 Introduction Stock market research is a popular topic for academics in both the financial and technical sectors because it is one of the primary industries to which investors devote their attention. The wise approach to making money is making more and more profit by utilizing your trading skills and expertise. Unless of course, a person has long-term ambitions, being a day trader and making money daily is the most favored and desirable way of getting money in the Stock Market. However, to do that, the various difficulties and problems that come with intraday trading should be analyzed. That is only possible by forecasting the stock market using a variety of tools and tactics. This helps to know how to maximize intraday trading and enables you to consistently make money. It is a time-series forecasting technique that quickly analyses historical data and predicts the values of future data. In this article, we are presenting an extensive review of the existing stock market prediction models using machine learning, deep learning, Fuzzy logic, and hybrid models. According to [1] the generally recognized, semi-strong version of market efficiency and the significant quantity of noise contained in the datasets make financial time series prediction a highly difficult task. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 479–491, 2023. https://doi.org/10.1007/978-3-031-35501-1_48
480
L. Agilandeeswari et al.
The general architecture of the stock market price prediction models is given in Fig. 1. The first step in the stock market prediction is data collection. The collected data may contain various noisy and missing values. The data may also be in a broad range, so normalization is needed. These operations are performed in the data Pre-processing step to cleanse and arrange the data in a proper format. The next stage is dimensionality reduction where the dimensionality of the data is reduced to avoid redundancy and to keep only informative features. After the dimensionality is reduced, the dataset will be divided into training and testing sets, and classification and prediction will be performed. The testing and validation is the performance of the model are evaluated using various metrics.
Fig. 1. General architecture of stock price trend prediction
1.1 Article Organization Section 2 explains data collection and the different datasets available. The different preprocessing techniques are presented in Sect. 3. Dimensionality reduction techniques are clearly explained in Sect. 4. Section 5 presents a detailed analysis of the various machine learning and deep learning techniques. The validation and testing process are explained in Sect. 6. Section 7 delivers the various metrics used for the evaluation. Section 8 gives a summary of the article with directions for further research.
Stock Market Price Trend Prediction
481
2 Data Collection In this section, some of the commonly used datasets and their description are given. Table 1 gives the details of the datasets available. Some of the attributes that are generally taken for the prediction and classification are 1. Date: It is the current date of the stock movement, 2. A close price: closing price of the stock, 3. It is commonly reported as the number of shares that changed hands during a given day or given period, 4. Open price: Open price of a stock, 5. High price: The highest price of a stock in each day, and 6. Low price: The lowest price of a stock in each day. Table 1. Datasets Dataset Name
Features
NIFTY-50 Stock Market Data (2000–2021) [2]
The information is drawn from https://www.kaggle.com/dat the NSE (National Stock asets/rohanrao/nifty50-stockExchange) of India’s price market-data history and trading volume information for the 50 equities that make up the index NIFTY 50. Each stock’s price and trading values are broken out into separate. CVS files, and a metadata file including some macro information about the stock itself, complete the day-level pricing and trading values for all datasets. From 1 January 2000 through 30 April 2021, the data is available
Link
S&P 500 stock data [3]
The top 500 publicly traded US https://datahub.io/core/s-andstocks are included in the p-500-companies free-float, capitalization-weighted S&P 500 index
BSE Sensex Dataset [4]
The Bombay Stock Exchange’s https://www.kaggle.com/dat free-float market-weighted index asets/ravisane1/5-year-bse-sen of 30 reputable and financially sex-dataset stable companies is called the BSE SENSEX. The 30 firms that make up the index, which are among the biggest and most liquid stocks, comprise a cross-section of the Indian economy’s main industrial sectors
(continued)
482
L. Agilandeeswari et al. Table 1. (continued)
Dataset Name
Features
Link
Chinese stock data (From 2005–2022)
This dataset includes all actual https://www.kaggle.com/dat Chinese stock data with adjusted asets/franciscofeng/chinesestock prices from 2005 through stock-data-from-20052022 2022. (Adjusted for stock splits and dividends). To save memory, a “ Pkl” file was produced. For developing and back-testing quantitative trading methods on the Chinese stock market, this dataset can be used. Any other private or professional use is likewise permissible without limitations
NIKKEI 225 Index Prophet [5]
Nikkei provides the most https://www.kaggle.com/code/ comprehensive and detailed stpeteishii/nikkei-225-indexmarket information in Japan from prophet/data business and economic news to corporate and industrial data
Shanghai stock index [6]
The Shanghai Stock Exchange’s (SSE) market data offerings include Level-1 and Level-2 market data in real-time. The above data may only be redistributed or used by authorised vendors. Eligible organizations in non-Mainland China regions could apply for a license to China Investment Information Services Limited (CIIS)
https://www.kaggle.com/code/ syavia/shanghai-stock-indexstock-forecast
NSE (National stock exchange) [7]
By using NSE we can trade in segments like Equities, Mutual funds, Initial Public Offerings, and Various schemes, etc.
https://www.kaggle.com/dat asets/atulanandjha/nationalstock-exchange-time-series
3 Pre-processing In the collected datasets, several noise values will be present. For better classification and prediction, the data should be clean and in a proper format. Reducing the number of dimensions in the training dataset is one of the important tasks. The main pre-processing techniques are data cleaning and data transformation.
Stock Market Price Trend Prediction
483
3.1 Data Cleaning There may be a lot of useless information and unnecessary gaps in the data. To solve this problem data cleaning is a much-required technique [8]. Missing Data: This is a problem where there are a lot of gaps in the data, and it can be dealt with in the following ways. (i) Ignore the tuples: This method only works when a tuple has numerous missing values, and our dataset is rather sizable. (ii) Filling the missed values: There are several ways to solve this problem. We can manually fill the values by taking the mean or values which are most suitable. Noisy Data: Noisy data is unintelligible data that cannot be understood by computers. It might be produced because of poor data gathering, incorrect data entry, etc. Following are some options for handling it: Binning Technique: To smooth data, this procedure applies to sorted data. The entire set of data is separated into equal-sized pieces before the task is finished using a variety of techniques. Each segment is dealt with independently. To finish the operation, one can use boundary values or replace all the data in a segment with its mean.
3.2 Data Transformation This method is used to change the data into formats that are appropriate for the mining process [9]. This entails the following: Normalization: To scale the data values inside a given range, this is done (−1.0 to 1.0 or 0.0 to 1.0). Selection of Attribute: With the help of the provided set of attributes, new attributes are created using this method to aid the mining process. Discretization: By doing this, interval levels or conceptual levels are used to substitute the raw values of numerical attributes. Generation of Concept Hierarchies: Here, qualities are transferred from a lower level of the hierarchy to a higher level. As an illustration, the attribute “city” can be changed to “country.”
4 Dimensionality Reduction Data size is decreased using encoding techniques, which can be either lossy or lossless. Such reductions are referred to as lossless reductions if the original data can be recovered after being compressed; otherwise, they are referred to as lossy reductions. Wavelet transforms and PCA are the two most effective dimensionality reduction techniques (Principal Component Analysis). Wavelet Transform: Using the wavelet transform (WT) [10], it is possible to evaluate data in time-frequency space while lowering noise and keeping the key elements of the original signals. WT has developed into a highly efficient instrument for data and signal processing over the last 20 years.
484
L. Agilandeeswari et al.
Principal Component Analysis: A statistical technique called Principal Component Analysis (PCA) [11] employs an orthogonal transformation to change a set of correlated variables into a set of uncorrelated ones. The most often used tool in machine learning for predictive models and exploratory data analysis is PCA. Correlation Between Features: The most typical method, correlation between features, eliminates features that have a significant association with other features. Statistical Tests: Another option is to choose the features using statistical tests, which examine each feature’s association with the output variable separately [12]. Recursive Feature Elimination: The algorithm trains the model with all features in the dataset, calculating the model’s performance, and then eliminates one feature at a time, ending when the performance improvement is insignificant. This process is sometimes referred to as backward elimination [13].
5 Classification and Prediction 5.1 Machine Learning Based From Table 2, we depict that when comparing wavelet-based soft computing models with autoregressive models to baseline models (simple regression, ARMA, and ARIMA), the authors of [14] examined several models, including Wavelet-ARMA Forecast, WaveletDenoised-ARMA Forecast, Wavelet-ARMA Forecast, and Wavelet-ARIMA Forecast. Wavelet-Denoised-ARIMA Forecast and Wavelet-ARIMA Forecast models, respectively, show a significant reduction in the inaccuracy of 0.67% and 0.38%. WaveletARIMA Forecast reduces the error value by 80% when daily closing prices are predicted. In [15], the authors stated that the open price of the stock market can be accurately predicted using a novel neural network approach that has been refined via particle swarm optimization. It has been tested on real data from JSPL and HDFC. In [16], the authors used two different categories of technical indicators to forecast how the Nikkei 225 index would move the next day. By applying these two different types of input variables, they tested the performance of the GA-ANN hybrid model by comparing the predictions with actual data and adjusting the weights and biases of the ANN model using the GA algorithm. The results of the trials showed that Type 2 input variables can perform better and that their accuracy in predicting direction is 81.27%. The hybrid Unscented Kalman filter and DE (DEUKF) is a novel learning paradigm that is detailed in this study [17] to estimate the trend for the following day in the four Indian stock markets: the Bombay Stock Exchange (BSE), IBM, Oracle, and Reliance Industries Limited (RIL). It outperforms the other three learning strategies in terms of the many technical indicators, such as the least Mean squared error and the lowest Mean absolute percentage error of roughly 2%, according to the multiple computational findings (Fig. 2).
Stock Market Price Trend Prediction
485
Table 2. Machine learning-based systems Ref
Methodology or Advantages techniques
[14]
Wavelet analysis, Autoregressive forecasting models
[15]
ANN and GA algorithm
Issues
Dataset
Metrics
Soft computing It is techniques computationally combined with intensive autoregressive models are then applied to provide more accurate prediction results.
Bombay Stock Exchange data, BSE 100 S&P index,
Performance 80%
The results of the trials showed that Type 2 input variables may perform better and that their accuracy in predicting direction is 81.27%.
The Tokyo Stock Exchange’s most popular market index is the Nikkei 225 index.
Performance 81.27%
There is no learning and automation
Fig. 2. Performance Metrics Analysis for Machine Learning based Stock Price Predictions
5.2 Deep Learning Based Since stock movements depend on a variety of factors, predicting future market trends is a difficult process. According to [14] there is a connection between news stories and stock prices, and stock price changes may be attributed to news events.
486
L. Agilandeeswari et al.
Deep learning model construction covers some stages. A deep learning system is sluggish and time-consuming because of how long this process requires as in Table 3. Table 3. Deep learning based systems [18]
Long short-term LSTM network memory can effectively (LSTM) extract 26 meaningful information from noisy financial time series data
The only criterion for being traded is that they have price information available for feature generation. To derive mean and standard deviation from the training set solely and prevent look-ahead biases, feature creation is essential.
S&P 500 from Thomson Reuters from December 1989 to September 2015
[19]
Convolutional neural network (CNN) and LSTM
Unlike other NNs, the Long Short-term Memory is a version of the conventional RNN with time steps, memory, and gate architecture.
2 years of data Accuracy is from the Chinese 97.61% stock market
PCA has significantly improved the training efficiency of the LSTM model by 36.8%.
Accuracy is 98.21%
Stock Market Price Trend Prediction
487
Table 4. Fuzzy based systems Ref
Methodology or techniques
Advantages
Issues
[20]
Neuro-Fuzzy logic
Automatically Noise cannot be learns fuzzy rules eliminated and membership functions from imprecise data
[21]
Fuzzy logic and It deals with the Neural networks inborn apprehensions of human data with linguistic records. - Easy communication between domain experts and the system’s designer
Dataset
Metrics
BSE SensexStocks from 2009 to 2012
Accuracy is 96.42%
It only responds Bombay Stock Accuracy is to issues that the Exchange 95.31% rule base has dataset information on. Depend on current inference logic rules since the system is not resistant to topological changes
Table 5. Hybrid systems Ref
Methodology or techniques
Advantages
[22]
Hybridization of: Sentiment analysis and clustering
[15]
Particle Swarm Optimization and Adaline Neural Network (ANN) hybridization (PSO)
Issues
Dataset
Metrics
We can meet Noisy data with somewhat from social more accuracy media than the previous existing hybrid approach
Dataset: NSE (National stock exchange)
Simple moving Average (SMA)positive
The method has the benefit of allowing load matching to be supplemented by allocating overall optimization tasks to many working swarms For financial time series data, this approach works well
HDFC and JSPL stocks from 2012 to 2014
Mean Absolute Percentage Error (MAPE) 1.1% Accuracy 98.9%
it can’t overcome the linear separability problem
488
L. Agilandeeswari et al.
5.3 Fuzzy Based A significant topic in stock market forecasting is finance. The use of artificial neural networks in stock market forecasting during the last ten years. The research was performed by [20] for the stock index value forecast as well. Reflects the index’s daily direction of change. A few artificial neural networks are used in some Networks that are limited in their ability to learn data patterns or their performance could be unexpected and inconsistent due to the intricate financial data handled. Considering the low degree of both long-term and short-term mistakes considering modelling, it might be said that the “ANFIS” is capable of predicting the behavior of stock prices as in Table 4. 5.4 Hybrid Approach
Fig. 3. Comparative Analysis of overall methods in terms of performance metrics
From Table 5, we infer that optimization namely PSO attains better results and overall analysis proves that deep learning is producing better results than other approaches. From Fig. 3 we deduce that deep learning-based methods predict the result with slightly more accuracy when compared to other models.
6 Validation and Testing The effectiveness of a statistical prediction model can be evaluated in a variety of ways. For measuring how closely forecasts match actual results are measured using metrics. Some of the metrics used for prediction are given in this section.
Stock Market Price Trend Prediction
489
6.1 Metrics Used for Prediction • Accuracy The difference between observed and forecasted values should be used to gauge predictive accuracy. The anticipated numbers, however, may belong to different data. Accuracy = Total count of accurate predictions/ count of each prediction. • Precision How many of the correctly predicted positive outcomes are measured by precision (true positives) Precision = TP/ [TP + FP]
(1)
• Root mean squared error (RMSE) Its square root is the sum of all the mistakes’ squared means. The widely used RMSE is regarded as a superior all-purpose error measure for numerical forecasts. √ RMSE = [ (Pi − Oi)2/ n ] (2) • Mean squared error (MSE) A measure of an estimator’s mean squared error (MSE) or mean squared deviation (MSD) is the average of the squares of the errors, or the average squared difference between the estimated values and the actual value (of a process for estimating an unobserved variable). ˆ 2 X−X (3) MSE =
7 Challenges and Issues In this article, we have identified some of the existing challenges that need to be addressed. • Dealing with noisy data is a common challenge. In a dataset, noisy data typically refers to useless information, inaccurate records, or duplicate observations. Depending on the outlier, a record that is an outlier may or may not be considered noise. It is necessary to decide whether the outlier may be regarded as noisy data or can be removed from the dataset. • The lack of data points in real-world data is another frequent problem. Most existing models are unable to deal with missing values in the data completely.
490
L. Agilandeeswari et al.
• To supply the model with enough data for prediction, databases must be merged, and various types of data must be combined. • And also features are highly correlated with one another. • Another restriction is that a model’s ability to analyse enormous amounts of data in real-time, with longer texts and more characteristics, must become more efficient. Even when pre-processing methods are used to minimize the amount of data, it is possible to lose important information.
8 Conclusion and Future Directions This paper presents a comprehensive review of various approaches used for stock market prediction namely, machine learning, deep learning, fuzzy, and hybrid systems. The various stages of stock market price prediction were presented in a detailed manner. The challenges and issues of these different models were identified. In the future, we should increase the accuracy and performance further for the best result, so researchers must work with hybrids of deep learning models with increased hidden layers and memory cells, which increases the performance to the next level and get satisfying accuracy. This survey helps new researchers to have a better view of this field. To further enhance automated predicting stock market behaviour, it may be possible to investigate the hybrid usage of time series, text mining, and sentiment analysis.
References 1. Sirignano, J., Cont. R.: Universal Features of Price Formation in Financial Markets: Perspectives From Deep Learning (2018) 2. Rohit. V., Choure, P., Singh. P.: Neural networks through stock market data prediction, In: International conference of Electronics, Communication, and Aerospace Technology (ICECA), vol. 2, pp. 514–519. IEEE (2017, April) 3. Chang. S.V., et al.: A review of stock market prediction with artificial neural network (ANN). In: IEEE International Conference on Control System, Computing and Engineering, pp. 477– 482. IEEE (2013) 4. Gandhmal, D.P., Kumar K.: Systematic analysis, and review of stock market prediction techniques. Comput. Sci. Rev. 34, 100190 (2019) 5. Harahap, L.A., Lipikom, R., Kitamoto, A.: Nikkei Stock market price index prediction using machine learning. J. Phys. Conf. Ser. 1566(1), 012043. IOP Publishing (2020, June) 6. Pang, X., Zhou. Y., Wang, P., Lin, W., Chang, V.: An innovative neural network approach for stock market prediction. J. Supercomput. 76(3), 2098–2118 (2020) 7. Hiransha, M., Gopalakrishnan, E.A., Menon, V.K., Soman, K.P.: NSE stock market prediction using deep-learning models. Procedia Comput. Sci. 132, 1351–1362 (2018) 8. Lanbouri, Z., Achchab, S.: Stock market prediction on high frequency data using long-short term memory. Procedia Comput. Sci. 175, 603–608 (2020) 9. Kumar Chandar, S., Sumathi, M., Sivanandam, S.N.: Prediction of stock market price using hybrid of wavelet transform and artificial neural network. Ind. J. Sci. Technol. 9(8), 1–5 (2016) 10. Hsieh, T.-J., Hsiao, H.-F., Yeh, W.-C.: Forecasting stock markets using wavelet transforms and recurrent neural networks: an integrated system based on artificial bee colony algorithm. Appl. Soft Comput. 11 (2011)
Stock Market Price Trend Prediction
491
11. Waqar, M., et al.: Prediction of stock market by principal component analysis. In: 13th International Conference on Computational Intelligence and Security (CIS). pp. 599–602. IEEE (2017) 12. De Bondt, W.F., Thaler, R.: Does the stock market overreact? J. Fin. 40(3), 793–805 (1985) 13. Xu, Y., Li, Z., Luo, L.: A study on feature selection for trend prediction of stock trading price. In: International Conference on Computational and Information Sciences, pp. 579–582. IEEE (2013) 14. Singh, S., Parmar, K.S., Kumar, J.: Soft computing model coupled with statistical models to estimate future of the stock market. Neural Comput. Appl. 33, 7629–7647 (2021) 15. Senapati, M.R., Das, S., Mishra, S.: A novel model for stock price prediction using hybrid neural network. J. Inst. Eng. (India) 99, 555–563 (2018) 16. Qiu, M., Song, Y.: Predicting the direction of stock market index movement using an optimized artificial neural network model. PLoS ONE 11(5), e0155133 (2016) 17. Weng, B.: Application of machine learning techniques for stock market prediction (2017) 18. Fischer, T., Krauss, C.: Deep learning with long short-term memory networks for financial market predictions. Eur. J. Oper. Res. 270(2), 654–669 (2018) 19. Shen, J., Omiar Shafix, M.: Short-term stock market price trend prediction using a comprehensive deep learning system. J. Big Data 66 (2020) 20. Bhama, P.K., Prabhune, S.S.: Comparative analysis of recent neuro fuzzy systems for stock market prediction. In: 3rd International Conference on Computing Methodologies and Communication (ICCMC), pp. 512–519. IEEE. (2019) 21. Kumar, P.H., Prasanth. K.B., Nirmala, T., Basavaraj, S.: Neuro fuzzy based techniques for predicting stock trends. Int. J. Comput. Sci. Issues (IJCSI) (2012) 22. Rajput, V., Bobde, S.S.: Stock market prediction using a hybrid approach. In: 2016 International Conference on Computing, Communication and Automation (ICCCA), pp. 82–86. IEEE (2016, April)
Visceral Leishmaniasis Detection Using Deep Learning Techniques and Multiple Color Space Bands Armando Luz Borges1 , Cl´esio de Ara´ ujo Gon¸calves2,3 , 1 Viviane Barbosa Leal Dias , Emille Andrade Sousa5 , Carlos Henrique Nery Costa4,5 , and Romuere Rodrigues Veloso e Silva1,2,5(B) 1
5
Information Systems - CSHNB/UFPI, Picos Piau´ı, Brazil [email protected] 2 Electrical Engineering - PPGEE/UFPI, Picos, Piau´ı, Brazil 3 Informatics Department - IFSert˜ ao-PE, Ouricuri, Pernambuco, Brazil 4 Department of Community Medicine - UFPI, Teresina, Piau´ı, Brazil Center for Intelligence on Emerging and Neglected Tropical Diseases (CIENTD), Teresina, Piau´ı, Brazil
Abstract. Leishmaniasis is a group of neglected popular parasitic diseases typical of tropical and subtropical countries, and its most serious form is Visceral Leishmaniasis (VL). Every year, it’s 700,000 to 1 million cases are recorded worldwide, leading to the death of 26,000 to 65,000 people. The diagnostic, performed through parasitological examination, is very tiring and error-prone; simultaneously, it is a step with a great capacity for automation. Therefore, this work aims to develop an automatic system based on computer vision capable of diagnosing patients infected with VL through medical images. We compared the results obtained in this study with related works, where it was possible to observe that the methodology implemented here proved superior and more efficient, reaching an Accuracy of 99%. In this way, we demonstrated that the deep learning models, trained with images of the patient’s bone marrow’s biological material, can help specialists accurately and safely diagnose patients with VL.
Keywords: Visceral Leishmaniasis
1
· Space color · Deep Learning
Introduction
Leishmaniasis is a group of commonly neglected infectious diseases, typical of tropical and subtropical countries, caused by a protozoan of the genus Leishmania. According to Alvar et al. [1], the disease can be transmitted to humans through the bite of female sandflies on the skin of the vertebrate host. Leishmaniasis is responsible for 700,000 to 1 million cases/year and causes 26.000 to 65.000 Supported by Funda¸ca ˜o de Amparo a Pesquisa do Piau´ı. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 492–502, 2023. https://doi.org/10.1007/978-3-031-35501-1_49
Visceral Leishmaniasis Detection Using Deep Learning Techniques
493
deaths annually [2]. According to Larry Jameson et al. [3], syndromes generated by Leishmaniasis belong to three main forms of the disease: Visceral Leishmaniasis (VL), Cutaneous Leishmaniasis, and Mucosal Leishmaniasis. According to the World Health Organization (WHO), more than 90% of global VL cases belong to six countries: India, Bangladesh, Sudan, South Sudan, Ethiopia and Brazil. [1]. The VL is the most severe form of the disease [4]. Its evolution is chronic and attacks the internal organs, which can compromise the immune system. Its main symptoms are long-lasting fever, weakness, weight loss, anemia, hepatosplenomegaly (enlargement of the liver and spleen), and a sharp drop in the patient’s blood platelet count [5]. Its detection can be done through microscopy of material aspirated from the bone marrow. This process is called parasitological examination and is currently the safest method of identification of the disease, aiming to search for amastigote forms of the parasite [6]. Manually evaluating exams can be tiring and laborious for healthcare professionals [7]. This happens due to the complexity and the large volume of generated images that make this work repetitive and difficult to analyze. However, due to the high degree of repeatability, this diagnostic step has a great capacity for automation. Thus, in recent years, computer systems have emerged capable of using images from exams to aid in diagnosing diseases in various areas of medicine. Such systems use Computer Vision (CV) concepts and techniques to extract important information from the images to find patterns that help diagnose a given disease [8]. Therefore, given the context of the problem, the objective of this work is to develop an automatic system, based on the concepts of VC and Deep Learning, capable of analyzing images captured from the parasitological examination of patients to extract information from the images to assist in the diagnosis of LV.
2
Related Works
This section presents and discusses works aimed at automatically detecting VL through images. We present the main objectives, the database used, the techniques applied, the results obtained, and the limitations of each study. Nogueira and T´eofilo [9] present an automated method of identifying cells and parasites in microscopic images of patients diagnosed with Leishmania. The method consists of dividing the image’s color channels, then segmenting each channel using the Otsu threshold to extract attributes for training a Support Vector Machine (SVM). The authors use a base of 794 images, obtaining an Accuracy of 94.96% in identifying parasites. Although the results are significant, the work uses only classical computer vision methods. Neves et al. [10] present a method for annotating and enumerating parasites and macrophages in fluorescent images infected by Leishmania. The database comprises 44 images, and the identification of amastigotes resulted in 81.55% of Accuracy, Recall of 87.62%, and F1-Score of 84.48%. However, segmentation faults can occur when parasites are too close together.
494
A. L. Borges et al.
Gorriz et al. [11] use the U-net architecture to identify Leishmania parasites in microscopic images to classify these parasites as amastigotes, promastigotes, or adherent parasites. The database used has 45 images, divided into 37 for training and 8 for testing. The F1-Score obtained was 77.7%. The work presents a significant result, although the database has few samples. The work by Isaza-Jaimes et al. [12] presented a computational method for the automatic detection of Leishmania. The database used was an open dataset with 45 images. The approach used was able to recognize approximately 80% of the parasites. It is worth mentioning that the authors did not investigate DL techniques. Coelho et al. [13] propose a microscopic image segmentation method that automatically calculates the infection rate caused by amastigotes. The segmentation used was based on the use of mathematical morphology together with a VC system. The Accuracy obtained in this work was 95%. State-of-the-art research has some gaps. Few studies use pre-processing techniques, and only Gorriz et al. [11] work uses data augmentation in conjunction with DL techniques. It is important to point out that none work uses transfer learning. With this work, we propose to fill these gaps. The contribution of this research is to use DL techniques, such as CNNs, to classify images of microscopic slides to perform the automatic detection of VL in humans.
3
Methodology
In this section, we present the proposed methodology and the concepts and techniques used to carry it out. Our work uses two techniques for image classification: pre-trained CNNs with the input size being the default for each network and the same CNNs with the input resized to the size of 75×75. Also, we evaluate using different color bands as input in the CNNs. In this sense, each technique’s approach consists of four steps: i) image acquisition; ii) application of pre-processing techniques; iii) application of data augmentation; iv) feature extraction and image classification through CNNs; and v) definition of evaluation criteria. Figure 1 illustrates these steps. 3.1
Image Acquisition
We acquired a set of microscopic images in collaboration with the Center for Intelligence in Emerging and Neglected Tropical Diseases (CIENTD)1 and the Institute of Tropical Diseases “Natan Portella” (Teresina, Brazil). The dataset is composed of 78 positive images and 72 negative images. Experts from both institutions tagged the entire dataset. In addition, they also provided us with binary masks that contain regions with the presence of amastigotes. The acquisition process can be seen in the first step of Fig. 1. During the acquisition, the specialists captured images of the slides using a digital camera attached to the ocular structure of the Olympus microscope, applying a magnification of 100x. Each image represents a slide with dimensions varying between 768 × 949 and 3000 × 4000 pixels. 1
http://www.ciaten.org.br.
Visceral Leishmaniasis Detection Using Deep Learning Techniques
495
Fig. 1. Diagram representing the steps of the proposed methodology.
3.2
Pre-processing
The pre-processing techniques were applied to improve the detection and classification of VL. In this sense, it was possible to use the positive masks acquired in the image acquisition step to calculate the centroid of each region of interest and use its coordinate to obtain windows of size 75×75 around the amastigotes. This process resulted in 559 amastigotes images. We generate the same number of patches from the negative images to obtain a balanced training dataset. In this way, our dataset was composed of 1118 images. The original images are in RGB format. In this color space, the structures in the image’s background were very prominent, often larger than the amastigote itself. As this could directly interfere with the learning of the networks, we converted the images to a color space that highlighted the amastigotes. In this sense, we converted the entire set of generated patches to the LAB color format. The pre-processing step in Fig. 1 illustrates this process. 3.3
Data Augmentation
Our methodology also applied data augmentation techniques to improve our architecture training. In this context, the techniques used in this work were the horizontal and vertical random flip operations, random zoom on a scale of 0% to 10%, and random contrast adjusted by a factor value of 0.2. We use these
496
A. L. Borges et al.
techniques to avoid de-characterizing the images. The use of random flips is due to the fusiform format of the amastigotes and the way they are positioned in the image. With this, the flip will change the axes of this format, generating a new image that will guide the network to analyze different angles. We used zoom due to the irregular nature of the dataset, where the images have significantly varied dimensions. Therefore zooming can cause a magnifying effect on objects of interest. The use of random contrast will generate images with slight variations in contrast. This technique will allow the network to adapt to possible images with color variations. 3.4
Extraction and Classification of Image Attributes
In this work, we used CNNs to extract attributes from the images and classify them in terms of the presence of VL. We chose these networks after a review of the literature indicated that they are the next-generation computer vision technique for medical image classification. For this work, we used four pre-trained models of CNNs: Densenet201 (DN201) [14]; VGG16 (VGG16) [15]; ResNet152V2 (RN152V2) [16], InceptionResnetV2 (IRNV2) [17]. They were selected because they are among the most effective and accurate in classifying images using ImageNet. [18]. Transfer learning was also implemented in the convolutional layer of CNNs. This technique allows adding attributes obtained in solving another similar problem to this layer [9]. In this sense, the feature detectors used in a similar problem are implemented in the network, thus making the training process more efficient. Typically, this technique is used when the number of data is small to train the network from scratch. In this context, only the fully connected layer is trained from scratch to correctly classify the attributes extracted in the previous layer [19]. To perform the training of networks, the convolutional layers are frozen to avoid their feature extractors being changed during training with the dataset that represents the problem in question. Thus, keep the resources obtained in the learning transfer phase. When training the networks, we freeze the convolutional layers to prevent their feature extractors from being changed during training with the dataset that represents the problem in question, keeping the features obtained in the learning transfer step. The last step of classification is fine-tuning. This technique consists of unfreezing the convolutional layers, thus training the entire model with the dataset mentioned above. Its use improves network performance as feature extractors adapt to dataset standards, thus increasing generalization power. The parameters used to train the network directly impact its performance. Thus, for this work, 40 training epochs were used only for the classification layer, using a learning rate of 0.000001 for this purpose. Using a low value in the learning rate is justified because it allows greater precision in the weights’ adjustment, facilitating the gradient’s descent to a global minimum. In the finetuning stage, the parameters used were an even lower learning rate of 0.0000001
Visceral Leishmaniasis Detection Using Deep Learning Techniques
497
for ten epochs, defined by the ratio between the above learning rate and the ten-value. 3.5
Performance Evaluation
The results were validated using five metrics: the Loss function, Accuracy, Precision, Recall, F1-Score, and the Kappa (κ) index [20]. These metrics are based on the confusion matrix, which is generated through the number of true positives, which represent the correct predictions of the positive class; false positives, which represent the incorrect predictions of the positive class; true negatives, which represent the correct predictions of the class negative; and false negatives, which represent the incorrect predictions of the negative class. The positive class represents the set of images with the presence of VL, and the negative class is the images with the absence of the disease.
4
Results and Discussions
In this work, we use two input values in CNNs. In the first approach, we use the size of the cutouts as dimensions for the inputs. In the second approach, we use the default input size of each CNN, ranging from 224×224 to 299×299 between architectures. Such results are shown in Table 1. Table 1. Results using different models of CNNs and different network input sizes.
Model
Loss
Accuracy Precision Recall F1-Score κ index Input Size: 75×75
DN201 VGG16 RN152V2 IRNV2
0.068 0.085 0.071 0.257
0.987 0.976 0.976 0.936
0.995 0.989 0.984 0.966
0.979 0.963 0.968 0.904
0.987 0.976 0.976 0.934
0.973 0.952 0.952 0.872
Input Size: default by each architecture DN201 VGG16 RN152V2 IRNV2
0.054 0.056 0.105 0.120
0.992 0.987 0.979 0.976
1.000 0.989 0.984 0.979
0.984 0.984 0.973 0.973
0.992 0.987 0.979 0.976
0.984 0.973 0.957 0.952
As seen in Table 1, the results of the second approach were superior to those of the first. Therefore, as the only difference between the two approaches was the input dimensionality of the networks, it is concluded that the use of resizing as a way to increase the size of resources to improve their identification proved to be
498
A. L. Borges et al.
able to boost the network to obtain better results. We also performed tests with the second approach using color spaces to compare the results. It is possible to see through Table 2 that the LAB color space brought the best results for the approach compared with LUV, RGB, and HSV color spaces. Table 2. Results using different models of CNNs with the standard input size and different color spaces.
Model
Loss
Accuracy Precision Recall F1-Score κ index LAB
DN201 VGG16 RN152V2 IRNV2
0.054 0.056 0.105 0.120
0.992 0.987 0.979 0.976
1.000 0.989 0.984 0.979
0.984 0.984 0.973 0.973
0.992 0.987 0.979 0.976
0.984 0.973 0.957 0.952
0.973 0.973 0.984 0.973
0.984 0.981 0.987 0.979
0.968 0.963 0.973 0.957
0.979 0.989 0.984 0.968
0.987 0.987 0.987 0.976
0.973 0.973 0.973 0.952
0.931 0.926 0.931 0.894
0.951 0.956 0.951 0.926
0.904 0.915 0.904 0.856
LUV DN201 VGG16 RN152V2 IRNV2
0.050 0.074 0.070 0.084
0.984 0.981 0.987 0.979
0.995 0.989 0.989 0.984 RGB
DN201 VGG16 RN152V2 IRNV2
0.055 0.076 0.070 0.116
0.987 0.987 0.987 0.976
0.995 0.984 0.989 0.984 HSV
DN201 VGG16 RN152V2 IRNV2
0.157 0.192 0.168 0.198
0.952 0.957 0.952 0.928
0.972 0.989 0.972 0.960
Regarding the CNN architectures, the DenseNet201 model proved superior in both approaches. In the best approach, this model reached an Accuracy of 0.992, Precision of 1.000, Recall of 0.984, F1-Score of 0.992, and κ index of 0.984. This model was also the one that reached one of the lowest Loss values. It is due to the architectural nature of this type of model. As for each layer, the attribute maps from all previous layers are used as inputs, and their maps are used as inputs in all subsequent layers, alleviating the problem of the escape of the gradient [14].
Visceral Leishmaniasis Detection Using Deep Learning Techniques
499
The confusion matrix shows the relationship between predicted and actual data, as seen in Fig. 2.
Fig. 2. Confusion matrix generated with the results of the DN201 model.
One of the ways to verify that the network is learning the patterns of the attributes of each image is by calculating the Grad-CAM. Thus, applying this technique generates a heat map, indicating the areas that are being considered to classify the images [21]. In this sense, the network is learning correctly, and its results can be considered reliable if the activation zones correspond to the object of interest in the images. Therefore, to validate the results in this work, the Grad-CAM of the last convolutional layer of DenseNet201 was generated, which can be observed in Fig. 3.
Fig. 3. Grad-CAM images of the last convolutional layer of the DN201 network. Images (a) and (b) show the regions considered by the network to perform the classification. Both images were correctly predicted.
As presented in Fig. 3, it is possible to observe two areas, a light one in shades of yellow and a darker one in shades of purple. The lighter areas represent the parts disregarded by the network for image classification. The dark part represents the activation regions of the network. That is, the darker it is, the
500
A. L. Borges et al.
more that region is taken into account by the network. In this sense, in Fig. 3(a), it is possible to notice that the activation regions correspond to the regions that support the presence of amastigotes. With this, it is possible to affirm that the network correctly generalized the problem, thus making the results in Table 1 reliable. Table 3. Comparison table with related works. Both images were correctly predicted. Methods
Number of Images Accuracy Precision Recall F1-Score κ index
Nogeira & Te´ ofilo [22] 794 Neves et al. [10] 44 Gorriz et al. [11] 45 Isaza et al. [12] 45 Coelho et al. [13] -
0.949 0.950
0.815 0.757 0.787 -
0.876 0.823 -
0.845 0.777 -
-
Proposed
0.992
1.000
0.984
0.992
0.984
150
Table 3 compares the proposed method and related works. Given what has been discussed, it is possible to observe that this work outperformed the related ones in all scenarios. In addition, it is possible to see that the work of the authors Nogueira & Te´ ofilo [22], despite using more images, returned the results below the proposed method. In this sense, it is possible to conclude that CNNs take greater advantage of the available data compared to the technique used by the authors.
5
Conclusion and Future Works
Using CNNs to classify medical images related to the parasitological examination of VL is an efficient technique to aid in diagnosing this disease. In addition, the use of patches also contributes to a greater number of images, as it is possible to generate multiple patches from a single image. This approach, combined with the use of CNNs and computer vision techniques, yielded superior results to related works, showing that our proposal is more effective in all scenarios. In future works, we intend to implement the proposed methodology in a complete system that scans all slide fields, automatically diagnosing VL in humans. In addition, we intend to segment the region of the parasites and evaluate the proposed method in a larger database, resulting in an even more realistic scenario.
Visceral Leishmaniasis Detection Using Deep Learning Techniques
501
References 1. Alvar, J., et al.: Leishmaniasis worldwide and global estimates of its incidence. PloS One 7 (2012) 2. Harigua-Souiai, E., Oualha, R., Souiai, O., Abdeljaoued-Tej, I., Guizani, I.: Applied machine learning toward drug discovery enhancement: Leishmaniases as a case study. Bioinform. Biol. Insights 16, 11779322221090348 (2022) 3. Jameson, J.L., Fauci, A.S., Kasper, D.L., Hauser, S.L., Longo, D.L., Loscalzo, J.: Harrison’s manual of medicine, vol. 20. McGraw-Hill Education, New York (2020) 4. Farahi, M., Rabbani, H., Talebi, A., Sarrafzadeh, O., Ensafi, S.: Automatic segmentation of leishmania parasite in microscopic images using a modified cv level set method. 12 2015 5. Kumar, R., Nyl´en, S.: Immunobiology of visceral leishmaniasis. Front. Immunol. 3, 251 (2012) 6. Silva, J., et al.: Bone marrow parasite burden among patients with new world kala-azar is associated with disease severity. Am. J. Trop. Med. Hyg. 90, 03 (2014) 7. Rodrigues Veloso e Silva, R., Henrique Duarte de Araujo, F., Moreno Rodrigues dos Santos, L., Melo Souza Veras, R., NelsizeumaSombra de Medeiros, F.: Optic disc detection in retinal images using algorithms committee with weighted voting. IEEE Latin America Trans. 14(5), 2446–2454 (2016) 8. e Silva, R.R.V., de Araujo, F.H.D., dos Santos, L.M.R., Veras, R.M.S., de Medeiros, F.N.S.: IEEE Latin America Trans. 14(5), 2446–2454 (2016) 9. Cook, A.: Using transfer learning to classify images with keras (2017). https:// alexisbcook.github.io/2017/using-transfer-learning-to-classify-images-with-keras/ 10. Neves, J.C., Castro, H., Tom´ as, A., Coimbra, M., Proen¸ca, H.: Detection and separation of overlapping cells based on contour concavity for leishmania images. Cytometry Part A 85(6), 491–500 (2014) 11. G´ orriz, M., Aparicio, A., Ravent´ os, B., Vilaplana, V., Sayrol, E., L´ opez-Codina, D.: Leishmaniasis parasite segmentation and classification using deep learning. In: Perales, F.J., Kittler, J. (eds.) AMDO 2018. LNCS, vol. 10945, pp. 53–62. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-94544-6 6 12. Isaza-Jaimes, A., et al.: A computational approach for leishmania genus protozoa detection in bone marrow samples from patients with visceral leishmaniasis (2020) 13. Coelho, G., et al.: Microscopic image segmentation to quantification of leishmania infection in macrophages. Fronteiras: J. Soc. Technol. Environ. Sci. 9(1), 488–498 (2020) 14. Huang, G., Liu, Z., Weinberger, K.Q.: Densely connected convolutional networks. CoRR, abs/1608.06993 (2016) 15. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) 16. He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. CoRR, abs/1603.05027 (2016) 17. Szegedy, C., Ioffe, S., Vanhoucke, V.: Inception-v4, inception-resnet and the impact of residual connections on learning. CoRR, abs/1602.07261 (2016) 18. Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015) 19. Xiang, Y., Wang, J., Hong, Q.-Q., Teku, R., Wang, S.-H., Zhang, Y.-D.: Transfer learning for medical images analyses: a survey. Neurocomputing 489, 230–254 (2022)
502
A. L. Borges et al.
20. Fleiss, J.L., Levin, B., Paik, M.C.: Statistical methods for rates and proportions. John Wiley & Sons (2013) 21. Grad-cam: Por que vocˆe disse isso? explica¸co ˜es visuais da deep networks via localiza¸ca ˜o baseada em gradiente. abs/1610,02391 22. Nogueira, P.A., Te´ ofilo, L.F.: A probabilistic approach to organic component detection in leishmania infected microscopy images. In: Iliadis, L., Maglogiannis, I., Papadopoulos, H. (eds.) AIAI 2012. IAICT, vol. 381, pp. 1–10. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33409-2 1
On the Use of Reinforcement Learning for Real-Time System Design and Refactoring Bakhta Haouari1,2,3(B) , Rania Mzid3,4 , and Olfa Mosbahi1 1
3
LISI Lab INSAT, University of Carthage, Centre Urbain Nord BP 676, Tunis, Tunisia [email protected], [email protected] 2 Tunisia Polytechnic School, University of Carthage, Tunis, Tunisia ISI, University Tunis-El Manar, 2 Rue Abourraihan Al Bayrouni, Ariana, Tunisia [email protected] 4 CES Lab ENIS, University of Sfax, B.P:w.3, Sfax, Tunisia
Abstract. One crucial issue during the design of real-time embedded systems is the deployment of tasks on distributed processors. Indeed, due to the constrained resources of these systems and their real-time aspect, the allocation model must be valid and optimal. This design phase is generally time-consuming and difficult, especially when the design decisions need to be frequently updated (i.e., refactoring). To address this problem, we provide in this paper an optimization model based on the reinforcement learning (RL) approach. The proposed model produces the optimal deployment model, ensuring that timing properties are respected while minimizing the number of active processors. After a refactoring request, the generation of the new solution in the proposed RL model is greatly based on the initial one. Indeed, just a minor part of the solution has to be updated, which is beneficial since it results in a shorter generation time. The efficiency of the proposed model is demonstrated through a case study and performance evaluation. Keywords: Real-Time learning · Q-learning
1
· Design · Refactoring · Reinforcement
Introduction
Today, embedded distributed systems are widely used in an increasing number of applications. These systems are frequently real-time as they must perform certain tasks in a specific amount of time (i.e., deadline); failure to meet the timing requirements can be critical for the safety of human beings [5]. Designing a Real-Time Embedded Distributed System (RTEDS) requires appropriate methods to ensure that task deadlines are met. If such guarantees cannot be supplied, the system is said to be unfeasible, and its implementation will most likely miss the deadline. In that context, adopting a correct-by-construction c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 503–512, 2023. https://doi.org/10.1007/978-3-031-35501-1_50
504
B. Haouari et al.
design (i.e., a design that ensures the respect of timing properties) [9] may greatly increase development productivity while significantly lowering development costs. One challenge during the development of RTEDS is defining the appropriate deployment model. Indeed, the designer must determine the optimal mapping that will result in a valid model. This mapping consists of allocating the application tasks to distributed processors while ensuring the respect of timing constraints and reducing the number of active processors. Finding the optimal and feasible solution becomes increasingly difficult as the number of tasks and processors grows. To address this problem, the authors in [8] propose two models for the placement problem using the Mixed Integer Linear Programming (MILP) technique and genetic algorithms (GA) [4]. In [10], the authors use the MILP for the optimization of the task and message allocation in a distributed system with the aim of meeting endto-end deadlines and minimizing latencies. In [3], the authors deal with the same problem and propose a method based on GA to minimize the communication cost. Despite their importance, these works do not address the refactoring issue, as the MILP and GA formulations must be re-executed from scratch after application properties changes. RL has been successfully applied to sequential decision-making problems such as resource allocation problem [7] or placement problem [2]. In this paper, we propose a decision maker for RTEDS that uses RL to solve the task placement problem. The need to update the application properties (i.e., tasks and/or processors properties such as the periods of the tasks, their deadlines, etc.) after the first generation of the optimal placement (initial placement) is a common situation at the design level. In this case, the designer must re-execute the decision maker in order to find a new optimal deployment (refactoring placement). The originality of the proposed model is the reuse of the initial created solution to generate the new deployment model after the application properties have been updated. This is beneficial since it reduces the time required to generate the new model after a refactoring request, especially when application attributes are updated many times during the design process, resulting in a shorter generation time. The rest of the paper is as follows. Section 2 formalizes the models involved in this work and introduces some real-time verification preliminaries. We present the proposed RL model in Sect. 3. A case study is treated in Sect. 4. Section 5 presents experimental results. In Sect. 6, we summarize our work and discuss future directions.
2
System Formalization
The task placement problem, involves three different types of models: the task model, the platform model, and the deployment model. It is assumed in this work that the task model, that we denote by τ , is composed of n synchronous, periodic, and independent tasks (i.e., τ = {T1 , T2 . . . Tn }). Each task Ti is characterized by static parameters Ti = (Ci , Pi , Di ) where Ci is an estimation of its worst case execution time, Pi is the activation period of the task Ti , and Di is the deadline that represents the time limit in which the task Ti must
RL Model for Real-Time System Design and Refactoring
505
complete its execution. We assume in this work that Di = Pi . The platform model, that we denote by P, represents the execution platform of the system. We assume that this model is composed of m homogeneous processors (i.e., P = {P1 , P2 , . . . , Pm }). Each processor has its own memory and runs a RealTime Operating System (RTOS). The task placement step produces a deployment model that we denote by D in this work. The deployment model consists of a set of tuples D = {(P1 , ξ1 ), (P2 , ξ2 ), . . . (Pk , ξk )} where k represents the number of active (or used) processors such as k ≤ m and ξj represents the subset of tasks allocated to the processor Pj after the placement step. For real-time embedded systems, the deployment model must be feasible. Feasibility means that the placement of the real-time tasks on the different processors must guarantee the respect of the timing requirements of the system. In that context, Liu and Layland [6] developed a necessary and sufficient schedulability test when the task model satisfies the RM optimality conditions. The feasibility test determines whether a given task set will always meet all deadlines under all release conditions. This test is based on the computation of the processor utilization factor Up and is defined as follows: Up =
n Ci i=1
3
Pi
≤ 0.69
(1)
Proposed Model
In this paper, we provide a model based on Q-learning [1] for finding the optimal solution to a placement problem with m processors and n tasks in a distributed real-time system. As is the case here, Q-learning is ideally suited to challenges in which only a portion of the environment is observed. To adapt Q-learning with the aim of resolving task placement problem for real-time distributed systems, key RL elements are redefined as the following: – State: At time step t, a state St is represented by the collection of tasks already placed on the processors with respect to their deadlines (i.e., Dt a deployment model at time step t), as well as the list Lt of tasks that have not yet been placed. Fig. 1 sketches an example of the progression of the placement problem states throughout time. These depicted states are sequential and closely related. For example, at time step St , the best task from a list of unplaced tasks is selected, and its placement on a processor results in the transition from state St to state St+1 . The same process is applied to St+1 and results in St+2 and so on until a terminal state is reached (all tasks are mapped) – Action space: Represents the set of possible actions that the agent can take when it encounters a state St following a policy π. In our case, the action is to find the best processor for a given task from the list of unplaced tasks, following an -greedy policy [1] – Reward : It is a crucial key in RL. The agent is reward-motivated throughout the process, and he is learning how to maximize it through trial and error experiences in the environment. Hence, the following points must be considered:
506
B. Haouari et al.
Fig. 1. The evolution of placement problem states in time
• The agent should receive a high positive reward for optimal placement because this behavior is highly desired • The agent should be penalized for prohibited placement (if it tries to place a task on a full processor) • The agent should get a slight negative reward for not making an optimal placement. Slight negative because we would prefer that our agent make a sub-optimal placement rather than make a prohibited placement Because our goal is to reduce the number of active processors, a new variable Ej to indicate the processor state is required. In fact, when the agent selects a new processor Pj which is in sleep mode, Ej equals -1; otherwise, Ej equals 0. The agent is penalized by the negative value of Ej if it tries to wake up a new processor. The goal is to prioritize task placement on currently active processors, with sleep processors being used only when all active processors are completely utilized. Another case to note is when the agent tries a prohibited placement; in this case, Ej is subjected to a large negative penalty, and so Ej is equal to −m (the maximum number of processors). The total reward R obtained at the end of a task placement is given by R = Ej − Ujt + Ui
(2)
where • Ujt is the available utilization of processor Pj at time step t i • Ui is the required utilization for a task Ti (i.e., Ui = C Pi ) the processor state such as • Ej reflects ⎧ ⎨ −1 if Pj isinasleepmode Ej = −m if Ujt − Ui ≤ 0 ⎩ 0 otherwise – Epoch: The decision epoch matches the placement of all the tasks; it operates on sequential steps. At each step, the agent observes the environment state and searches for the optimal pair (i.e., (processor*,task*)). – Q-table: The Q-table represents a matrix, where rows represent all possible states and columns refer to all actions. Firstly, it is initialized to 0, and then
RL Model for Real-Time System Design and Refactoring
507
Algorithm 1: Task Placement with RL Input: α = 0.5; γ = 0.9; = 0.9; decease factor= 0.001 Lt : List of unplaced tasks; P: List of processors Output: D∗ : Optimal deployment model Notations: Q : The Q-table Dt : The deployment model at time step t (i.e., list of (processor, tasks) pairs) S: The state at the time step t (i.e., S ← (Lt , Dt )) Generation of Q ; /* only performed for the initial placement */ Initialization of Q(S, a) to 0 , for all S and a; while Q(S, a) still moving do reset S; while Lt is not empty do a ← Select Placement (, Lt , Dt ); Take Action a then Compute (R); ; /* compute R using expression 2 */ Observe Q(S, a); Q(S, a) = Q(S, a) + α[R + γmaxa Q(S, a) − Q(S, a)]; ; /* Bellman’s equation [1] */ Update Q(S, a); /* remove Ti from Lt */ Remove (Ti , Lt ); ; /* place Ti on Pj */ Dt+1 ← Update (Dt ); ; S ← S; end ← max( - decease factor, 0); end return D∗;
values (denoted as Q(S, a)) are updated at each step of an epoch. The generation of the Q-table that defines all the possible states for a given application is only performed for the initial deployment. Indeed, when refactoring of the deployment model is required due to an update of the application’s real-time attributes, the already generated Q-table is used. Algorithm 1 describes the proposed task placement solution based on the RL approach. The proposed algorithm has some initialization parameters, such as α, which represents the learning rate ( (to moderate the speed of learning and the update of Q-values we assume α = 0.5), and γ, which represents the discount factor that quantifies the importance given to future rewards (in our approach we consider that future task placement are important thus we attribute an enough great value to γ = 0.9). The list of unplaced tasks (Lt ) and the list of processors (P) are also given as inputs. Because the -greedy is used as a policy in this model, the value must also be defined. Indeed, we initially attribute a high value to =0.9 to favorite environmental exploration in first steps. The value is decreased over each epoch by decease factor. This degradation can be explained by the fact that agent learning grows from one epoch to the
508
B. Haouari et al.
Algorithm 2: Select Placement Input: : The epsilon value Dt : The deployment model of the current state Lt : List of unplaced tasks in the current state Output: a: placement to be done Notations: Q : The Q-table nb ← random(0, 1) ; /* generate a uniform number between 0 and 1 */ if nb ≤ then Ti ← random task from Lt ; Pj ← random processor from Dt ; a ← (Pj , Ti ); else Select a from Q where Q(S,a) = max Q(S,a); end return a;
next, and new states are visited as the process runs, so we prefer exploitation over exploration. The output of this algorithm is an optimal deployment model which corresponds to the best placement of the input tasks on the given processors. In order to hold the values of all potential state-action pairings, Algorithm 1 first constructs the Q-table Q (based on Lt and P). This generation occurs only on the first deployment of a certain application. The entire reward that the agent will receive by executing action a on state S is Q(S,a). The Q-table is then reset to zero (Q (S,a) =0 ∀ S and a in Q). The algorithm then iterates over tasks and processors, searching for the pair (processor*,task*) yielding the optimal reward. The agent investigates the environment’s states and uses a greedy technique to assign unplaced tasks to a particular processor (one per iteration) while updating the corresponding Q-value (i.e.,Q(S, a)). For action selection (i.e. Select Placement ), an -greedy policy is used. Algorithm 2 describes the action selection approach. The agent starts the exploration by randomly assigning unplaced task to a processor for each state, then traveling to the next state S as a result of its activity and setting S as its current state. The epoch terminates when the agent reaches a terminal state, and a new one begins. In the action selection process, determining the value is critical. In fact, an of zero means that the agent will never explore new states (i.e., choose action with the best Q-value), whereas an of one forces the agent to only explore (i.e., choose only random action). As a result, a well-studied value of is necessary to capture a trade-off between exploration and exploitation. At each epoch, the agent interacts with the environment (by performing random or Q-table-based actions) and updates its knowledge. The entirety of Q-table states is attained and Q-values are computed after a sufficient number of epoch iterations. At this point, the agent halts its exploration and uses its knowledge (Q-values) to choose the best action (with the best reward) at each state. This process ends when the final state is reached and the ideal solution (with the highest cumulative long-term benefit) is found. In Algorithm 1, we have considered a sequential approach in
RL Model for Real-Time System Design and Refactoring
509
Table 1. Task model tabular description of the considered case study Task
Description
Ci (ms) Pi (ms) Ui
SENSOR 1
This task reads the car’s speed
1
10
0.1
SENSOR 2
This task provides information about the temperature of the car
2
10
0.2
SENSOR 3
This task reads the GPS position of the car
3.2
40
0.08
12
0.16
6
0.16
DISPLAY 1 This task displays a summary of the 2 sensing data DISPLAY 2 This task displays the map of the current car location
1
which we try to assign unplaced task to a processor while taking into account the processors states and minimizing their number. However, depending on the task order, this sequential technique may result in sub-optimal solutions. This process will be repeated many times using the -greedy strategy given in Algorithm 2, where the optimal solution is the best of all the stored ones.
4
Case Study
In this section, we consider a case study dealing with a system embedded in a car that displays various data to the driver. The specified system’s functionalities are ensured by five tasks; three sensing tasks and two display ones. The platform model consists of three homogeneous processors that we denote by P1, P2 and P3. The application tasks are assumed to be independent and periodic, with an execution time of Ci in the worst-case scenario and a period of Pi equal to the deadline Di . Table 1 gives a tabular description of the task model describing the considered case study. The designer’s goal is to find the optimal initial placement for the embedded car application. To build the deployment model, the designer executes the proposed RL model implemented in Python3 with the NumPy library. The environment in this case study comprises the set of five tasks Lt (i.e., n = 5), which must be mapped to three initially unused processors P = {P1 , P2 , P3 } (m = 3). The use of Algorithm 1 leads to the development of all possible states, the creation of the Q-table, and the initialization of the Q-values to zero. Following that, the process of selecting actions and transitioning between states begins until a large number of Q-values have been modified(the agent reaches the majority of states). As a result, the Q-table is ready for use by the agent, which must select the state-action pair (S, a) with the highest Q(S, a) value at each transition until all tasks are assigned to the specified set of processors. Table 2 shows the deployment model resulted from applying the described process to our case study.
510
B. Haouari et al.
Table 2. Optimal deployment models for the embedded car application : Initial and refactoring placement Initial placement
Refactoring placement
Processor Assigned tasks Processor utilization
Processor Assigned tasks Processor utilization
P1
SENSOR 2, DISPLAY 1, DISPLAY 2, SENSOR 1
0,52
P1
DISPLAY 1, SENSOR 3, SENSOR 1
0,68
P2
SENSOR 3
0,08
P2
SENSOR 2
0,4
P3
-
0
P3
DISPLAY 2
0,32
The embedded car application designer notices that the frequency with which the system detects and provides information to the driver is excessively late, and hence decides to update the task model by reducing the periods to the half for all the tasks. The optimal deployment model must be regenerated in this scenario, and a refactoring placement is required. The new deployment model, as shown in Table 2, differs from the initial one in that it requires three processors instead of two to provide a feasible deployment model. Indeed, the agent begins by assigning the task DISPLAY 1 to processor P1 rather than SENSOR 2, then assigns the tasks SENSOR 3 and SENSOR 1. As we can see, the placements on processor P1 are completely different from the previous scenario (see Table 2). The agent strives to maximize its reward by efficient exploitation of the available processor utility, which explains this result. In fact, with the proposed RL model, the agent was able to reach the maximum capacity load of processor P1 . Following the same logic, the agent maps SENSOR 2 to processor P2 before being compelled to map DISPLAY 2 to processor P3 due to the capacity load of processor P2 .
5
Experimental Results
In order to show the effectiveness of the proposed RL model for the placement problem and to demonstrate the benefits collected from the use of Q-learning, particularly when design revisions are required (i.e., refactoring), we undertook a set of experiments. We generate two random systems with a random task set and processor set by increasing the number of tasks/processors from system 1 to system 2. We also define a new parameter called Tgeneration . This parameter refers to the time required to generate the optimal allocation model after a set of parameter designers’ adjustments. Tgeneration = Tinitial + rf ∗ Tref actoring , Where Tinitial denotes the time required to generate the initial allocation model for a given system, Tref actoring denotes the time required to generate the allocation model following a refactoring request (i.e., refactoring placement), and rf denotes the number of times the designer updates the application properties.
RL Model for Real-Time System Design and Refactoring
511
Fig. 2 shows an evaluation of the generation time for the two considered systems. Indeed, we randomly increase the number of refactoring requests and compute Tgeneration for each system. We also compare the obtained results with the MILP-based method proposed in [10]. It was clear that the proposed model led to shorter generation times, especially with the increase of application updates. This gain is more important when the number of tasks and processors increases (i.e., system 2).
Fig. 2. Evaluation of the generation time for randomly generated systems
6
Conclusion
We have proposed in this paper an RL-based approach for the placement of RTEDS. Two algorithms are thus provided for placement issues based on RL. In the absence of a complete environmental model, Q-learning is a good candidate to follow for task placement in RTEDS, particularly in cases where the designer does not have a clear idea about the design of the application. The proposed method can be extended to deal with applications with a huge number of parameters through the adoption of Deep RL [1]. So that we expect to evaluate the scalability of the proposed method with an industrial example. We are also interested in introducing an RL-based optimal scheduler that takes reconfiguration constraints into account for RTEDS.
References 1. Barto, A.G.: Reinforcement learning: an introduction. In: Sutton, R.S. (ed.) SIAM Review, vol. 63, issue 2, p. 423 (2021) 2. Caviglione, L., Gaggero, M., Paolucci, M., Ronco, R.: Deep reinforcement learning for multi-objective placement of virtual machines in cloud datacenters. Soft. Comput. 25(19), 12569–12588 (2021)
512
B. Haouari et al.
3. Kashani, M.H., Zarrabi, H., Javadzadeh, G.: A new metaheuristic approach to task assignment problem in distributed systems. In: 2017 IEEE 4th International Conference on Knowledge-Based Engineering and Innovation (KBEI), pp. 0673– 0677. IEEE (2017) 4. Kumar, M., Husain, D., Upreti, N., Gupta, D., et al.: Genetic algorithm: review and application (2010) 5. Lakhdhar, W., Mzid, R., Khalgui, M., Treves, N.: MILP-based approach for optimal implementation of reconfigurable real-time systems. In: International Conference on Software Engineering and Applications, vol. 2, pp. 330–335. SCITEPRESS (2016) 6. Liu, C.L., Layland, J.W.: Scheduling algorithms for multiprogramming in a hardreal-time environment. J. ACM (JACM) 20(1), 46–61 (1973) 7. Liu, N., et al.: A hierarchical framework of cloud resource allocation and power management using deep reinforcement learning. In: 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pp. 372–382. IEEE (2017) 8. Mehiaoui, A., Wozniak, E., Babau, J.P., Tucci-Piergiovanni, S., Mraidha, C.: Optimizing the deployment of tree-shaped functional graphs of real-time system on distributed architectures. Autom. Softw. Eng. 26(1), 1–57 (2019) 9. Saxena, P., Menezes, N., Cocchini, P., Kirkpatrick, D.A.: The scaling challenge: can correct-by-construction design help? In: Proceedings of the 2003 International Symposium on Physical Design, pp. 51–58 (2003) 10. Zhu, Q., Zeng, H., Zheng, W., Natale, M.D., Sangiovanni-Vincentelli, A.: Optimization of task allocation and priority assignment in hard real-time distributed systems. ACM Trans. Embedded Comput. Syst. (TECS) 11(4), 1–30 (2013)
Fully Automatic LPR Method Using Haar Cascade for Real Mercosur License Plates Cyro M. G. Sab´ oia(B) , Adriell G. Marques, Lu´ıs Fabr´ıcio de Freitas Souza, Solon Alves Peixoto, Matheus A. dos Santos, Antˆ onio Carlos da Silva Barros, Paulo A. L. Rego, and Pedro Pedrosa Rebou¸cas Filho Laborat´ orio de Processamento de Imagens, Sinais E Computa¸ca ˜o Aplicada, Instituto Federal Do Cear´ a, Universidade Federal Do Cear´ a - Brazil, Fortaleza, Brazil [email protected], [email protected] http://lapisco.ifce.edu.br Abstract. The growing increase in traffic and road monitoring technologies brings new challenges and possibilities for using character detection and recognition technologies to improve traffic management and road safety. This work proposes a study between a model trained with synthetic license plate images and a model trained with real license plate images. A new LPR-UFC database is also presented to be made available via a request containing 2.686 images of vehicles with Mercosur plates [1]. Together with the perspective adjustment, the model result obtained a gain of 97.00% of accuracy for detection and 88.48% accuracy for recognition. Keywords: License Plate Detection · License Plate Recognition Perspective Adjustment · Haar Cascade · Tesseract
1
·
Introduction
With the advancement of traffic technologies, intelligent systems of detection and recognition of traffic images become more and more critical to help the management of highways, improving the collection of information and mapping of traffic conditions [2]. Technologies for detection and recognition of license plates, whether of roads or vehicles, are still objects of relevant studies in computer vision, whether for road safety applications, parking, tolls, stolen vehicle tracking, border control, speed, etc. [3–6]. The License Plate Recognition (LPR) techniques maturation in computer vision allowed advances in discussions about license plate detection and the recognition of digits in images provided by video cameras [7], whether embedded in official vehicles or at specific points on the highways, allowing the development of more specific applications in real-time [8]. One of the known computer vision techniques for object detection is the Haar Cascade [9], which consists of a list of steps that label the specific region of the image as positive or negative. Positive images have the object you want to find, and negative images don’t have the object you want to find [10]. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 513–522, 2023. https://doi.org/10.1007/978-3-031-35501-1_51
514
C. M. G. Sab´ oia et al.
Fig. 1. It presents the steps of the model of this study. Stage 1 - Dataset Input Imgeries, Stage 2 - Trained Haar Cascade Model, Stage 3 - License Plate Segmentation, Stage 4 - License Plate Perspective Adjustment, Stage 5 - License Plate pre-processing, Stage 6 - License Plate Recognition, Stage 7 - Recognition Text Result, Stage 8 - Heuristic Algorithm, Stage 9 - License Plate Text Result After Heuristic Algorithm.
For character recognition, the computer vision technique Optical Character Recognition (OCR) is applied to the images of the detected license plates. OCR retrieves the image objects in the form of attributes and expresses it in textual information [11]. With the increasing demand for LPR applications and their challenges, this work proposes an approach of automatic detection of plates in the Haar cascade, followed by character recognition using Tesseract OCR [12–14]. A new LPR-UFC database containing 2.686 images will also be available to the community upon request. This work seeks to broaden the discussions on Haar Cascade as a computer vision tool for plate detection, Tesseract OCR for character recognition, and a new database containing only Brazilian plates from the new Mercosur model. As main contributions of this study, the different relevant contexts are addressed: – Mercosur license plate model detection in real imagerys. – Character recognition of Mercosur license plate model in real imageries. – A comparative study between a model trained with images of synthetic license plates and a model trained with images of real license plates. – Detection and recognition of Mercosur license plates in adjusted perspective angles using Tesseract OCR. – Increase of the LRP-UFC dataset with more 1.586 real images of vehicles with the new Mercosur license plate model.
2
Related Works
Cyro M. G. Sab´ oia et al. [15] performed an LPR study on synthetically generated license plates. As the Mercosur plate model was recently released, a database
Fully Automatic LPR Method Using Haar Cascade
515
with real imageries was not yet in the public database. From the synthetically generated license plate imageries dataset, the study proposed using Haar Cascade trained from a set of synthetically generated positive and negative images that did not contain license plates, to perform LPR on the Brazilian license plate in the new format of the Mercosur. The study achieved satisfactory results for the license plate detection problem. the work of Lu´ıs Fabr´ıcio de F. Souza et al. [8] proposed to apply the model trained on synthetic plates by [15] in a dataset of real LPR-UFC imageries. In the research, perspective adjustment techniques were used, which proved significant gains in character recognition. The study achieved 90.00% in license plate detection and 83.01% in character recognition. Al Awaimri et al. [16] developed an Omani license plate detection and recognition system using different methodologies, such as optical character recognition, convolutional neural network, or deep neural network. Several works were carried out an analytical study to understand which algorithms are more suitable, and finally, a practical study on real boards. As a result, the work achieved 71.5% for license plate extraction and 96.00% to 99.00% accuracy in character recognition. The deep-learned detection approach system proposed by [17] is divided into image acquisition, license plate detection, and optical character recognition. The architecture used is the Mobilenet V1 based on Single Shot Detector (SSD), allowing the processing of images with accuracy superior to 95.00%. Considering the different models of plates existing in the world [18] proposed a precise approach applicable to license plates from other countries, using tiny YOLOV3 for license plate detection, YOLOv3-SPP for unified character recognition, and, In order, it is proposed layout detection algorithm to extract the correct sequence from the LP number. According to the authors, the results showed that the results presented surpassed previous research works.
3
Materials
This section provides information about the dataset used in this article. One thousand five hundred eighty-six images increased the LPR-UFC dataset to train the proposed method, now having 2.686 imageries of vehicles with the new Mercosur license plate model. We took one hundred imageries for testing, and the rest of the dataset was resized to 7 different sizes, as shown in Table 2, totaling 17.962 positive images and 35.658 negative images. Also, as a contribution to this work, the LPR-UFC dataset was increased by 1586 more images from the LPR-UFC dataset provided by [8], totaling 2.686 images. The images from the new LPR-UFC dataset were collected in Fortaleza city, Brazil, from different angles and cameras. (Models: Intelbras and Hikvision) configured with 1920 × 1080 px resolution, 8192 Kbps bit rate, H .265 encoding and 25 fps refresh rate. The dataset is available upon request by email: [email protected], subject to citation of this study.
516
4
C. M. G. Sab´ oia et al.
Proposed Methodology
Fig. 2. Proposed methodology flowchart.
This section will present the methodology used for the LPR process applied to this study. In work presented by [15], a Haar Cascade model trained with synthetic imageries is applied to a dataset with synthetic imageries. In [8], the same model trained on synthetic imageries is applied to a set of real license plate imageries. The method proposed in this work is to train the Haar Cascade model with real imageries and apply it to the real license plate images dataset. Research Methodology Divided into eight stages, defined in Stage 1 - Database with real license plates images in the new Mercosur format, Stage 2 - Haar Cascade model training, Stage 3 - Segmentation, Stage 4 - Perspective adjustment, Stage 5 pre-processing, Stage 6 - Optical Character Recognition OCR, Stage 7 - Text recognition result, Stage 8 - Heuristic algorithm, Stage 9 - Resulting text. Stage 1 - Database with real imageries of license plates in the new Mercosur format - In this step, a database is built containing vehicles with plates of the new Mercosul Fig. 3 format, both with frontal and rear perspectives. The database has 2686 images, 2.586 used in training, and 100 separated for testing.
Fig. 3. Real dataset images sample from different vehicles.
Fully Automatic LPR Method Using Haar Cascade
517
Stage 2 - Haar Cascade model training - Model training was done with 2586 imageries separated for training. The Cascade Trainer GUI tool (Version 3.3.1) [19] software was used for this. The imageries of the plates applied to the tests were cut from the original image containing the entire vehicle. Then only the license plate imageries were resized to seven different sizes according to Table 2, forming the basis of positive images, which contain the object to be detected. The same dataset used in [15] was used for negative imageries. Stage 3 - Segmentation - In this step, the 100 images of vehicles with Mercosur plates separated for testing are passed through the trained model, resulting in new images containing only the cutout of the plate Fig. 4.
Fig. 4. Cropped license plate sample in different license plates.
Stage 4 - Perspeective Adjustment - After the segmentation stage, the license plate cropped image is passed to the perspective adjustment algorithm, which corrects the skew, leaving the plate correctly aligned Fig. 5.
Fig. 5. Perspective Adjustment sample in different license plates. Compare with Fig. 4
Stage 5 - Preprocessing - In the pre-processing stage, the license plate imageries go through a series of functionalities made available by the OPENCV library, where the image is converted to gray scale, the application of filters for noise smoothing, binarization and morphological operations to close gaps between digits Fig. 6.
Fig. 6. Pre-processing: Smoothed, binarized image and morphological operations applied in different license plates.
Stage 6 - Optical OCR Character Recognition - The result of the preprocessing of the image is sent to the optical character recognition tool TesseractOCR, in its python version called Pytesseract. Stage 7 - Textual recognition result - In this step the Tesseract returns the recognized text of the Fig. 7 image. It is important to understand that the
518
C. M. G. Sab´ oia et al.
sequence of plates in the new Mercosur format is composed of three letters, one digit, one letter, and two more digits, thus totaling seven characters per plate. Note that the tesseract can confuse letters, digits, letters to digits, and vice versa.
Fig. 7. Character recognition result in different license plates.
Stage 8 - heuristic algorithm - The license plate format in the new Mercosur model follows the standard with seven digits in the following sequence: letter, letter, letter, number, letter, number, number. Sometimes character recognition can recognize a letter when a number is expected, such as replacing the letter O with the Number 0. To optimize the results is used a heuristic algorithm to change characters when a letter is expected, and a digit appears, and vice versa. The algorithm logic can be found in Table 1. Table 1. Replacing Heuristic. Letter Digit
Q
D
O
I
Z
A
S
G
T
B
⇓
⇓
0
0
0
1
2
4
5
6
7
8
Stage 9 - Result text - As a result of the heuristic algorithm, a new resulting text is reported, correcting flaws and optimizing the recognition results Fig. 8
Fig. 8. Heuristic result in different license plates.
5
Results and Discussion
In this section, we will describe the results and discussion of the experiments carried out in the study. In work proposed by [15], a model using the Haar Cascade
Fully Automatic LPR Method Using Haar Cascade
519
method was trained using synthetically generated images. In work presented by [8], this same model trained with synthetic images was applied in the tests of real imageries. The proposed method of this work is to train a new model with real imageries and use it again to the training images of the previous models. We will address the results in two subsections: License Plate Detection and License Plate recognition. The work was developed and tested on a computer with the following configurations: Intel Core i7 processor with 2.9 GHz, 8 GB RAM, Ubuntu 16.04 LTS operating system. 5.1
License Plate Haar-Cascade Trainning
OpenCV provides us with tools for training and augmenting samples from the positive image pool. In this work, we use the Cascade Trainer GUI tool (Version 3.3.1). The training consisted of 16 stages, a sample width of 40, and a sample height of 13. For detection, scaleFactor 1.1, minNeighbors 1, minSize(1,1), and maxSize(300,300). Table 2 is defined the amount of positive and negative images used in training. For training the model, a dataset of real imageries was created in Section 3 Materials. The 2.586 images used in training were resized to seven different sizes, resulting in 17.962 Tab positive images. Table 2 For the negative images, the same dataset with 35.658 negative images described in [15] was used. 5.2
License Plate Detection
For a better understanding, it is important to highlight that the detection model trained in this experiment was applied only to the Real [8] test base. In tests performed without the perspective adjustment for the Real database, the model trained in this work presented a result of 91.00% of accuracy. There was a gain of 1.00% in the model trained on the Real base applied to the Real test database As for the tests performed with perspective adjustment, the model trained in this work presented a result of 97.00% of accuracy. There was also a gain, with an increase of 7.00% in the model trained on the real base applied to the Real test database. The gain was higher because the Real database had a greater amount of inclined plates. The synthetic database, precisely because a computer creates it, has a more homogeneous alignment and fewer obstacles for detection. Both results can be seen in Table 3. 5.3
License Plate Recognition
In Table 4 we present the results of character recognition. For a better understanding, it is important to note that the recognition was applied only to the same test base used in [8]. Both the results with perspective adjustment and without perspective adjustment showed a gain from the results presented by [8].
520
C. M. G. Sab´ oia et al.
For character recognition without perspective adjustment, we got a result of 86.05% against 83.01%, showing an increase of 3.04%. For character recognition with perspective adjustment, the result was 88.48% against 83.13%, showing an increase of 5.35% both compared to the results of [8]. Table 2. License Plate Haar-Cascade Trainning. Positives Negatives Images Size 17.962
35.568
60 × 16 80 × 22 100 × 27 120 × 32 140 × 38 160 × 44 180 × 49
Table 3. License plate detection models and respective results. No Perspective Adjustment Method
Train
Cyro M. G. Sab´ oia [15] Lu´ıs Fabr´ıcio de F. Souza [8] Proposed Method
Syntectic Dataset Syntectic Dataset 83.82% Syntectic Dataset Real Dataset 90.00% Real Dataset Real Dataset 91.00%
With Perspective Adjustment Method Train Cyro M. G. Sab´ oia [15] Lu´ıs Fabr´ıcio de F. Souza [8] Proposed Method
Test
ACC (%)
Test
ACC (%)
Syntectic Dataset Syntectic Dataset 90.00% Syntectic Dataset Real Dataset 90.00% Real Dataset Real Dataset 97.00%
Table 4. Character Recognition models Comparison using Tesseract OCR - Real and synthetic images. No Perspective Adjustment Method
Experiments
Cyro M. G. Sab´ oia [15] Lu´ıs Fabr´ıcio de F. Souza [8] Proposed Method
Syntectic Dataset 95.72% Real Dataset 83.01% Real Dataset 86.05%
With Perspective Adjustment Method Experiments Cyro M. G. Sab´ oia [15] Lu´ıs Fabr´ıcio de F. Souza [8] Proposed Method
ACC
ACC
Syntectic Dataset 95.72% Real Dataset 83.13% Real Dataset 88.48%
The proposed model, compared to the methods in the literature [15] and [8], brought superior and satisfactory results. Based on the methods found for the same problem, the perspective adjustment method made the license plate detection process more robust, helping identify characters in real imageries. The performance is measured by the accuracy based on the amount of identification of license plates and accuracy in identifying the characters. In the processing time of the tests, the proposed model obtained an average time of 22 hundredths of a second, bringing robustness to the process.
6
Conclusion and Future Work
In this study, a complete method of license plate detection and character recognition was proposed in an unprecedented database of images of vehicles with
Fully Automatic LPR Method Using Haar Cascade
521
license plates in the new format of Mercosur Fig. 1. The imageries are from natural environments, taken by cameras fixed on busy avenues in Fortaleza, Brazil. The studied model presented satisfactory results and was consistent with state-of-the-art, with 91.00% and 97.00% accuracy in detection without perspective adjustment and perspective adjustment, respectively. For character recognition, the results were 86.05% and 88.48% accuracy without perspective adjustment and with perspective adjustment, respectively. In both cases, there were improvements in the results, showing that Haar Cascade identifies the specifics of synthetic and real imageries, such as texture, shading, contrast, etc. For future work, we’ll use different classification and detection methods, using the YOLO network, to identify the old Brazilian license plate model and the new Mercosur license plate model. Acknowledgement. The authors would like to thank The Cear´ a State Foundation for the Support of Scientific and Technological Development (FUNCAP) for the financial support (grant #6945087/2019).
References 1. Silvano, G., et al.: Synthetic image generation for training deep learning-based automated license plate recognition systems on the brazilian mercosur standard. Des. Autom. Embed. Syst. 25(2), 113–133 (2021) 2. Zhang, C., Tai, Y., Li, Q., Jiang, T., Mao, W., Dong, H.: License plate recognition system based on OPENCV. In: Jain, L.C., Kountchev, R., Shi, J. (eds.) 3D Imaging Technologies–Multi-dimensional Signal Processing and Deep Learning, pp. 251– 256. Springer Singapore, Singapore (2021) 3. Bensouilah, M., Zennir, M.N., Taffar, M.: An ALPR system-based deep networks for the detection and recognition. In: ICPRAM, pp. 204–211 (2021) 4. Vi, G.V., Faudzi, A.A. B.M.: A study on different techniques in ALPR system: the systems performance analysis. In: Nasir, A.F.A.B., et al., (eds.) Recent Trends in Mechatronics Towards Industry 4.0 Singapore Springer Singapore, pp. 617–627 (2022). https://doi.org/10.1007/978-981-33-4597-3 56 5. Balia, R., Barra, S., Carta, S., Fenu, G., Podda, A.S., Sansoni, N.: A deep learning solution for integrated traffic control through automatic license plate recognition. In: Gervasi, O., et al. (eds.) ICCSA 2021. LNCS, vol. 12951, pp. 211–226. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86970-0 16 6. Sharma, P., Gupta, S., Singh, P., Shejul, K., Reddy, D.: Automatic number plate recognition and parking management, In 2022 International Conference on Advances in Computing, Communication and Applied Informatics (ACCAI). IEEE, pp. 1–8 (2022) 7. Srikanth, P., Kumar, A.: Automatic vehicle number plate detection and recognition systems: survey and implementation. In: Autonomous and Connected Heavy Vehicle Technology. Elsevier, pp. 125–139 (2022) 8. Souza, L.F.De.F.: New approach to the detection and recognition of Brazilian mercosur plates using HAAR cascade and tesseract OCR in real images. J. Inf. Assur. Secur. 17, 144–153, (2022)
522
C. M. G. Sab´ oia et al.
9. Takiddin, A., Shaqfeh, M., Boyaci, O., Serpedin, E., Stotland, M.: Gauging facial abnormality using HAAR-cascade object detector. In: 44th Annual International Conference of the IEEE Engineering in Medicine Biology Society (EMBC). IEEE, vol. 2022, pp. 1448–1451 (2022) 10. Yaddanapudi, S.D., et al.: Collection of plastic bottles by reverse vending machine using object detection technique, Mater. Today Proc. (2021) 11. Thorat, C., Bhat, A., Sawant, P., Bartakke, I., Shirsath, S.: A detailed review on text extraction using optical character recognition. In: Fong, S., Dey, N., Joshi, A. (eds.) ICT Anal. Appl., pp. 719–728. Springer Singapore, Singapore (2022) 12. Niluckshini, M., Firdhous, M.: Automatic number plate detection using HAARcascade algorithm proposed for Srilankan context. In 2022 2nd International Conference on Advanced Research in Computing (ICARC). IEEE, pp. 248–253 (2022) 13. Thumthong, W., Meesud, P., Jarupunphol, P.: Automatic detection and recognition of THAI vehicle license plate from CCTV images. In: 2021 13th International Conference on Information Technology and Electrical Engineering (ICITEE). IEEE, pp. 143–146 (2021) 14. Dome, S., Sathe, A.P.: Optical character recognition using tesseract and classification. In. International Conference on Emerging Smart Computing and Informatics (ESCI), vol. 2021, pp. 153–158 (2021) 15. Sab´ oia, C.M.G.., Filho, P.P.R.: Brazilian mercosur license plate detection and recognition using HAAR cascade and tesseract OCR on synthetic imagery. In: Abraham, A., Gandhi, N., Hanne, T., Hong, T.-P., Nogueira Rios, T., Ding, W. (eds.) ISDA 2021. LNNS, vol. 418, pp. 849–858. Springer, Cham (2022). https:// doi.org/10.1007/978-3-030-96308-8 79 16. Awaimri Al, M., Fageeri, S., Moyaid, A., Thron, C., ALhasanat, A.: Automatic number plate recognition system for Oman. In: Alloghani, M., Thron, C., Subair, S. (eds.) Artificial Intelligence for Data Science in Theory and Practice. SCI, vol. 1006, pp. 155–178. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-92245-0 8 17. Awalgaonkar, N., Bartakke, P., Chaugule, R.: Automatic license plate recognition system using SSD. In: International Symposium of Asian Control Association on Intelligent Robotics and Industrial Automation (IRIA), vol. 2021, pp. 394–399 (2021) 18. Henry, C., Ahn, S.Y., Lee, S.-W.: Multinational license plate recognition using generalized character sequence detection. IEEE Access 8, 185–199 (2020) 19. Phase, T.: Building custom HAAR-cascade classifier for face detection. Int. J. Eng. Tech. Res. 8, 01 (2020)
Dynamic Job Shop Scheduling in an Industrial Assembly Environment Using Various Reinforcement Learning Techniques David Heik(B) , Fouad Bahrpeyma, and Dirk Reichelt University of Applied Sciences Dresden, Faculty of Informatics/Mathematics, 01069 Dresden, Germany [email protected], [email protected], [email protected]
Abstract. The high volatility and dynamics within global value networks have recently led to a noticeable shortening of product and technology cycles. To realize an effective and efficient production, a dynamic regulation system is required. Currently, this is mostly accomplished statically via a Manufacturing Execution System, which decides for whole lots, and usually cannot react to uncertainties such as the failure of an operation, the variations in operation times or in the quality of the raw material. In this paper, we incorporated Reinforcement Learning to minimize makespan in the assembly line of our Industrial IoT Test Bed (at HTW Dresden), in the presence of multiple machines supporting the same operations as well as uncertain operation times. While multiple machines supporting the same operations improves the system’s reliability, they pose a challenging scheduling challenge. Additionally, uncertainty in operation times adds complexity to planning, which is largely neglected in traditional scheduling approaches. As a means of optimizing the scheduling problem under these conditions, we have implemented and compared four reinforcement learning methods including Deep-Q Networks, REINFORCE, Advantage Actor Critic and Proximal Policy Optimization. According to our results, PPO achieved greater accuracy and convergence speed than the other approaches, while minimizing the total makespan. Keywords: Artificial Intelligence · Reinforcement Learning · Manufacturing Systems · Smart Production Systems · Industrial IoT Test Bed
1
Introduction
A primary challenge in the development of automated industrial manufacturing systems is the design of the control mechanisms, which are primarily concerned with scheduling. The purpose of scheduling, as a fundamental problem in production systems, is to determine the most efficient allocation of tasks based on the c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 523–533, 2023. https://doi.org/10.1007/978-3-031-35501-1_52
524
D. Heik et al.
Fig. 1. Industrial IoT Test Bed at HTW Dresden
available resources and production cycle [6]. Traditionally, in the development procedure of production systems, it is assumed that the tasks to be processed by the machines remain constant over time. However, real-world applications face uncertain operating times that may result from a variety of factors, such as machine malfunction, maintenance or jamming. In order to achieve effective and efficient planning, an appropriate control mechanism must be in place that can handle such uncertainties. There is a possibility that the entire system may come to a halt if these unforeseen events are not accounted for. As part of an industrial laboratory for smart manufacturing, at HTW Dresden, we work on a wide range of topics, in particular, the development control problems to optimize the production process and the corresponding performance. Our fully automated IIoT Test Bed consists of a CNC milling machine, a warehouse and, at its core, 13 assembly stations (see Fig. 1), some of which can perform similar operations. This redundancy allows the physical system to have the potential capability to handle issues such as bottlenecks, maintenance or failures while maintain overall performance at a certain level. However, in practice, the existing control software limits potentials due to the fact that the previously mentioned issues were not taken into account. Furthermore, production scheduling is currently a dynamic planning problem in which decisions must be made in real time and the system must be able to react to sudden events. Therefore, the problem no longer corresponds to offline planning, in which all of the prerequisites for optimization were in place. In order for a satisfactory decision to be made if any of the mentioned challenging scenarios arise during the production process, it is imperative to consider all possible scenarios before they occur. As of the present, there is no polynomial-time algorithm that can solve this problem, which makes it NP-hard [6]. This paper proposes the use of Reinforcement Learning (RL) to develop an appropriate scheduling method for our IIoT Test Bed, which in contrast to traditional scheduling methods, is capable of dealing with dynamicity and uncertainty in manufacturing environments. Specifically, this paper examines the presence of uncertain operation times in machines whose corresponding
Dynamic Job Shop Scheduling
525
operation can be also performed by another machine (which provides the same functionality).
2
Literature Review
The relevant literature consists of various types of methodologies to optimize scheduling processes, from exact solutions such as profound methods mentioned in [18] to advanced methods such as RL [21]. Traditionally, rule based approaches have been the most popular methods to scheduling, where rules are initially defined for jobs to be assigned to certain machines in efficient schedules of time slots. However, rule based methods often fail to be comprehensive and flexible enough for dynamic systems. Also, algorithms such as branch-and-bound methods provide exact procedures, while can only be applied to small-scale problems [19]. As a result of the need for algorithms that could offer efficiency in time, computational requirements and also engineering cycle, the community became interested in the use of meta-heuristic approaches. As nature-inspired algorithms, meta-heuristics offer near-optimal solutions and have become increasingly popular since, in most cases, a near-optimal solution can be used rather than a global optimal solution if the effort required to obtain the latter is not feasible. For instance, Liu et al. [10] proposed a hybrid algorithm using PSO and generic algorithm (GA), and applied it to solve a machine tool production scheduling problem. Their results showed that the hybrid approach provides better solution quality and convergence rate compared to PSO, GA and simulated annealing algorithm. A hybrid PSO was furthermore used in [8] for production scheduling in cellular manufacturing systems. Literature also includes many examples of meta-heuristics that have been combined to enhance results further. In this regard, Sels et al. [15] presented a GA and a scatter search procedure for store scheduling, while Wang and Tang [17] incorporated an adaptive GA for solving the minimum makespan problem. Besides, with the aim of dealing with unseen situations, the community started using supervised learning approaches, such as neural networks (NNs), where a generalization ability is provided to estimate the appropriate solution for an unseen situation. Yu and Liang [20] presented a hybrid approach using NNs and GA to solve a restrictive scheduling problem. In this work, GA was used to optimize the sequence and a NN was used to optimize the operation start times with a fixed sequence. In the literature of scheduling, a class of machine learning methods known as RL has recently received increased attention due to its ability to handle uncertainty. RL incorporates intelligent agents in simulated environments to learn via trial and error through interactions with the environment, mainly initiated in simulated environments before physical deployment. In [2,3], the authors proposed the use of Deep Q-Network (DQN) for highly dynamic scheduling environments focusing on minimizing the job rejection rate. To solve large scheduling problems, the authors in [13] used PSO and RL in parallel. Zhou et al. [22] incorporated a DQN-method for minimizing the maximum completion time in dynamic scheduling problems in smart manufacturing. Kardos et al. [7] developed a Q-learning based scheduling algorithm to solve the dynamic job shop scheduling problem effectively. A double loop
526
D. Heik et al.
DQN with an exploration and exploitation loop, was proposed by Luo et al. [11] to solve the job store scheduling problem with random job arrivals. Wang et al. [16] designed a dual Q-learning method for the assembly job shop scheduling problem with uncertain assembly times to minimize the total weighted earliness penalty and the completion time and cost. A review of the literature regarding the approaches used for scheduling, as well as the technical considerations in their implementation for specific cases, reveals that little attention is paid to the presence of multiple supporting machines for the same operations while dealing with varying operating cycles. This paper aims to address the mentioned problem focusing on our Industrial IoT Test Bed production architecture via the use of a method specifically developed to deal with uncertainty, known as reinforcement learning.
3
Methodology
Table 1. Operating times-V1 Station Operation Duration
Fig. 2. Workplan-V1
Fig. 3. Environment architecture-V1
Station 1 A
10 s
Station 2 B
12 s
Station 3 B
14 s
Station 4 C
10 s
Table 2. Operating times-V2 Station Operation Duration Station 1 A
10 s
Station 2 B
12 s
Station 3 B
14 s
Station 4 C
16 s
Station 5 C
14 s
Fig. 4. Workplan-V2 Fig. 5. Environment Station 6 D architecture-V2
10 s
A simplified structure of our IIoT Test Bed (see Fig. 1) is illustrated in Fig. 3 which represents an abstraction of the manufacturing line that we use to provide demonstrations. The goal of this paper is to incorporate RL to develop a control mechanism for scheduling the production process for this architecture, while considering redundant support of operations in multiple machines as well as the presence of uncertainty in machine’s operation times. The simplest form of the problem with redundant capabilities and uncertain operation times is depicted in Fig. 2 where there are only two similar workstations in the workplan. This
Dynamic Job Shop Scheduling
527
Table 3. Complexity Situation Parallel Uncertainty Car- No of poss. operations (margin ±) riers initial situations Index
No of poss. Experiments decisions required
SI1 SI2 SI3 SI4
1 1 1 1
1 3 1 3
4 4 6 6
(34 ) ∗ (24choose4) = 860.706 (74 ) ∗ (24choose4) = 25.513.026 (34 ) ∗ (24choose6) = 10.902.276 (74 ) ∗ (24choose6) = 323.164.996
16 16 64 64
13.771.296 408.208.416 697.745.664 20.682.559.744
SI5 SI6 SI7 SI8
2 2 2 2
1 3 1 3
4 4 6 6
(36 ) ∗ (24choose4) = 7.746.354 (76 ) ∗ (24choose4) = 1.250.138.274 (36 ) ∗ (24choose6) = 98.120.484 (76 ) ∗ (24choose6) = 15.835.084.804
256 256 4096 4096
1.983.066.624 320.035.398.144 401.901.502.464 64.860.507.357.184
workplan contains a total of 3 operations with uncertain operations times. This workplan begins at station 1 (slot 1) with operation A (placing a bottom shell - red), followed by operation B (inserting a PCB - green). The operation B can be performed by station 2 (slot 7) or by station 3 (slot 13). The assembly of the product is completed at station 4 (slot 19) with operation C (placing the top shell - yellow). Our Industrial IoT Test Bed was used to record production times, as shown in Table 1. An extended (more realistic) version of this problem is when two parallel operations are defined. Assume that operation B (the insertion of a PCB with glue - green) is offered by station 2 (slot 7) or station 3 (slot 10) and Operation C (the activation of the glue in a pass-through oven - blue) is offered by station 4 (slot 13) or station 5 (slot 16), as shown in Fig. 5. Like the previous example, the production process starts with operation A (placing a bottom shell - red) at station 1 (slot 1) and is completed at station 6 (slot 19) with Operation D (the placement of the top shell - yellow). The corresponding workplan for the extended problem is shown in Fig. 4, followed by the production times represented in Table 2. As a discrete manufacturing system, our simulation considers discretized time intervals each corresponding to one second. Per time step, a carrier can be transported one slot further, or wait for their predecessors to move on, or wait until the station finishes their operation. A growth in the number of product and also the number of stations supporting same operations will lead to an exponential expansion of the possible decision space (combinations of decisions). This enlargement in the decision space is shown in Eq. (1) N umberOf P ossibleDecisions = (2N umberP arallelOperations )N umberCarriers
(1)
The number of possible initial situations per problem (SI) depends on: – the amount of variation in operation times (a margin of ±3 s, results in 7 possible operation times) – How many stations are existing in the environment – How the carriers are initially distributed on the conveyor belt (The calculation is made using the binomial coefficient, where n = 24 (number of slots) and k = 4 or 6 (number of carriers)
The correspondence of this problem is shown in Table 3, in numbers. In this paper, we consider uncertainty as a variations in the operating cycle. For example, for an uncertainty show in a value of 3, the operation D from Table 2
528
D. Heik et al.
corresponds to the time range in seconds: 10±3 = [7; 13]. Our aim, therefore, is to incorporate RL approaches such as DQN, REINFORCE, Advantage Actor Critic (A2C), and Proximal Policy Optimization (PPO) to develop decision policies that minimize the overall completion time while dealing with the uncertainties considering the redundant capabilities. As a value-iteration-based method, the algorithm of DQN does not directly optimize for reward but instead learns a function approximator to predict Qvalues that satisfy the recursive process of the Bellman Equation [12]. By maximizing the Q-values, optimal actions are identified. The Q-value approximator is usually updated in an offline manner by storing transitions in a buffer and sampling them randomly. With PPO, the expected reward is directly optimized by estimating the gradient of the policy from the agent’s trajectory. In PPO [14], the policy is typically updated in an online process in which the agent’s most recent transitions are used to update the policy. Actor Critic (AC) approaches use a combination of value-iteration and policy gradient methods, wherein the actor is a policy gradient method used for action selection and the critic, as a value-iteration based approach, is responsible for supervising the actor. An advantage function is usually used beside AC, in other words Advantage Actor Critic to stabilize training by reducing variance [9]. As a policy gradient-based method, REINFORCE assumes that actions resulting in higher expected rewards are more likely to result in observed state. In REINFORCE, the policy is directly optimized by estimating the weights of the optimal policy with stochastic gradient ascent [1]. A major limitation of REINFORCE is that it has no general asymptotic property and does not always converge to a (local) maximum. Based on the requesting station and the current state of the conveyor, a decision is made by the agent for each carrier and redundant operation. The request can be understood as follows: Should the operation be performed here or at the neighboring station? The representation of the state always includes all slots. The value of a single slot corresponds to 0 if there is no carrier or to the one hot coded value of the next needed operation for the workpiece. We have developed a score (see Eq. (2)) for evaluating the performance and calculate the reward after the production of all workpieces is completed. Score = (
M akespanW orst − M akespanAgent ); Reward = Score3 M akespanW orst − M akespanBest
(2)
In order to make the results reproducible, the implementation code and data for this paper have been provided, see [4,5]. The used settings and hyperparameters are shown in Tables 4 and 5.
4
Experiments
The experimental results are presented in Figs. 6, 7, 8 and 9 that indicates an individual boxplot for the performance of each RL technique. In these figures, the vertical black scale (Y-axis on the left) reflects the normalized accuracy. Each diagram was divided into 2 sections (left version 1 and right version 2).
Dynamic Job Shop Scheduling
529
Table 4. General settings Parameter
Environment V1 Environment V2
Input Dimensions Hidden1 Dimensions Hidden2 Dimensions Hidden3 Dimensions Action Dimensions Min. episodes Max. episodes
3 ∗ 24 = 72 64 32 16 2 200 200000
(4 ∗ 24) + 6 = 110 256 64 16 2
Table 5. Hyperparameter Hyperparameter
DQN REINFORCE A2C PPO Env. V1 Env. V2 Env. V1 Env. V2 Env. V1 Env. V2 Env. V1 Env. V2
Discount (γ) 0.99 0.01 Learning rate GAE parameter (λ) -
0.99 0.0003 -
0.99 0.001 -
Minibatch size Horizon Policy clip Num. epochs
-
-
128 -
0.99 0.0003 0.95 12 24 0.2 4
Fig. 6. DQN
The simulation index (SI), as clearly described in Table 3, has been plotted on the X-axis. Each scenario contains a subset of 4 early stopping strategies (from light grey to black, identified in the legend area), represented by a box that illustrates the evaluation results from multiple execution of the algorithms with the same parameters. We limited the length of the whiskers to a maximum of 1.5 times the interquartile range. Values that lie outside the whiskers were noted with an X-symbol. The vertical orange scale (Y-axis on the right) is logarithmic and illustrates in the same diagram how many episodes were needed on average until the model converged (represented by orange circles). With the exception of REINFORCE shown in Fig. 7, it was possible to train at least one model of a performance
530
D. Heik et al.
Fig. 7. REINFORCE
Fig. 8. A2C
Fig. 9. PPO
over 80% for each scenario (SI). Convergence had to be achieved within a maximum of 200,000 episodes, after which training was terminated. However, if the median values or even the outliers of the trained models are taken into account, it can be observed that PPO performs low in the SI2 , SI5 and SI6 scenarios. In scenario SI5 , the model even always makes the worst possible decisions. Also DQN (Fig. 6) has convergence problems in combination with some early stop strategies, see: SI6 and SI8 . For the results obtained from DQN (Fig. 6), it is clear that the variation in accuracy becomes particularly large whenever there is a high variation in operation times, as shown in SI2 in comparison with SI1 , SI4 in comparison with SI3 , SI6 in comparison with SI5 as well as SI8 in compar-
Dynamic Job Shop Scheduling
531
Table 6. Best and fastest converging models Situation best best alg. e.s.m. Index
av score av no. of fastest (best) ep. (best) alg.
fastest e.s.m.
av score av no. of av score av no. of (fastest) ep. (fastest) (Δ) ep. (Δ)
SI1 SI2 SI3 SI4
PPO DQN PPO PPO
0.79–100 0.89–25 0.85–50 0.89–25
0.934 0.857 0.959 0.888
303 22754 314 5607
DQN DQN PPO PPO
0.79–100 0.92–10 0.79–100 0.92–10
0.917 0.813 0.954 0.866
277 853 207 506
–0,017 –0,044 –0,005 –0,022
–26 –21901 –107 –5101
SI5 SI6 SI7 SI8
DQN A2C PPO PPO
0.85–50 0.89–25 0.89–25 0.89–25
0.936 0.897 0.970 0.926
615 21564 305 1643
DQN DQN PPO PPO
0.89–25 0.92–10 0.92–10 0.79–100
0.934 0.863 0.961 0.912
536 1635 232 972
–0,002 –0,034 –0,009 –0,014
–79 –19929 –73 –671
ison with SI7 . This behavior is also found in a similar form in Figs. 8, 9 and 7, where higher variations in the operating times also leads to higher variations in the accuracy. Furthermore, Figs. 8 (A2C) and 6 (DQN) shows that, in general, a better accuracy in the decisions is achieved when the operation time varies less. As shown in Table 6, DQN performs best in the cases with only 4 carriers (SI1 , SI2 , SI5 and SI6 ) in terms of the convergence speed. Also, when it comes to the accuracy of the results, DQN can outperform other algorithms in the experiments of SI2 and SI5 . A remarkable fact is that in all situations where 6 products should be produced (SI3 , SI4 , SI7 and SI8 ), PPO was able to outperform DQN, A2C and REINFORCE in terms of the average score, duration and the number of episodes required for learning. The reason behind this outperformance is the fact that PPO is a policy gradient method. The motivation of Schulman et al. [14], who presented this technique, was to develop an algorithm with the data efficiency and reliable performance of Trust Region Policy Optimization (TRPO), but using only first-order optimization. The term “proximal policy” was coined by measuring during the learning process how strongly the current policy has shifted compared to the previous one. Upon the review of the results, we found that PPO proved to be more effective than DQN, A2C and REINFORCE when dealing with complex problems, involving the 6 carriers case in our study. However, due to the way the PPO algorithm works, it became clear that it is not suitable for every problem (especially SI5 and SI6 ). To improve the robustness, an appropriate mechanism need be incorporated that allows the model to recover from over-rewarded episodes.
5
Conclusion
With the recent advances in the field of artificial intelligence in connection with automated industrial manufacturing, we also see an immense potential for dynamic and automated resource allocation at our real model factory at the HTW, the Industrial IoT Test Bed. Our central research objective in this paper was to bring reinforcement learning into practice in our Industrial IoT Test Bed. The methodology used in this work had the purpose of minimizing the total completion time for a given order. In this regard, we investigated four RL-methods
532
D. Heik et al.
(DQN, REINFOCE, A2C, and PPO) to implement production scheduling under uncertainty in our model factory. A comparison between the performances of the used techniques in terms of the speed of convergence and the accuracy, indicate that PPO is the recommended approach for scheduling production lines of high dimensional problem space. On the other hand, if the models were to be generated as quickly as possible, DQN gave the best results for problems where characterized by a less complicated production scheduling problem space. In the end, our experiments demonstrated that RL is capable of solving complicated problems where traditional approaches fail such as when dealing with uncertain operation times and redundant machine capabilities. Please see the supplementary material at [5]. In future work, there is a potential in the reward to be enhanced in order to make the algorithm more generalizable and effective for more complicated problems. A remaining problem to work on in the future is also to include all the stations of the IIoT Test Bed and consider various workplans for various products, which increases the overall complexity and needs to be addressed in real-world scenarios. Acknowledgement. This work has been supported and funded by the ESF (European Social Fund) as part of the REACT research group “Wandlungsf¨ ahige Produktionsumgebungen” (WaPro, application number: 100602780). REACT-EU: Funded as part of the EU reaction to the COVID-19 pandemic.
References 1. Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: Deep reinforcement learning: a brief survey. IEEE Signal Process. Mag. 34(6), 26–38 (2017) 2. Bahrpeyma, F., Haghighi, H., Zakerolhosseini, A.: An adaptive RL based approach for dynamic resource provisioning in cloud virtualized data centers. Computing 97(12), 1209–1234 (2015) 3. Bahrpeyma, F., Zakerolhoseini, A., Haghighi, H.: Using ids fitted q to develop a real-time adaptive controller for dynamic resource provisioning in cloud’s virtualized environment. Appl. Soft Comput. 26, 285–298 (2015) 4. Heik, D.: Discrete manufacturing simulation environment. Zenodo (2022) 5. Heik, D.: Results of the experiments within the discrete manufacturing simulation environment. Zenodo (2022) 6. Jong, W.R., Chen, H.T., Lin, Y.H., Chen, Y.W., Li, T.C.: The multi-layered jobshop automatic scheduling system of mould manufacturing for industry 3.5. Comput. Indus. Eng. 149, 106797 (2020) 7. Kardos, C., Laflamme, C., Gallina, V., Sihn, W.: Dynamic scheduling in a job-shop production system with reinforcement learning. Procedia CIRP 97, 104–109 (2021) 8. Khalid, Q.S., et al.: Hybrid particle swarm algorithm for products’ scheduling problem in cellular manufacturing system. Symmetry 11(6), 729 (2019) 9. Lee, B.: Roll control of underwater vehicle based reinforcement learning using advantage actor-critic. J. Korea Instit. Military Sci. Technol. 24(1), 123–132 (2021) 10. Liu, L.L., Hu, R.S., Hu, X.P., Zhao, G.P., Wang, S.: A hybrid PSO-GA algorithm for job shop scheduling in machine tool production. Int. J. Prod. Res. 53(19), 5755–5781 (2015)
Dynamic Job Shop Scheduling
533
11. Luo, B., Wang, S., Yang, B., Yi, L.: An improved deep reinforcement learning approach for the dynamic job shop scheduling problem with random job arrivals. J. Phys. Conf. Ser. 1848, 012029 (2021). IOP Publishing (2021) 12. O’Donoghue, B., Osband, I., Munos, R., Mnih, V.: The uncertainty bellman equation and exploration. In: International Conference on Machine Learning, pp. 3836– 3845 (2018) 13. Pradhan, A., Bisoy, S.K., Kautish, S., Jasser, M.B., Mohamed, A.W.: Intelligent decision-making of load balancing using deep reinforcement learning and parallel PSO in cloud environment. IEEE Access 10, 76939–76952 (2022) 14. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017) 15. Sels, V., Craeymeersch, K., Vanhoucke, M.: A hybrid single and dual population search procedure for the job shop scheduling problem. Eur. J. Oper. Res. 215(3), 512–523 (2011) 16. Wang, H., Sarker, B.R., Li, J., Li, J.: Adaptive scheduling for assembly job shop with uncertain assembly times based on dual q-learning. Int. J. Prod. Res. 59(19), 5867–5883 (2021) 17. Wang, L., Tang, D.: An improved adaptive genetic algorithm based on hormone modulation mechanism for job-shop scheduling problem. Expert Systems with Applications (2011) 18. Werner, F., Burtseva, L., Sotskov, Y.: Exact and heuristic scheduling algorithms. MDPI-Multidisciplinary Digital Publishing Institute (2020) 19. Xiao, Y., Zheng, Y., Yu, Y., Zhang, L., Lin, X., Li, B.: A branch and bound algorithm for a parallel machine scheduling problem in green manufacturing industry considering time cost and power consumption. J. Clean. Prod. 320, 128867 (2021) 20. Yu, H., Liang, W.: Neural network and genetic algorithm-based hybrid approach to expanded job-shop scheduling. Comput. Indus. Eng. 39(3), 337–356 (2001) 21. Zhang, Z., Zheng, L., Weng, M.X.: Dynamic parallel machine scheduling with mean weighted tardiness objective by q-learning. Int. J. Adv. Manuf. Technol. 34(9), 968–980 (2007) 22. Zhou, L., Zhang, L., Horn, B.K.: Deep reinforcement learning-based dynamic scheduling in smart manufacturing. Procedia CIRP 93, 383–388 (2020)
Gated Recurrent Unit and Long Short-Term Memory Based Hybrid Intrusion Detection System M. OmaMageswari, Vijayakumar Peroumal(B) , Ritama Ghosh, and Diyali Goswami School of Electronics Engineering, Vellore Institute of Technology, Chennai, India [email protected]
Abstract. Cyber-attacks have increased in recent years; however, the classic Network Intrusion Detection System based on feature selection by filtering has significant disadvantages that make it difficult to stop new attacks promptly. An anomalybased hybrid deep learning system for detecting network intrusions is done by using various neural networks such as Gated Recurrent Unit (GRU) and Long short-term memory (LSTM). Extreme Gradient Boosting with Shapley Additive Explanations (SHAP) values-based supervised machine learning feature selection methods have been used to optimally select the number of features. Optimization techniques such as Adam and Root Mean Square Propagation (RMSPROP) are used to further enhance the performance of the deep learning classifier. Finally, the mechanism is examined using simulation characteristics including precision, accuracy, recall, and F1-score. The model is tested on two benchmark datasets, CICIDS2017 and UNSW-NB15. This sort of research aids in the identification of the optimal algorithm for predicting future cyber-attacks, especially in the vulnerable public healthcare industry. Keywords: Deep Learning · Intrusion Detection System · Machine Learning
1 Introduction Malicious software is evolving at a rapid pace, posing a significant challenge for the building of intrusion detection systems (IDS). Malicious activities have evolved, and the most difficult task is identifying unknown and disguised malware since malware developers employ various evasion tactics for information concealment to avoid detection by an IDS. Businesses have hurried to digitize their processes and business operations since the advent of the pandemic and digitization. While technology advancements have aided businesses in scaling their operations, they have also raised the danger of essential data theft. A cyber assault might target any firm that has sensitive data or depends on real-time processing. Covid-19 has increased the surge in cyberattacks, which was already on the rise. With remote working becoming the standard across the world, the large reliance on technology was unavoidable. Furthermore, rising 5G adoption, device interconnectivity, new processes and procedures, updated employee profiles, and less-controlled work settings have all resulted in an increase in vulnerabilities, detection of this is a very © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 534–544, 2023. https://doi.org/10.1007/978-3-031-35501-1_53
Gated Recurrent Unit and Long Short-Term
535
challenging one. Nowadays Machine Learning and Deep learning plays a vital role and an efficient way of detecting traffic, Kaur et al. [1] present a novel approach to danger detection. They proposed a D-Sign-based novel solution using an LSTM network, with raw data handled through two layers of LSTM. The LSTM outperformed the previous work by giving an accuracy of 99.08% for CICIDS and 99.14% for NSL-KDD datasets. The authors Advent Patel et al. [2] discuss the advent of online communication and data transfer and the effect of this on data security. The study examines the need for constructing a new detection mechanism based on intelligent approaches such as machine learning to identify known and unexpected dangers. The basic understanding of how intrusion detection works and its types are discussed in detail. Finally, the need for autonomic computing, machine learning, and deep learning to create a new intrusion detection system is realized. Mohammed Maithew et al. [3] use a deep neural network method to identify an unknown attack package utilizing a sophisticated intrusion detection system with high network performance. The neurons utilize forward engendering with its feed-forward neural classifier. It also uses back propagation using a loss function and an optimizer. Additionally in this model, the attack detection is done in two ways binary classification and multiclass classification. In terms of the suggested system’s performance, it has demonstrated promising results of high precision of 99.98%.Mahhizaruvi et al. [4] put forward IDS with the help of an Enhanced Multi-Relational Fuzzy Tree (EMRFT). The EMRFT utilized a genetic algorithm for optimization. First, it generates a classifier using the KDD99 training dataset. Second, it will classify the test dataset using the generated classification model. The ability and precision of the model are tested using 10 –fold cross-validation. The Binary EMRFT model produced an accuracy of 98.27% and the detection time required by the model is around 7 seconds. In order to classify the normal or abnormal traffic and also to detect the type of attack the proposed method uses Deep learning with optimization techniques. The main contribution of the author has been mentioned below: • Proposed the Recurrent Neural Network-based Intrusion Detection System using the two benchmark datasets such as CICIDS2017 and UNSW-NB15. • Implemented the novel feature optimized Extreme Gradient Boosting classifier algorithm with the help of SHAP values. • Build the GRU and LSTM models for binary and multi-target classification. • Enhanced Adam and RMSP optimization techniques are implemented for Intrusion Detection System. • Compare the performance of the Different ML models using the metrics like accuracy, precision, recall and F1-Score. The paper is structured as follows: Section 2 elaborates on the Hybrid Intrusion Detection System. Section 3 summarizes the metrics and the experimental findings. Section 4 concludes the study with conclusions and contains recommendations for future work.
536
OmaMageswari. M et al.
2 Literature Survey This study explains the existing methods for detecting Intrusion and various techniques using deep learning and machine learning methodology, and also discusses the performance analysis of Intrusion detection using various methods.The authors Jin Kim et al. [5] proposed a better method than Mahhizaruvi et al. where Artificial Intelligence (AI) intrusion detection system is based on a deep neural network. The deep neural network (DNN) with four levels and 100 hidden units was used to create a learning model. The Rectified Linear Unit function was employed as the activation function for the hidden layers in this study. The Adam optimizer, which uses adaptive back-propagation, was also used. Finally, the DNN model’s detection efficacy was determined by evaluating its accuracy, detection rate, and false alarm rate. The results show that the accuracy and detection rate are both extremely high, averaging 99%. Firdausi et al. [6] use different ways to identify and generate signatures for web-based attacks. A comprehensive approach to automated behaviour-based malware detection utilizing machine learning techniques has been discussed in this research. Four distinct data sets were used in the research, each of which was exposed to five different classifiers: k-Nearest Neighbor, Naive Bayes, SVM, J48 decision tree, and MPNN. In this study, the outcomes of binary categorization tests were statistically assessed. Based on the analysis of the tests and experimental findings of all 5 classifiers, the J48 decision tree had the best overall performance, with a recall of 95.9%, a false positive rate of 2.4%, precision of 97.3%, and an accuracy of 96.8%. Smita Ranveer et al. [7] provide a solution that employs a hybrid technique for detecting malware based on a Support Vector Machine classifier, allowing the malware detection system’s full potential to be realised while maintaining high accuracy and minimal false alarms. The TPR for the opcode-based static strategy was 0.95, while the TPR for the behaviour-based approach was 0.93, which was lower than the hybrid approach. For the same set, the opcode-based static technique gave 0.08 false alarms, while the behaviour-based approach yielded 0.3, which was higher than the hybrid approach. Das et al. [8] use Rough Set Theory (RST) to decrease the dimension of the packets collected and then the selected features are passed through SVM. When packets are gathered from the network, RST is used for pre-processing the data and reduce the dimensions. The selected features were be sent into model, which will be used to learn and test them. RST lowered the features from 41 to 29. This RST-SVM approach produces a more accurate result than either full feature or entropy. The result shows an increase in accuracy.Sujitha et al. [9] work on the benchmark IDS dataset – KDD’99. Only 16 of the 41 features provided in the dataset have been utilized. The methodology is a combination of fuzzy logic further improved by a Genetic Algorithm. For IDS, a genetic algorithm is employed to evolve new rules. Using these rules, the distinction between normal network traffic from unusual traffic/data is made. Rules in the genetic rule set algorithms are of the “if-then” variety. Similarly, an attempt also made by the authors Halim Z et al. [10] to solve the feature selection issue in intrusion detection was achieved with the help of a genetic algorithmbased feature selection method. Three benchmark network traffic datasets were utilized to test the work and then three different classifiers were used to assess the obtained features. Lastly, a comparison of the common feature selection methodologies with the
Gated Recurrent Unit and Long Short-Term
537
proposed system showed that using GbFS, the accuracy of the findings improved, which resulted in the highest accuracy of 99.80%. Gao et al. [11] suggest an adaptive ensemble learning method to solve the current problems in the intrusion detection field. The NSLKDD Test+ has been utilized dataset for their research. Firstly, CART is used to classify the sample data. A DNN is employed as a base classifier. To extract the advantages of the various classifiers, an adaptive voting algorithm is implemented. The dataset is not exhaustive and there exists an imbalance between the number of data points for different categories. It has been found that each classification algorithm has its own set of benefits. The results suggest that the proposed model is an effective solution for intrusion detection, with the best attack classification rate. The suggested voting system has an accuracy of 85.2%, a precision of 86.5%, a recall of 85.2%, and an F1 of 84.9%. Maseer et al. [12] reviewed the learning methods – ANN, Decision Tree, KNN, Naïve Bayes, Random Forest, SVM, CNN, K-means clustering, EM clustering, and SOM. The four most popular IDS datasets: KDD’99, NSL-KDD, UNSW NB-15, and CICIDS2017 have been analysed. The authors tabulated the results obtained on implementing all the algorithms. The different columns showed the performance of each model on the four benchmark datasets. The performance metrics used for evaluation were accuracy, precision, recall, and F1-score. The training time, memory usage, and testing time, too, were calculated. The experimental results show that there exists no single ML algorithm that can successfully detect all kinds of attacks. Future work must focus on feature selection and developing new DL methods for constructing IDS.
3 Hybrid Intrusion Detection System This work focuses on presenting a Recurrent Neural Network-based Intrusion Detection System using the two benchmark datasets. The types of classification are listed as follows: Binary Classification is a classification activity in which a given set of data is categorized into two labels to detect whether the network activity is normal or abnormal. Using the network features, the proposed IDS will classify into ‘normal traffic’ and ‘abnormal traffic’. Multi-target Classification is the task of categorising things into several classes is known as multi-class categorization. In multi-target classification, each input will have only one output class. This is to not only detect whether the network activity is normal but to detect the type of attack as well. Using the network features, the proposed IDS will classify into ‘normal traffic’ and different types of attacks such as ‘DoS’, ‘DDoS’, ‘PortScan’, etc. 3.1 Hybrid Intrusion Detection System for Binary Classification The first step is dataset collection which is followed by splitting the datasets into training and testing datasets in a ratio of 70:30. The next step is preprocessing of input data which mainly deals with the various data inputs and converting them into acceptable data structures. Machine learning Feature selection using XgBoost with SHAP values gave a set of 20 features which were further used for model building. LSTM-based deep learning model was used for binary classification with Adam optimizer as shown in (Fig. 1). The performance of the final model was evaluated using testing data.
538
OmaMageswari. M et al.
Dataset Information: CICIDS 2017: The Canadian Institute for Cybersecurity at the University of Brunswick generated this dataset in 2017. CICIDS-2017’s goal was intrusion detection, and it included many assault scenarios. There are 1.04 million data points in all. Out of those, around 40% belong to the attack category and 60% to the normal category, there are a total of 77 characteristics and one target category called ‘Label.’ UNSW-NB15 was developed in the Australian Centre for Cyber Security’s Cyber Range Lab using a commercial penetration tool (ACCS). This programme can produce a combination of modern synthetic regular activities and modern attack behaviour from network data. It is a network intrusion dataset with 49 characteristics that contains around 2540044 records. For binary classification, the label column is categorized into attack (55.1%) and benign traffic (44.1%). Training Data is used to train a machine learning model to anticipate the outcome that the model is designed to predict. For training purposes in the CICIDS2017 dataset, around 700000 data points (70%) have been utilized. In the case of the UNSW NB-15 dataset, 1.7 million data points (70%) have been used. In preprocessing stages, the datasets contain many missing values and other symbolic features which are not capable of being processed by the deep learning classifiers. As a result, they are removed from datasets. Since the target label is categorical, it is converted into numerical values using Label Encoder. The classes were tagged between 0 and n-1 classes. Highly correlated columns are eliminated in order to decrease unnecessary costs. Additionally, Min-Max scaling is used for the normalization of the feature columns. This aids in rescaling all the features within [0, 1]. This enhances the model’s efficiency and training stability. Relabeling of the target feature is performed in the CICIDS2017 dataset to address the high-class imbalance issue.
Fig. 1. Hybrid LSTM-based intrusion detection for binary classification
Gated Recurrent Unit and Long Short-Term
539
Proposed Intrusion Detection System for Multi-target Classification: The first step includes collecting the datasets: CICIDS 2017 and UNSW NB-15. Then, preprocessing mostly deals with the numerous data inputs and transforming them into suitable data structures. Feature selection using XgBoost with SHAP values is used to pick features yielding a collection of 20 features that were then utilized for model construction. For multitarget classification, a GRU-based deep learning model with the Adam optimizer was utilized as shown in (Fig. 2). The final model’s performance was assessed using testing data.
Fig. 2. Proposed hybrid GRU based intrusion detection for multi-target classification.
Dataset Information: The CICIDS2017 dataset comprises benign and up-to-date common attacks, and it closely reflects authentic real-world data. Benign, DoS, DDoS, and PortScan are examples of attack types. There are 1.04 million data points in all. There are a total of 77 characteristics and one target category called ‘Label.’ UNSW-NB15 was developed in the Australian Centre for Cyber Security’s Cyber Range Lab using a commercial penetration tool (ACCS). This programme can produce a combination of modern synthetic regular activities and modern attack behaviours from network data. It is a network intrusion dataset with 49 characteristics that contains around 2540044 records. For binary classification, the label column is categorised into attack and benign traffic. Fuzzers, Analysis, etc., are among the various attack categories which are used for multiclass classification. For training purposes in the CICIDS2017 dataset, around 700000 data points (70%) have been utilized. In the case of the UNSW NB-15 dataset, 1.7 million data points (70%) have been used. In preprocessing stage, the datasets contain a
540
OmaMageswari. M et al.
large number of null values and other symbolic information that the classifiers are unable to analyze. As a result, they are no longer included in the datasets. Because the target label is categorical, Label Encoder is used to transform it into numerical values. The classes were labelled between 0 and n-1. In order to save needless expenditures, highly connected columns are removed. In addition, Min-Max scaling is employed to normalize the feature columns. This helps with rescaling all of the features in the same range. This improves the model’s efficiency and training consistency. To solve the high-class imbalance issue, the target feature is relabeled in the CICIDS2017 dataset.
4 Results and Discussions The work is conducted on HP Pavilion loaded with the Windows 11 Operating System with the following processor: Intel(R) Core(TM)i5-8250U CPU @ 1.60GHz 1.80 GHz with an 8 GB RAM. The models are built on the Keras Python framework. Keras provides a complete framework to create any type of neural network. Additionally, the python libraries, namely, NumPy, pandas, sklearn, seaborn, TensorFlow, xgboost, and keras_tuner are used. Two benchmark datasets namely CICIDS2017 and UNSW NB-15 are used. Each model is run for 15 epochs. Firstly the accuracy, recall, precision, and f1 score are observed at the end of training. Then the model is evaluated on the testing dataset to get the testing accuracies, precision, recall, and f1 score. 4.1 Model Building For Binary Classification Table 1 represents the DL models such as LSTM and GRU were used with different optimizers like Adam and RMSprop using the CICIDS 2017 dataset. The models were built and they were compared using different evaluation metrics. The best testing accuracy was found in LSTM with adam optimizer with an accuracy of 99%, precision of 100%, recall value of 59.53%, and F1 score of 74.25%. Table 1. Model Comparison for Binary Classification using the CICIDS2017 dataset Model
LSTM
GRU
Optimizer
Train
Test
Accuracy
Precision
Recall
F1_score
Accuracy
Precision
Recall
F1_score
Adam
99.06%
100%
59.72%
74.42%
99%
100%
59.53%
74.25%
RMSprop
81.84%
100%
62.74%
76.55%
57.03%
100%
47.31%
63.74%
Adam
98.76%
100%
59.93%
75.86%
98.42%
100%
61.43%
75.75%
RMSprop
89.93%
100%
62.49%
76.47%
84.92%
100%
61.60%
75.87%
Table 2 shows the LSTM and GRU based models were used with different optimizers like Adam and RMSprop using the UNSW NB-15 dataset as shown in Table 2. The models were built using the 20 best features and they were compared using different
Gated Recurrent Unit and Long Short-Term
541
Table 2. Model Comparison for Binary Classification using the UNSW NB-15 dataset Model
Optimizer
Train Accuracy
LSTM
GRU
Test Precision
Recall
F1_score
Accuracy
Precision
Recall
F1_score
Adam
98.75%
98.84%
13.14%
22.74%
98.82%
98.74%
12.59%
21.89%
RMSprop
98.39%
98.80%
12.98%
19.10%
97.64%
97.25%
10.60%
18.74%
Adam
94.83%
100%
54.86%
70.42%
95.37%
100%
54.30%
69.99%
RMSprop
94.72%
100%
54.95%
70.48%
94.43%
100%
54.17%
69.86%
evaluation metrics. The best testing accuracy was found in LSTM with adam optimizer with an accuracy of 98.82%, precision of 98.74%, recall value of 12.59%, and F1 score of 21.89%. Table 3. Model Comparison for Multiclass Classification using the CICIDS 2017 dataset Model
LSTM
GRU
Optimizer
Train
Test
Accuracy
Precision
Recall
F1_score
Accuracy
Precision
Recall
Adam
82.66%
100%
95.64%
97.94%
82.51%
100%
95.65%
F1_score 97.74%
RMSprop
61.45%
100%
86.57%
92.69%
61.40%
100%
86.62%
92.72%
Adam
91.93%
100%
99.84%
99.92%
91.95%
100%
99.96%
99.98%
RMSprop
91.74%
100%
99.83%
99.92%
91.79%
100%
99.92%
99.96%
Table 3 shows Deep learning models of LSTM and GRU were used with different optimizers like Adam and RMSprop using the CICIDS 2017 dataset. The models were built for multitarget classification and they were compared using different evaluation metrics such as accuracy, precision, recall, and F1 score as shown in Table 3. The best testing accuracy was found in GRU with Adam optimizer with an accuracy of 91.95%, precision of 100%, recall value of 99.96%, and F1 score of 99.98%. Table 4. Model Comparison for Multiclass Classification using the UNSW NB-15 dataset Model
Optimizer
Train Accuracy
LSTM
GRU
Test Precision
Recall
F1_score
Accuracy
Precision
Recall
F1_score 100%
Adam
92.84%
100%
100%
100%
95.45%
100%
100%
RMSprop
92.68%
98.3%
98.46%
99.02%
93.82%
98.3%
98.46%
99.02%
Adam
95.87%
100%
100%
100%
95.75%
100%
100%
100%
RMSprop
94.72%
100%
100%
100%
94.43%
100%
100%
100%
Table 4 depicts the LSTM and GRU-based models which were used with different optimizers like Adam and RMSprop using the UNSW NB-15 dataset. The models were
542
OmaMageswari. M et al.
built using the 20 best features and they were compared using different evaluation metrics. The best testing accuracy was found in GRU with Adam optimizer with an accuracy of 95.75%, precision of 100%, recall value of 100%, and F1 score of 100%. 4.2 Performance Comparison Between the Two Datasets Figure 3 shows a representation of the best models by taking into account their performance for Binary Classification using CICIDS2017 and UNSW NB-15 datasets as shown in (Fig. 3).
Fig. 3. Best Model Performance Comparison for Binary Classification
Fig. 4. Best Model Performance Comparison for Multi-target Classification
The above Fig. 4 is a representation of the various performance metrics used to choose the best model. The best performance models are selected for multitarget classification from the CICIDS2017 and UNSW NB-15 dataset as shown in (Fig. 4).
Gated Recurrent Unit and Long Short-Term
543
5 Conclusion and Future Work In this paper, various RNN-based Intrusion Detection System models have been proposed and experimented with two benchmark NIDS datasets, namely, UNSW NB-15 and CICIDS2017. The LSTM model with Adam optimizer is observed to outperform all the other Neural Networks for binary classification of network traffic. It achieves an accuracy of 99%, precision of 100%, recall of 59.53%, and f1-score of 74.25% using the CICIDS2017 dataset. For the UNSW NB-15 dataset, the accuracy is 98.82%, precision is 98.74%, recall is 12.59%, and f1-score is 21.89%. With regards to multi-target classification, a GRU model with an Adam optimizer has been proposed. The performance metrics for the CICIDS2017 dataset are as follows: 91.95% accuracy, 100% precision, 99.96% recall, and 99.98% f1-score. The performance metrics for the UNSW NB-15 dataset are as follows: 95.73% accuracy, 100% precision, 100% recall, and 100% f1score. The models can be further improved by stacking the LSTM and GRU layers in order to observe whether the detection rate increases or not. Additionally, ensemble techniques can be implemented with the deep learning models to create complex intrusion detection systems. Work could be done to make this hybrid IDS more compatible with embedded systems and IoT to utilize it in real-time.
Conflict of Interest. The authors declare that there is no conflict of interest in this paper.
References 1. Kaur, S., Singh, M.: Hybrid intrusion detection and signature generation using Deep recurrent Neural Networks. Neural Comput. Appl. 32(12), 7859–7877 (2019). https://doi.org/10.1007/ s00521-019-04187-9 2. Patel, A., Qassim, Q., Wills, C.: A survey of intrusion detection and prevention systems. Inf. Manag. Comput. Secur. 18(4), 277–290 (2010). https://doi.org/10.1108/09685221011079199 3. Maithem, M., Al-sultany, G.: Network intrusion detection system using deep neural network. J. Phys. Conf. Ser. 1804(1), 012–138 (2021). https://doi.org/10.1088/1742-6596/1804/ 1/012138 4. Mahhizharuvi, P., Seethalaksmi, A.V.: An effective intrusion detection system using enhanced multi relational fuzzy tree. Turk. J. Comput. Math. Educ. 12(13), 3152–3159 (2021) 5. Kim, J., Shin, N., Jo, S.Y., Kim, S.H.: Method of intrusion detection using deep neural network. In: IEEE International Conference on Big Data and Smart Computing (BigComp). IEEE (2017). https://doi.org/10.1109/bigcomp.2017.7881684 6. Firdausi, I., lim, C. , Erwin, A., Nugroho, A.S.: Analysis of Machine learning Techniques Used in Behavior-Based Malware Detection. In: 2010 Second International Conference on Advances in Computing, Control, and Telecommunication Technologies. IEEE (2010). https://doi.org/10.1109/act.2010.33 7. Smita, R. , Hiray, S.R. : SVM Based Effective Malware Detection System. (IJCSIT) Int. J. Comput. Sci. Inf. Technol. 6(4), 3361–3365 (2015) 8. Das, V., Pathak, V., Sharma, S., Sreevathsan, Srikanth, M., Gireesh Kumar, T.: Network intrusion detection system based on machine learning algorithm. Int. J.Comput. Sci. Inf. Technol. 2( 6), 138–151 (2010). https://doi.org/10.5121/ijcsit.2010 9. Sujitha, B. Ramani, R., Parameswari.: Intrusion detection system using fuzzy genetic approach. Int. J. Adv. Res. Comput. Commun. Eng. 1(10) (2012)
544
OmaMageswari. M et al.
10. Halim, Z.: An effective genetic algorithm-based feature selection method for intrusion detection systems. Comput. Secur. 110, 102448 (2021). https://doi.org/10.1016/j.cose.2021. 102448 11. Gao, X., Shan, C., Hu, C., Niu, Z., Liu, Z.: An adaptive ensemble machine learning model for intrusion detection. IEEE Access 7, 82512–82521 ( 2019). https://doi.org/10.1109/access. 2019.2923640 Accessed 8 Jan 2022 12. Maseer, Z., Yusof, R., Bahaman, N. Mostafa, S., Foozy, C.: Benchmarking of machine learning for anomaly based intrusion detection systems in the CICIDS2017 Dataset. IEEE Access 9, 22351–22370 (2021). https://doi.org/10.1109/access.2021.3056614
Trace Clustering Based on Activity Profile for Process Discovery in Education Wiem Hachicha1,2(B) , Leila Ghorbel1 , Ronan Champagnat2 , and Corinne Amel Zayani1 1
MIRACL Laboratory, Sfax University, Tunis Road Km 10 BP. 242, Sfax 3021, Tunisia [email protected], [email protected], [email protected] 2 L3i Laboratory, La Rochelle University, Avenue Michel Cr´epeau, La Rochelle 17042, France [email protected]
Abstract. The basic objective of process mining is to discover, monitor, and improve process models by extracting knowledge from event logs using different techniques. In order to enhance the quality of process models, several works in literature used trace clustering, where a trace represents a sequence of events of the same process instance (user). In this research paper, we attempt to discover process models in the educational domain so as to refine learning resource recommendation. For this reason, we propose to apply trace clustering on event log extracted from the learning platform Moodle. Indeed, to the best of our knowledge, trace clustering has not yet been applied in the educational domain. From this perspective, we performed several experiments with various clustering algorithms to retain the most performant one. Subsequently, we applied a process discovery algorithm, namely heuristic miner. The results revealed that the quality of the discovered process models is better once trace clustering is considered.
Keywords: Process mining Activity profile
1
· Trace clustering · Process discovery ·
Introduction
The central target of process mining is to discover, monitor, and improve real processes by extracting knowledge from event logs readily available in current information systems [1]. Process mining techniques allow to discover process models from event log, and to make analysis in order to determine their performances [2,3]. Process mining is invested in several areas, including information retrieval in a digital library, healthcare, social media, etc. Our research work focused mainly on the educational domain, where process mining techniques are applied to educational data in order to enhance learning resource recommendation [4,5]. In our previous work [6,7], we elaborated an c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 545–554, 2023. https://doi.org/10.1007/978-3-031-35501-1_54
546
W. Hachicha et al.
architecture of applying process mining in order to recommend process models (learning paths) to learners. We applied discovery algorithms to extract process models of learning paths based on event logs, learning results and learner’s profiles extracted from Moodle platform. We realized that heuristic miner stands for the best discovery algorithm to extract a process model of learning paths. However, we noticed that the results can be highly improved. For this reason, we carried out an in-depth study which corroborated that there are methods which make it possible to improve the process discovery. Trace clustering constitutes one of these methods, where a trace represents a sequence of events of the same process instance (user). It is used in several works [8–13] based on three techniques: syntax-based similarity, model-based similarity and feature vector-based similarity. These techniques use as input data extracted from event log, such as activity profile [10,12] and position profile [13]. As far as this work is concerned, we attempt to improve the proposed architecture in [6] by applying the trace clustering so as to enhance the quality of the discovered process models. Indeed, to the best of our knowledge, trace clustering has not yet been applied to the educational domain. We performed several experiments with various clustering algorithms to retain the most performant one. Afterwards, we applied a process discovery algorithm which is the heuristic miner. The results demonstrated that applying trace clustering helps polish the quality of the discovered process models. This paper is organized as follows. We start by introducing the state of the art in Sect. 2. Next, in Sect. 3, we display an overview about our approach of trace clustering in the educational domain. In Sect. 4, we introduce and detail the experiments which we have conducted in this area. Eventually, we draw certain pertinent conclusions and provide outstanding perspectives for future works in Sect. 5.
2
State of the Art
In this section, we first introduce the process mining. Second, we report relevant research papers about trace clustering. Third, we discuss activity profile. Finally, we compare the different works about trace clustering. 2.1
Process Mining
The starting point for process mining is an event log. Event log includes a set of events recorded during the execution of a process. Each event refers to an activity and is associated with a particular case, also called a process instance. A sequence of events of the same process instance is a trace [14]. Table 1 depicts a fragment of a possible event log corresponding to an e-learning process. Each row indicates an executed event, which contains information like the timestamp, and the activity name such as course viewed and quiz attempt viewed. Note that events are grouped per case in this table. Case 1 rests on three associated events. The first event of Case 1 corresponds to the execution of Activity1 on January 14th 2021.
Trace Clustering Based on Activity Profile
547
Table 1. A Fragment of Some Event Log CaseID Activity
Timestamp
1
Activity1 14/01/21, 11:08
1
Activity2 14/01/21, 16:00
1
Activity1 27/01/21, 10:04
2
Activity1 06/01/21, 09:03
2
Activity2 06/01/21, 16:22
2
Activity3 07/02/21, 11:47
3
Activity1 03/03/21, 09:00
3
Activity1 08/03/21, 17:05
Process mining algorithms aim to discover process models based on event logs such as inductive miner, fuzzy miner, heuristic miner and alpha miner. There are various metrics for measuring the quality of process models [15], namely fitness, precision, and generalization. Fitness determines how well the model allows the behaviors present in the event log. Precision corresponds to the rate of activities in the event logs compared to the total activities observed in the process model. Generalization measures the ability of the model to generalize the behavior present in the event log. 2.2
Trace Clustering
Trace clustering identifies homogeneous sets of traces in a heterogeneous event log and allows the discovery of many simpler process models. In order to cluster traces, three trace clustering techniques are invested: • Syntax-based similarity: traces are compared to each other through applying string distance metrics, such as Levenshtein distance. A trace can be edited into another one by adding, deleting and substituting events [8], [9]. • Model-based similarity: the definition of the similarity between traces relies on the quality of the models discovered from those traces. A trace is more similar to a cluster of traces, if a better quality of the model can be discovered from the cluster [16]. • Feature vector-based similarity: each trace is transformed into a vector of features based on defined characteristics, such as the frequency of activities. For instance, the set of vectors represents the activity profile. To estimate the similarity between traces, numerous distance metrics are used to find the similarity between these feature vectors. Authors in [10] used the same approach of [11] to create the activity profile as an input for the clustering algorithm. However, they focused on the fitness, precision and simplicity of a model generated by clustered data using Incremental Trace Clustering.
548
W. Hachicha et al.
In [12], the trace clustering method was used to split the event logs into similar sets of sub-logs. Then, the researchers predicted the missing data by calculating the similarity between each missing trace and the sub-logs and by considering the number of traces. In [13], the position profile was introduced which is a triplet that considers the occurrence of an activity at a position with its respective frequency. Scientists measured the distance between different segments of the event log, computing the probability distribution of observing activities in specific positions. Authors in [11] invested the activity profile to cluster traces based on the combination of Euclidean distance as a distance measure and Self-Organizing Maps (SOM). 2.3
Activity Profile
Grounded on the information found in the event log, we can obtain various profiles such as the activity profile. The latter can be identified as the aggregation of vectors, each of which contains a set of measures on the events composing a trace [11]. Each measure represents the frequency of activity (the number of occurrences) in the trace of a user. These vectors can be used to cluster the traces. In Table 2, each case is denoted by a sequence of activities, also referred to as a trace. We can find all traces by filtering the event log on case identifiers, which corresponds to CaseID in our example. For simplicity, the activity names have been transformed into single-letter labels. Therefore, “a” denotes “Activity1”, “b” denotes “Activity2” and “c” denotes “Activity3”. Table 2. Another Representation of Event Log Plotted in Table 1 CaseID Trace 1
< a, b, a >
2
< a, b, c >
3
< a, a >
Table 3 demonstrates the activity profile generated from the event log exhibited in Table 1. For example, in the second row of Table 3, the trace vector v= < 1, 1, 1 >, which implies that activities {a, b, c} occur once.
Trace Clustering Based on Activity Profile
549
Table 3. Activity profile from Table 1 Activity Profile CaseID a b c
2.4
1
2 1 0
2
1 1 1
3
2 0 0
Synthesis
As previously mentioned, we aim to apply the trace clustering so as to enhance the quality of the discovered process models. We identified in the literature three techniques for trace clustering. In this paper, we are basicaly interested in the feature vector-based similarity technique (cf. Sect. 2.2). In this regard, we performed a comparative study of this technique’s works as displayed in Table 4. The comparison rests upon the used clustering algorithm, the type of profile, the objectives of the research work, and the domain of the case study. We noticed that these works used different clustering algorithms (Incremental Clustering, Self-Organizing Map, Hierarchical) based on activity profile [10–12] or position profile [13] in various domains such as healthcare and manufacturing. The main objective of these works is to improve the quality of the discovered models or complement missing traces to get a complete log. We attempt in this work to extract the activity profile and to apply the trace clustering in the domain of education.
3
Clustering of Learner Traces for Process Discovery
In our previous work, we proposed an architecture that uses process mining for learning resource recommendation [6]. This architecture rests on four layers which are source, client, recommendation, and process mining. The latter allows the discovery of process models based on event logs, learning results and learner’s profiles extracted from Moodle platform. In this research work, we propose to extend the process mining layer by the step of trace clustering to enhance the quality of process models. The chief purpose of this step is to find homogenous groups of learners (process instances). As depicted in Fig. 1, we first filtered the event log by instances to ensure that only learners’ activity logs were kept in the event log. Second, we anonymized the learners. We equally converted the learners’ names into IDs to maintain their anonymity and to ensure the principles of ethics. Third, we created the activity profile which is used to split the event log. Notably, the activity profile
550
W. Hachicha et al. Table 4. Comparison of Related Works
Work Clustering Algorithms
Type of profile Objectives
Domains
[10]
Incremental Clustering
Activity profile Refine the quality of a discovered model by increasing fitness and precision and reduce complexity
Procedure of reviewing articles for publication
[12]
Self-Organizing Map
Activity profile Complete missing traces to get complete log
Healthcare
[13]
Hierarchical
Position profile Improve the characterisation of Manufacturing event logs in preparation to process mining
[11]
Self-Organizing Map
Activity profile Enhance process mining results
Healthcare
(see Sect. 2.3) is a NxM matrix, where N represents the number of learners and M refers to the number of activities. Each row in this matrix corresponds to a vector trace which is composed of activity frequencies. Finally, we applied the clustering algorithm to split traces based on activity profile. The results of trace clustering stand for the input of process discovery algorithm.
Fig. 1. The integration of Trace clustering based on activity profile for process model discovery
Trace Clustering Based on Activity Profile
4
551
Experimental Results
In order to assess the discovered process models, we extracted the event logs of learners who studied the course “Introduction to human-machine interfaces (HMI)” created on the Moodle platform at the University of La Rochelle (France). A pedagogical scenario has been established allowing learners to achieve the final goal. These event logs include 42,438 events of 100 students (i.e. 100 traces) that learned a course over one semester. Each event corresponds to an activity performed by a learner. Additionally, in order to extract homogeneous traces’ clusters, we created the activity profile, then we applied the clustering algorithm. In our case particularly, the activity profile is a 100 × 28 matrix, where 100 represents the number of learners and 28 represents the number of activities. We calculated the number of occurrences of each activity in each trace. Basically, we performed several experiments with various clustering algorithms such as DBSCAN [17], Agglomerative Clustering [18], Gaussian Mixture Model (GMM)1 , and k-means [19]. To discover process model for each cluster, we applied the heuristic miner process discovery algorithm as it performed better in our previous work [6]. Subsequently, the Fitness (F), Precision (P) and Generalization (G) of discovered models were measured. Notably, Table 5 portrays the results of measured metrics after applying the clustering algorithms k-means, DBSCAN, Agglomerative Clustering, and Gaussian Mixture Model respectively. The clusters are expressed in terms of C0, C1, C2, C3, and C4. The average of each measured metric is abbreviated by Avg. We fixed the number of clusters to 5 after an experimental investigation. Afterwards, we calculated the average of each metric (Fitness, Precision and Generalization) in order to identify the most performing clustering algorithm. The results revealed that the discovered process models based on clusters generated by the Gaussian Mixture Model is the best compared to the other algorithms. We obtained 0.9837 for the Fitness value, which implies that the model allows the behaviors present in the event log in a better way. In addition, we recorded the highest value in terms of Precision, which indicates that the rate of activities in the event logs is 0.3489 compared to the total of activities detected in the process model. Moreover, we recorded 0.7692 for the Generalization value, which confirms the ability of the model to generalize the behavior present in the event log. It is noteworthy that in order to better confirm the usefulness and feasibility of trace clustering, we enacted a comparison between the discovered process models without applying trace clustering and with applying trace clustering. The results of evaluation which are outlined in Table 6 prove that trace clustering enhances the quality of process models. In fact, the value of Precision increased 1
c “User Guide.” Gaussian mixture models. Web. 2007 - 2022. scikit-learn developers.
552
W. Hachicha et al.
Table 5. Evaluation metrics obtained on process models by applying different clustering algorithms k-means F P
G
DBSCAN F P
G
Agglomerative F P G
GMM F
P
G
C0
0.9922 0.1951 0.7718 0.9887 0.1441 0.7852 0.9892 0.2037 0.8019 0.9862
0.2393
0.8429
C1
0.9847 0.2218 0.8235 0.9841 0.2479 0.8554 0.9796 0.2960 0.7447 0.9799
0.1972
0.7597
C2
0.9652 0.1624 0.6668 0.9865 0.1909 0.6172 0.9923 0.1485 0.7963 0.9652
0.1624
0.6668
C3
0.9838 0.2884 0.7663 0.9332 1
0.1459
0.7607
C4
0.9923 0.1486 0.7938 0.9802 0.1586 0.6003 0.9909 0.1623 0.7553 0.9949
1
0.8162
0.7836 0.9652 0.1624 0.6668 0.9923
Avg 0.9836 0.2023 0.7644 0.9745 0.3483 0.7283 0.9834 0.1945 0.7530 0.9837 0.3489 0.7692
Table 6. Comparison between the discovered process models without applying trace clustering and with applying trace clustering Process models
F
P
G
without applying trace clustering (our previous work [6])
0.9903 0.1596
with applying trace clustering
0.9837 0.3489 0.7692
0.7611
from 0.1596 to 0.3489 and the value of Generalization relatively rose. However, the Fitness values slightly decreased. All obtained results revealed that trace clustering has a positive impact on the discovered process models in the educational domain.
5
Conclusion
In the current paper, we presented the state of the art works addressing process mining, trace clustering, and activity profile. We elaborated an approach for process discovery based on the trace clustering in the educational domain to enhance the quality of process models. Moreover, we created the activity profile which is used to split the learners traces. Then, we performed several experiments with various clustering algorithms. In order to assess the quality of process models, we calculated the Fitness, Precision, and Generalization. In addition, the generated results corroborate the effectiveness of applying trace clustering based on activity profile for the discovery process models and prove that it can be used in the educational domain. Nevertheless, these results can be taken further and enhanced. In this respect, in future works, we aspire to enhance the activity profile through incorporating such additional measures to the frequency of activities as frequent sub-sequences [20]. Acknowledgment. This work was financially supported by the PHC Utique program of the French Ministry of Foreign Affairs and Ministry of higher education and research and the Tunisian Ministry of higher education and scientific research in the CMCU project number 22G1403.
Trace Clustering Based on Activity Profile
553
References 1. Van Der Aalst, W.: Process mining: overview and opportunities. ACM Trans. Manage. Inf. Syst. (TMIS) 3(2), 1–17 (2012) 2. Bey, A., Champagnat, R.: Analyzing student programming paths using clustering and process mining. In: Cukurova, M., Rummel, N., Gillet, D., McLaren, B.M., Uhomoibhi, J. (eds.) Proceedings of the 14th International Conference on Computer Supported Education, CSEDU 2022, Online Streaming, April 22-24, 2022, Volume 2, pp. 76–84. SCITEPRESS (2022) 3. Leblay, J., Rabah, M., Champagnat, R., Nowakowski, S.: Process-based assistance method for learner academic achievement. In: International Association for Development of the Information Society (2018) 4. Zayani, C.A., Ghorbel, L., Amous, I., Mezghanni, M., P´eninou, A., S`edes, F.: Profile reliability to improve recommendation in social-learning context. Online Inf. Rev. 44(2), 433–454 (2018) 5. Troudi, A., Ghorbel, L., Amel Zayani, C., Jamoussi, S., Amous, I.: Mder: multidimensional event recommendation in social media context. Comput. J. 64(3), 369–382 (2021) 6. Hachicha, W., Ghorbel, L., Champagnat, R., Zayani, C.A., Amous, I.: Using process mining for learning resource recommendation: a moodle case study. Procedia Comput. Sci. 192, 853–862 (2021) 7. Hachicha, W., Champagnat, R., Ghorbel, L., Zayani, C.A.: Process models enhancement with trace clustering. In: Proceedings of the 30th International Conference on Computers in Education. Asia-Pacific Society for Computers in Education (2022) 8. Bose, R.J.C., Van der Aalst, W.M.: Context aware trace clustering: towards improving process mining results. In: Proceedings of the 2009 SIAM International Conference on Data Mining, pp. 401–412 (2009) 9. Di Francescomarino, C., Dumas, M., Maggi, F.M., Teinemaa, I.: Clustering-based predictive process monitoring. IEEE Trans. Serv. Comput. 12(6), 896–909 (2016) 10. Faizan, M., Zuhairi, M.F., Ismail, S.: Process discovery enhancement with trace clustering and profiling. Ann. Emerging Technol. Comput. (AETiC) 5(4), 1–13 (2021) 11. Song, M., G¨ unther, C.W., van der Aalst, W.M.P.: Trace clustering in process mining. In: Ardagna, D., Mecella, M., Yang, J. (eds.) BPM 2008. LNBIP, vol. 17, pp. 109–120. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-003288 11 12. Xu, J., Liu, J.: A profile clustering based event logs repairing approach for process mining. IEEE Access 7, 17872–17881 (2019) 13. Ceravolo, P., Damiani, E., Torabi, M., Barbon, S.: Toward a new generation of log pre-processing methods for process mining. In: Carmona, J., Engels, G., Kumar, A. (eds.) BPM 2017. LNBIP, vol. 297, pp. 55–70. Springer, Cham (2017). https:// doi.org/10.1007/978-3-319-65015-9 4 14. Van Der Aalst, W.: Process Mining: Data Science in Action, pp. 1–477. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49851-4 15. Buijs, J.C.A.M., van Dongen, B.F., van der Aalst, W.M.P.: On the role of fitness, precision, generalization and simplicity in process discovery. In: Meersman, R., et al. (eds.) OTM 2012. LNCS, vol. 7565, pp. 305–322. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33606-5 19
554
W. Hachicha et al.
16. De Weerdt, J., Vanden Broucke, S., Vanthienen, J., Baesens, B.: Active trace clustering for improved process discovery. IEEE Trans. Knowl. Data Eng. 25(12), 2708–2720 (2013) 17. Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, vol. 96, pp. 226–231 (1996) 18. M¨ ullner, D.: Modern hierarchical, agglomerative clustering algorithms. arXiv preprint arXiv:1109.2378 (2011) 19. McCool, M., Robison, A.D., Reinders, J.: Chapter 11 - k-means clustering. In: McCool, M., Robison, A.D., Reinders, J. (eds.) Structured Parallel Programming, pp. 279–289. Morgan Kaufmann, Boston (2012) 20. Trabelsi, M., Suire, C., Morcos, J., Champagnat, R.: A new methodology to bring out typical users interactions in digital libraries. In: 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 11–20 (2021). IEEE
IoT Based Early Flood Detection and Avoidance System Banu Priya Prathaban1(B) , Suresh Kumar R2 , and Jenath M3 1 Department of Electronics and Communication Engineering, Vel Tech Rangarajan Dr.
Sakunthala R&D Institute of Science and Technology, Chennai, India [email protected] 2 Department of ECE, Vel Tech Multi tech Dr. Rangarajan Dr. Sakunthala Engineering College, Chennai, India [email protected] 3 Department of ECE, Sri Sairam Engineering College, Chennai, India [email protected] Abstract. The Early Flood Detection and Avoidance is a smart system that constantly monitors biological predictions for taking necessary steps to reduce flood damage. The property impairment and losing of lives are the major issue connected to the natural disaster. In this present system, there is lack of efficient device to trigger flood alert. The prevailing procedures are costly, flimsy and wired. Therefore it is not appropriate for outside environment. In the current system, a person has to go to check the water level manually, which is time consuming. Therefore, in this paper, the proposed framework has a Wi-Fi connection, so the gathered information can be received from any place utilizing IoT without any problem. This model is an IoT based which can be remotely monitored. This work informs the likelihood to give a ready framework to conquer the flood hazard. Similarly it adds to the power of organization like fireman, administration office who assist the general public about the cataclysmic event. It is critical to evolve a flood control framework as a component to diminish the flood hazard. Giving a fast criticism on the event of the flood is essential for making occupant aware of make an early move such as clear rapidly to a more secure and higher spot. The reason for flood notice is to distinguish and conjecture undermining flood occasions with the goal that public can be alarmed ahead of time. Flood warnings are exceptionally versatile where insurance through huge scope, hard protections, isn’t attractive. Sensing and GSM modules all together provides better insight regarding the occurrence of flood. Here, the cautioning framework screen suggest to close the dams in regards to the situation. Keywords: Alert · avoidance · detection · flood · IoT · ubidots
1 Introduction Internet of Things (IoT) denotes to the interconnection of, internet-connected devices which might be able to acquire and switch facts over a wireless community without human intervention [1]. It is an arrangement of composed figuring gadgets, mechanical and virtual machines, contraptions, creatures or individuals which contains a specific identity and the possibility to move information inside an organization lacking © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 555–563, 2023. https://doi.org/10.1007/978-3-031-35501-1_55
556
B. P. Prathaban et al.
the essential human communication [2]. Floods are a natural phenomenon that attracts global interest and are the result of massive environmental degradation and loss of life. Floods are the result of heavy rainfall, structural failure and a large number of human characteristics [3]. Floods depend on rainfall values and prices, topology, geology, land use, and previous humidity. Floods can also occur in streams if the flow rate exceeds the channel’s capacity, particularly where it is bended or in streams. Floods regularly unleash destruction on homes and organizations when situated on common stream fields [4]. While stream flood damage could be shed from a detachment from streams and various streams, people commonly lived and worked in streams considering the way that the land would overall be level and ready and because streams gave straightforward turn of events and permission to trade and Indus [5]. Flash floods are linked to a variety of rainstorms, all of which are capable of dumping large volumes of rain over a specific area, making detection difficult. Other severe weather events occurring at the same time might also trigger a flash flood hazard [6]. Satellites, lightning observation systems, and radar are the key methods for controlling severe rainfall related with freshet. Humans have attempted, but failed, to manage and prevent the harmful impacts of flooding rivers for a long time. Artificial flood walls are being created, the riverbed is being dredged to make it deeper, and the river flow is being straightened. All of these techniques of management can be effective, however they frequently have negative repercussions for the river’s environment [7]. Two organizations, the International and National Environment Organizations, collaborate to safeguard the environment. Their task is challenging because rivers are more prone to wall of water, which occurs when rains or melting ice and snow cause a river’s level to rise very quickly, frequently much faster than in previous years [8]. Major floods generate headlines across the world due to the impact they have on human lives. This impact is measured in terms of loss of life, property, and daily life disturbance. Flood is one of the biggest problem. They occur most often in some densely populated areas. This could be due to climate change that creates high rainfall, which puts many cities in additional risk of flooding. Floods occur when heavy rainfall is prolonged given a small area and the water system could not cope with the increase in water prices [9]. Cataclysmic events strike around the world, and they can totally derange human wellbeing and the nation’s economy. The economy and development of any nation relies upon farming. The program has been exceptionally useful in ensuring the existences of people and creatures [10]. The practice of flooding in developing countries will continue go ahead unless a trust management system is well established ahead of time. However, due to lack of awareness, resources and appropriate approach, the problem could not be resolved as did the developed countries [11]. This poses a threat to developing countries emphasizes the urgent need to promote a faster, more technologically sustainable environment friendship and social acceptance call for good form and order opposition measures should be planned and implemented by the public according to them real needs and payments [12]. This work is based on the implementation of a middleware called VirtualCOM, which allows an application server to exchange information with distant sensors attached to
IoT Based Early Flood Detection and Avoidance System
557
a GPRS data unit (GDU). A GDU acts as if it were a cable connecting the remote sensors to the application server while using VirtualCOM. The database server is a web-based system that runs online applications written in PHP and JAVA and manages databases using MySQL. Users may examine real-time water conditions as well as water predictions straight from the web using a web browser or a WAP application. The created system has demonstrated the use of today’s sensors in monitoring real-time water parameters remotely [13]. The effect of fast urbanization on environmental change is the temperature increment brought about by structures and metropolitan exercises. This increment in temperature immediately affects the worldwide precipitation dispersion. It is obvious that the dissemination of storm precipitation is significantly impacted by various climate frameworks for example the Arctic Oscillation, Siberian High and Western Pacific Subtropical High, just as the perplexing Asian geology (Tibetan Plateau) [14]. Flood is the most devastating natural disaster Malaysia has ever seen. There are 189 waterway bowls throughout Malaysia, including Sabah and Sarawak, with the main channels flowing directly toward the South China Sea, and 85 of them are prone to flooding on a regular basis (89 of the stream bowls are in Peninsula Malaysia, 78 in Sabah and 22 in Sarawak). The calculated flood-prone area covers around 29,800 km2, or 9% of Malaysia’s total land area, and affects around 4.82 million people, or about 22% of the country’s total population [15]. The present conventional flood detection system has lack of efficient device to trig flood alert. Also, the department of irrigation and drainage could not predict when the flood will happen whether late in night or day. These existing products for flood detection are expensive, flimsy and wired, which is not suitable outdoor. Therefore, the present existing system is not best suited for the remote monitoring. Hence, a person has to go and check the water level manually, which is ultimately a time consuming one. In this presented work, a model which consisting of an ultrasound sensor that will convey the level of water is used. The proposed system is an IoT based one, which can be remotely monitored. The long range ultrasound which is used to check the water level upto 15 m–20 m is favorably used in this proposed system. The rest of this presented work is organized as the following: Sect. 2 describes the methodology and implementation of the system, Sect. 3 summarizes the results and its respective discussions and finally Sect. 4 concludes the work.
2 Methodology The main idea of the proposed approach is to integrate android and IoT platforms in order to realize a system that is dependable of easy to access at the same time. Flood Disaster Prevention Using Wireless Sensor Network is a proficient framework that oversees different characteristic components to anticipate a flood, so we can accept ourselves for alert, and minimize the flood damage. Catastrophic events may be be obliterating causing property harm and the loss of life. The framework uses many regular aspects to recognize flood in order to eliminate or reduce the impacts of the flood. Because the system includes a Wi-Fi network, the data it collects may be accessed from anywhere via IoT. The framework looks for numerous characteristics, such as moistness, temperature, water level, and stream level, to define a flood.
558
B. P. Prathaban et al.
The framework consists of several sensors that collect data for unique boundaries in order to acquire information on referred distinctive characteristics. The identification of the variations in the in moisture and temperature is done via including a temperature sensor (DHT11). This module with temperature identification and resistive damping components is very important in the proposed system for the detection of flood. For noting down the rising and falling of the water range, a reliable buoy sensor with open and close circuits are used in the proposed system. And stream sensor is used for tracking the water flowing towards the certain threshold level. An Ultrasonic Sensor (HC SR04) for determining the distance between any object and the sensor by via ultrasonic waves is present in this proposed model. NodeMCU, which cycles and saves information, is connected to each of the sensors. The framework has Wi-Fi highlight, which is valuable to get to the framework and its information over IoT. NodeMCU is a low cost platform for open source IoT. It is used as Wi-Fi network support. It also takes up less space because of its reduced size. It uses low power. Long Range Ultrasound is used to test the water level, which can measure up to 15 m–20 m. These sensors do not require much electricity, are easy to build, and are relatively inexpensive. A level sensor is used to rate the level within a specified distance. Two types of level sensor one is digital level sensor and another one is analog level sensor.
Fig. 1. Architecture of the Proposed System
The level sensor is a gadget intended to screen, store, and measure liquid levels. When the liquid level is distinguished, the sensor changes over the visual information into an electrical sign. With GPS Sensor, the exact location is found. GPS and IoT combine to create a complete, usable set of connected data. IoT looks at objects and hardware to give you real-time information and data about device performance while
IoT Based Early Flood Detection and Avoidance System
559
GPS provides physical links to the equipment or object. The whole system needs power and the internet to work so this is done using battery and Wi-Fi respectively. The Internet of Things applications have various prerequisites for distance availability, information move, power productivity and gadget cost. Data is uploaded to Ubidots with the help of the Ubidots library and API. After that the data will be displayed. After that the data will be displayed. Ubi new ideas and design stack was built to provide our clients with a secure, white glove first-hand knowledge. Gadget-friendly PC’s (connected through HTTP/MQTT/TCP/UDP standards) provide a simple and safe connection to transmit and retrieve data to and from our cloud administration on a constant basis. Figure 1 depicts the architecture of IoT based flood detection and avoidance system, where all the sensors such as GPS location sensor, water level sensor, long range ultrasound sensor and power supply battery all are connected with Node MCU micro-controller. With the help of GPS location sensor exact location is displayed on the dashboard.
3 Results and Discussion The plan stage will be provided according to our proposed approach since two sensors are employed at two distinct heights. The primary will be lowered to a lower setting. This is the typical level of flooding in the event of a potential flood. When the water level reaches this level, the water sensor is triggered, the data is sent to the microcontroller, and the warning is sent to Ubidots to the inhabitants, as a warning to be cautious and planned. If the water keeps going up and appears at the water tank sensor, it is now considered dangerous, and an alarm will be shown and delivered by the occupant and skilled specialists. Figure 2 depicts the implementation of sensors in proposed system and Fig. 3. Shows the hardware setup of proposed system.
Fig. 2. Implementation of Sensors in Proposed System
560
B. P. Prathaban et al.
In Fig. 2 and 3 power sensor which is connected to NodeMCU is used to ON and OFF the whole system. When the power is ON the GPS sensor and DHT11 sensor will gather the data and this data will be displayed on Ubidots dashboard. An Ultrasound sensor will measure the water level. Arduino IDE is used for implementation. It is an open access programming environment that is primarily used for writing and integrating code into the Arduino Ide. Sublime Text is a cross-stage source code editor with an Application Programming Interface that is available as a shareware programme (API). It supports a variety of programming languages and markup languages on the local level, and clients can beneficial via modules, which are frequently built and maintained in the local area under unrestricted licences. Fritzing is an open-source endeavour to promote novice or sideinterest CAD programming for the design of devices and tools, with the goal of assisting
Fig. 3. Hardware Setup of Proposed System
Fig. 4. Data displaying on Ubidot
IoT Based Early Flood Detection and Avoidance System
561
planners and experts who are ready to go from investigating several options for a model to creating a more lasting loop. The real-time data displaying on the Ubidot is shown in Fig. 4. When the water level is below the critical level, the output of Ubidot is no flood as depicted in Fig. 5. When the water level is above the critical level, the output of Ubidot is red alert indication and
Fig. 5. No Flood Indication
Fig. 6. Red Alert Indication
562
B. P. Prathaban et al.
it sends message to the nearby dams to close them immediately to avoid further flood overflow, as depicted in Fig. 6.
4 Conclusion The project can be successful in engaging users and keeping them informed of all the information needed during a disaster. This proposed system can be used in the department of irrigation and drainage, useful tool for flood Analysts, by public authorities, Also, the division of water system and drainage will get information and ready structure the gadget. The DOID will likewise control the focal preparing of information. Therefore, the division can acquire data about the water rising and expect when the flood may occur. In addition there is a large development room such as repeated years of data from the cloud that can be given to artificial intelligence or machine learning. Algorithms such as line deforestation and rain and flood forecasts can be developed early which will help authorities and users to take the necessary precautionary measures. Disasters, as the name implies, wreak havoc on lives and property indiscriminately throughout the world. Developing countries face greater damage than developed countries and do not have the capacity to deal with the consequences of these disasters. Floods are not easy to predict, but we are trying to create a system that tries to detect floods and provide proximity to nearby people.
References 1. Udo, E., Isong, E.: Flood monitoring and detection system using wireless sensor network. Research Gate 10(5), 767–782 (2014) 2. Mallisetty, J.B., Chandrasekhar, V.: Internet of Things based real time flood monitoring and alert management system, 11, 34–39 (2012) 3. Becker, R.: A future of more extreme floods, brought to you by climate change (2017) 4. Sunkpho, J., Oottamakorn, C.: Real-time flood monitoring and warning system. Songklanakarin J. Sci. Technol. 33, 227–235 (2011) 5. Yen, Y.L., Lawal, B., Ajit, S.: Effect of climate change on seasonal monsoon in Asia and its impact on the variability of monsoon rainfall in Southeast Asia. Geosci. Front. 6(6), 817–823 (2014) 6. Department of Irrigation and Drainage Malaysia 2011 Flood phenomenon, flood mitigations publication & Ministry of Natural Resources and Environment 7. Yan J., Fang, Z., Zhou, Y.: Study on scheme optimization of urban flood disaster prevention and reduction. In: International Conference on Intelligent and Advanced Systems 25–28 Nov Kuala Lumpur pp. 971–976 (2017) 8. Pratim, P., Mukherjee, M.: Internet of Things for Disaster Management: State-of-the-Art and Prospects (2014) 9. Choubin, B., Khalighi-Sigaroodi, S., Malekian, A., Ahmad, S., Attarod, P.: Drought forecasting in a semi-arid watershed using climate signals: a neuro-fuzzy modeling approach. J. Mt. Sci. 11(6), 1593–1605 (2014). https://doi.org/10.1007/s11629-014-3020-6 10. Choubin, B., Khalighi-Sigaroodi, S., Malekian, A., Multiple linear regression, multi-layer perceptron network and adaptive neuro-fuzzy inference system for forecasting precipitation based on large-scale climate signals. Hydrol. Sci. J. 61, 1001–1009 (2016)
IoT Based Early Flood Detection and Avoidance System
563
11. Dineva, A., Várkonyi-Kóczy, A.R., Tar, J.K.: Fuzzy expert system for automatic wavelet shrinkage procedure selection for noise suppression. In: Proceedings of the 2 014 IEEE 18th International Conference on Intelligent Engineering Systems (INES), Tihany, Hungary, 3–5 July 2014, pp. 163–168 (2014) 12. Hashi, A.O., Hashim, S.Z.M., Anwar, T., Ahmed, A.: A robust hybrid model based on KalmanSVM for bus arrival time prediction. In: Saeed, F., Mohammed, F., Gazem, N. (eds.) Emerging Trends in Intelligent Computing and Informatics: Data Science, Intelligent Information Systems and Smart Computing, pp. 511–519. Springer International Publishing, Cham (2020). https://doi.org/10.1007/978-3-030-33582-3_48 13. Tiwari, M.K., Chatterjee, C.: Development of an accurate and reliable hourly flood forecasting model using wavelet–bootstrap–ANN (WBANN) hybrid approach. J. Hydrol. 394, 458–470 (2 010) A Real Time Flood Detection System 373 14. Amir Mosavi, K.-W.C.: Review flood prediction using machine learning models. Water 2018, 1–41 (2018) 15. Hameed, S.S., et al.: Filter-wrapper combination and embedded feature selection for gene expression data. Int. J. Adv. Soft Comput. Appl. 10(1), 90–105 (2018) 15. Sajedi-Hosseini, F., Malekian, A., Choubin, B., Rahmati, O., Cipullo, S., Coulon, F., Pradhan, B.: A novel machine learning-based approach for the risk assessment of nitrate groundwater contamination. Sci. Total Environ. 644, 954–962 (2018)
Hybrid Model to Detect Pneumothorax Using Double U-Net with Segmental Approach P. Akshaya and Sangeetha Jamal(B) Rajagiri School of Engineering and Technology, Cochin, India [email protected]
Abstract. A revived lung is consistently referred to as “Pneumothorax”. Pneumothorax is detected using chest X-ray image of the patient.AI identification algorithms are really helpful in the field of medicine to clearly diagnose the disease. The model designed here has used these AI methods to the segments of X-ray image and to predict pneumothorax areas on radiographs. The regions of pneumothoraxes in the chest X-ray pictures are detected using a deep learning neural network model. Here we using dual U-Net method to segment and find areas of pneumothorax. It is a to determine whether an image has a pneumothorax or not. An X-ray mask projection showing the presence or absence of pneumothorax in X-rays. This Deep Convolution Neural Network (DCNN) model helps to detect pneumothorax accurately with Dice coefficient of 0.062133 for 10 epochs. This model would be useful for medical experts to identify pneumothorax in chest X-rays quickly. Keywords: Pneumothorax Detection · Artificial Intelligence · Deep Learning · Dual U-Net · Segmentation
1 Introduction Artificial intelligence has conquered almost every industry.AI can assist non-radiologists make greater dependable diagnoses and prioritize interpretation of X-ray images [1]. For instance, systems designed with artificial intelligence can identify a pneumothorax in a chest radiograph. The task is to detect pneumothorax masks in chest X-ray [3].This model is beneficial for diagnosing pneumothorax. Solving this problem demands the use of deep knowledge methods used in unstructured files, such as visual and audio related. In this case, we have a disorganized image file in the form of various X-ray images [2]. Image segmentation is used to classify pixel region from an image. There are two different types of segmentation: Semantic segmentation and Instance segmentation. • Semantic Segmentation: This technique uses the same color to segment all pixels of the same type. For example, detecting the presence and background of people. • Instance Segmentation: In this form each corresponding pixel or object is split into different distorted piece [3]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 564–573, 2023. https://doi.org/10.1007/978-3-031-35501-1_56
Hybrid Model to Detect Pneumothorax
565
Semantic segmentation uses U-shaped network(U-Net) to segment pixel regions in an image. The U-Net model is specially for biological image separation. Its construction is frequently depicted as an encoder network understood by a decoder network. [6].
2 Proposed Method 2.1 System Design The Society for radiology in Medicine-American College of Radiology (SIIM-ACR) Pneumothorax dataset is used by the system to detect pneumothorax. This dataset contains X-ray images in the Digital Imaging and Communications in Medicine (DICOM) image format. During the data preprocessing stage, the DICOM pictures are converted into PNG (Portable Network Graphics) format. Next, the mask regions are identified in the dataset [4]. Then, we create a deep learning method to detect pneumothorax from X-rays (Double U-Net) (Fig. 1).
Fig. 1. System Design
2.2 Dataset The dataset contains DICOM images and its information as Comma-Separated Values (CSV) files, which will be for both training and testing [5]. In the CSV file holds the
566
P. Akshaya and S. Jamal
equivalent Run Length Encoding (RLE) masks, and also holds the concept X-ray IDs (Fig. 3). The data resides as of DICOM figures (Fig. 2) in addition to picture IDs and run-length encoded (RLE) masks as annotations. Several of the images show positive evidence of pneumothorax.
Fig. 2. DICOM images
That is shown in the annotations as two-fold masks. The existence of presence of pneumothorax, shows a mask value corresponding to that region otherwise it will be −1, or an empty mask.
Fig. 3. Information about data
Hybrid Model to Detect Pneumothorax
567
2.3 Data Preprocessing The X-ray images are in.dcm format, which must be converted to.png format to train the model. In addition, appropriate masks must be created for each image [5]. Conversion from DCM to PNG We have to convert the DICOM files to PNG format. (Fig. 4).
Fig. 4. PNG images
Mask Creation The data contains Run-Length Encoded (RLE) masks. The identical data value exists in multiple succeeding data in RLE knowledge sequences. Instead of the initial run, the data items are kept as a single data value and a single count [10]. Consider instances of unequivocal drawings like images, line illustrations, and animations. Given an input “aaabccc”, then the RLE function should return its RLE value as “a3b1c3”. Example Actual-RLE: a3b1c3. Input: aaabccc. Estimated-RLE: a3b1c3 The files are all in the.png format. Ground truth masks must be constructed for each image in the training dataset. We need to convert our RLE Pixels style masks into.png images. In Fig. 5, the white region denotes the pneumothorax area (mask), and the black region denotes the absence of the pneumothorax.
568
P. Akshaya and S. Jamal
Fig. 5. Mask Creation
Since it is difficulty to identify pneumothorax on actual X-ray leading to the existence of pneumothorax identified by producing a mask image [12]. 2.4 Deep Learning Models Vanilla U-Net A convolutional interconnected system named U-Net, is used for the for image segmentation. It is a Fully convolutional network. It is a U-shaped network having encoding and decoding path. A 512x512 image will take small time to segment image area on modern GPU [8].
Fig. 6. U-Net Architecture for pneumothorax detection
Hybrid Model to Detect Pneumothorax
569
New weights are used to train the U-Net (Fig. 6). The segmentation method establishes the convolutional blocks for feature mapping from the input X-ray images [14]. Double U-Net The Double U-Net model, uses two U-Nets to capture the semantic separation. The Double U-Net encoder is a VGG19, and a decoder subnet following it (Fig. 7). The updated U-Net (Network 1) gets the input X-ray image via the network and generates the expected masks (i.e., output 1). The second modified U-net (Network 2) uses the input X-ray image and the produced masks (i.e., output 1) as inputs to produce an additional mask (output 2). The two-word masks (output 1 and output 2) are connected to generate the final word mask (output) [8]. When radiologists are active, they usually disregard delicate and narrow pneumothorax, but the segmental approach acts better at detecting it [13] (Table 1).
Fig. 7. Double U-Net
570
P. Akshaya and S. Jamal Table 1. Parameter settings for Double U-Net Architecture
Layer(Type)
Output Shape
Parameters
block1_conv1 (Conv2D)
(None, 64, 256, 256)
1792
block1_conv2 (Conv2D)
(None, 64, 256, 256)
36928
block1_pool (MaxPooling2D)
(None, 64, 128, 128)
0
block2_conv1 (Conv2D)
(None, 128, 128, 12)
73856
block2_pool (MaxPooling2D)
(None, 128, 64, 64)
0
conv2d (Conv2D)
(None, 64, 1, 1)
16448
3 Result 3.1 Mask Detection The model accurately recognizes the pneumothorax region on radiographs [11]. In Fig. 8 and Fig. 9 we can see the areas where that pneumothorax region (mask region) exists in the input image. The X-ray’s having pneumothorax is indicated by the red shade.
Fig. 8. Mask Detection for small region
Fig. 9. Mask Detection for large region
Hybrid Model to Detect Pneumothorax
571
3.2 Evaluation Metric Dice coefficient Dice coefficient gives the F1 score. It is two times of the overlap region divided by total number of pixels in two images. This gives the similarity between the ranges of 0 to 1 of the input X-ray images and predicted output image. Here it indicates positive correlation between actual mask area and predicted mask area[9]. Dice coefficient = (2 * intersection) / (Total number of pixels in input and output image). Total number of pixels in input and output image = (y_true) + (y_pred). where intersection indicate the overlapping area of actual mask and predicted mask. y_true is the actual input image and y_pred is the predicted output image. Loss Metric The loss metric is used to find the entropy. The loss functions are picked in accordance with the dataset and the issue to be resolved. The dice loss and binary cross-entropy are the best combinational coefficient functions to utilize when studying separation models. Loss Metric = 1.0 - dice coefficient. Our model presented a dice coefficient of 0.062133 in 10 epochs. According to the study of the dice coefficient each concept in the dataset, accompanying an extreme score are the one that have plenty pixels. By analyzing the available dataset, it is observed that 25% of patients are having pneumothorax, whereas 75% patients are healthy [7].
Fig. 10. Testing – Pneumothorax detection
Since there is chance for weight imbalance, we added more weight to the foreground pixels and less weight to the background pixels (Fig. 10). We manage to train our model for 10 epochs on account of period restraints, accordingly we were not able to search some further answers to the data inequality. The time complexity was also huge. However, we can achieve high performance by training the model using 300–1000 epochs on a higher GPU machine [15]. Advantage of segmentation model is the training data has a significantly higher number of patches than the training image count. And disadvantage is due to overlapping patches and the need to run the network independently for each patch, it is quite slow.
572
P. Akshaya and S. Jamal
4 Conclusion Semantic segmentation of pneumothorax X-rays can be teased easily using the method described. Due time and space complexity the actual required accuracy was not achieved. Better results can be obtained with higher GPU machine. Unable to continue training this model for more epochs due to the lack of reliable computational resources. We will receive a better prediction if it is trained for more epochs.
Fig. 11. Dice coefficient graph
The above graph (Fig. 11) shows the dice coefficient values for the model we created. It reduces the time and effectiveness of clinical practice and patient care. It can easily be adapted to different diagnoses. Furthermore, different kind images like ultrasound, MRI, and CT scan figures can be used in the model to identify diseases in future. The model can be further developed for better accuracy by augmenting different techniques. The classification model can be trained using images along with their associated class labels. Different models can be combined in the architecture to enhance the performance. As a result, pneumothorax can be identified in chest X-ray images utilizing a variety of semantic image segmentation models, such as UNet, DeepLab, etc. These models will be applied in additional research to improve the segmentation accuracy of pneumothoraxes. The capacity of these deep learning models to automatically segment pneumothorax on CXR images will benefit the health department because it will enable early diagnosis of the illness. It might help medical professionals make crucial treatment decisions.
References 1. Folke, T., Yang, S.C-H., Anderson, S., Shafto, P.: Explainable AI for medical imaging. In: Explaining pneumothorax diagnoses with Bayesian Teaching. Proc. SPIE 11746, 3660–3671 (2021)
Hybrid Model to Detect Pneumothorax
573
2. Kim, M., Kim, J.S., Lee, C., Kang, B-K.: Detection of pneumoperitoneum in the abdominal radiograph images using artificial neural networks. ejro.2020.100316. eCollection, 82–115 (2021) 3. Chan, Y.-H., Zeng, Y.-Z.: Effective pneumothorax detection for chest X-ray images using support vector. IEEE Trans. Med. Imaging 30(3), 733–746 (2020) 4. Lindsey, T., Lee, R., Grisell, R., Vega, S., Veazey, S.: Automated pneumothorax diagnosis using deep neural networks. Am. J. Emerg. Med. 35(9), 1285–1290 (2020) 5. Haritsa V.K.T ., Raju, N., Kishore Rajendra , G.K., Rao. P.: Pneumothorax detection and classification on chest radiographs using artificial intelligence. Appl. Sci. 10(11), 3777 (2021) 6. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W., Frangi, A. (eds) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. MICCAI 2015. Lecture Notes in Computer Science, vol. 9351. Springer, Cham (2015). https://doi.org/10.1007/978-3-31924574-4_28 7. Lou, A., Guan, S., Loew, M.: DC-UNet: pneumothorax detection in chest radiographs: optimizing artificial intelligence system for accuracy and confounding bias reduction using in-image annotations in algorithm training. J. Electron. Imaging, 99–111 (2021) 8. Tolkachev, A., Sirazitdinov, I, Kholiavchenko, M., Mustafaev, T., Ibragimov, B.: Deep learning for diagnosis and segmentation of pneumothorax: the results on the Kaggle competition and validation against radiologists. IEEE J. Biomed. Health Inform. 25, 1660–1672 (2021) 9. Angeline, R., Vani, R.: ResNet: a convolutional Neural Network for detecting and diagnosing of coronavirus pneumonia. In: 2021 IOP Conference Series Materials Science and Engineering. vol. 1084, p. 012011 (2021) 10. Kundu, R., DasID, R., Geem, Z. W., Han, G-T., Sarkar, R.: Pneumonia detection in chest X-ray images using an ensemble of deep learning models. In: IEEE Transactions on Medical Imaging, pp. 66–134 (2021) 11. Philbrick, K.A., Weston, A.D., Akkus, Z.: RIL-Contour: a medical imaging dataset annotation tool for and with deep learning. J. Digit. Imaging 32, , 571–581 (2019) 12. Wang, Z., Qinami, K., Karakozis, I.C.: Towards fairness in visual recognition effective strategies for bias mitigation. arXiv:191111834 [cs] (2019) 13. Wang, A., Narayanan, A., Russakovsky, O.: ViBE: a tool for measuring and mitigating bias in image datasets. arXiv:200407999 (2020) 14. Tolkachev, A., Sirazitdinov, I., Kholiavchenko, M., Mustafaev, T., Ibragimov, B.: Deep learning for diagnosis and segmentation of pneumothorax: te results on the Kaggle competition and validation against radiologists. IEEE J. Biomed. Health Inform. 25, 1660–1672 (2021) 15. Wang, H., Gu, H., Qin, P., Wang, J.: CheXLocNet: automatic localization of pneumothorax in chest radiographs using deep convolutional neural networks. PLoS ONE 15, e0242013 (2020)
An Approach to Identify DeepFakes Using Deep Learning Sai Siddhu Gedela, Nagamani Yanda(B) , Hymavathi Kusumanchi, Suvarna Daki, Keerthika Challa, and Pavan Gurrala GMR Institute of Technology, GMR Nagar, Rajam 532127, India [email protected]
Abstract. Deepfakes are the manipulated images and videos that are created by performing a face swap using various Artificial Intelligence tools. This creates an illusion that someone either said something that they did not say or are someone they are not. This incorrect way of using the deepfake technology leads to severe consequences. Some scenarios like creating political tension, fake terrorism events, damaging image and dignity of people are some of the negative impacts created by these deep fakes. It is difficult for a naked human eye to detect the results of these deepfake technology. Thus, in this work, a deep learning-based method that can efficiently differentiate deepfake videos is developed. The proposed method uses Res-Next pretrained Convolutional neural network for extracting features from the frames that are divided from the input video. The extracted features are then utilized to train a Long Short Term Memory (LSTM) network to differentiate whether the input video taken from the user is subject to any manipulation or not. To make the model perform better on real time data, it is trained with a balanced dataset. The dataset used to train the model is Deepfake detection challenge dataset. The results of the proposed model can predict the output with a better accuracy. This method can thus encounter the threats and danger produced from the deepfake technology to the society. Keywords: Deepfakes · Deep learning · Artificial Intelligence (AI) · Long Short Term Memory (LSTM) · Res-Next Convolutional Neural Network
1
Introduction
In the present world of rapid growing social media platforms, Deepfakes are considered as a major threat of Artificial Intelligence. The rapid increase in usage of current technology and fast-growing reach of various social media platforms has made the creation and sharing of videos very easier than ever before. Techniques like GANs and autoencoders which are the result of a combination of deep learning as well as computer vision are used to develop very realistic fake images and videos called as Deepfakes. This technology efficiently learns how a certain c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 574–583, 2023. https://doi.org/10.1007/978-3-031-35501-1_57
An Approach to Identify DeepFakes Using Deep Learning
575
face looks like in different angles and transposes the target face onto a source image or video. It is very much advanced and has a wide variety of applications in both video games and cinema industry for improving visual effects. However, there are various real-life scenarios where these face swapped deepfakes are used to create political tension, false terrorism events and blackmail common people. Creation of these deepfakes and circulating them on social media reduces people’s trust on what they see. Some real-life examples reveal a deepfake video clip of former United States president Barack Obama where he was seen insulting Donald Trump. Another instance of this technology is when two male artists and their advertising company had created a deepfake video of Facebook CEO Mark Zuckerberg in which he says things that he never said and it was uploaded on Instagram. The deepfakes creation process is crucial for identifying fake videos realistic Fig. 1 represents creation of deepfakes and sample real images and their deepfakes. It is very important to spot the difference between the deepfake and an original video. Deepfakes are created using tools like FaceApp and Face Swap, which use pre-trained neural networks like generative adversarial network or auto encoders for these deepfakes creation. Therefore, it is very clear that this Deepfake technology is a serious safety concern and the necessary to create the techniques which are able to detect and counteract it is needed.
a) Creation of deepfakes
b) Few real images and deepfakes of them
Fig. 1. Creation of deepfakes, Few real images and deepfakes
The proposed method uses a Long Short Term Memory (LSTM) network to process the video frames and a pretrained ResNext CNN to extract the frame level features. The pre-trained Convolution neural network model ResNext
576
S. S. Gedela et al.
extracts the various frame level features and all these extracted features are then utilized to train a Long Short Term Memory based RNN to differentiate the input video as a deepfake or real one. To work well on the real-life scenarios and to let the model perform well on real time data, the proposed method is trained with Deepfake detection challenge dataset.
2
Related Works
The Expectation-Maximization algorithm serves as the foundation for this method, which has been taught to find and extract images that show Convolutional Traces (CT) during image formation. The Convolutional Traces have strong discriminative power and produce successful results. On well-known realimage forensics datasets as DRESDEN, UCID, or VISION more work can be done. [1]. The author created for detecting facial landmarks is applied to extract the face images from the films, which are then fed into a multi-layer perceptron. By accounting for several additional factors, such as people of race, it would be possible to devise tactics to improve the model’s effectiveness. [2]. A 3-dimensional convo-lutional neural network model a huge one which can learn the spatio-temporal information from a series of video frames has been proposed by the author of the latest research. The suggested model can be used to identify many forms of facial re-enactments, including those produced by tools like Face2Face or NeuralTextures [3]. In this study, the author put out a method for using Deep Learning technology to identify fake images that relies on the movement of the mouth and teeth. The use of hybrid classification, which combines different classifier combinations, will be used to improve results and make them more reliable and precise [4]. The suggested model addresses face modified video detection from a set perspective, and the author suggests set convolutional neural network (SCNN), a unique architecture built on sets. The network merges the features between video frames in addition to learning feature maps of the facially altered images. The suggested method is capable of simultaneously using multiple input video frames [5]. In order to identify DeepFake videos, the author proposed an architecture that uses spatiotemporal aspects. An LSTM layer follows a CNN model, which is surrounded by a distributed layer. The output of the LSTM layer is then fed into a convolution layer to reframe the deepFake detection task difficulty as a binary classification problem [6]. The author suggests using integrity verification to track major alterations in eye blinking patterns can be used to spot deeper fakes. When eye movements were repeatedly repeated in a brief period of time, the author developed the DeepVision model, that is used to verify an irregularity based on the period, recurrent number, and elapsed blinking time [7]. To identify the deep fakes and audio spoofs, the author suggested combining bidirectional recurrent structures, entropy-based optimization methods, and convolutional latent models. A recurrent framework was used to process the data from the audio and video recordings in order to further detect deepfakes in their spatial and temporal manifestations [8]. The suggested approach relies on CNNs that have been trained to identify potential motion
An Approach to Identify DeepFakes Using Deep Learning
577
discrepancies in the organization of a video sequence for deep-fake detection. The video frames are first analyzed to estimate the optical flow fields, and then a square box of 300 X 300 pixels encompassing the subject’s face is trimmed [9]. The primary objective of the suggested method is to determine whether Genuine or false photo-graphs are posted on the WhatsApp app. It extracts image content-based attributes, temporal aspects, and other information based on the individuals who shared the photographs and social context features [10]. A technique is proposed to detect different types of deepfake images making use of three common traces that are generated by residual noise, warping artifacts and also blur effects. For the detection of the three types of deepfake techniques like face swap, puppet-master, and attribute change a generalized detection method is pro-posed. [11]. Considering entire facial region and input, selection of only specific facial regions are two approaches considered. Long term Recurrent convolutional networks (LRCN) are proposed to capture temporal dependencies in the human eye blinking. [12]. Facial forgery detection method is proposed by SegNet which segments images to patches and also identifies the features of each patch of convolutional neural networks (CNN). Based on the identified features the categories of patches are determined [13]. The blending mask’s borders, which are subsequently classed as real or artificial are identified by 3- dimensional morphable model(3DMM). A combined of face identification network and context recognition network is involved [14]. In the present work a video is firstly divided into various segments selected frames of each segment is fed into two different streams. By using better SRM filters the model can be designed [15]. An effective solution for the growing danger of deep fakes is to detect the synthetic content from portrait videos. The proposed approach can be extended to detect videos of various time duration [16]. The approach two stream method is proposed by significantly analyzing frame level and temporal levels of the compressed deepfake videos. Video compression causes lot of redundant information [17]. The multi-scale texture difference model which is further named as MTD-Net for more robust face forgery detection. Unknown face manipulation, the model is trained further [18]. This is an effective method for detecting forged faces and also to simultaneously locate the manipulated regions. The image’s high level semantic information clue is delivered by the segmentation map, which is the dependent model for the present method. Better results are produced by unseen face manipulation techniques adapted from the proposed method [19]. In image generation process, for tracking potential texture trace multi scale self texture attention generative network(MSTA-Net) is proposed. The input into the classifier is the merged image of a generated trace image and an original map [20] and other realted works [21–30]
3
Methodology
The present method for deepfakes detection is developed using a combination of Convolu-tional Neural Networks and the Recurrent Neural Networks. The proposed model takes a video as an input from the user and determines whether
578
S. S. Gedela et al.
the video is real or is subjected to any kind of manipulation. Initially, the input video which is taken from the user undergoes various preprocessing stages and the output from preprocessing is a face cropped video which is sent to a pretrained ResNext CNN model for extracting features. Then a Long Short Term Memory (LSTM) network is developed for classifying the input video as a real one or a deepfake. The present model is developed through the below mentioned stages They are: Dataset Gathering, Pre-processing, Dataset Splitting, Model Training and Hyper parameter tuning All the five steps mentioned above are implemented sequentially one after the other. Each of these are elaborated as follows: 3.1
Dataset Gathering
The input data for this study was derived from a dataset. An open dataset is used for the current work meaning that anyone can use it. The dataset used to develop the proposed method is DFDC (Deepfake Detection Challenge) dataset which was obtained from Kaggle. This dataset was the most comprehensive publically available for training our model. The entire dataset is over 470 GB which consists of more than fifteen thousand videos. It is made available as a single huge file which is further divided into 50 smaller files where each of them is 10 GB in size. The present work is developed and trained using one of the smaller files which consists a total of 802 files including extensions of type mp4, json and csv. The dataset contains two folders namely train and test where each folder contains 400 videos in total. The train folder present in the dataset also contains a metadata.json which consists of the filename and label of each video. 3.2
Pre-processing
Pre-processing is the initial step done to detect the existence of deepfakes. Itis required to make sure that the dataset is consistent and only meaningful data is present inside it. This step is needed in order to reduce the workload of the further steps that follow in the current work Fig. 2, represented as the Overall flow of pre-processing stages. Firstly, the input video taken from the user is converted into frames. Next, the face in each frame is detected and is also cropped. Finally all the face cropped frames are again combined into a new video. This step is repeated until all the videos are face cropped and converted into a new dataset with face only data. Frames of the input video which do not have face within them are ignored. A video of 10 s at thirty frames per second contains a total of 300 frames. However a large computational power is required to process all these frames. For efficient classification of deepfakes, frames are considered only sequentially and are sent as an input for the Long Short Term Memory. The newly created face cropped videos which are a result of the pre-processing stage are saved at a resolution of 112 × 112 and with a 30 frames per second frame rate.
An Approach to Identify DeepFakes Using Deep Learning
579
Fig. 2. Overall flow of pre-processing stage
Dataset Splitting. The output of the pre-processing stage which are only face cropped videos are Splitted into train and test dataset during this stage. A normal ratio of 70% for train videos and 30% for test videos is done during the splitting. 3.3
Model Training
The present method for deepfakes detection is developed using a combination of both Con-volutional Neural Networks and also Recurrent Neural Network. Features from the face cropped videos are extracted using a pretrained CNN model namely ResNext. These extracted features are sent as an input for LSTM network for efficient classification of deep-fakes. ResNext CNN. A pretrained CNN model is used for extracting frame level features from the input video instead of writing the code from scratch. For greater performance on deeper neural networks, an optimized residual network is used in the present work. A resnext50 32 × 4d model is used which consists of 50 layers and has 32 × 4 dimensions. The model is then fine-tuned by adding extra layers and a proper learning rate is selected in order to converge the gradient descent of the model. At the end of the last pooling layers of ResNext model a 2048-dimensional feature vector is generated which is sent as a sequential input for the LSTM network. LSTM Network. The 2048-dimensional output feature vector which is obtained from ResNext is sent as an input to the LSTM. A single LSTM layer is used for the proposed method with 2048 hidden layers and 0.4 chance of dropout. Temporal analysis is one by LSTM by sequentially processing the frames. Each frame of t seconds time is compared with a frame of t-n seconds. A Relu activation function also exists along with the model and an average pooling layer
580
S. S. Gedela et al.
Fig. 3. Proposed Model Architecture
is also used which gives the target output size. A sequential layer is used for processing the frames sequentially. In the end, a SoftMax layer was utilized to get the confidence of pre-diction. 3.4
Hyperparameter Tuning
In order to achieve efficient and accurate model, perfect hyperparameters are to be chosen. Adam optimizer with model parameters are used to enable the adaptive learning rate. The learning rate and weight decay used for the proposed model is 0.00001 to achieve a better global minimum of the gradient descent. Batch training is used to make proper use of the computational power that is available. The loss cross entropy approach is also used as the current problem is a classification problem. A final overview of the entire proposed model architecture is visualized in the Fig. 3.
4
Results
A total of 800 videos from the Deepfake Detection Challenge dataset were used to im-plement the proposed model. The dataset is split in a ratio of 70% train videos and 30% test videos. After splitting the dataset, the train videos consist of 636 videos whereas the test videos consist of 160 videos. The train videos further comprise of 319 real and 317 fake videos whereas the test videos contain 79 real videos and 81 fake videos. While training the model, 70% of the total dataset was used for training purpose and the following graphs illustrate that different losses during the training period of the dataset was decreasing which shows that the current model metrics were accurate.
An Approach to Identify DeepFakes Using Deep Learning
581
Fig. 4. GUI for taking real video and predicting the input video as original and fake video and predicting the input video as deepfake
A summary of prediction results on the proposed model is done using a confusion ma-trix. It gives the number of correct and wrong predictions that are done by the model. It gives an overview about the errors that are being made by the classification model. It also demonstrates the ways in which the proposed model is confused while making predictions. Overall, the confusion matrix is used to evaluate the proposed model. The below figure gives a demonstration of the confusion matrix that is obtained after implementing the proposed model. The proposed model identifies face from input video and classifies it even if the lighting conditions are a bit dark as shown as Fig. 4. Unlike the previous exisiting algorithms which cannot identify between a black person and a gorilla, the current model gives even better classification accuracy when an altered video of a black person is given as an input. Also, unlike various programs that identify photographs that have been modified with Photoshop, the existing model identifies whether a video is manipulated or not by processing it into frames and by extracting frame level features from it.
5
Conclusion and Future Scope
Due to the rapid increase in technology, deepfakes are becoming a major threat from common people to popular celebrities. Improper usage of deepfake technology is creating panic among people and is also leading to loss of lives. Thus, in order to identify whether a given video is original or a deepfake, in the current work, a deep learning model which is a combination of both CNN and RNN is developed. Initially, the input video is preprocessed and is sent to a pretrained
582
S. S. Gedela et al.
ResNext model for feature extraction. The output of the ResNext CNN model is sent as an input to the LSTM network for efficient classification. The model is evaluated using a confusion matrix. The results show that the proposed model efficiently classifies a real video from a manipulated one. This improvement can thus stop fake videos being spread throughout social media and other platforms which as a result does not create any panic among people. The proposed model focused only on face swapped deepfakes detection which can be further extended to identify even full body deepfakes. Also, accurate identification of deepfake videos which consists of multiple persons can be done.
References 1. Guarnera, L., Giudice, O., Battiato, S.: Fighting deepfake by exposing the convolutional traces on images. IEEE Access 8, 165085–165098 (2020) 2. Kolagati, S., Priyadharshini, T., Rajam, V.M.A.: Exposing deepfakes using a deep multilayer perceptron-convolutional neural network model. Int. J. Inf. Manage. Data Insights 2(1), 100054 (2022) 3. Nguyen, X.H., Tran, T.S., Nguyen, K.D., Truong, D.T.: Learning spatio-temporal features to detect manipulated facial videos created by the deepfake techniques. Forensic Sci. Int. Digit. Invest. 36, 301108 (2021) 4. Elhassan, A., Al-Fawa’reh, M., Jafar, M.T., Ababneh, M., Jafar, S.T.: DFT-MF: enhanced deepfake detection using mouth movement and transfer learning. SoftwareX 19, 101115 (2022) 5. Xu, Z., et al.: Detecting facial manipulated videos based on set convolutional neural networks. J. Vis. Commun. Image Represent. 77, 103119 (2021) 6. Singh, A., Saimbhi, A.S., Singh, N., Mittal, M.: DeepFake video detection: a timedistributed approach. SN Comput. Sci. 1(4), 1–8 (2020) 7. Jung, T., Kim, S., Kim, K.: Deepvision: Deepfakes detection using human eye blinking pattern. IEEE Access 8, 83144–83154 (2020) 8. Chintha, A., et al.: Recurrent convolutional structures for audio spoof and video deepfake detection. IEEE J. Select. Top. Sig. Process. 14(5), 1024–1037 (2020) 9. Caldelli, R., Galteri, L., Amerini, I., Del Bimbo, A.: Optical Flow based CNN for detection of unlearnt deepfake manipulations. Pattern Recogn. Lett. 146, 31–37 (2021) 10. Kaur, M., Daryani, P., Varshney, M., Kaushal, R.: Detection of fake images on whatsApp using socio-temporal features. Soc. Netw. Anal. Min. 12(1), 1–13 (2022) 11. Kang, J., Ji, S.K., Lee, S., Jang, D., Hou, J.U.: Detection enhancement for various deepfake types based on residual noise and manipulation traces. IEEE Access 10, 69031–69040 (2022) 12. Tolosana, R., Romero-Tapiador, S., Vera-Rodriguez, R., Gonzalez-Sosa, E., Fierrez, J.: DeepFakes detection across generations: Analysis of facial regions, fusion, and performance evaluation. Eng. Appl. Artif. Intell. 110, 104673 (2022) 13. Yu, C.-M., Chen, K.-C., Chang, C.-T., Ti, Y.-W.: SegNet: a network for detecting deepfake facial videos. Multimedia Syst. 28, 793–814 (2022). https://doi.org/10. 1007/s00530-021-00876-5 14. Nirkin, Y., Wolf, L., Keller, Y., Hassner, T.: “Deepfake detection based on discrepancies between faces and their context”. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2021)
An Approach to Identify DeepFakes Using Deep Learning
583
15. Han, B., Han, X., Zhang, H., Li, J., Cao, X.: Fighting fake news: two stream network for deepfake detection via learnable SRM. IEEE Trans. Biomet. Behavior Identity Sci. 3(3), 320–331 (2021) 16. Ciftci, U.A., Demir, I., Yin, L.: Fakecatcher: detection of synthetic portrait videos using biological signals. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020) 17. Hu, J., Liao, X., Wang, W., Qin, Z.: Detecting compressed deepfake videos in social networks using frame-temporality two-stream convolutional network. IEEE Trans. Circuits Syst. Video Technol. 32(3), 1089–1102 (2021) 18. Yang, J., Li, A., Xiao, S., Lu, W., Gao, X.: MTD-Net: learning to detect deepfakes images by multi-scale texture difference. IEEE Trans. Inf. Forensics Secur. 16, 4234–4245 (2021) 19. Kong, C., Chen, B., Li, H., Wang, S., Rocha, A., Kwong, S.: Detect and locate: exposing face manipulation by semantic-and noise-level telltales. IEEE Trans. Inf. Forensics Secur. 17, 1741–1756 (2022) 20. Yang, J., Xiao, S., Li, A., Lu, W., Gao, X., Li, Y.: MSTA-Net: forgery detection by generating manipulation trace based on multi-scale self-texture attention. IEEE Transactions on Circuits and Systems for Video Technology (2021) 21. Miao, C., Tan, Z., Chu, Q., Yu, N., Guo, G.: Hierarchical frequency-assisted interactive networks for face manipulation detection. IEEE Transactions on Information Forensics and Security (2022) 22. Zhang, L., Qiao, T., Xu, M., Zheng, N., Xie, S.: Unsupervised learning-based framework for deepfake video detection. IEEE Transactions on Multimedia (2022) 23. Pan, D., Sun, L., Wang, R., Zhang, X., Sinnott, R.O.: Deepfake detection through deep learning. In 2020 IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT), pp. 134–143. IEEE (2020) 24. Srikantrh, P., Behera, C.K.: A Machine Learning framework for Covid Detection Using Cough Sounds. In: 2022 International Conference on Engineering MIS (ICEMIS), pp. 1–5. IEEE (2022) 25. Srikanth, P., Behera, C.K.: An Empirical study and assessment of minority oversampling with dynamic ensemble selection on COVID-19 utilizing blood sample. In 2022 International Conference on Engineering & MIS (ICEMIS), pp. 1–7. IEEE (2022) 26. Srikanth, P.: An efficient approach for clustering and classification for fraud detection using bankruptcy data in IoT environment. Int. J. Inf. Technol. 13(6), 2497– 2503 (2021) 27. Srikanth, P., Rajasekhar, N.: A novel cluster analysis for gene-miRNA interactions documents using improved similarity measure. In 2016 International Conference on Engineering & MIS (ICEMIS), pp. 1–7. IEEE (2016) 28. Srikanth, P., Deverapalli, D.: A critical study of classification algorithms using diabetes diagnosis. In: 2016 IEEE 6th International Conference on Advanced Computing (IACC), pp. 245–249. IEEE (2016) 29. Srikanth, P., Anusha, C., Devarapalli, D.: A computational intelligence technique for effective medical diagnosis using decision tree algorithm. i-Manager’s J. Comput. Sci. 3(1), 21 (2015) 30. Srikanth, P.: Clustering algorithm of novel distribution function for dimensionality reduction using big data of omics: health, clinical and biology research information. In: 2016 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), pp. 1–6. IEEE (2016)
Author Index
A Abraham, Ajith 55 Affes, Nesrine 108 Agilandeeswari, L. 65, 458, 479 Agrawal, Animesh 343, 366 Ahmadi, Ali 200 Ahmed, Farhan 343 Ajesh, F. 55 Akermi, Montasser 415 Akhter, Muhammad Pervez 313 Akshaya, P. 564 Alex, John Sahaya Rani 470 Anahitaa, R. 1 Aravind, S. 358 Arfaoui, Nouha 171 Asgary, Ali 200 Awais, Ch Muhammad 139 Ayachi Ghannouchi, Sonia 119 B Badue, Claudine 427 Bahrpeyma, Fouad 523 Balakrishnan, S. G. 237, 273 Balderas, Luis 81 Barros, Antônio Carlos da Silva 513 Basterrech, Sebastián 190 Belkadi, Widad Hassina 160 Ben Amor, Nader 108 Ben Aouicha, Mohamed 415 Benítez, José M. 81 Bilal, Muhammad Atif 313 Borges, Armando Luz 492 Boudjadar, Jalil 332 Bouhassoune, Kamal 350 C Cai, Jinyu 16 Casalino, Gabriella 394 Castellano, Giovanna 394 Challa, Keerthika 574 Champagnat, Ronan 545
Chand, Satish 150 Chaudhary, Sachi 375 Costa, Carlos Henrique Nery
492
D da Silveira Colombo, Cristiano 427 Daki, Suvarna 574 Dias, Viviane Barbosa Leal 492 Divyapriya, P. 273 Dlamini, Gcinizwe 139 Drias, Habiba 160 Drias, Yassine 160 E Ege, Mert 291 Ejbali, Ridha 129, 171 Elamaran, R. 479 Elhajjej, Mohamed Hedi 171 Elkhalil, Najet 129 Elleuch, Mohamed 404 F Faisal, Mohammad 25 Faiz, Sami 34 Filho, Pedro Pedrosa Rebouças 513 Frikha, Tarek 108 G Gedela, Sai Siddhu 574 Ghorbel, Leila 545 Ghosh, Ritama 534 Gonçalves, Clésio de Araújo 492 Gopinath, K. 358 Goshwami, Gaurav 25 Goswami, Diyali 534 Goundar, Sam 350 Gu, Wei 139 Guennich, Ala 181 Gunasekar, M. 358 Gurrala, Pavan 574
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 716, pp. 585–587, 2023. https://doi.org/10.1007/978-3-031-35501-1
586
Author Index
H Hachicha, Wiem 545 Hadj Taieb, Mohamed Ali 415 Hadriche, Abir 91, 100 Hajjami Ben Ghézala, Henda 119 Hamam, Habib 108 Haouari, Bakhta 503 Haqiq, Abdelkrim 350 Hasan, Shamimul 25 Heik, David 523
Marques, Adriell G. 513 Mattaparthi, Hemendra Praneeth Meesala, Niranjani 303 Mellado, Bruce 200 Mezghani, Anis 404 Mienye, Ibomoiye Domor 263 Mishra, Gitanjali 65 Mishra, Keshav 212, 219 Monika, 150 Morgül, Ömer 291 Mosbahi, Olfa 503 Muralibabu, K. 479 Mustafaev, Tamerlan 386 Muthulakshmi, S. 281 Mzid, Rania 448, 503
I Ibragimov, Bulat 386 Isaev, Maksim 45 Issa, Haiham 227 Issa, Sali 227 Ivanov, Vladimir 437
N Nagireddy, Sai Sri Lakshman 303 Navienkumar, P. 358 Necibi, Amal 91
J Jakobsen, Jesper 332 Jamal, Sangeetha 564 Jensen, Mikkel 332 Jmail, Nawel 91, 100
O Obaido, George 263 Ogbuokiri, Blessing 200, 263 Oliveira, Elias 427 OmaMageswari, M. 534 Orbinski, James 200 Othmani, Mohamed 181, 250
K Kanojia, Mahendra 212, 219 Karampudi, Bahirithi 281 Kasongo, Sydney Mambwe 263 Khatri, Pallavi 343, 366 Kherallah, Monji 404 Kholmatova, Zamira 139 Kiruthik Suriyah, M. 458 Kong, Jude 200 Ktari, Jalel 108 Kusumanchi, Hymavathi 574 L Láinez-Ramos-Bossini, Antonio J. Lassoued, Rahma 448 Lastra, Miguel 81 Li, Jialong 16 Logeswaran, K. 358 Ltifi, Hela 181, 250 M M, Jenath 555 Maksudov, Bulat 386 Mankad, Sapan H. 375 Mao, Zhenyu 16
303
81
P Panapana, Pooja 303 Peixoto, Solon Alves 513 Peng, Qinmu 227 Peroumal, Vijayakumar 534 Pershin, Ilya 386 Phanideep, D. Meher 281 Platoš, Jan 190 Pllana, Sabri 322 Pothala, Eswara Rao 303 Prasanna Kumar, K. R. 358 Prathaban, Banu Priya 555 R R, Suresh Kumar 555 Ragala, Ramesh 1 Rakshana, B. S. 1 Ramya, P. 237, 273 Rao, Ummity Srinivasa 1 Raza, Khalid 25
Author Index
587
Reddy, V. Mani Kumar 281 Rego, Paulo A. L. 513 Reichelt, Dirk 523 Riaz, Saleem 313 Romanov, Vitaly 437 Rubino, Gerardo 190 S Sabóia, Cyro M. G. 513 Sadek, Zayneb 100 Sah, Sumit 366 Sahai, Aishwarya 366 Said, Salwa 171 Saidi, Malak 34 Samantaray, Anish 470 Sanghvi, Harshil 375 Santos, Matheus A. dos 513 Saravanan, C. 470 Sethi, Nilambar 65 Shaikh, Awais 212, 219 Sharifirad, Iman 332 Shipunov, Timur 45 Shnyrev, Aleksandr 45 Silva, Romuere Rodrigues Veloso e Singh, Nripendra Kumar 25 Singh, Pardeep 150 Solovyev, Daniil 45 Sousa, Emille Andrade 492
Souza, Luís Fabrício de Freitas Srikanth, R. 479 Subhashini, N. 281 Succi, Giancarlo 139 Surange, Geetanjali 343 T Talamanova, Iryna 322 Tei, Kenji 16 Telli, Mounir 250 Thabet, Dhafer 119 Tissaoui, Anis 34 V Vidhiyapriya, A.
237
W Weddy, Youssouf Cheikh 129 Wo´zniak, Michał 190 Wu, Jiahong 200 Y Yanda, Nagamani
574
492 Z Zainulin, Ramil 45 Zayani, Corinne Amel Zaza, Gianluca 394
545
513