152 11 150MB
English Pages 1461 Year 2022
Lecture Notes in Networks and Systems 418
Ajith Abraham · Niketa Gandhi · Thomas Hanne · Tzung-Pei Hong · Tatiane Nogueira Rios · Weiping Ding Editors
Intelligent Systems Design and Applications 21st International Conference on Intelligent Systems Design and Applications (ISDA 2021) Held During December 13–15, 2021
Lecture Notes in Networks and Systems Volume 418
Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas— UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Turkey Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science. For proposals from Asia please contact Aninda Bose ([email protected]).
More information about this series at https://link.springer.com/bookseries/15179
Ajith Abraham Niketa Gandhi Thomas Hanne Tzung-Pei Hong Tatiane Nogueira Rios Weiping Ding •
•
•
•
•
Editors
Intelligent Systems Design and Applications 21st International Conference on Intelligent Systems Design and Applications (ISDA 2021) Held During December 13–15, 2021
123
Editors Ajith Abraham Scientific Network for Innovation and Research Excellence Machine Intelligence Research Labs (MIR Labs) Auburn, WA, USA Thomas Hanne Institut für Wirtschaftsinformatik Fachhochschule Nordwestschweiz Olten, Switzerland Tatiane Nogueira Rios Federal University of Bahia Ondina, Brazil
Niketa Gandhi Scientific Network for Innovation and Research Excellence Machine Intelligence Research Labs (MIR Labs) Auburn, WA, USA Tzung-Pei Hong Department of Computer Science and information Engineering National University of Kaohsiung Kaohsiung, Taiwan Weiping Ding Nantong University Nantong Shi, Jiangsu, China
ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-3-030-96307-1 ISBN 978-3-030-96308-8 (eBook) https://doi.org/10.1007/978-3-030-96308-8 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
Welcome to the 21st International Conference on Intelligent Systems Design and Applications (ISDA’21) held in the World Wide Web. ISDA’21 is hosted and sponsored by the Machine Intelligence Research Labs (MIR Labs), USA. ISDA’21 brings together researchers, engineers, developers and practitioners from academia and industry working in all interdisciplinary areas of computational intelligence and system engineering to share their experience, and to exchange and cross-fertilize their ideas. The aim of ISDA’21 is to serve as a forum for the dissemination of state-of-the-art research, development and implementations of intelligent systems, intelligent technologies and useful applications in these two fields. ISDA’21 received submissions from 34 countries, and each paper was reviewed by at least five or more reviewers, and based on the outcome of the review process, 132 papers were accepted for inclusion in the conference proceedings (36% acceptance rate). First, we would like to thank all the authors for submitting their papers to the conference and for their presentations and discussions during the conference. Our thanks go to program committee members and reviewers, who carried out the most difficult work by carefully evaluating the submitted papers. Our special thanks to the following plenary speakers, for their exciting plenary talks: • • • • • • • • • •
Yukio Ohsawa, The University of Tokyo, Japan Juergen Branke, University of Warwick, UK Cengiz Toklu, Beykent University, Turkey Günther Raidl, Technische Universität Wien, Austria Kalyanmoy Deb, Michigan State University, USA Oscar Cordon, University of Granada, Spain Andries Engelbrecht, University of Stellenbosch, South Africa Antônio de Padua Braga, Federal University of Minas Gerais, Brazil Frédéric Guinand, Le Havre Normandy University, France Marco Dorigo, Université Libre de Bruxelles, Belgium
v
vi
Preface
We express our sincere thanks to the organizing committee chairs for helping us to formulate a rich technical program. Enjoy reading the articles! Ajith Abraham Thomas Hanne Weiping Ding General Chairs Tzung-Pei Hong Tatiane Nogueira Rios Program Chairs
ISDA 2021—Organization
General Chairs Ajith Abraham Thomas Hanne Weiping Ding
Machine Intelligence Research Labs (MIR Labs), USA University of Applied Sciences and Arts Northwestern Switzerland, Switzerland Nantong University, China
Program Chairs Tzung-Pei Hong Tatiane Nogueira Rios
National University of Kaohsiung, Taiwan Universidade Federal da Bahia, Brazil
Publication Chairs Niketa Gandhi Kun Ma
Machine Intelligence Research Labs, USA University of Jinan, China
Special Session Chair Gabriella Casalino
University of Bari Aldo Moro, Italy
Publicity Chairs Aswathy S. U. Pooja Manghirmalani Mishra Mahendra Kanojia Anu Bajaj
Jyothi Engineering College, Kerala, India University of Mumbai, Maharashtra, India MVLU College, Maharashtra, India Machine Intelligence Research Labs (MIR Labs), Washington, USA
vii
viii
ISDA 2021—Organization
International Publicity Team Mabrouka Salmi Phoebe E. Knight Marco A. C. Simões Hsiu-Wei Chiu Serena Gandhi
National School of Statistics and Applied Economics (ENSSEA), Kolea, Tipaza, Algeria UTeM, Malaysia Bahia State University, Brazil National University of Kaohsiung, Taiwan Media Specialist, CA, USA
International Program Committee Abdul Syukor Mohamad Jaya Alfonso Guarino Alzira Mota André Borges Guimarães Serra e Santos Angelo Genovese Anoop Sreekumar R. S. Antonella Falini Antonella Guzzo Anu Bajaj Arun B Mathews Aswathy S. U. Bay Vo Biju C. V. Blerina Spahiu Carlos Pereira Christian Veenhuis Claudio Savaglio Daniele Schicchi Devi Priya Rangasamy Dìpanwita Thakur E. M. Roopa Devi Ela Pustulka Elizabeth Goldbarg Fabio Scotti Fariba Goodarzian Federico Divina Gabriella Casalino Gautami Tripathi
Universiti Teknikal Malaysia Melaka, Malaysia University of Foggia, Italy Polytechnic Institute of Porto, School of Engineering, Portugal Polytechnic Institute of Porto, Portugal Università degli Studi di Milano, Italy Manonmaniam Sundaranar University, India University of Bari Aldo Moro, Italy University of Calabria, Italy Machine Intelligence Research Labs, USA MTHSS Pathanamthitta, India Jyothi Engineering College, India Ho Chi Minh City University of Technology (HUTECH), Vietnam Jyothi Engineering College, India Università degli Studi di Milano Bicocca, Italy ISEC, Portugal Technische Universität Berlin, Germany Institute for High-Performance Computing and Networking (ICAR-CNR), Italy Institute for Educational Technology, National Research Council of Italy, Italy Kongu Engineering College, India Banasthali University, India Kongu Engineering College, India FHNW Olten, Switzerland Federal University of Rio Grande do Norte, Brazil Università degli Studi di Milano, Italy Machine Intelligence Research Labs, USA Pablo de Olavide University, Spain University of Bari Aldo Moro, Italy Jamia Hamdard, India
ISDA 2021—Organization
Gianluca Zaza Gonglin Yuan Hudson Geovane de Medeiros Isabel S. Jesus Islame Felipe da Costa Fernandes Ivo André Soares Pereira János Botzheim João Ferreira José Everardo Bessa Maia José Raúl Romero K. Ramesh Kaushik Das Sharma Kavita Jain Kingsley Okoye Kosisochukwu Judith Madukwe Leandro Coelho M. Rubaiyat Hossain Mondal Mahendra Kanojia Manisha Satish Divate Mariella Farella Mohd Abdul Ahad Murilo Oliveira Machado Nidhi Sindhwani Niketa Gandhi Nuno Miguel Gomes Bettencourt Nurul Azma Zakaria Oscar Castillo Pasquale Ardimento Patrick Hung Paulo Moura Oliveira Pietro Picerno Pooja Manghirmalani Mishra Priya P. Sajan Rafael Barbudo Lunar Reeta Devi
ix
University of Bari Aldo Moro, Italy Victoria University of Wellington, New Zealand Federal University of Rio Grande do Norte, Brazil Institute of Engineering of Porto, Portugal Federal University of Rio Grande do Norte (UFRN), Brazil University Fernando Pessoa, Portugal Eötvös Loránd University, Hungary Instituto Universitário de Lisboa, Portugal State University of Ceará, Brazil University of Cordoba, Spain Hindustan Institute of Technology and Science, India University of Calcutta, India University of Mumbai, Maharashtra, India Tecnologico de Monterrey, Mexico Victoria University of Wellington, New Zealand Pontifícia Universidade Católica do Parana, Brazil Bangladesh University of Engineering and Technology, Bangladesh Sheth L.U.J. and Sir M.V. College, India Usha Pravin Gandhi College of Arts, Science and Commerce, Maharashtra, India University of Palermo, Italy Jamia Hamdard, New Delhi, India Universidade Federal do Mato Grosso do Sul, Brazil Amity University, Noida, India Machine Intelligence Research Labs, USA Polytechnic Institute of Porto (ISEP/IPP), Portugal Universiti Teknikal Malaysia Melaka, Malaysia Tijuana Institute of Technology, Mexico Università degli studi di Bari Aldo Moro, Italy Ontario Tech University, Canada University of Trás-os-Montes and Alto Douro, Portugal Università Telematica “e-Campus”, Italy University of Mumbai, India C-DAC, Kerala, India University of Córdoba, Spain Kurukshetra University, India
x
Rohit Anand Ruggero Donida Labati Sabri Pllana Sidemar Fideles Cezario Sílvia Maria Diniz Monteiro Maia Sindhu P. M. Subodh Deolekar Sulaima Lebbe Abdul Haleem Susana Cláudia Nicola de Araújo Syariffanor Hisham Thatiana C. Navarro Diniz Thiago Soares Marques Thomas Hanne Tzung-Pei Hong Wen-Yang Lin Wenbin Pei Yibao Zhang Youssef Ghanou Zurina Saaya
ISDA 2021—Organization
DSEU, G.B. Pant Okhla-1 Campus, New Delhi, India Università degli Studi di Milano, Italy Center for Smart Computing Continuum, Forschung Burgenland, Austria Federal University of Rio Grande do Norte, Brazil Federal University of Rio Grande do Norte, Brazil Nagindas Khandwala College, India Welingkar Institute of Management Development and Research, India South Eastern University of Sri Lanka, Sri Lanka Instituto Superior de Engenharia do Porto (ISEP), Portugal Universiti Teknikal Malaysia Melaka, Malaysia Federal Rural University of the Semi-arid Region, Brazil Federal University of Rio Grande do Norte, Brazil University of Applied Sciences and Arts Northwestern Switzerland, Switzerland National University of Kaohsiung, Taiwan National University of Kaohsiung, Taiwan Lanzhou University, China Lanzhou University, China Moulay Ismail University of Meknes, Morocco Universiti Teknikal Malaysia Melaka, Malaysia
Contents
Open-Ended Automatic Programming Through Combinatorial Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sebastian Fix, Thomas Probst, Oliver Ruggli, Thomas Hanne, and Patrik Christen
1
Deep Face Mask Detection: Prevention and Mitigation of COVID-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sahar Dammak, Hazar Mliki, and Emna Fendri
13
Extracting Emotion and Sentiment Quotient of Viral Information Over Twitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pawan Kumar, Reiben Eappen Reji, and Vikram Singh
23
Maintaining Scalability in Blockchain . . . . . . . . . . . . . . . . . . . . . . . . . . Anova Ajay Pandey, Terrance Frederick Fernandez, Rohit Bansal, and Amit Kumar Tyagi
34
Thoracic Disease Chest Radiographic Image Dataset: A Comprehensive Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Priyanka Malhotra, Sheifali Gupta, Atef Zaguia, and Deepika Koundal
46
Batch Normalization and Dropout Regularization in Training Deep Neural Networks with Label Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Andrzej Rusiecki
57
Intelligent Software Engineering: The Significance of Artificial Intelligence Techniques in Enhancing Software Development Lifecycle Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vaishnavi Kulkarni, Anurag Kolhe, and Jay Kulkarni Honey Bee Queen Presence Detection from Audio Field Recordings Using Summarized Spectrogram and Convolutional Neural Networks . . . Agnieszka Orlowska, Dominique Fourer, Jean-Paul Gavini, and Dominique Cassou-Ribehart
67
83
xi
xii
Contents
Formal Verification Techniques: A Comparative Analysis for Critical System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rahul Karmakar
93
Investigating Drug Peddling in Nigeria Using a Machine Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Oluwafemi Samson Balogun, Sunday Adewale Olaleye, Mazhar Moshin, Keijo Haataja, Xiao-Zhi Gao, and Pekka Toivanen Selective Information Control and Layer-Wise Partial Collective Compression for Multi-Layered Neural Networks . . . . . . . . . . . . . . . . . 121 Ryotaro Kamimura Semantic Representation Driven by a Musculoskeletal Ontology for Bone Tumors Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 Mayssa Bensalah, Atef Boujelben, Yosr Hentati, Mouna Baklouti, and Mohamed Abid Centrifugal Pump Fault Diagnosis Using Discriminative Factor-Based Features Selection and K-Nearest Neighbors . . . . . . . . . . . . . . . . . . . . . 145 Zahoor Ahmad, Md. Junayed Hasan, and Jong-Myon Kim Transfer Learning with 2D Vibration Images for Fault Diagnosis of Bearings Under Variable Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 Zahoor Ahmad, Md Junayed Hasan, and Jong-Myon Kim Performance Evaluation of Microservices Featuring Different Implementation Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Leandro Costa and António Nestor Ribeiro Lifetime Optimization of Sensor Networks with Mobile Sink and Solar Energy Supply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 Mehdi Achour and Amine Boufaied Counting Vehicle by Axes with High-Precision in Brazilian Roads with Deep Learning Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 Adson M. Santos, Carmelo J. A. Bastos-Filho, and Alexandre M. A. Maciel Imbalanced Learning for Robust Moving Object Classification in Video Surveillance Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 Rania Rebai Boukhriss, Ikram Chaabane, Radhouane Guermazi, Emna Fendri, and Mohamed Hammami Mining Frequently Traveled Routes During COVID-19 . . . . . . . . . . . . . 210 George Obaido, Kehinde Aruleba, Oluwaseun Alexander Dada, and Ibomoiye Domor Mienye
Contents
xiii
Analysis of Performance Improvement for Speaker Verification by Combining Feature Vectors of LPC Spectral Envelope, MFCC and pLPC Pole Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 Haruki Shigeta, Kodai Komatsu, Shun Oyabu, Kazuya Matsuo, and Shuichi Kurogi CASTA: Clinical Assessment System for Tuberculosis Analysis . . . . . . . 231 Ramisetty Kavya, Jonathan Samuel, Gunjan Parihar, Y. Suba Joyce, Y. Bakthasingh Lazarus, Subhrakanta Panda, and Jabez Christopher Bearing Fault Classification of Induction Motor Using Statistical Features and Machine Learning Algorithms . . . . . . . . . . . . . . . . . . . . . 243 Rafia Nishat Toma and Jong-myon Kim Evidential Spammers and Group Spammers Detection . . . . . . . . . . . . . 255 Malika Ben Khalifa, Zied Elouedi, and Eric Lefèvre NLP for Product Safety Risk Assessment . . . . . . . . . . . . . . . . . . . . . . . . 266 Michael Hellwig, Steffen Finck, Thomas Mootz, Andreas Ehe, and Florian Rein Augmented Reality for Fire Evacuation Research: An A’WOT Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 El Mostafa Bourhim Optimization of Artificial Neural Network: A Bat Algorithm-Based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 Tarun Kumar Gupta and Khalid Raza ResD Hybrid Model Based on Resnet18 and Densenet121 for Early Alzheimer Disease Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296 Modupe Odusami, Rytis Maskeliūnas, Robertas Damaševičius, and Sanjay Misra Quantum Ordering Points to Identify the Clustering Structure and Application to Emergency Transportation . . . . . . . . . . . . . . . . . . . . 306 Habiba Drias, Yassine Drias, Lydia Sonia Bendimerad, Naila Aziza Houacine, Djaafar Zouache, and Ilyes Khennak Patterns for Improving Business Processes: Defined Pattern Categorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316 Nesrine Missaoui and Sonia Ayachi Ghannouchi SAX-Preprocessing Technique for Characters Recognition Using Gyroscope Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326 Mariem Taktak and Slim Triki Lower Limb Movement Recognition Using EMG Signals . . . . . . . . . . . 336 Sali Issa and Abdel Rohman Khaled
xiv
Contents
A Model of Compactness-Homogeneity for Territorial Design . . . . . . . . 346 María Beatriz Bernábe-Loranca, Rogelio González-Velázquez, Carlos Guillen Galván, and Erika Granillo-Martínez Automated Cattle Classification and Counting Using Hybridized Mask R-CNN and YOLOv3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 358 R. Devi Priya, V. Devisurya, N. Anitha, N. Kalaivaani, P. Keerthana, and E. Adarsh Kumar UTextNet: A UNet Based Arbitrary Shaped Scene Text Detector . . . . . 368 Veronica Naosekpam, Sushant Aggarwal, and Nilkanta Sahu VSim-AV: A Virtual Simulation Platform for Autonomous Vehicles . . . 379 Leila Haj Meftah and Rafik Braham Image Segmentation Using Matrix-Variate Lindley Distributions . . . . . 389 Zitouni Mouna and Tounsi Mariem Improving Speech Emotion Recognition System Using Spectral and Prosodic Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399 Adil Chakhtouna, Sara Sekkate, and Abdellah Adib Spare Parts Sales Forecasting for Mining Equipment: Methods Analysis and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410 Egor Nikitin, Alexey Kashevnik, and Nikolay Shilov Data-Centric Approach to Hepatitis C Virus Severity Prediction . . . . . . 421 Aniket Sharma, Ashok Arora, Anuj Gupta, and Pramod Kumar Singh Automatic Crack Detection with Calculus of Variations . . . . . . . . . . . . 432 Erika Pellegrino and Tania Stathaki Deep Squeeze and Excitation-Densely Connected Convolutional Network with cGAN for Alzheimer’s Disease Early Detection . . . . . . . . 441 Rahma Kadri, Mohamed Tmar, Bassem Bouaziz, and Faiez Gargouri Recognition of Person Using ECG Signals Based on Single Heartbeat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452 Sihem Hamza and Yassine Ben Ayed Semantic Segmentation of Dog’s Femur and Acetabulum Bones with Deep Transfer Learning in X-Ray Images . . . . . . . . . . . . . . . . . . . . . . . 461 D. E. Moreira da Silva, Vitor Filipe, Pedro Franco-Gonçalo, Bruno Colaço, Sofia Alves-Pimenta, Mário Ginja, and Lio Gonçalves Automatic Microservices Identification from Association Rules of Business Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476 Malak Saidi, Mohamed Daoud, Anis Tissaoui, Abdelouahed Sabri, Djamal Benslimane, and Sami Faiz
Contents
xv
Toward a Configurable Thing Composition Language for the SIoT . . . 488 Soura Boulaares, Salma Sassi, Djamal Benslimane, Zakaria Maamar, and Sami Faiz Comparison of Different Processing Methods of Joint Coordinates Features for Gesture Recognition with a RNN in the MSRC-12 . . . . . . 498 Júlia Schubert Peixoto, Anselmo Rafael Cukla, Daniel Welfer, and Daniel Fernando Tello Gamarra Summary Generation Using Natural Language Processing Techniques and Cosine Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508 Sayantan Pal, Maiga Chang, and Maria Fernandez Iriarte An Approach for Constructing a Simulation Model for Dynamic Analysis of Information Security System . . . . . . . . . . . . . . . . . . . . . . . . 518 Ivan Gaidarski and Pavlin Kutinchev Enhanced Prediction of Chronic Kidney Disease Using Feature Selection and Boosted Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527 Ibomoiye Domor Mienye, George Obaido, Kehinde Aruleba, and Oluwaseun Alexander Dada An Adaptive-Backstepping Digital Twin-Based Approach for Bearing Crack Size Identification Using Acoustic Emission Signals . . . . . . . . . . . 538 Farzin Piltan and Jong-Myon Kim Implementation-Oriented Feature Selection in UNSW-NB15 Intrusion Detection Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548 Mohammed M. Alani Augmented Reality SDK’s: A Comparative Study . . . . . . . . . . . . . . . . . 559 El Mostafa Bourhim and Aziz Akhiate Hybrid Neural Network for Hyperspectral Satellite Image Classification (HNN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567 Maissa Hamouda and Med Salim Bouhlel Implementation of the Business Process Model and Notation in the Modelling of Patient’s Clinical Workflow in Oncology . . . . . . . . . . . . . . 576 Nassim Bout, Rachid Khazaz, Ali Azougaghe, Mohamed El-Hfid, Mounia Abik, and Hicham Belhadaoui Mobile Cloud Computing: Issues, Applications and Scope in COVID-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587 Hariket Sukesh Kumar Sheth and Amit Kumar Tyagi
xvi
Contents
Designing a Humanitarian Supply Chain for Pre and Post Disaster Planning with Transshipment and Considering Perishability of Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601 Faeze Haghgoo, Ali Navaei, Amir Aghsami, Fariborz Jolai, and Ajith Abraham Innovative Learning Technologies as Support to Clinical Reasoning in Medical Sciences: The Case of the “FEDERICO II” University . . . . . 613 Oscar Tamburis, Fabrizio L. Ricci, Fabrizio Consorti, Fabrizio Pecoraro, and Daniela Luzi Convolutional Neural Networks (CNN) Model for Mobile Brand Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624 Hamidah Jantan and Puteri Ika Shazereen Ibrahim How Knowledge-Driven Class Generalization Affects Classical Machine Learning Algorithms for Mono-label Supervised Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637 Houcemeddine Turki, Mohamed Ali Hadj Taieb, and Mohamed Ben Aouicha Deep Residual Network for Autonomous Vehicles Obstacle Avoidance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647 Leila Haj Meftah and Rafik Braham Modeling Travelers Behavior Using FSQCA . . . . . . . . . . . . . . . . . . . . . 657 Oumayma Labti and Ez-zohra Belkadi AHP Approach for Selecting Adequate Big Data Analytics Platform . . . 667 Naima EL Haoud and Oumaima Hali Combining Bert Representation and POS Tagger for Arabic Word Sense Disambiguation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 676 Rakia Saidi and Fethi Jarray Detection of Lung Cancer from CT Images Using Image Processing . . . 686 S. Lilly Sheeba and L. Gethsia Judin An Overview of IoT-Based Architecture Model for Smart Home Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 696 Odamboy Djumanazarov, Antti Väänänen, Keijo Haataja, and Pekka Toivanen Real Time Tracking of Traffic Signs for Autonomous Driving Using Monocular Camera Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707 Sneha Hegde, K and Srividhya Kannan
Contents
xvii
Metaheuristic Methods for Water Distribution Network Considering Routing Decision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723 Ahmad Hakimi, Reza Mahdizadeh, Hossein Shokri Garjan, Amir Khiabani, and Ajith Abraham Prediction of Moroccan Stock Price Based on Machine Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735 Abdelhadi Ifleh and Mounime El Kabbouri R-DCNN Based Automatic Recognition of Indian Sign Language . . . . . 742 S. Subhashini, S. Revathi, and S. Shanthini VReason Grasp: An Ordered Grasp Based on Physical Intuition in Stacking Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754 Xiang Ji, Qiushu Chen, Tao Xiong, Tianyu Xiong, and Huiliang Shang Comparative Evaluation of Genetic Operators in Cartesian Genetic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 765 Abdul Manazir and Khalid Raza Prediction of Credibility of Football Player Rating Using Data Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 775 Manaswita Datta and Bhawana Rudra DDoS Attack Detection on IoT Devices Using Machine Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 787 Sunil Kumar, Rohit Kumar Sahu, and Bhawana Rudra Functionality and Architecture for a Platform for Independent Learners: KEPLAIR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 795 Stefano Ferilli, Domenico Redavid, Davide Di Pierro, and Liza Loop Aircraft Conflict Resolution Using Convolutional Neural Network on Trajectory Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 806 Md Siddiqur Rahman, Laurent Lapasset, and Josiane Mothe Evaluation of Techniques for Predicting a Build Up of a Seizure . . . . . 816 Abir Hadriche, Ichrak ElBehy, Amira Hajjej, and Nawel Jmail A Real-Time Stereoscopic Images Rectification and Matching Algorithm Based on Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 828 Elmehdi Adil, Mohammed Mikou, and Ahmed Mouhsen Named Entities as a Metadata Resource for Indexing and Searching Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 838 Flavio Izo, Elias Oliveira, and Claudine Badue Brazilian Mercosur License Plate Detection and Recognition Using Haar Cascade and Tesseract OCR on Synthetic Imagery . . . . . . . . . . . 849 Cyro M. G. Sabóia and Pedro Pedrosa Rebouças Filho
xviii
Contents
Designing Scalable Intrusion Detection Systems with Stacking Based Ensemble Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 859 A. Sujan Reddy, S. Akashdeep, S. Sowmya Kamath, and Bhawana Rudra Retrofitting Stormwater Harvest System in Dispersing Reliable Water Supply in a Climate-Smart City . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 870 Bwija Mukome, Muhammed Seyam, and Oseni Amoo Predicting and Analysis the Bitcoin Price Using Various Forecasting Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 879 E. M. Roopa Devi, R. Shanthakumari, R. Rajdevi, S. Dineshkumar, A. Dinesh, and M. Keerthana Improved Sentence Similarity Measurement in the Medical Field Based on Syntactico-Semantic Knowledge . . . . . . . . . . . . . . . . . . . . . . . 890 Wafa Wali and Bilel Gargouri Analysis of the Brazilian Artisanal Cheese Market from the Perspective of Social Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 900 Thallys da Silva Nogueira, Vitor Agostinho Mouro, Kennya Beatriz Siqueira, and Priscila V. Z. C. Goliatt PONY: Predicting an Object’s Next_Location Using YOLO . . . . . . . . . 910 Aris E. Ignacio and Julian Antonio S. Laspoña Role of Machine Learning in Authorship Attribution with Select Stylometric Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 920 Sumit Gupta, Tapas Kumar Patra, and Chitrita Chaudhuri COVID Detection Using Chest X-Ray and Transfer Learning . . . . . . . . 933 Saksham Jain, Nidhi Sindhwani, Rohit Anand, and Ramani Kannan ECFAR: A Rule-Based Collaborative Filtering System Dealing with Evidential Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 944 Nassim Bahri, Mohamed Anis Bach Tobji, and Boutheina Ben Yaghlane Enhancing Photography Management Through Automatically Extracted Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 956 Pedro Carvalho, Diogo Freitas, Tiago Machado, and Paula Viana A Machine Learning Framework for House Price Estimation . . . . . . . . 965 Adebayosoye Awonaike, Seyed Ali Ghorashi, and Rawad Hammaad A Dedicated Temporal Erasable-Itemset Mining Algorithm . . . . . . . . . . 977 Tzung-Pei Hong, Hao Chang, Shu-Min Li, and Yu-Chuan Tsai Denoising Hyperspectral Imageries with Split-Bregman Iteration Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 986 Satwinder Kaur, Bhawna Goyal, and Ayush Dogra
Contents
xix
iWAD: An Improved Wormhole Attack Detection System for Wireless Sensor Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1002 Virendra Dani, Radha Bhonde, and Ayesha Mandloi Twitter People’s Opinions Analysis During Covid-19 Quarantine Using Machine Learning and Deep Learning Models . . . . . . . . . . . . . . 1013 Wafa Alotaibi, Faye Alomary, and Raouia Mokni Estimation and Aggregation Method of Open Data Sources for Road Accident Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1025 Sergey Savosin and Nikolay Teslya A Hybrid Approach for an Interpretable and Explainable Intrusion Detection System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1035 Tiago Dias, Nuno Oliveira, Norberto Sousa, Isabel Praça, and Orlando Sousa An IoT Based Home Automation System VIA Hotspot . . . . . . . . . . . . . 1046 Raihan Uddin Genomic Variant Annotation: A Comprehensive Review of Tools and Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1057 Prajna Hebbar and S. Kamath Sowmya Age Estimation and Gender Recognition Using Biometric Modality . . . 1068 Amal Abbes, Randa Boukhris, and Yassine Ben Ayed Towards a Historical Ontology for Arabic Language: Investigation and Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1078 Rim Laatar, Ahlem Rhayem, Chafik Aloulou, and Lamia Hadrich Belguith Optimized Evidential AIRS with Feature Selection and Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1088 Rihab Abdelkhalek and Zied Elouedi Predicting the Movement Intention and Controlling the Grip of a Myoelectrical Active Prosthetic Arm . . . . . . . . . . . . . . . . . . . . . . . . 1098 Jonatan Dellagostin, Anselmo Cukla, Fábio Bisogno, Raul Sales, Lucas Strapazzon, and Gregório Salvador An Evolutionary Approach for Critical Node Detection in Hypergraphs. A Case Study of an Inflation Economic Network . . . . 1110 Noémi Gaskó, Mihai Suciu, Rodica Ioana Lung, and Tamás Képes A Modified Technique Based on GOMASHIO Method for Mobile Nodes Localization in a WSN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1118 Omnia Mezghani Attitude Prediction of In-service Teachers Towards Blended Learning Using Machine Learning During COVID-19 Pandemic . . . . . . . . . . . . . 1129 Pooja Manghirmalani Mishra, Rabiya Saboowala, and Niketa Gandhi
xx
Contents
Driver Behavior Analysis: Abnormal Driving Detection Using MLP Classifier Applied to Outdoor Camera Images . . . . . . . . . . . . . . . . . . . . 1142 Wictor Gomes de Oliveira, Pedro Pedrosa Rebouças Filho, and Elias Teodoro da Silva Junior Supporting Reusability in the Scrum Process . . . . . . . . . . . . . . . . . . . . . 1153 Oumaima Bhiri, Khaoula Sayeb, and Sonia Ayachi Ghannouchi Arabic Automatic Essay Scoring Systems: An Overview Study . . . . . . . 1164 Rim Aroua Machhout, Chiraz Ben Othmane Zribi, and Saoussen Mathlouthi Bouzid Energy-Efficient Khalimsky-Based Routing Approach for K-Hop Clustered Wireless Multimedia Sensor Networks (WMSNs) . . . . . . . . . . 1177 Mahmoud Mezghani Academic Venue Recommendation Based on Refined Cross Domain . . . 1188 Abir Zawali and Imen Boukhris Typology of Data Inputs Imperfection in Collective Memory Model . . . 1198 Haithem Kharfia, Fatma Ghorbel, and Bilel Gargouri How Latest Computer Science Research Copes with COVID-19? . . . . . 1207 Leila Bayoudhi, Najla Sassi, and Wassim Jaziri Using Machine Learning Approaches to Identify Exercise Activities from a Triple-Synchronous Biomedical Sensor . . . . . . . . . . . . . . . . . . . . 1216 Yohan Mahajan, Jahnavi Pinnamraju, John L. Burns, Judy W. Gichoya, and Saptarshi Purkayastha Intelligent Image Captioning Approach with Novel Ensembled Recurrent Neural Network Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1227 L. Agilandeeswari, Kuhoo Sharma, and Saurabh Srivastava Analysis of Six Different GP-Tree Neighborhood Structures . . . . . . . . . 1237 Souhir Elleuch and Bassem Jarboui Ensemble Learning for Data-Driven Diagnosis of Polycystic Ovary Syndrome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1250 Subrato Bharati, Prajoy Podder, M. Rubaiyat Hossain Mondal, V. B. Surya Prasath, and Niketa Gandhi Tree Species Detection Using MobileNet – An Approach . . . . . . . . . . . . 1260 L. Agilandeeswari, Aayush Jha, and Dhruv Gupta Dimensional Reduction Methods Comparison for Clustering Results of Indonesian Language Text Documents . . . . . . . . . . . . . . . . . . . . . . . . 1271 Siti Inayah Rizki Hasanah, Muhammad Ihsan Jambak, and Danny M. Saputra
Contents
xxi
Gun Model Classification Based on Fired Cartridge Case Head Images with Siamese Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1281 Sérgio Valentim, Tiago Fonseca, João Ferreira, Tomás Brandão, Ricardo Ribeiro, and Stefan Nae Image-based Android Malware Detection Models using Static and Dynamic Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1292 Hemant Rathore, B. Raja Narasimhan, Sanjay K. Sahay, and Mohit Sewak A Fuzzy Logic Based Optimal Network System for the Delivery of Medical Goods via Drones and Land Transport in Remote Areas . . . . . 1306 Shio Gai Quek, Ganeshsree Selvachandran, Rohana Sham, Ching Sin Siau, Mohd Hanif Mohd Ramli, and Noorsiah Ahmad The Menu Planning Problem: A Systematic Literature Review . . . . . . . 1313 Dorra Kallel, Ines Kanoun, and Diala Dhouib Comparative Study on Deep Learning Methods for Apple Ripeness Estimation on Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1325 Raja Hamza and Mohamed Chtourou Neuro-Fuzzy Systems for Learning Analytics . . . . . . . . . . . . . . . . . . . . . 1341 Gabriella Casalino, Giovanna Castellano, and Gianluca Zaza Prediction of COVID-19 Active Cases Using Polynomial Regression and ARIMA Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1351 Neji Neily, Boulbaba Ben Ammar, and Habib M. Kammoun Land Use/Land Cover Classification Using Machine Learning and Deep Learning Algorithms for EuroSAT Dataset – A Review . . . . . . . . 1363 Agilandeeswari Loganathan, Suri Koushmitha, and Yerru Nanda Krishna Arun A Dynamic Rain Detecting Car Wiper . . . . . . . . . . . . . . . . . . . . . . . . . . 1375 Andebotum Roland, John Wejin, Sanjay Misra, Mayank Mohan Sharma, Robertas Damaševičius, and Rytis Maskeliūnas Crude Oil Price Prediction Using Particle Swarm Optimization and Classification Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1384 Emmanuel Abidemi Adeniyi, Babatunde Gbadamosi, Joseph Bamidele Awotunde, Sanjay Misra, Mayank Mohan Sharma, and Jonathan Oluranti An Efficient Thyroid Disease Detection Using Voting Based Ensemble Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1395 L. Agilandeeswari, Ishita Khatri, Jagruta Advani, and Syed Mohammad Nihal
xxii
Contents
A Cross-Entropy Based Feature Selection Method for Binary Valued Data Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1406 Zhipeng Wang and Qiuming Zhu Effective Music Suggestion Using Facial Recognition . . . . . . . . . . . . . . . 1417 A. P. Ponselvakumar, S. Anandamurugan, K. Lokeshwaran, Suganneshan, Zubair, and Gokula Kannan A Survey on SLA Management Using Blockchain Based Smart Contracts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1425 Nawel Hamdi, Chiraz El Hog, Raoudha Ben Djemaa, and Layth Sliman Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1435
Open-Ended Automatic Programming Through Combinatorial Evolution Sebastian Fix, Thomas Probst, Oliver Ruggli, Thomas Hanne, and Patrik Christen(B) FHNW University of Applied Sciences and Arts Northwestern Switzerland, 4600 Olten, Switzerland [email protected]
Abstract. Combinatorial evolution – the creation of new things through the combination of existing things – can be a powerful way to evolve rather than design technical objects such as electronic circuits. Intriguingly, this seems to be an ongoing and thus open-ended process creating novelty with increasing complexity. Here, we employ combinatorial evolution in software development. While current approaches such as genetic programming are efficient in solving particular problems, they all converge towards a solution and do not create anything new anymore afterwards. Combinatorial evolution of complex systems such as languages and technology are considered open-ended. Therefore, open-ended automatic programming might be possible through combinatorial evolution. We implemented a computer program simulating combinatorial evolution of code blocks stored in a database to make them available for combining. Automatic programming in the sense of algorithm-based code generation is achieved by evaluating regular expressions. We found that reserved keywords of a programming language are suitable for defining the basic code blocks at the beginning of the simulation. We also found that placeholders can be used to combine code blocks and that code complexity can be described in terms of the importance to the programming language. As in a previous combinatorial evolution simulation of electronic circuits, complexity increased from simple keywords and special characters to more complex variable declarations, class definitions, methods, and classes containing methods and variable declarations. Combinatorial evolution, therefore, seems to be a promising approach for open-ended automatic programming.
Keywords: Automatic programming Open-endedness
1
· Combinatorial evolution ·
Introduction
Genetic algorithms and evolutionary computation in general are widely used for solving optimisation problems [6]. Such algorithms follow the paradigm of biological evolution. They consist of a collection of virtual organisms, where every organism represents a possible solution to a given problem. Some fitness c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 1–12, 2022. https://doi.org/10.1007/978-3-030-96308-8_1
2
S. Fix et al.
measure is then calculated for each organism in an iterative process and it tries to find improved solutions by forming random mutations and crossovers on them. In contrast to such evolutionary computation, combinatorial evolution as proposed by W. Brian Arthur [1,2], makes no modifications to the organisms themselves. New solutions are formed through the combination of existing components which then form new solutions in later iterations with the goal of satisfying certain needs. The more useful a combination is, the higher is its need rating. Combining existing components to construct new components can be observed in the evolution of technology [1,2]. For instance, the invention of radar was only possible through combining simpler electronic parts fulfilling functions like amplification and wave generation [3]. In order to investigate combinatorial evolution, Arthur and Polak [3] created a simple computer simulation, where electronic circuits were evolved in a combinatorial manner. Their simulation started by randomly combining primitive elementary logic gates and then used these simpler combinations for more complicated combinations in later iterations. Over time, a small number of simple building blocks was transformed into many complicated ones, where some of them might be useful for future applications. It was concluded that combinatorial evolution allows building some kind of library of building blocks for the creation of future and more complicated building blocks. Intriguingly, combinatorial evolution is a key ingredient to achieve openended evolution [27,29], that is the ongoing creation of novelty [4,26]. This contrasts classical computational approaches where the aim is to converge towards a solution as fast as possible. Computational approaches according to open-ended evolution are therefore not more efficient but they are more creative since they generate ongoing novelty. Here we want to explore whether combinatorial evolution could be also applied to software development, more specifically to automatic programming to eventually make it open-ended [7]. An early idea of automatic programming was to implement high-level programming languages that are more human readable resulting in compilers, which produce low-level programs – down to machine code – from human readable syntax [8]. However, human input in some form was still needed and the programming task was simply transferred to a higher level. Furthermore, the software solution is limited by the programmer’s capabilities and creativity. Language therefore remains a barrier between programmers and computers. A way around this barrier would be to let the computer do the programming (also occasionally denoted as metaprogramming [10]), which might even lead to better programs. Koza [18] addressed this issue through genetic programming, where populations of computer programs are generated by a computer using genetic algorithms. The problem space consists of programs that try to solve (or approximately solve) problems. It has been demonstrated that random mutations and crossovers in source code can effectively contribute in creating new sophisticated programs [24]. Therefore, it seems possible to define a programming task and let a computer do the programming. However, looking at the process of software development, programming seems more comparable to technological rather than biological evolution. Existing libraries or algorithms are often integrated into new software
Open-Ended Automatic Programming Through Combinatorial Evolution
3
without the necessity of modifying them. Therefore, an automatic programming approach that creates new computer programs by means of combinatorial evolution might be an interesting alternative to genetic programming. Also, due to open-endedness, combinatorial evolution holds the promise to be more creative generating ongoing novelty. In the present study we investigate ways to define a programming task for automatic programming through combinatorial evolution including the evaluation of the generated code with a need rating. Our research question is whether it is possible to generate computer programs of increasing complexity using automatic programming through combinatorial evolution. Specifically, we ask what kind of basic code blocks are needed at the beginning? How are these code blocks implemented to allow them to combine? How can code complexity be measured?
2
Automatic Programming
Since the development of computers, it has been a challenge to optimise and adapt program code to access the potential performance of a computer. While the computational power of computers has been steadily increasing in recent years, program code is still limited by the ability of programmers to create efficient and functioning code. Programming languages have also evolved over the past decades. The development of programming languages has sought to provide programmers with abstractions at higher levels. However, this also led to limitations, especially regarding performance and creativity. It is thus intriguing to shift the programming to the computer itself. Most of the programming is currently done by human programmers, which often leads to a time-intensive and error-prone process of software development. The idea that computers automatically create software programs has been a long-standing goal [5] with the potential to streamline and improve software development. Automatic programming was first considered in the 1940s describing the automation of a manual process in general and with the goal to maximise efficiency [22]. Later, automatic programming was considered a type of computer programming in which code is generated using tools that allow developers to write code at a higher level of abstraction [22]. There are two main types of automatic programming: application generators and generative programming. Cleaveland [9] describes the development of application generators as the use of high-level programming models or templates to translate certain components into low-level source code. Generative programming, on the other hand, assists developers in writing programs. This can be achieved, e.g. by providing standard libraries as a form of reusable code [10]. In generative programming it is crucial to have a domain model, which consists of three main parts: a problem space, a solution space, and a configuration knowledge mapping that connects them [11]. The problem space includes the features and concepts used by application engineers to express their needs. These can be textual or graphical programming languages, interactive wizards, or graphical user interfaces. The solution space consists of elementary components with a maximum of combinability and
4
S. Fix et al.
a minimum of redundancy. The configuration knowledge mapping presents a form of generator that translates the objects from the problem space to build components in the solution space [10]. Most recently, automatic programming shifted towards higher level programming languages and incorporating even more abstraction [21]. While these kinds of automatic programming heavily depend on human interaction and thus the capabilities and creativity of programmers, genetic programming can be regarded an attempts to reduce this dependency and shift the focus to automation done by the computer itself. Koza [18] describes genetic programming as a type of programming in which programs are regarded as genes that can be evolved using genetic algorithms [15,16]. It aims to improve the performance of a program to perform a predefined task. According to Becker et al. [5], a genetic algorithm takes, as an input, a set of instructions or actions that are regarded as genes. A random set of these instructions is then selected to form an initial sequence of DNA. The whole genome is then executed as a program and the results are scored in terms of how well the program solves a predefined task. Afterwards, the top scorers are used to create offspring, which are rated again until the desired program is produced. To find new solutions, evolutionary techniques such as crossover, mutation, and replication are used [23]. Crossover children are created by picking two parents and switching certain components. Another technique is mutation, which uses only one individual parent and randomly modifies its parts to create a new child. Sometimes parents with great fitness will be transferred to the next iteration without any mutation or crossover because they might do well in later steps as well.
3
Combinatorial Evolution
With combinatorial evolution, new solutions build on combinations of previously discovered solutions. Every evolution starts with some primitive, existing building blocks and uses them to build combinations. Those combinations are then stored in an active repertoire. If the output satisfies a need better than an earlier solution, it replaces the old one and will be used as the building block in later iterations. Building blocks are thus not modified, they are combined together creating new building blocks. The result is a library of functionalities that may be useful for a solution in the future [1,2]. As Ogburn [20] suggested, the more equipment there is within a material culture, the greater the number of inventions are. This is known as the Ogburn’s Claim. It can therefore be inferred that the number and diversity of developed components as well as their technological developments matters because next generation components build upon the technological level of the previous, existing components. To investigate this, Arthur and Polak [3] created a simple computer simulation to ‘discover’ new electronic circuits. In their simulation, they used a predefined list of truth tables of basic logic functions such as full adders or n-bit adders. Every randomly created combination represented a potential satisfaction of a need, which was then tested against this list. If the truth table
Open-Ended Automatic Programming Through Combinatorial Evolution
5
of a newly created circuit matched one from the predefined list, it is added to the active repertoire as it fulfils the pre-specified functionality. Sometimes, it also replaced one that was found earlier, if it used fewer parts and therefore would cost less. New technologies in the real world are not usually found by randomly combining existing ones nor do they exist in a pre-specified list to be compared against. Nevertheless, their needs are generally clearly visible in economics and current technologies [3]. Combinatorial evolution is in general an important element of evolutionary systems. Stefan Thurner and his colleagues developed a general model of evolutionary dynamics in which the combination of existing entities to create new entities plays a central role [27–29]. They were able to validate this model using world trade data [17], therefore underlining the importance of evolutionary dynamics in economic modelling in general and combinatorial interactions in particular. The model shows punctuated equilibria that are typical for openended evolutionary systems [27–29].
4
Code Complexity
Genetic algorithms have been used for automatic programming already, however, a large number of iterations are required to significantly increase code complexity in order to solve more complex problems [14]. It therefore seems beneficial to use combinatorial evolution in which complexity seems to increase in fewer steps and thus less time. Code complexity has been measured in this context with different approaches. The cyclomatic complexity of a code is the number of linearly independent paths within it [12]. For instance, if the code contains no control flow elements (conditionals), the complexity would be 1, since there would be only a single path through the code [30]. If the code has one single-condition IF statement, the complexity would be 2 because there would be two paths through the code – one where the IF statement evaluates to TRUE and another one where it evaluates to FALSE [30]. Two nested single-condition IFs (or one IF with two conditions) would produce a complexity of 3 [19,30]. According to Garg [13], cyclomatic complexity is one of the most used and renowned software metrics together with other proposed and researched metrics, such as the number of lines of code and the Halstead measure. Although cyclomatic complexity is very popular, it is difficult to calculate for object-oriented code [25].
5 5.1
Methods Development Setup and Environment
We used the programming language Java though other programming languages would have been feasible as well. The development environment was installed on VirtualBox – an open source virtualisation environment from Oracle. Oracle Java SE Development Kit 11 was used with Apache Maven as build automation
6
S. Fix et al.
tool. To map the existing code with a database, Hibernate ORM was used. It allows mapping object-oriented Java code to a relational database. Furthermore, code versioning with GitHub was used. 5.2
Simulation
Simulations are initialised by adding some basic code building blocks into a repository. The first simulation iteration then starts by randomly selecting code blocks from this repository. Selected blocks are then combined into a new code block, which subsequently gets analysed for its usefulness and complexity. Based on this analysis, the code block is assigned a value. Nonsense code, which is the most common result when randomly combining keywords of a programming language, are assigned a value of 0 and not used any further. Only code blocks with a value greater than 0 are added to the repository and consequently have a chance of being selected in a later iteration. 5.3
Code Building Blocks
Preliminary experiments in which code snippets with placeholders were predefined showed that this approach would limit the creativity and complexity of the automatic programming solution by the predefined snippets. The simulation would only create program logic that is already given by the basic set of code blocks. To overcome this limitation, we defined basic code building blocks according to keywords and special characters of the Java programming language, e.g. the keywords int, for, class, and String as well as the special characters &, =, ;, and {. Additionally, we defined three more extra code blocks: First, PLACEHOLDER to define where blocks allow other code blocks to be combined and integrated. This is particularly important for nesting certain code elements, such as methods that must be nested into a class construct to be valid Java code. Second, NAME to name something, e.g. classes, methods, and variables. And third, the special keyword main in the main method definition. 5.4
Selecting and Combining Code Blocks
During the selection process, new source code is generated based on combinations of existing code blocks from the repository. The chance that a particular code block is selected depends on its classification value (see next section). In a first step, a helper function defines a random value of how many code blocks are taken into consideration in the current iteration. There is a minimum of two code blocks required to generate a new code block. The maximum number can be predefined in the program. Arthur and Polak [3] combined up to 12 building blocks. To reduce the number of iterations needed for receiving valid Java code, a maximum of eight blocks turned out to be a good limit. After randomly defining the number of code blocks to be combined, the weighted random selection of code blocks
Open-Ended Automatic Programming Through Combinatorial Evolution
7
based on their classification value follows. Instead of simply chaining all selected code blocks together, there is also the possibility to nest them into a placeholder if available. A random function decides whether a code block is nested into the placeholder, or simply added to the whole code block. This procedure is important because program code usually exhibits such nested structures. 5.5
Code Analysis and Building Block Classification
After the selection and combination process, the newly generated source code is passed into the classification function where it gets analysed. The classification process is required to weight the different code blocks according to their relevance in the Java programming language and to see whether the code evolved with respect to complexity. This is achieved with regular expression patterns, which allow identifying relevant Java code structures such as classes and methods that can be weighted with predefined classification values for these code structures. Basic structures such as variable declarations are assigned a value of 1. More elaborate structures such as classes have a value of 2 and even more complicated structures such as methods have a value of 3. If a structure contains several of these substructures, their classification values is added. An important structure in many programming languages is the declaration of a variable. With the following regular expression, any declaration of the value types boolean, byte, char, double, float, int, long, and short are detected: ( PLACEHOLDER (?! PLACEHOLDER ))? ( boolean | byte | char | double | float | int | long | short ) NAME ; ( PLACEHOLDER (?! PLACEHOLDER ))? Other important elements are brackets. E.g. they are used in methods and classes specifying the body. The syntax is given by the programming language. Placeholders inside brackets are important, they allow new code to be injected into existing code blocks in future combinations. We therefore created the following regular expression: ^(\{ PLACEHOLDER \}|\( PLACEHOLDER \)) $ As already shown in the simple simulation with electronic circuits [3], one needs a minimal complexity of the initial building blocks to be able to generate useful and more complex future combinations. Classes and methods are essential to build anything complex in Java. Therefore, regular expressions were implemented to identify valid classes and methods. Valid means, the element is closed and it successfully compiles. Variable declarations and methods are allowed to be nested in the class structure. The following regular expression to detect classes was developed: ( protected | private | public ) class NAME \{ (( boolean | byte | char | double | float | int | long | short ) NAME ; |( protected | private | public ) void NAME \( (( boolean | byte | char | double | float | int | long | short ) NAME )?\) \{ (( boolean | byte | char | double | float | int | long | short ) NAME ;
8
S. Fix et al.
| PLACEHOLDER (?! PLACEHOLDER ))*\} | PLACEHOLDER (?! PLACEHOLDER ))*\} $
A valid method needs to be correctly closed and can contain either a placeholder or a variable declaration. The following regular expression to detect methods was developed: ( PLACEHOLDER (?! PLACEHOLDER ))? ( protected | private | public ) void NAME \( (( boolean | byte | char | double | float | int | long | short ) NAME )?\) \{ (( boolean | byte | char | double | float | int | long | short ) NAME ; | PLACEHOLDER (?! PLACEHOLDER ))*\} ( PLACEHOLDER (?! PLACEHOLDER ))?
5.6
Regular Expression Validation
In some preliminary experiments, we automatically compiled source code files of newly combined code blocks to check whether they are valid. However, this process is too time consuming to allow large numbers of iterations. An iteration required one to three seconds compilation time. As combinatorial evolution relies on rather large numbers of iterations, we instead used regular expressions to check whether newly combined code blocks compile and are thus valid. Java allows compiling regular expression into a pattern object, which can then be used to match it with a string containing the code to be tested. It turned out to be a much faster alternative to the actual compilation of source code files.
6
Results
Using Java keywords for the initial basic code blocks, we found the first useful combinations of code blocks within 100’000 iterations in a simulation of 1.6 billion iterations, which took approximately 5 h on a desktop computer. These code blocks mainly consisted of combinations of three basic building blocks classified with a value of 1. Table 1 shows some examples that were found in the simulation. Such combinations are typically assigned a small classification value due to their simplicity, keeping in mind that only code blocks that are assigned values greater than 0 are added to the code block repository for later combinations. It did not take long for the combinatorial evolution simulation to find the first combinations that consisted of previously found code blocks as illustrated in Table 2. E.g. code block 45 - which consists of block 42 and block 44 - was found only 308 iterations later. Though it took some time to find a Java method in code block 168, only a small number of iterations later, many subsequent code blocks followed with higher classification values. Code blocks 169 and 170 characterise Java classes that contain methods and declarations of variables. It took considerably longer to jump to the next higher classification value of 3 as compared to the jump from value 1 to 2. More than 1 · 109 iterations were required to evolve a method with a placeholder in it, classified with a value of 3. From
Open-Ended Automatic Programming Through Combinatorial Evolution
9
Table 1. Examples of newly generated code blocks within the first 100’000 iterations of a combinatorial evolution simulation. Class refers to the classification value representing how useful the code block is in programming. Iteration Block New Code Block
Class
4’647
25
short NAME ;
1
16’394
30
public void NAME
1
22’729
34
boolean NAME ;
1
50’419
42
protected class NAME 1
58’595
44
{ PLACEHOLDER }
1
93’722
55
public class NAME
1
Table 2. Examples of newly generated code blocks within a wide range of iterations of a combinatorial evolution simulation of 1.6 billion iterations. Class refers to the classification value representing how useful the code block in programming. Iteration Block New Code Block
Class
58’903
45
protected class NAME { PLACEHOLDER }
2
112’609
61
public class NAME { PLACEHOLDER }
2
> 1 · 109 168
public void NAME ( ) { PLACEHOLDER }
3
> 1 · 109 169
public class NAME { public void NAME ( ) { PLACEHOLDER 6 } short NAME ; }
> 1 · 109 170
protected class NAME { boolean NAME ; public void NAME ( ) { 6 PLACEHOLDER } }
there it only took a few iterations to jump to classification values of 4, 5, and even 6. Combinations of a method with a variable declaration were assigned a classification value of 4, combinations with a class were assigned a classification value of 5, and combining all three resulted in the assignment of a classification value of 6.
10
7
S. Fix et al.
Discussion and Conclusion
In the present paper, we investigated whether it is possible to generate computer programs of increasing complexity using automatic programming through combinatorial evolution since this would make it an open-ended process. Specifically, we wanted to know what kind of basic code blocks are needed at the beginning of a simulation, how are these code blocks implemented to allow them to combine, and how can code complexity be measured. To start the first iteration of the combinatorial evolution simulation we needed to define code blocks that existed in the programming framework Java. As initial code blocks we defined reserved keywords of the Java programming language that are used to define classes, methods, initialise variables, and so on. This also includes some special characters used in the programming language that we also added. Placeholders within code blocks are used to allow combining code blocks and thus source code. Newly generated code blocks are assigned a classification value according to their structure, which represents code complexity. The combinatorial evolution simulation generated code blocks including classes, methods, variables, and combinations thereof. It therefore generated code of increasing complexity. Regarding measuring complexity, different approaches to do so, e.g. determining the number of lines of code and McCabe’s cyclomatic complexity [19], have been taken into consideration but the code blocks from the outcomes after nearly 2 billion iterations were in our opinion still too short to implement these complexity measures. Two factors were important why we did not use McCabe’s cyclomatic complexity [19]. First, it did not generate the required main method within a reasonable number of iterations, so there was no starting point. Second, we decided to not have the decision code block assigned a value greater than 0 in the initial code blocks. Without any of these code blocks, the complexity would always be evaluated as 1. We conclude that the combinatorial evolution simulation clearly shows how Java code can be automatically created using combinatorial evolution. Simple keywords and special characters were successfully combined into more complex and different structures like variable declarations or methods and in later iterations they even got combined into more sophisticated results such as classes consisting of methods and variable declarations. We also conclude that due to combinatorial evolution, open-ended automatic programming could be achieved, indicating an intriguing approach if creativity is important. The reached limitations of complexity show that further research is required. Similar observations for genetic programming [14] suggest that more advanced evolutionary operators could be useful. However, already when starting with further elaborated code blocks or when reaching them during previous combinatorial evolution, the goal of automatic programming might come much closer. Therefore, forthcoming research may also study the concept with much increased computational power and distributed computing.
Open-Ended Automatic Programming Through Combinatorial Evolution
11
References 1. Arthur, W.B.: The Nature of Technology: What it is and How it Evolves. Free Press, New York (2009) 2. Arthur, W.B.: How we became modern. In: Sim, S., Seet, B. (eds.) Sydney Brenner’s 10-on-10: The Chronicles of Evolution. Wildtype Books (2018) 3. Arthur, W.B., Polak, W.: The evolution of technology within a simple computer model. Complexity 11(5), 23–31 (2006) 4. Banzhaf, W., et al.: Defining and simulating open-ended novelty: requirements, guidelines, and challenges. Theory Biosci. 135(3), 131–161 (2016). https://doi. org/10.1007/s12064-016-0229-7 5. Becker, K., Gottschlich, J.: AI programmer: autonomously creating software programs using genetic algorithms. arXiv (2017) 6. B¨ ack, T.: Evolutionary Algorithms in Theory and Practice. Oxford University Press, New York (1996) 7. Christen, P.: Modelling and implementing open-ended evolutionary systems. In: The Fourth Workshop on Open-Ended Evolution (OEE4) (2021) 8. Chun, W.H.K.: On software, or the persistence of visual knowledge. Grey Room 18, 26–51 (2005) 9. Cleaveland, J.: Building application generators. IEEE Softw. 5(4), 25–33 (1988) 10. Czarnecki, K., Eisenecker, U.: Generative Programming: Methods, Tools, and Applications. ACM Press/Addison-Wesley Publishing Co., New York (2000) 11. Czarnecki, K.: Perspectives on generative programming. In: SFB 501 “Development of Large Systems with Generic Methods” (2003) 12. Ebert, C., Cain, J., Antoniol, G., Counsell, S., Laplante, P.: Cyclomatic complexity. IEEE Softw. 33(6), 27–29 (2016) 13. Garg, A.: An approach for improving the concept of Cyclomatic Complexity for Object-Oriented Programming. arXiv (2014) 14. Harter, A.T.: Advanced techniques for improving canonical genetic programming. Missouri University of Science and Technology (2019) 15. Holland, J.H.: Genetic algorithms. Sci. Am. 267(1), 66–72 (1992) 16. Holland, J.H.: Signals and Boundaries: Building Blocks for Complex Adaptive Systems. The MIT Press, Cambridge (2012) 17. Klimek, P., Hausmann, R., Thurner, S.: Empirical confirmation of creative destruction from world trade data. PLoS ONE 7(6), e38924 (2012) 18. Koza, J.R.: Genetic programming as a means for programming computers by natural selection. Stat. Comput. 4(2), 87–112 (1994) 19. McCabe, T.J.: A complexity measure. IEEE Trans. Softw. Eng. SE–2(4), 308–320 (1976) 20. Ogburn, W.F.: Social Change: With Respect to Culture and Original Nature. B. W. Huebsch, New York (1922) 21. O’Neill, M., Spector, L.: Automatic programming: the open issue? Genet. Program. Evolvable Mach. 21(1), 251–262 (2019). https://doi.org/10.1007/s10710019-09364-2 22. Parnas, D.L.: Software aspects of strategic defense systems. ACM SIGSOFT Softw. Eng. Notes 10(5), 15–23 (1985) 23. Pillay, N., Chalmers, C.K.A.: A hybrid approach to automatic programming for the object-oriented programming paradigm. In: Proceedings of the 2007 Annual Research Conference of the South African Institute of Computer Scientists and Information Technologists on IT Research in Developing Countries, SAICSIT ’07, New York, NY, USA, pp. 116–124. Association for Computing Machinery (2007)
12
S. Fix et al.
24. Poli, R., Langdon, W.B., McPhee, N.F., Koza, J.R.: Genetic programming: an introductory tutorial and a survey of techniques and applications. University of Essex, UK, Tech. Rep. CES-475, pp. 927–1028 (2007) 25. Sarwar, M.M.S., Shahzad, S., Ahmad, I.: Cyclomatic complexity: the nesting problem. In: Eighth International Conference on Digital Information Management (ICDIM 2013), pp. 274–279. IEEE (2013) 26. Taylor, T.: Evolutionary innovations and where to find them: routes to open-ended evolution in natural and artificial systems. Artif. Life 25(2), 207–224 (2019) 27. Thurner, S.: A simple general model of evolutionary dynamics. In: MeyerOrtmanns, H., Thurner, S. (eds.) Principles of Evolution: From the Planck Epoch to Complex Multicellular Life. The Frontiers Collection, pp. 119–144. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-18137-5 4 28. Thurner, S.: The creative destruction of evolution. In: Sim, S., Seet, B. (eds.) Sydney Brenner’s 10-on-10: The Chronicles of Evolution. Wildtype Books (2018) 29. Thurner, S., Hanel, R., Klimek, P.: Introduction to the Theory of Complex Systems. Oxford University Press, New York (2018) 30. Wikipedia contributors: Cyclomatic complexity — Wikipedia, the free encyclopedia. https://en.wikipedia.org/w/index.php?title=Cyclomatic complexity& oldid=1054490449 (2021). Accessed 19 Nov 2021
Deep Face Mask Detection: Prevention and Mitigation of COVID-19 Sahar Dammak1(B) , Hazar Mliki2 , and Emna Fendri3 1 2
MIRACL-FSEG, Faculty of Economics and Management of Sfax, Sfax, Tunisia [email protected] MIRACL-ENET’COM, National School of Electronics and Telecommunications of Sfax, Sfax, Tunisia 3 MIRACL-FS, Faculty of Sciences of Sfax, Sfax, Tunisia [email protected]
Abstract. Recently, the outbreak of Coronavirus Disease 2019 (COVID-19) has spread rapidly across the world. In order to safely live with the virus while effectively reducing its spread, the use of face masks has become ubiquitous. Indeed, several countries enforced compulsory face mask policies in public areas. Therefore, it is important to provide automatic solutions for masked/not-masked faces detection. In this work, we proposed a face mask detection method based on deep convolutional neural networks (CNNs) in an uncontrolled environment. In fact, the proposed method aims to locate not-masked faces in a video stream. Therefore, we performed a face detection based on the combination of multi-scale CNN feature maps. Then, we classified each face as maskedface or not-masked face. The main contribution of the proposed method is to reduce confusion between detected object classes by introducing a two steps face mask detection process. The experimental study was conducted on the multi-constrained public dataset “Face Mask Dataset” and the Simulated Masked Face Dataset (SMFD). The achieved results reveal the performance of our face mask detection method in an uncontrolled environment. Keywords: COVID-19 · Face mask detection CNN · Data augmentation
1
· Face detection ·
Introduction
Coronavirus disease 2019 (COVID-19) was recognized in December 2019, in Wuhan, Hubei province of China [6] which may cause acute respiratory illness and even fatal acute respiratory distress syndrome (ARDS). Therefore, the world health organization (WHO) has declared on 11 March 2020 that COVID19 becomes a global pandemic [21]. This pandemic is entering a new phase in which the world must learn to live with this virus. Hence, wearing a face mask has become mandatory, especially in public and crowded places, to prevent the spread of respiratory infections and subsequently the transmission of COVID-19 c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 13–22, 2022. https://doi.org/10.1007/978-3-030-96308-8_2
14
S. Dammak et al.
epidemic. In order to ensure efficient control of wearing masks in public places, the automatic detection of not-masked faces is becoming a requirement. The face mask detection task can be considered as a special case of occluded face detection. In the literature, despite the high efficiency of recent face detection methods [1,22], occluded face detection is a challenging task due to the large appearance variation affected by various real-world occlusions. In this context, Su et al. [20] proposed a CNN based method for occluded face detection. In fact, they used a fully convolutional networks (FCN) model for the pixel-wise classification and the bounding box regression. The light-weighted neural network PVANet is used as the backbone. In addition, authors applied a long short term memory (LSTM) architecture to enhance the contextual information of the feature maps. Then, the Non-Maximum Suppression (NMS) algorithm is applied on the generated face regions to get final results. Li et al. [10] used CNN model to distinguish occluded faces from original faces. The authors employed three different convolutional neural networks to test on four features: skin regions, luminosity on different surfaces, luminosity around edges and colours around edges. Then, they performed fusion on decisions out of individual image features classifications using majority voting. Jin et al. [8] used the VGG16 pre-trained model as the backbone network to extract multi-scale features. Then, the authors combined these multi-scale features with the feature-enhanced network SG-net. Finally, the classification and border regression are performed based on fusion features. Recently, several study were proposed to deal with the face mask detection. In this context, Jiang et al. [7] proposed a face mask detection method named RetinaMask detector which is an extension of RetinaFace [3]. In fact, RetinaMask consists in detecting objects and classifying them into three classes: background, face with mask and face without mask. It is composed of a backbone, a neck and heads. For the backbone, authors used the pre-trained CNN ResNet to extract feature maps from input images. Regarding the neck, they enhanced and refined the original feature maps by extracting high-level semantic information and then fused it with the previous layers’ feature maps. Finally, heads stand to identify the corresponding object classes. In [9], the authors proposed a framework based on the Multi-Task Cascaded Convolutional Neural Network (MTCNN) for the face detection to identify the faces and their corresponding facial landmarks. These facial images are then processed by a neoteric classifier based on MobileNetV2 architecture to identify masked regions. Shay et al. [19] introduced an approach which consists of three key components. The first component applies a deep learning architecture that integrates deep residual learning (ResNet-50) with Feature Pyramid Network (FPN) to detect the existence of human subjects in the videos. The second component introduces Multi-Task Convolutional Neural Networks (MT-CNN) to detect and extract human faces from these videos. Finally, they build and train a CNN classifier to distinguish masked and unmasked human face subjects in the third component. As for [14], authors proposed a two-stage network based on a light feature extraction. The first stage filters non-faces using a CNN based on a Rapidly
Deep Face Mask Detection: Prevention and Mitigation of COVID-19
15
Reduced Block (R2B) and an Efficiently Extracted Block (E2B). Then, the second stage classifies the facial regions into masked/not masked face using a slim CNN architecture which emphasizes shallow layers and narrow channels. Loey et al. [11] introduced a hybrid model using deep and classical machine learning for face mask detection. They have used deep transfer learning (ResNet50) for feature extraction and combined it with classical machine learning algorithms whose individual decisions are merged to identify the final decision (K-Nearest Neighbors Algorithm (k-NN), Linear Regression and Logistic Regression). In this paper, we introduced a new face mask detection method based on deep convolutional neural networks (CNNs) in an uncontrolled environment. The proposed method aims to detect faces and classify them into masked or not-masked faces. Our method involves two steps: the first step is dedicated for the face detection and the second step is for the face mask classification. Furthermore, we included data augmentation to deal with the problem of data scarcity and studied the effects of different augmentations on the proposed CNN architecture efficiency. The main contribution of this paper is to introduce a two steps face mask detection process to reduce confusion between detected object classes and improve the classification performance. The remainder of this paper is organized as follows. Section 2 introduced and detailed the proposed method. The experiments and results were illustrated in Sect. 3 Finally, Sect. 4 provided the conclusion and some perspectives.
2
Proposed Method for Face Mask Detection
The proposed method is based on deep convolutional neural networks (CNNs) in an uncontrolled environment. It involves two steps: face detection and face mask classification as illustrated in Fig. 1.
Fig. 1. Proposed method for face mask detection.
2.1
Face Detection
In this step, we performed the face detection by applying the face detector proposed in our previous work [12] which is able to handle the face appearance variation affected by occlusion, pose variation, low resolution, scale variation,
16
S. Dammak et al.
illumination variation among others. In fact, the proposed face detection method introduces a new CNN architecture that extends the Faster R-CNN [15] architecture by combining both global and local features at multiple scales. The face detection step is illustrated in Fig. 2. For the features extraction, we extracted feature maps using the pre-trained ResNet model [5]. Both of the ROI generation and ROI classification steps share the full-image convolution features of the ResNet model. As for the ROI generation, we extracted ROIs from the last convolutional layer of the pre-trained ResNet model by using a sliding window and an anchors based algorithm. Then, we applied the ROI Pooling layer to extract the feature maps of the generated ROIs from different convolution layers, including different levels and scales features. Finally, we concatenated the obtained features maps to enhance the ROI feature vectors in order to perform the final ROI classification (face/non-face).
Fig. 2. Face detection step.
2.2
Face Mask Classification
For the face mask classification step, it relies on a CNN based model to classify the detected faces into masked or not-masked faces. Data Augmentation. We opted to data augmentation as a pre-processing step to achieve considerable performance. In fact, The data augmentation technique is a strategy for increasing the amount and quality of training datasets in order to construct stronger Deep Learning models can be developed with them [17]. Therefore, we used the data augmentation to force the model to learn a robust face representation with different modifications and more difficult situations in the training dataset. In fact, we have applied various data augmentation methods based on the geometric transformations such as flipping, rotation, rescaling, shifting and zooming. Each image is horizontally flipped, rotated with a random angle between (−30, 30), and randomly zoomed in or out within the range of 0.3. Furthermore, The width-shift-range randomly shifts the pixels horizontally either to the left or to the right. As for the height-shift-range, it randomly shifts the pixels vertically either to the top or to the bottom. Figure 3 represents some samples of augmented images.
Deep Face Mask Detection: Prevention and Mitigation of COVID-19
17
Fig. 3. Samples of augmented images.
CNN Architecture. We proposed to adapt the pre-trained VGG-16 [18] to the face mask classification task since it has shown good results on the ImageNet challenge. In fact, we fine-tuned the VGG16 by replacing the Fully connected (FC) layers with two fully connected layers of 1024 units and we used a softmax layer as an activation function of 2 units for masked face/not-masked face. In addition, we used a Dropout algorithm to overcome the over-fitting and improve the performance with ratio 0.5 after each FC layer during training. Moreover, the weights in the five first convolutional layers were freezing where the rest of the layers were trained for a total of 100 epochs. The proposed model was optimized with a stochastic gradient descent (SGD) which is considered as an effective standard optimization algorithm for machine learning classification models [16]. The face mask classification step is illustrated in Fig. 4.
Fig. 4. Face mask classification step.
3
Experimental Study
In this study, the performance of the face mask detection method is evaluated. In the next sections, we present the datasets, the evaluation metrics and the experimental results. 3.1
Dataset Description
We have carried out our experiments on the “Face Mask Dataset” [2] and the Simulated Masked Face Dataset (SMFD) [13].
18
S. Dammak et al.
The “Face Mask Dataset” contains 7959 images where the faces are annotated as masked/not-masked. It is a collection of images from the Wider Face [23] and MAsked FAces (MAFA) datasets [4]. We used this dataset for the training and testing phases. – The Wider Face dataset is a reference benchmark for face detection which contains 393,703 labeled human faces from 32,203 images collected based on 61 event classes from internet. The Wider Face dataset includes many constraints such as small scale, illumination, occlusion, background clutter and extreme poses. – The MAFA dataset is a challenging dataset for detecting occluded faces. It contains 30,811 images and 34,806 masked faces collected from Internet. Each image is included a range of color images with different resolutions, different illuminations, and different background complexity. In this dataset, some faces are masked by hands, other accessory and surgical masks. The SMFD dataset consists of 1,376 images, 690 for simulated masked faces, 686 for unmasked faces. It contains portrait images of male and female faces with a variety of poses and sizes. We used this dataset for the testing phase. 3.2
Experimental Results
We investigate the performance of the proposed method for face mask detection through quantitative and qualitative experiments. The quantitative experiments seeks to compare the performance of the proposed method with the recent face mask detection methods from the state of the art. The qualitative experiments aims to show some results of our face mask detection method on the “Face Mask Dataset”. Quantitative Experiments. In order to evaluate the performance of the proposed method for the face mask detection, we conducted three series of experiments. The first aimed to validate the use of the data augmentation. The second series compared the performance of the proposed method with the most recent face mask detection methods in the state of the art on the “Face Mask Dataset”. As for the third series, it seeks to prove the independence of the proposed method from the training data. The evaluating metrics used in the experiments are: the standard accuracy, recall and precision. First Series of Experiments. In this series, we evaluated the proposed method performance with and without using the data augmentation. Table 1 shows the evaluation of the data augmentation in terms of recall and precision rates on the “Face Mask Dataset”.
Deep Face Mask Detection: Prevention and Mitigation of COVID-19
19
Table 1. Evaluation of data augmentation on the “Face Mask Dataset”. Methods
Not-masked face Precision Recall
Masked face Precision Recall
Without data augmentation 97.68%
95.93% 92.38%
95.57%
With data augmentation
97.27% 94.77%
95.77%
97.80%
Through this first series of experiments, we note that the use of data augmentation help to achieve a gain of 0.12% and 1.34% in terms of precision and recall rates, respectively, for the not-masked face class. As for the masked face class, we achieved an improvement rate equal to 2.39% and 0.20% in terms of precision and recall rates, respectively. The obtained results prove the importance of using the data augmentation to generate more face images with misalignment for the training of the proposed face mask detection method. Second Series of Experiments. In this experiments, we compare the performance of the proposed face mask detection method with recent works [2,7,9] in terms of recall and precision rates on the “Face Mask Dataset”. Unlike [7] and [2] methods which perform final classification on three classes (masked face, not-masked face, background), we conducted final classification only on two classes (masked face, not-masked face). Table 2 displays this comparative study. Table 2. Comparison of the proposed face mask detection method with the state of the art on the “Face Mask Dataset” in terms of recall and precision rates. Methods
Not-masked face Precision Recall
Masked face Precision Recall
Baseline [2]
89.60%
85.30% 91.90%
88.60%
Jiang et al. [7]
91.90%
96.30% 93.40%
94.50%
Joshi et al. [9]
94.50%
86.38% 84.39%
80.92%
Proposed method 97,80%
97.27% 94.77%
95.77%
Referring to Table 2, the obtained results outperform those recorded by [2,7,9] thanks to the use of the two steps face mask detection process which reduce the confusion between classes and then improve the classification performance. Moreover, the use of the data augmentation to generate more invariant representation for the training of the proposed model improve, significantly, the face mask classification rates. Third Series of Experiments. For deep evaluation of the proposed method, we conducted more experiments on the SMFD dataset. Compared to the “Face Mask Dataset”, SMFD dataset shows less constraints. The proposed method records 100% as accuracy rate. The obtained results underline not only the accuracy of
20
S. Dammak et al.
Fig. 5. Samples face mask detection results using the proposed method on “Face Mask Dataset”.
the generated face mask detection model, but also its independency from the training dataset. For this reason, we compared our results with recent face mask detection methods [11,14] in terms of accuracy. In fact, the achieved results are better than [11] which records an accuracy of 99.49%. Such a result is justified by the fact that unlike [11] who used an SVM classifier, we resorted to the use of the softmax layer during the classification step. Compared to [14] which records an accuracy of 99.72% on the same dataset, we achieved a gain of 0.28% thanks to the use of data augmentation to generate more invariant representation for the training and to obtain a robust model to handle the face appearance variation in an uncontrolled environment. Qualitative Experiments. Figure 5 represents some qualitative results of our face mask detection method on the “Face Mask Dataset” where the red and green boxes refer to the not-masked and masked face classes, respectively. We
Deep Face Mask Detection: Prevention and Mitigation of COVID-19
21
observe that the proposed method can efficiently detect masked and not-masked faces across challenging constraint (i.e. occlusion, facial expressions, poses and illumination variation, low resolution conditions). As shown in Fig. 5, we succeed to deal with multi-faces images: (a) without mask, (b) confusing face without mask, (c) with mask and (d) with and without mask. More qualitative results are available on these links: video1 and video2.
4
Conclusion and Future Work
Due to the outbreak of the COVID-19 and the trend of wearing a face masks in public area, we introduced a new face mask detection method based on deep convolutional neural networks (CNNs) in an uncontrolled environment. In fact, the proposed method detect faces and check whether they are wearing a mask or not in a video stream. The proposed method involves two steps: face detection and face mask classification. We have used the data augmentation to generate more face images with misalignment for the training of the proposed model to improve, significantly, the face mask classification rate. We evaluated our method on the challenging datasets “Face Mask Dataset” and the Simulated Masked Face Dataset (SMFD). The experimental study showed that using a two steps face mask detection process reduces confusion between classes and improve the classification performance. The obtained results provide new perspectives to enhance the face recognition accuracy on masked faces.
References 1. Chen, H., Chen, Y., Tian, X., Jiang, R.: A cascade face spoofing detector based on face anti-spoofing R-CNN and improved Retinex LBP. IEEE Access 7, 170116– 170133 (2019) 2. Daniell, C.: Detect faces and determine whether people are wearing mask (2020). https://github.com/AIZOOTech/FaceMaskDetection 3. Deng, J., Guo, J., Zhou, Y., Yu, J., Kotsia, I., Zafeiriou, S.: RetinaFace: single-stage dense face localisation in the wild. arXiv preprint arXiv:1905.00641 (2019) 4. Ge, S., Li, J., Ye, Q., Luo, Z.: Detecting masked faces in the wild with LLECNNs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2682–2690 (2017) 5. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) 6. Huang, C., et al.: Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. lancet 395(10223), 497–506 (2020) 7. Jiang, M., Fan, X.: RetinaMask: a face mask detector. arXiv preprint arXiv:2005.03950 (2020) 8. Jin, Q., Mu, C., Tian, L., Ran, F.: A region generation based model for occluded face detection. Procedia Comput. Sci. 174, 454–462 (2020) 9. Joshi, A.S., Joshi, S.S., Kanahasabai, G., Kapil, R., Gupta, S.: Deep learning framework to detect face masks from video footage. In: 2020 12th International Conference on Computational Intelligence and Communication Networks (CICN), pp. 435–440. IEEE (2020)
22
S. Dammak et al.
10. Li, H., Alghowinem, S., Caldwell, S., Gedeon, T.: Interpretation of occluded face detection using convolutional neural network. In: 2019 IEEE 23rd International Conference on Intelligent Engineering Systems (INES), pp. 000165–000170. IEEE (2019) 11. Loey, M., Manogaran, G., Taha, M.H.N., Khalifa, N.E.M.: A hybrid deep transfer learning model with machine learning methods for face mask detection in the era of the COVID-19 pandemic. Measurement 167, 108288 (2021) 12. Mliki, H., Dammak, S., Fendri, E.: An improved multi-scale face detection using convolutional neural network. Signal Image Video Process. 14(7), 1345–1353 (2020). https://doi.org/10.1007/s11760-020-01680-w 13. Prajnasb: observations (2020). https://github.com/prajnasb/observations 14. Putro, M.D., Nguyen, D.-L., Jo, K.-H.: Real-time multi-view face mask detector on edge device for supporting service robots in the COVID-19 pandemic. In: Nguyen, N.T., Chittayasothorn, S., Niyato, D., Trawi´ nski, B. (eds.) ACIIDS 2021. LNCS (LNAI), vol. 12672, pp. 507–517. Springer, Cham (2021). https://doi.org/10.1007/ 978-3-030-73280-6 40 15. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015) 16. Ruder, S.: An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747 (2016) 17. Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. J. Big Data 6(1), 1–48 (2019) 18. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) 19. Snyder, S.E., Husari, G.: Thor: a deep learning approach for face mask detection to prevent the COVID-19 pandemic. In: SoutheastCon 2021, pp. 1–8. IEEE (2021) 20. Su, Y., Wan, X., Guo, Z.: Robust face detector with fully convolutional networks. In: Lai, J.-H., et al. (eds.) PRCV 2018. LNCS, vol. 11258, pp. 207–218. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03338-5 18 21. WHO: World Health Organization: Who characterizes COVID-19 as a pandemic (2020). https://www.who.int/dg/speeches/detail/who-director-general-sopening-remarks-at-the-mediabriefing-on-covid-19-11-march-2020 22. Wu, W., Yin, Y., Wang, X., Xu, D.: Face detection with different scales based on faster R-CNN. IEEE Trans. Cybern. 49(11), 4017–4028 (2018) 23. Yang, S., Luo, P., Loy, C.C., Tang, X.: WIDER FACE: a face detection benchmark. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5525–5533 (2016)
Extracting Emotion and Sentiment Quotient of Viral Information Over Twitter Pawan Kumar1 , Reiben Eappen Reji1 , and Vikram Singh2(B) 1 National Institute of Technology, Surathkal, India
{pawan.181ee133,reubeneappenreji.181ee136}@nitk.edu.in 2 National Institute of Technology, Kurukshetra, India [email protected]
Abstract. In social media platforms, viral or trending information are consumed for several decision-making, as they harness the information flux. In apt to this, millions of real-time users often consumed the data co-located to these virilities. Thus, encompass sentiment and co-located emotions, could be utilized for the analysis and decision support. Traditionally, sentiment tool offers limited insights and lacks in the extraction of emotional impact. In these settings, estimation of emotion quotient becomes a multifaceted task. The proposed novel algorithm aims, to (i) extract the sentiment and co-located emotions quotient of viral information and (ii) utilities for comprehensive comparison on co-occurring viral information, and sentiment analysis over Twitter data. The emotion and micro-sentiment reveals several valuable insight of a viral topic and assists in decision support. A use-case analysis over real-time extracted data asserts significant insights, as generated sentiments and emotional effects reveals co-relations caused by viral/trending information. The algorithm delivers an efficient, robust, and adaptable solution for the sentiment analysis also. Keywords: Big Data · Emotion quotient · Sentiment analysis · Twitter
1 Introduction The traditional social media platforms, e.g., Twitter, Facebook, etc. cater to the global users and list their personal information and media. The heterogeneous user data is often utilized for deriving common sentiments or trending information. The trending or viral information primarily harnesses the global content shared co-related to a particular topic and hash tag keywords [1]. A naive user or new user usually refers to this trending or viral list of information to see the most occurring or contributory piece of information [2, 3]. In this process, a user simply refers to the viral information and explores the related term over the Twitter API, without cognitive awareness of the emotional effect of viral information. A piece of viral information may have a list of information that may trigger the emotional effect on the user and lead to emotional splits or swings on the choice of information. Though social media platforms offer limited or no functions or aspect-related views on the API for the generic user. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 23–33, 2022. https://doi.org/10.1007/978-3-030-96308-8_3
24
P. Kumar et al.
Typically, the designed algorithm for the sentiment level (SL) and emotion quotients (EQ) statistics could serve several pivotal objectives, as asserted by the experimental analysis also [7, 8]. The sentiment and EQ statistics generated could be utilized in several application areas: decision-making, advertising, public administrations, etc. Though, generating these statistics for real-time published data from the twitter data is a complex and multifaceted computing task [13, 14]. 1.1 Motivation and Research Questions The sentiment analysis is a complex computing task, mainly due to the semantic correlation that exists between the user-generated data and targeted sentiment level and created emotional quotients [14]. The task becomes multifaceted, primarily, due to estimation of ‘emotion effect’ within each sentiment level. In these settings, a real-time strategy for the generation of insights, as deliverables is the need-of-hour. We have formalized research questions (RQs) steer the proposed system design, as: RQI: What are the key twitter data elements/features to extract the SL and EQs? RQII: How to estimate the SLs and EQs and co-located overlap on both estimates? RQIII: What SL and EQ statistics asserted for spectrum of application domains? The designed RQs assist in conducting overall work and validate its feasibility for analytics and just-in-time decision-making over real-time published twitter data. The key contribution is a robust and adaptive algorithm for estimation of both SL and EQ on real-time basis, for an interactive data play. Other contributions, as: (i)
A portable and adaptive UI, to assist on generates the real-time statistics (emotion and sentiment polarities) for an emotion value ‘as query’ or viral information. (ii) The strategy outlines pivotal features of text-based sentiment and emotion analysis on social media, e.g. subjectivity, statement polarity, emotions expressed, etc. (iii) The experimental assessment asserts the overall accuracy upto 89% and 90%, respectively for sentiment and EQs estimation. The overall performance achieved is at significant-level in the view of real-time soft data analysis challenges. The paper is organized as: Sect. 2 lists the relevant research efforts to the sentiment and emotional statistics. Section 3 elaborates the conceptual schema and internals of the designed strategy, with formulas and working example. Section 4 showcased the performance evaluation of the algorithm. Paper is concluded at the last.
2 Related Work In recent years, developing novel algorithms for sentiment and threaded emotional analytics estimation on soft data, particularly at micro-level on viral or trending information is area of interest. The located research areas fall under two heads:
Extracting Emotion and Sentiment Quotient of Viral Information
25
2.1 Sentiment Analysis Over Soft Data (Reviews/Posts/Viral Information) In recent years, the research efforts made on the accurate estimation of sentiment statistics, with a listed core task (i) an automatic identification of relevant and text with opinion or documents [15–20], (ii) preparation of sentiment and threaded sentiment analysis. Existing strategies and methods employed mainly rule-based and statistical machine learning approaches for these inherent tasks, e.g. opinion mining and sentiment analysis [22, 23]. A comprehensive survey is presented in [23] with two broad set of strategies (opinion mining and sentiment analysis). Whereas, Turney [31] asserts that an unsupervised algorithm, could be more suitable for the lexicon-based determination of sentiment phrases using function of opinion words over the word/sentences or document corpus, same is supported in [4]. Another work in [5] highlighted the use of SentiWordNet as lexicon-based sentiment classifier over document and text segregation, as it may contains opinion strength for each term [22]. A prototype in [9], used the SentiWordNet Lexicon to classify reviews and [6] build a dictionary enabled sentiment classification over reviews with embedded adjectives. Further, several Naïve-byes and SVM for sentiment analysis of movie reviews supported by inherent features, unigrams, bigrams, etc. [23, 25]. In the recent potential work, a subsequence kernel-based voted perceptron prototype is created, and it is observed that the increase in the number of false positives is strongly co-related with the number of true positives. The designed models reveal its resiliency over intermediate star rating reviews classification, though five-star rating reviews is not utilized while training the model. Similar model is used for the sentiment analysis over the microblog posts using two phases: first phase involves partition of subjective and objective documents based on created and further for the generation of sentiment statistics (as positive and negative) in the second phase [10, 11]. 2.2 Emotion Quotients (EQs) Over Soft Data The accurate detection of inherent ‘Emotion’ over a text data using natural language processing and text analytics to discover people’s feelings located subarea of research work. The usage of it could be tracking of disasters and social media monitoring. Tracking user’s opinions and inherent emotional quotients using posted soft data reveals interesting insights, e.g. tracking and analyzing Twitter data for election campaigns [6, 21, 22]. There are several research studies asserts that sentiment topics and emotion topic/terms delivers promising outcomes for the generation of both polarities, such as for the tracking and monitoring ‘earthquake disasters’ using ‘Weibo’ a Chinese social media content is used to see the sentiments generated and sensitization. In this, the proposed framework detected disasters related sentiment over massive data from a micro-blogging stream and to filter the negative messages to derive co-located event discovery in a post-disaster situation [23, 24, 37]. The emergence of spectrum of social media platforms justified the need of social analytics for decision-making, similar fashion, tracking sentiment on news entities, to detect socio-politics issues, here sentiment-spike generated over large no. of entities and domains [25–29]. Similarly, in [30, 31] a system to tracking health trends using micro
26
P. Kumar et al.
blogs data for the identification of province of several health related sentiment and colocated emotions. An open platform for crowd-sourced labeling is adapted for the public social media content. The accurate and real-time estimation of sentiment and co-located EQs is the key challenge, as scalability of soft data is unprecedented.
3 Proposed Strategy The traditional social media platform, e.g. Twitter, Facebook, etc. caters global user’s personal intents using posted media, this user data is pre-processed to acquir generic SL and EQs of a viral information. As, a user refers to viral information manually to explores via twitter API, without cognitive awareness of the emotional effects. Though, social media platforms offer limited functions to for the generic analysis and further, to assists a naive user to understand its sentiment impact and further EQs. 3.1 Conceptual Framework A novel strategy for the real-time generation of emotional quotients of viral/trending information on twitter is designed. Figure 1 illustrates the internal computing blocks and their interactions for the intended objectives. The proposed framework begins with a traditional data collection over twitter API. The data extraction is driven by the user inputs, e.g. keyword/hashtags, number of tweets, and duration. The retrieved tweets from the API, are now to be stored in a temporary storage for later text-processing and feature extraction.
Fig. 1. Conceptual framework of proposed strategy
The local twitter data storage is also connected to the computing clock ‘text preprocessing’, each tweet extracted must go through local text processing and further supplied to the ‘feature-extraction’. Further, a small computing thread is kept within the ‘feature extraction’ computing block for the estimation ‘sentiment score (SC)’ and ‘emotional quotient (EQ)’ co-located twitter data objects.
Extracting Emotion and Sentiment Quotient of Viral Information
27
3.2 Estimation of Sentiments and Co-located Emotional Quotients The aim of the designed system is to generate the sentiments and emotion quotient. A prospective user (e.g., naïve, decision-maker, Govt. official, etc.) submit the data request over the UI, using keywords, no. tweets of interest and name of emotion. The pre-processing stage, each extracted tweet is divided into tokens with estimated probability (T prob = Happy, Sad, etc.). The python sentiment analysis is conducted using NLTK library [12]. The probability score (WGT Prob .) is weighted a value, as to account of fewer negative tweets, additionally, tokens below a threshold count is truncated, since it is not significant and often little contributory, and estimated as H 10 entropy, formalized as Eq. 1 as (p( s|token) log10 p( s|token) ) (1) H10 (token) = − s∈sentiment
The measures for positive, neutral and negative emojis are found. Finally, as given that a tweet is composed of several words; all the different features are aggregated /summed for each word, as to obtain an overall tweet_value (Tv ), normalized by the tweet_length (T ct ). The positive score (s+ ) and negative scores(s− ) for each tweet are determined as average of the both scores using Eq. 2 and Eq. 3 respectively. The overall Sentiment Score (SC) is estimated for the locating the topic proportion. pos_scorei (2) s+ = i∈t n neg_scorei (3) s− = i∈t n Overall Sentiment Score (SC) = s+ − s− (4) To extract emotion from a tweet, the topical words (bigram) are taken from tweet content, based on ‘item response theory’ [12] and further categorized using its unsupervised features. The proposed algorithm is based on ‘Topic proportion’ that helps to identify related sentiment terms located to a topic sentiment lexicon. The proposed algorithm, extracts tweet-object as a mixture of one or more topics, with a lexicon approach and associated polarity scores. The proposed lexicon-based sentiment estimation for positive and negative sentiment using Eq. 5 and Eq. 6 respectively, and formalized as following, Mw (5) P(w|+) = + N Mw (6) P(w|−) = − N Here, M w is the set of messages containing lexical token ‘w’. Here, the estimated positive and negative sentiments of a message are coded as N + , N − respectively, for each implicit message (message) ‘m’. The log likelihood ratio inherently estimated as: n P(wi |−) (7) log Sm = i=1 P(wi |+) Here, lexical unit of the dictionary is presented by w and n is the number of words and collocations included in the dictionary, existences with the sentence m.
28
P. Kumar et al.
3.3 System Use-Cases of Sentiment and Emotional Analytics The designed system is plugged with an interactive user interfaces (UIs) for spectrum of users. The UIs steers data plays over statistics in decision-making and analytics. The first use-case is ‘basic search on social media data for a sentiment and emotion value’, illustrated in Fig. 2, for the user search and browsing. A user input ‘keyword/hashtag/emotion name’, prepares SL and EQs statistics over relevant data. Here, Part 1 illustrates both information with scores ‘%’ values and tweet list and Part 2, features to explore new dimension are given, to compare pair of virility and further deeper view of these statistics and values and related tweets.
Fig. 2. UI for 1st use-case (SL & EQs explorations)
Fig. 3. UI for polarity comparison
The second use-case is robust exploration, as shown in Fig. 3, here matching polarity of viral information easily adapted for the analytics. Further, third use-case is a capacity, to deliver a matching list of trending/viral information’s for and matching viral the system for the exploration within the tweets data for an input, ‘emotion quotient’. The viral/trending topics may be extracted with the presence of the same emotional quotient values. The fourth use-case is an inherent capability for systematic comparative view between more than one user inputs (trending and viral information) and its detailed view on the emotion quotients (EQs) and further exploration on the generated tweet text, also illustrated in Fig. 4 (Fig. 5).
Fig. 4. UI for sentiment and EQ polarity
Fig. 5. Estimated subjectivity and polarity
Extracting Emotion and Sentiment Quotient of Viral Information
29
4 Performance Assessment and Evaluation 4.1 Data Settings The experimental setup includes software used Jypter notebook, Visual Studio and PyCharm. Twitter API is used for the real-time data extraction, at instance to be extracted 1 to 800 tweets for comparing 1 to 1600 tweets for analytics. There are several libraries, e.g. tweepy, re, text2emotion, textblob, pandas, numpy, matplotlib, sys, csv and Tkinter (for user interface). The hardware components includes, 2 PCs with specification as: AMD Ryzen 5 2500U processer with Radeon Gfx 2.00 GHz, RAM 8 GB and another with processor of Intel Core i5(8250U)CPU @ 1.60 GHz, Intel UHD graphics 620, 12 GB RAM). The user interface (UI) is designed using Python library Tkinter and statistics are using Python library Matplotlib for extracted tweet objects for a user request. The realtime extraction of tweets objects using API and further cleaned and stored into Pandas Data Frame. The no. of tweets for extraction is related to a user input, The experiments are conducted using the several input values, 1 to 1600 numbers of tweets extracted on real-time basis on the prototype. 4.2 Performance on Sentiments and Emotion Quotients (EQs) Estimation The performance evaluation of the designed system for the real-time data processing to estimate the underlying statistics, specific to system’s feasibility and its viability for justin-time decision making and analysis. The Query Length (QL), No of tweet (ToI), and Performance (processing time) are employed for the purpose. Here, QL as the dimension of the keywords/hashtasg, i.e. No. of characters in the user input keywords or hashtags. For a user input ‘Tokyo2020’, QL is 9. Similarly, ToI value directs system to extract at least these many recent tweets from API on in real-time basis, e.g. ‘Tokyo2020’ with ToI value 20 fetches recent 20 tweets at the time. Figure 6 illustrates the overall processing time, as ‘total time required for the preparation of both statistics on real-time’, as for each topic relevant tweets are extracted, based on priority based preference. A generic estimation of processing time usually increased with the higher no. of tweets, as higher no. of tweets harness increased coverage of SL and EQs. Figure 7 depicts the overall processing time for the user submitted query or selected viral topics on real-time data. The different size of user input, as query length (QL) is adapted for the assessment, with an aim to highlight the fact that as QL increased to a level affects the overall computational time. In a generic settings, to different viral/trending topics appearing, as to ensure the variable QL and observed the processing time patterns for at least recent 1000 tweets on each viral or trending topics.
30
P. Kumar et al.
Fig. 6. Overall time viral or trending topics analysis
Fig. 7. Overall QRT as ‘QL’
4.3 Accuracy on Sentiment and Emotional Quotient Generation The evaluation of accuracy in the detection of accurate sentiment levels (SLs), as sentiment levels are formalized as ‘the quantum of a sentiment contributes to viral information, w.r.t user contributed data’, and each user specified inputs is pivotal for the analysis. The performance statistics for the estimation of accuracy on specified sentiment-levels are listed in Table 1. A brief comparison over 500 ToIs asserts overall accuracy 85% on the prediction of sentiment-levels (e.g. positive, negative, and neutral as 89%, 88.58%, and 77.62%). Table 1. Accuracy on estimation of each specified Sentiment -levels Predicted
Actual
Positive
Negative
Neutral
234
108
155
Positive
181
143
14
24
Negative
177
54
86
37
Neutral
139
37
8
94
Table 2 lists the overall accuracy statistics for the generation of emotion quotients (EQs), here EQ is formalized as ‘the emotion influx created by the viral information on different fundamental emotions’. The designed algorithm predicts EQs accuracy of 87%, with 89%, 85.88%, 85.66%, 89.88% and 88% for spectrum of the emotions, e.g. Happy, Sad, Fear, Surprise and Anger respectively.
Extracting Emotion and Sentiment Quotient of Viral Information
31
Table 2. Accuracy on estimation of each specified emotion Predicted
Actual
Happy
Sad
Fear
Surprise
Anger
56
24
22
12
16
Happy
33
22
5
2
2
2
Sad
40
15
6
8
6
5
Fear
18
6
2
7
0
3
Surprise
15
5
5
1
4
0
Anger
24
8
6
4
0
6
4.4 Overall Retrieval Performance The performance of a designed sentiment and EQs estimation on the traditional retrieval metrics are key indicators for the feasibility for decision making and related functions. The traditional metrics are adapted for the evaluation of the system performance, e.g. Precision, recall and f-measure. The precision is adapted in its fundamental notion, as a measure of ‘precisely matched results to the user input’, and recall as a measure ‘closely relevant result to the user query’. F-measure is a geometric mean of precision and recall. Further, Table 3 lists these indicators, when experimented with varying degree of user input (Query length and ToI). Table 3. Traditional retrieval metrics for sentiment level and emotion quotient estimation Sentiment type Positive
Negative
Neutral
Metric (scores) Precision 0.611 Recall 0.389 F-measure 0.475 Precision 0.796 Recall 0.224 F-measure 0.351 Precision 0.606 Recall 0.240 F-measure 0.344
Emotion type Happy
Sad
Fear
Surprise
Anger
Metric (scores) Precision 0.393 Recall 0.259 F-measure 0.312 Precision 0.250 Recall 0.113 F-measure 0.156 Precision 0.318 Recall 0.108 F-measure 0.161 Precision 0.333 Recall 0.048 F-measure 0.084 Precision 0.375 Recall 0.063 F-measure 0.107
5 Conclusion The viral information over social media platforms trigger significant changes emotion flux and may affect several decision making scenarios. This paper proposed a novel
32
P. Kumar et al.
algorithm for the estimation of sentimental and co-located emotion quotient of viral information in real-time. The textual emoticons mining are largely adapted for the purpose with adaptive UI for the end user. The corpus of tweets and related fields is classified with respective emotions based on lexicon approach and evaluated to statistics. The feasibility analysis of statistics is demonstrated in 04 use-cases, also outlines the key features of social media data for the purpose. The embedding interactive user interface is one of future scope of the current algorithm. An intent model for the estimating the user interest and its correlation with current trending or viral information in the social media platforms is another tentative direction.
References 1. Bikel, D.M., Sorensen, J.: If we want your opinion. In: International Conference on Semantic Computing (ICSC 2007), pp. 493–500 (2007) 2. Cambria, E., Schuller, B., Xia, Y., Havasi, C.: New avenues in opinion mining and sentiment analysis. IEEE Intell. Syst. 28(2), 15–21 (2013) 3. Chen, R., Xu, W.: The determinants of online customer ratings: a combined domain ontology and topic text analytics approach. Electron. Commer. Res. 17, 31–50 (2016). https://doi.org/ 10.1007/s10660-016-9243-6 4. Esuli, A., Sebastiani, F.: SentiWordNet: a publicly available lexical resource for opinion mining. In: Proceedings of 5th Language Resources and Evaluation, vol. 6, pp. 417–422 (2006) 5. Godbole, N., Srinivasaiah, M., Skiena, S.: Large-scale sentiment analysis for news and blogs. In: Proceedings of the International Conference on Weblogs and Social Media (ICWSM), vol. 7, no. 21, pp. 219–222 (2007) 6. Van de Kauter, M., Breesch, D., Hoste, V.: Fine-grained analysis of explicit and implicit sentiment in financial news articles. Expert Syst. Appl. 42(11), 4999–5010 (2015) 7. Loper, E., Bird, S.: NLTK: the natural language toolkit. arXiv preprint arXiv:cs/0205028 (2002) 8. Li, Y., Qin, Z., Xu, W., Guo, J.: A holistic model of mining product aspects and associated sentiments from online reviews. Multimedia Tools Appl. 74(23), 10177–10194 (2015) 9. Liu, B.: Opinion mining and sentiment analysis. In: Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, pp. 459–526 (2011). https://doi.org/10.1007/978-3-642-194603_11 10. Liu, B.: Sentiment Analysis and Opinion Mining. Synthesis Lectures on Human Language Technologies, vol. 5, no. 1, pp. 1–167 (2012) 11. Liu, P., Gulla, J.A., Zhang, L.: Dynamic topic-based sentiment analysis of large-scale online news. In: Cellary, W., Mokbel, M.F., Wang, J., Wang, H., Zhou, R., Zhang, Y. (eds.) WISE 2016. LNCS, vol. 10042, pp. 3–18. Springer, Cham (2016). https://doi.org/10.1007/978-3319-48743-4_1 12. Ma, Y., Chen, G., Wei, Q.: Finding users preferences from large-scale online reviews for personalized recommendation. Electron. Commer. Res. 17(1), 3–29 (2017) 13. Mo, S.Y.K., Liu, A., Yang, S.Y.: News sentiment to market impact and its feedback effect. Environ. Syst. Decis. 36(2), 158–166 (2016) 14. Nassirtoussi, A.K., Aghabozorgi, S., Wah, T.Y., Ngo, D.C.L.: Text mining of newsheadlines for FOREX market prediction: a multi-layer dimension reduction algorithm with semantics and sentiment. Expert Syst. Appl. 42(1), 306–324 (2015) 15. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2(1–2), 1–135 (2008). https://doi.org/10.1561/1500000011
Extracting Emotion and Sentiment Quotient of Viral Information
33
16. Parkhe, V., Biswas, B.: Sentiment analysis of movie reviews: finding most important movie aspects using driving factors. Soft. Comput. 20(9), 3373–3379 (2016) 17. Peng, J., Choo, K.K.R., Ashman, H.: Astroturfing detection in social media: using binary n-gram analysis for authorship attribution. In: Proceedings of the 15th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom 2016), pp. 121–1286 (2016) 18. Peng, J., Choo, K.K.R., Ashman, H.: Bit-level N-gram based forensic authorship analysis on social media: identifying individuals from linguistic profiles. J. Netw. Comput. Appl. 70, 171–182 (2016). https://doi.org/10.1016/j.jnca.2016.04.001 19. Pröllochs, N., Feuerriegel, S., Neumann, D.: Enhancing sentiment analysis of financial news by detecting negation scopes. In: Proceedings of the 48th Hawaii International Conference on System Sciences (HICSS), pp. 959–968 (2015) 20. Rout, J., Dalmia, A., Choo, K.K.R., Bakshi, S., Jena, S.: Revisiting semi-supervised learning for online deceptive review detection. IEEE Access 5(1), 1319–1327 (2017) 21. Rout, J., Singh, S., Jena, S., Bakshi, S.: Deceptive review detection using labeled and unlabeled data. Multimedia Tools Appl. 76(3), 3187–3211 (2017) 22. Sadegh, M., Ibrahim, R., Othman, Z.A.: Opinion mining and sentiment analysis: a survey. Int. J. Comput. Technol. 2(3), 171–178 (2012) 23. Song, L., Lau, R.Y.K., Kwok, R.C.W., Mirkovski, K., Dou, W.: Who are the spoilers in social media marketing? Incremental learning of latent semantics for social spam detection. Electron. Commer. Res. 17(1), 51–81 (2017) 24. Tang, H., Tan, S., Cheng, X.: A survey on sentiment detection of reviews. Expert Syst. Appl. 36(7), 10760–10773 (2009) 25. Turney, P.D.: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 417–424 (2002) 26. Wang, D., Li, J., Xu, K., Wu, Y.: Sentiment community detection: exploring sentiments and relationships in social networks. Electron. Commer. Res. 17(1), 103–132 (2017) 27. Zheng, L., Wang, H., Gao, S.: Sentimental feature selection for sentiment analysis of Chinese online reviews. Int. J. Mach. Learn. Cybern. 9, 75–84 (2015). https://doi.org/10.1007/s13042015-0347-4 28. Alves, A.L.F.: A spatial and temporal sentiment analysis approach applied to Twitter microtexts. J. Inf. Data Manag. 6, 118 (2015) 29. Chaabani, Y., Toujani, R., Akaichi, J.: Sentiment analysis method for tracking touristics reviews in social media network. In: Proceedings of the International Conference on Intelligent Interactive Multimedia Systems and Services, Australia, 20–22 May 2018 30. Contractor, D.: Tracking political elections on social media: applications and experience. In: Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 25–31 July 2015 31. Brynielsson, J., Johansson, F., Jonsson, C., Westling, A.: Emotion classification of social media posts for estimating people’s reactions to communicated alert messages during crises. Secur. Inform. 3(1), 1–11 (2014)
Maintaining Scalability in Blockchain Anova Ajay Pandey1 , Terrance Frederick Fernandez2 and Amit Kumar Tyagi1,4(B)
, Rohit Bansal3 ,
1 School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, India
[email protected]
2 Department of Computer Science and Engineering, Saveetha School of Engineering, Saveetha
Institute of Medical and Technical Sciences (SIMATS), Chennai, TN, India [email protected] 3 Department of Management Studies, Vaish College of Engineering, Rohtak, India 4 Centre for Advanced Data Science, Vellore Institute of Technology, Chennai, Tamilnadu, India
Abstract. The history of cryptocurrencies like Bitcoin and Litecoin and even the meme coins like Dogecoin have developed fast. Blockchain, the technology that underpins these digital currencies, has drawn considerable interest from the academic community and the business community, including on Twitter. Aside from security and privacy, the blockchain has several other advantages. The average time it takes for a transaction to be validated and stored in each peer node so that it cannot be reversed or revoked is used to measure the performance of blockchain networks. Even though this is referred to as “throughput,” it should not be confused with the number of transactions processed at any given time. To put it another way, the ability of a blockchain network to handle more transactions and more nodes is called scalability. Yet, scalability is problematic to reach a more robust platform of users and transaction load. We will focus on scalability issues and ways to maintain them efficiently. We will also go through the research challenges and future work for blockchain. Hence, researchers working on blockchain have aimed for a lower level of scalability to let the network’s throughput grow sub-linearly because the size of the network increases. The resulting schemes are mostly mentioned as scale-out blockchains. We have heard of Sharding, Lightning Network or Ethereum Plasma and Matic; they all will be considered as scale-out solutions to the matter of blockchain scalability. Keywords: Blockchain · Ethereum · Bitcoin · Scalability · Decentralisation · Sharding · DApps
1 Introduction Crypto-currencies with underlying technology as blockchain has gained their light in these recent years. Some emerging and significant industries are applying the technology into many areas, for example, IoT and smart cities. Blockchain has many plus points like decentralization, security, anonymity, and democracy. More attention garnered a lot of on-chain activity, however as we’ve seen within the past few months, fees become costly. The confirmation time of dealing will increase once Ethereum approaches the © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 34–45, 2022. https://doi.org/10.1007/978-3-030-96308-8_4
Maintaining Scalability in Blockchain
35
limit of ~15 transactions per second (tps). Prices that are too high make it more difficult for people to use Ether (ETH) as an easy payment or run Decentralized Apps (DApps), some of which can only operate at a high level if the fees are kept low. As the chain grows in size, problems with decentralization will arise. As hardware requirements for running an Ethereum full node rise, the network’s ability to spread will be hampered, and adoption will be more difficult. Few people will try to set up a node if it takes too long or requires expensive equipment. It’s worth noting that ETH fees are paid in units called gas, which is ETH’s free currency. A simple payment would be required to pay less gas than a complex sensible contract because the latter requires a lot of computation. The higher the gas value, the more likely a transaction will be confirmed more quickly. The transparency of the blockchain is because all transactions are publicly accessible. To recognize or identify the person, we keep track of transactions using anonymous public addresses and hide the nodes’ identities from the real world to locate them. Because the network’s nodes make all of the decisions, it is transparent and easy to find the sources of any errors. Automated transaction generation, data storage, and decision-making are all possible with Smart contracts. Scalability is the main reason why blockchain isn’t being used as a generic platform for a variety of applications and services. With a maximum throughput of 14 transactions per second, Ethereum outperforms Bitcoin, the first blockchain-based cryptocurrency to be discovered, which can only handle about 3–4 transactions per second on average. By distributing the execution of consensus algorithms across multiple nodes, consistency is preserved. Integrity, confidentiality, and authorization are the three pillars upon which the security of a blockchain system is built [1–3]. The trilemma of maintaining Decentralisation, Scalability, and Security (Refer Fig. 1) is still there but maintaining scalability is the main criteria of this paper.
Fig. 1. The scalability trilemma
36
A. A. Pandey et al.
Organization of Work • Section 2 discusses Literature Survey. • Section 3 discusses the methodology to be used for a more Scalable blockchain. • Section 4 discusses proposed solutions like incentivization and punishments for secure and decentralized chains. • Section 5 discusses the Results for various Scaling solutions. • Section 6 discusses the Conclusion and Future work for scalability and other techniques discussed in this paper and References.
2 Literature Survey We have three main aspects of scalability, throughput, storage, and networking. Many start-ups are coming with solutions for this issue, like Polygon Matic, Cardano, and others. In the trilemma mentioned earlier, we can only choose at most two out of the three. The same goes with these three, throughput, storage, and networking. If we focus on only improving scalability, we need to compromise with the other two. We can use different combinations of them whenever we need to in an application. We already have existing technologies or solutions for scalable blockchain systems. Current Scalability Issues One by one, we will discuss Storage, Throughput, and Networking. Throughput Throughput refers to the number of items moving through a system or process. The number of transactions per block and the time between blocks impact the throughput of blockchain systems. The throughput of blockchain, for example, is seven transactions per second. Contrast this with the current VISA system, which can process an average of 2000 transactions per second. The block interval is 10 min for a Bitcoin blockchain system, and the transaction volume is limited to one megabyte. As a result, we must devise precise schemes to boost throughput. Storage Various devices that users use generate large amounts of data when blockchain is applied to real-world business situations. A node must store complete transactions back to the genesis block in the current Blockchain system or algorithm. Using blockchain in real-world settings is challenging because nodes have limited storage and processing power. The safekeeping of such a large amount of data on a distributed ledger should be investigated and a matter of concern. Networking The scalability of blockchain systems [4–7] is affected by this factor. Each node in the current blockchain system has a limited number of resources. Due to the need for network bandwidth, this mode of transmission cannot be scaled up to handle many transactions. Block propagation delays are exacerbated when nodes are informed of a transaction update twice. As a result, finding a better way to transmit data is critical.
Maintaining Scalability in Blockchain
37
3 Methodology All three terms can be broken down into two categories: throughput, which is the number of transactions per block and the time between blocks, and storage, which refers to data storage. These technologies, which will be discussed below, are ways to increase and understand scalabilities. Increasing the Block Size It’s possible to increase the block size to boost throughput, which will also increase throughput simultaneously. To process and confirm transactions, some nodes must work harder. By Reducing the Size of Transactions Reduce transaction size by increasing the number of transactions in each block. 60–70% of all transactions are authenticated by digital signatures used to verify their authenticity. Digital signatures are segregated from the rest of the transaction data and are pushed to the end of the blocks by SegWit. As a result, the transaction size is reduced, and each block contains a more significant number of transactions. To put it another way, if we do so, the amount of data in a single block will increase. Reduce the Number of Transactions Processed by Nodes Decoupling control and management from execution via off-chain transactions is one of the three solutions. Off-chain Transactions Off-chain transactions are those that take place on a cryptocurrency network outside of the blockchain. There is a growing interest in off-chain transactions, particularly among prominent participants, due to their low or no cost. Transactions that require multiple signatures can be processed more quickly by creating off-chain micropayment channels. The Lightning Network and the duplex Micropayment Channels are examples of offchain transactions that are still processed on the blockchain; they are vastly different in many ways. The Lightning Network sends a small amount of data to the blockchain every time a micropayment channel is updated. Anatomical updates of initial funds are more likely to support Duplex Micropayment channels than other micropayment channels. Sharding Sharding is a technique for distributing an extensive data set across a flat surface; it is a notice in software engineering. Every shard represents a tiny fraction of the total transaction. Creating new chains, referred to as “shards,” will reduce network congestion and increase the number of exchanges per second (Refer Fig. 3). Other than flexibility, this is significant. Running an agreement calculation on a large number of businesses helps sharded hubs agree. The throughput of sharded blockchain frameworks increases linearly as more seats are added. Some examples of sharding blockchain frameworks include Plasma and Polkadot.
38
A. A. Pandey et al.
The Matic Organization is currently employing plasma. Using a square maker layer to generate blocks solves the problem of helpless exchange execution in Matic Organization. The framework can produce blocks at a fast rate, thanks to the square maker. Decentralization is ensured through the use of PoS designated spots shipped off the main Ethereum chain. Two hundred sixteen exchanges on an uneven chain are thus possible for Matic [7–9]. Decoupling the Executives/Control from Execution The prerequisite of value and administration and applications by decoupling the board/control and executing brilliant agreements as codes should be possible through virtualization. Dissimilar to most existing DLT frameworks that don’t recognize various administrations and applications, DLT expressly considers different administrations’ QoS prerequisites. In particular, administrations and applications are arranged into multiple classes reliable with their QoS prerequisites, including affirmation inertness, throughput, cost, security, protection, and so on. This is a change in perspective from the current blockchain-situated DLT frameworks to cutting-edge administration arranged DLT frameworks. Empowering Innovations Identified with Block Time Frame Exchange serialization implies that the chosen pioneer hubs approve exchanges and create new squares. To limit impacts in pioneer political decisions, the pioneer hubs are chosen like clockwork. In customary blockchain frameworks, every pioneer political decision can produce a substitution block. To lessen the square time framework on the throughput, moderate pioneer political race and quick exchange serialization must be decoupled. Numerous advancements have embraced the thought into three classifications as per their chief political race components. Fixed Pioneers: Hyperledger texture assigned a selected bunch of pioneer hubs that run the PBFT agreement convention to approve exchanges and settle on new squares. Small squares contain dialogues and are created by the chosen pioneer at a quick rate. Between two key impedes, the chosen pioneer can produce different miniature squares. Aggregate Pioneers: to diminish the affirmation season of the blockchain framework, change the pioneer’s political decision to be a board of trustee’s political decision. A gathering of pioneers is chosen to approve exchanges and affirm blocks to keep up with the framework’s decentralization. Byzantine agreement calculations empower quick exchange affirmation; others delegate the approval of the business. On the board, individuals’ democratic force corresponds to the quantity of their agreement bunch shares. In this manner, the board of trustees’ individuals in ByzCoin are progressively changed. When a hub discovers a PoW arrangement, the reconfiguration occasion is set off. The panel then, at that point, concludes if to add the new part. Whenever added, then the most established part is taken out from the advisory group. Advancements for Information Stockpiling Dispersed frameworks and existing information stockpiling are combined with increasing capacity. They were using Circulates Hash Tables, the power (DHT). Off-chain DHT
Maintaining Scalability in Blockchain
39
is used to store the raw data, while the blockchain is used to keep only the information references. References to the information are SHA-256 hashes. DHT and IPFS will be integrated with blockchain as part of a larger plan to address the capacity challenge. Off-chain stockpiling arrangements can hold a lot of data, but they are not as durable as on-chain stockpiling arrangements. The blockchain hubs cannot straightforwardly handle the off-chain information. Furthermore, arrangements for off-chain capacity complicate exchange checks [10–15]. While verifying transactions, blockchain hubs had the opportunity to request information from the off-chain stockpiling frameworks about the transactions that had occurred. Innovations for Information Transmission Sending all the information about the exchange occurred and diminished the prerequisite for the organization to transfer speed assets, here we examine a couple of innovations and their proposed answers for this issue. RINA: Cardano (chips away at Ethereum based framework) embraces Recursive Between Organization Engineering, another innovation to scatter exchange data. RINA gives a secure and programmable climate to proliferate information effectively. Fiber: it’s the fast square transfer network for the Bitcoin Blockchain framework. There are immediately six Fiber hubs, disseminated deliberately throughout the earth. Excavators can accompany Fiber hubs to both send and receive blocks in the model of the middle-and-talked model. They are reducing the amount of data the blockchain organization has to deal with. It may be possible to reduce the amount of information that is disseminated by only exchanging information once. As a general rule, it’s hoped to take advantage of the fact that blockchain hubs all have access to the same exchange data. Blocks like Xtreme Thin are like conservative squares. To transmit exchange hashes, the Sprout Channel employs an additional numerical strategy. Hubs can use Blossom Channels to decide whether or not missing exchanges in other hubs’ memory pools should be addressed. When a square is generated, the missing deals are developed in addition to the square header and, therefore, the hashes of discussions.
4 Proposed Solutions and Techniques to Further Develop Versatility A viable strategy for improving throughput can be to decouple pioneer political decisions and exchange serialization. All the blocks are produced rapidly by chosen pioneers utilizing Byzantine agreement conventions adds the suspicion that what 33% of the hubs are flawed. At the same time, the remainder of them executes effectively. Some blockchain frameworks select pioneers hooked into the calculation escalated PoW, which isn’t a proficient energy methodology and burns-through plenty of power as a force. Motivators and Discipline Instruments Hubs are self-intrigued; therefore, the motivator instruments are essential to propel hubs to contribute their endeavors to see the information. Some excavators check the knowledge and execute the exchange. Mining is the way toward making new bitcoin by settling a computational riddle. It’s essential to stay up with the record of trades after
40
A. A. Pandey et al.
that a cryptographic money like bitcoin is predicated. Diggers have gotten extremely refined within the foremost recent entirely while utilizing complex hardware to accelerate mining tasks. Two strategies for motivating force and discipline instruments, exchange expenses and money issuance, are two standard techniques. Considering the Bitcoin blockchain framework, when an excavator effectively produces a square, it acquires 6.25 new bitcoins. Portions of monetary standards and exchange charges among these pioneers should be planned fastidiously. To forestall twofold spending assaults and rebuff malevolent pioneers, discipline instruments need to be embraced. Affirmation time is often utilized as a strategy. The motivations could also be given within the wake of investing the affirmation energy. Within the event that any invalid or twofold spending exchange is recognized, there’ll be no motivators given or moved. This instrument is critical as a reasonable measure of the store will be a prize for the diggers and a discipline or a misfortune for the pernicious assailants. Keeping a high standard of motivating force may prompt centralization because it would be unreasonable to the pioneer to form new squares. Subsequently, appropriate planning of the impetuses is going to be beneficial and safe. Consensus and Verification Speed of consensus is also a factor in how well a blockchain system works. The difficulty of a PoW block only rises as the number of transactions increases, implying that it will take longer and require more resources to process a transaction. In addition, it’s important to note that PoW is unsustainable. These solutions are necessary because they can address the issue of scaling without increasing block sizes or introducing other measures that may interfere with the technology’s capacity for decentralization and high levels of security without increasing block sizes or introducing other measures. Layer 1 blockchain solutions help to reinforce rock-bottom protocols, such as Bitcoin’s PoW, which changed the way they work in data processing [15–20]. When it comes to a consensus algorithm, Ethereum is currently using proof of stake (PoS). Using this new mining method allows for faster transactions and better utilization of energy. Another layer one scaling solution discussed above is sharding, which breaks down authenticating and validating transactions into smaller pieces. For the first time, a more comprehensive range of nodes can participate in a peer-to-peer network (P2P). Block execution can be sped up by all of these. Blockchains can be scaled in more ways than just through Layer 1 solutions. An additional protocol built on top of blockchains such as Ethereum and Bitcoin is necessary for layer two scaling solutions to work. The core decentralization and security features of the core blockchain can be maintained with layer two scaling solutions. If you’ve heard of Ethereum 2.0, you’ve probably heard of a PoS-based system with support for fragmentation and other scalability features that are intended to replace the Ethereum network. The scalability of Ethereum will improve due to these changes, allowing it to compete with other leading blockchains. Ethereum investors who put money into the platform can earn rewards by staking their coins in return for their validation efforts.
Maintaining Scalability in Blockchain
41
5 Results for Improvising Scaling in Blockchain As a separate blockchain linked to the Ethereum main net, Plasma Chain can use proofs against fraud to arbitrate disputes such as Optimistic rollups. These chains are often called “child” chains because they are essentially smaller versions of the Ethereum main net. Chains of Merkel trees can be infinitely stacked on top of each other, allowing for a constant bandwidth flow from the parent chains (including the main net). Each child chain has its mechanism for verifying the validity of the blocks in the parent chain. Ethereum’s Plasma layer two solution employs “child” or “secondary” blockchains to help confirm the chain. In many ways, plasma chains are similar to Polka dot’s smart contracts. To release the work and improve scalability, they’re organized in a hierarchical structure. Scaling Blockchain It is possible to scale the base layer by uploading or sharing your work to a second level. For most transactions, Layer 1 serves as the primary consensus layer. Only intelligent contracts are needed for layer 2, which is built on top of layer one and does not require any changes to layer 1. The base layer can handle 15 transactions per second, but layer two scalings can increase that to 2000 to 4000 per second. Ethereum developers are currently working on Ethereum 2.0, which uses proof of stake and sharding to improve transaction throughput on the base layer. To handle more transactions in the future, we will need layer two scalings. It is clear from Fig. 2 that scaling blockchain-based systems is possible.
Fig. 2. Division of technologies related to maintaining and improving scalability
42
A. A. Pandey et al.
Fig. 3. Sharding example
Scalability cannot be sacrificed for the sake of security and decentralization. Increasing the capabilities of layer two scalings, like transaction speed and throughput, improves off-chain capabilities. It’s a great way to save money on gas. Channels are one of the most widely discussed scaling solutions, and some solutions are payment-specific. Channels allow users to perform multiple transactions and only submit two transactions to the base layer. State channels and their subtype Payment channels are the most popular, but they do not allow open participation. It is necessary for users to be known and for multi-sig contracts to be in place to secure their funds. Lightning’s network makes extensive use of payment channels. Joseph Poon and Vitalik Buterin proposed plasma, an Ethereum development framework. Smart contracts and Merkel trees in plasma have no upper limit on the number of child chains created. By offloading transactions from the leading chains, transactions can be quickly and cheaply transferred between child chains. Layer 2 of Plasma requires a longer wait time before funds can be withdrawn. Plasma or channels cannot be used to scale general-purpose intelligent contracts. It makes use of Plasma Side-chains and the Proof-of-Stake network to deliver Ethereum transactions that are scalable, quick, and secure. A modified version of the Plasma framework is employed. “Side-chains” are blockchains with their consensus models and block parameters compatible with Ethereum. Because of this, side-chain contracts and Ethereum-based contracts can be deployed to the Ethereum virtual machine simultaneously. A side chain can take many forms. XDY is just one of many. Rollups A cryptographic verification, a snark compact non-intelligent contention of information, is created and submitted to the base layer to maintain and improve scaling. Rollups handle all exchange states and execution within chains. Most of the Ethereum chain is simply a means for storing and exchanging data. Optimistic and ZK rollups are the two main types of rollups. Bright Rollups are slower, but ZK Rollups are faster and more effective. Idealistic rollups do not provide a primary path to the overall intelligent agreements to move to layer two. Virtual machines run by Idealistic Rollups are called “OVM,” or “Optimistic” virtual machines because they consider the execution of intelligent agreements frequently carried out on Ethereum. It makes overall keen agreements easier and faster to deal with their composability, which is highly relevant in decentralized money where all significant sensitive arrangements were at that battle. It is important. Good faith, which is getting closer and closer to the central net dispatch, is probably the most critical task performed on idealistic rollups.
Maintaining Scalability in Blockchain
43
When ZK is included, this is when Decentralized trades based on layer 2 have rollups and broadens. Then there’s ZK sync, which makes it possible to use crypto payments more flexibly. Ethereum 2.0 also has the potential to enhance adaptability. Scalability of the information layer is all that is required for rollups. Ethereum 2.0 stage 1, which deals with information sharding, will give them an incredible boost. Regardless of the availability of a variety of layer two setups. It appears that the Ethereum community is unified in its approach to scaling through rollups, and Ethereum 2.0 stage 1 information sharding was also confirmed. A new post by Vitalik Buterin called a rollup-driven Ethereum guide. There are many ways to scale decentralized money so that it is more accessible to the general public. In the last, researchers are suggested to refer articles [20-28] to know about emerging technologies and the role/ importance of the Blockchain in these emerging technologies.
6 Conclusion and Future Work Blockchain technology’s various benefits will attract organizations and businesses worldwide without a doubt to take a position more ahead of what it is now. It is still in its initial phase, but this, one of the most recent technologies, will take a much longer time to gain our identity, which requires patience. The rise of Ethereum and the various possibilities that we could do with blockchain. It allowed intelligent contracts, making it possible to have much more complex use cases than Bitcoin and for computer programs to be built and executed on the blockchain. However, the pros of blockchain are hard to ignore. Still, the technology will help various industries because the verification for each piece of knowledge that goes in and through these Blockchain systems will prevent the many adversities. This will continue to be a topic of discussion shortly as blockchain technology is adopted and used in a wide range of applications. The performance of Blockchain 3.0 networks has dramatically improved, but they have yet to be widely adopted. When looking for a high-performance blockchain platform that can handle thousands of transactions per second, let’s make sure that the use case we’re looking for is compatible with it first. Acknowledgment. The authors want to thank the Centre for Advanced Data Science and School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, for providing their kind support to complete this research work on time.
References 1. Tschorsch, F., Scheuermann, B.: Bitcoin and beyond: a technical survey on decentralized digital currencies. IEEE Commun. Surv. Tutor. 18(3), 2084–2123 (2016) 2. Sompolinsky, Y., Zohar, A.: Accelerating bitcoin’s transaction processing fast money grows on trees, not chains. International Association for Cryptologic Research (2013) 3. Eyal, I., Gencer, A.E., Sirer, E.G., van Renesse, R.: Bitcoin-NG: a scalable blockchain protocol. In: USENIX the Advanced Computing Systems Association (2016) 4. Creighton, R.: Domus Tower Blockchain. Domus Tower Inc. (DRAFT) (2016)
44
A. A. Pandey et al.
5. Bellare, M., Rogaway, P.: Random oracles are practical: a paradigm for designing efficient protocols. In: Proceedings of the 1st ACM Conference on Computer and Communications Security (1993) 6. Bitcoin community. Bitcoin source, March 2015. https://github.com/bitcoin/bitcoin 7. Eyal, I., Birman, K., van Renesse, R.: Cache serializability: reducing inconsistency in edge transactions. In: 35th IEEE International Conference on Distributed Computing Systems, ICDCS (2015) 8. Bitcoin community, Protocol rules, September 2013. https://en.bitcoin.it/wiki/Protocol_rules 9. Vukoli´c, M.: The quest for scalable blockchain fabric: proof-of-work vs. BFT replication. In: Camenisch, J., Kesdo˘gan, D. (eds.) iNetSec 2015. LNCS, vol. 9591, pp. 112–125. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-39028-4_9 10. Nakamoto, S.: Bitcoin: a peer-to-peer electronic cash system. Consulted 1(2012), 28 (2008) 11. Blake, I.F., Seroussi, G., Smart, N.P.: Advances in Elliptic Curve Cryptography, vol. 317. Cambridge University Press, Cambridge (2005) 12. Miller, A., Jansen, R.: Shadow, bitcoin: scalable simulation via direct execution of multithreaded applications. IACR Cryptology ePrint Archive (2015) 13. Miller, A., LaViola, J.J., Jr.: Anonymous byzantine consensus from moderately-hard puzzles: a model for Bitcoin (2009). https://socrates1024.s3.amazonaws.com/consensus.pdf 14. Sompolinsky, Y., Zohar, A.: Secure high-rate transaction processing in Bitcoin. In: Böhme, R., Okamoto, T. (eds.) FC 2015. LNCS, vol. 8975, pp. 507–527. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-47854-7_32 15. Tschorsch, F., Scheuermann, B.: Bitcoin and beyond: a technical survey on decentralized digital currencies. IEEE Commun. Surv. Tutor. 18(3), 2084–2123 (2016) 16. Wood, G.: Ethereum: a secure decentralized transaction ledger. http://gavwood.com/paper. pdf 17. Kreuter, B., Mood, B., Shelat, A., Butler, K.: PCF: a portable circuit format for scalable two-party secure computation. In: Security (2013) 18. Buterin, V.: Ethereum: a next-generation smart contract and decentralized application platform (2013). http://ethereum.org/ethereum.html 19. Dwork, C., Naor, M.: Pricing via processing or combatting junk mail. In: Brickell, E.F. (ed.) CRYPTO 1992. LNCS, vol. 740, pp. 139–147. Springer, Heidelberg (1993). https://doi.org/ 10.1007/3-540-48071-4_10 20. Kumari, S., Tyagi, A.K., Aswathy S.U.: The future of edge computing with blockchain technology: possibility of threats, opportunities and challenges. In: Recent Trends in Blockchain for Information Systems Security and Privacy, CRC Press (2021) 21. Varsha, R., Nair, S.M., Tyagi, A.K., Aswathy, S.U., RadhaKrishnan, R.: The future with advanced analytics: a sequential analysis of the disruptive technology’s scope. In: Abraham, A., Hanne, T., Castillo, O., Gandhi, N., Nogueira Rios, T., Hong, T.P. (eds.) Hybrid Intelligent Systems. HIS 2020. Advances in Intelligent Systems and Computing, vol. 1375. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-73050-5_56 22. Tyagi, A.K., Nair, M.M., Niladhuri, S., Abraham, A.: Security, privacy research issues in various computing platforms: a survey and the road ahead. J. Inf. Assur. Secur. 15(1), 1–16 (2020) 23. Madhav, A.V.S., Tyagi, A.K.: The world with future technologies (Post-COVID-19): open issues, challenges, and the road ahead. In: Tyagi, A.K., Abraham, A., Kaklauskas, A. (eds.) Intelligent Interactive Multimedia Systems for e-Healthcare Applications. Springer, Singapore. https://doi.org/10.1007/978-981-16-6542-4_22 24. Tyagi, A.K.: Analysis of security and privacy aspects of blockchain technologies from smart era’ perspective: the challenges and a way forward. In Recent Trends in Blockchain for Information Systems Security and Privacy, CRC Press (2021)
Maintaining Scalability in Blockchain
45
25. Tyagi, A.K., Rekha, G., Kumari, S.: Applications of blockchain technologies in digital forensic and threat hunting. In: Recent Trends in Blockchain for Information Systems Security and Privacy, CRC Press (2021) 26. Tibrewal, I., Srivastava, M., Tyagi, A.K.: Blockchain technology for securing cyberinfrastructure and internet of things networks. In: Tyagi, A.K., Abraham, A., Kaklauskas, A. (eds.) Intelligent Interactive Multimedia Systems for e-Healthcare Applications. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-6542-4_1 27. Tyagi, A.K., Aswathy, S.U., Aghila, G., Sreenath, N.: AARIN: affordable, accurate, reliable and innovative mechanism to protect a medical cyber-physical system using blockchain technology. IJIN 2, 175–183 (2021) 28. Mishra, S., Tyagi, A.K.: Intrusion detection in internet of things (IoTs) based applications using blockchain technolgy. In: 2019 Third International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 123–128 (2019). https://doi.org/10. 1109/I-SMAC47947.2019.9032557
Thoracic Disease Chest Radiographic Image Dataset: A Comprehensive Review Priyanka Malhotra1 , Sheifali Gupta1 , Atef Zaguia2(B) , and Deepika Koundal3 1 Chitkara University Institute of Engineering and Technology, Chitkara University,
Punjab, India 2 Computer Sciences Department, College of CIT, Taif University, P.O. Box 11099,
Taif 21944, Saudi Arabia [email protected] 3 Department of Virtualization, School of Computer Science, University of Petroleum and Energy Studies, Dehradun, India
Abstract. Computer aided diagnosis (CAD) system uses an algorithm to analyze a medical image and interpret the abnormality from the image. CAD provides assistance to radiologists/doctor to assess and categorize the pathology in images. A broad range of algorithms, such as image processing and Artificial Neural Networks (ANNs) based Machine Learning (ML) and Deep Learning (DL), have been employed in this field. The systems based on ML and DL require huge amount of data for training the model. The researchers work to collect and develop medical image databases for training ML and DL based models. The datasets are released publicly to foster research and collaboration in the field of medical image processing. This paper discusses the various chest radiographic imaging datasets and challenges involved in reading it. A detailed analysis of different publicly available chest radiographic datasets for thoracic pathologies is provided. The paper also discusses the various pitfalls and challenges involved in the datasets. It is hoped that using the datasets for training deep learning algorithm will lead to advancement in the field of CAD based thoracic disease diagnosis, hence aids to improve health care system. Keywords: Chest radiograph dataset · Thoracic diseases · Deep learning
1 Introduction The thorax area in human beings can lead to many serious health problems in human life. The thorax is an area of human body containing organs like lungs, heart, esophagus, chest wall, diaphragm, trachea, pleural cavity etc. Figure 1 shows internal parts of thorax or chest. A disorder causing the organs of the thorax to be affected is called thoracic disease. It includes the conditions of the lungs, airways, heart, esophagus, chest wall, diaphragm and other thorax organs. The different thoracic diseases includes: asthma, chronic obstructive pulmonary disease (COPD), chronic bronchitis, emphysema, cystic
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 46–56, 2022. https://doi.org/10.1007/978-3-030-96308-8_5
Thoracic Disease Chest Radiographic Image Dataset
47
fibrosis, Pneumonia, COVID19, tuberculosis, pulmonary edema, lung cancer, acute respiratory distress syndrome (ARDS), Interstitial lung disease (ILD), pulmonary edema, Pneumothorax, Mesothelioma, Pulmonary Embolism (PE), Pulmonary hypertension.
Fig. 1. Thorax organs [1]
The well-timed diagnosis of thoracic diseases is really crucial for medication and prognosis. There are different modalities available for imaging primary chest/thoracic diseases which includes chest radiographic imaging (CXR), computed tomography (CT), perfusion lung scanning and positron emission tomography (PET), magnetic resonance imaging (MRI). Chest radiographic imaging is at present the widely accepted method for screening and diagnosis of chest diseases. The key contributions of this work are: • The paper provides a detailed analysis of different publicly released chest radiographic image datasets for thoracic pathologies. • The paper also discusses the various pitfalls and challenges involved in the datasets This paper is organized as follows: Sect. 2 explains chest radiograph imaging types, advantages and challenges involved in reading CXR images. Section 3 explores various available chest radiograph image datasets. Section 4 discusses pitfalls and challenges involved in various datasets. Section 5 concludes the paper.
2 Chest Radiograph Imaging Chest radiograph image is a photographic image depicting internal composition of chest which is produced by passing X-rays through the chest and the x-rays are being absorbed by different amounts by different components in the chest. The different types of CXR’s include Posterior Anterior (PA), Anterior posterior (AP), lateral view: PA is frontal chest projection in which the x-ray travels from the posterior to the anterior part of the chest and striking the x-ray film; In AP view the patient stand with their back against the film and x-ray traverse from anterior to posterior; lateral image is taken by placing one shoulder on the plate and raising hands over the head. The Fig. 2 shows few thoracic abnormalities on CXR images. The chest radiographic imaging is preferred and advantageous as [2]: it is a painless procedure; non-invasive test as no break is created in the skin; there is no radiations left
48
P. Malhotra et al.
Fig. 2. Images from NIH chestXray8 with different abnormalities [10]
back in the body after the process; and this imaging technique produces the images of internal body parts like lungs, airways, heart, blood vessels, ribs and spine. CXR images have different objects like bones, fluids, tissues, air etc. In CXR images, air is represented by black color, bones or solid structures by white color, and tissues or fluids by gray color. Reading of these CXR’s is difficult for radiologist. The chest radiographic images contain a huge amount of information regarding the patient’s health. The correct interpretation of the information from chest radiographic images is a really to challenging task due to [3]: the presence of the large number of variable objects; patterns of different types of thoracic diseases has highly diverse appearances, sizes and locations on chest X-ray images; areas in the chest X-ray containing leisions may be small as compared to the complete chest X-ray; varying posture of patient while capturing X-ray image can create distortion or misalignment. The reading of chest radiographs is to be done by expert radiologists and there is a shortage of expert radiologists [4]. A computer aided diagnostic or CAD system provides assistance to radiologists/doctor to assess and categorize the chest pathology in CXR images. A CAD system is trained to work on radiographic images using a repository of data. In medical domain, digital image data set repository should have large data set with reliable annotations for designing CAD systems with high accuracy.
3 Chest Radiograph Image Databases A digital image database is important for research in digital imaging, image processing techniques, computer aided diagnosis of diseases. The accumulation of appropriate volume of data for construction of a data set is crucial for any image classification system based on machine learning or deep learning. The basic requirements of any clinical digital image database include: presence of large number of images, images with good quality for diagnosis, a ground truth established by experts, images with different details of the abnormality involved. The establishment of an appropriate dataset requires time, expertise in the specific domain to select the required information and sufficient infrastructure for capturing data and storage [5]. The collection of data is laborious and challenging. A number of researchers had worked to develop digital image databases on chest radiographic images. These datasets are utilized to train deep learning/machine learning based algorithms. The various publicly available chest X-ray datasets used for designing deep learning algorithm is discussed next.
Thoracic Disease Chest Radiographic Image Dataset
49
JSRT Dataset - Japanese Society of Radiological Technology (JSRT) [6, 7] released a dataset containing chest radiographic images from 14 different medical centers located in Japan and US. The database contained 247 PA chest images, with 154 images containing nodule. Out of 154 images with nodule, 100 nodules were malignant and others benign. Table 1 presents specifications for CXR images from JSRT dataset. The database included a broad range of lung nodules with different level of subtlety. Table 1. JSRT dataset specifications No. of images
Type of images
Images with abnormality
Normal images
No. of patients scanned
Image format
Image size
No. of labels
247
PA
154
93
247
PNG
2048 × 2048
1 (Lung nodule)
Montgomery County Dataset - The dataset acquired x-ray images from Department of Health Montgomery County, USA. The dataset [8] contained 138 PA images, out of which 58 images indicated tuberculosis and rest were normal. The lungs area in the images was manually segmented (See Table 2). Table 2. Montgomery county dataset specifications No. of images
Type of images
Images with abnormality
Normal images
No. of patients scanned
Image format
Image size
No. of labels
138
Frontal, PA
58
80
138
PNG
4020 × 4892
1 (Tuberculosis)
Shenzhen Dataset - The chest X-ray images for the dataset have been collected from Shenzhen Hospital, China [8]. The dataset contained 662 frontal CXR images, of which 336 cases have been of TB and 326 normal. The images exist in JPEG format. The image size is approximately 3K × 3K pixels. The dataset contained a text file with information regarding age, gender and lung abnormality of the patient (Table 3). Table 3. Shenzhen dataset specifications No. of images
Type of images
Images with abnormality
Normal images
No. of patients scanned
Image format
Image size
No. of labels
662
PA
326
336
662
JPEG
3K × 3K
1 (Tuberculosis)
Open-I Indiana - The dataset contained frontal and lateral chest X-ray images collected from Indiana University [9]. The dataset has 8,121 images with annotations
50
P. Malhotra et al.
for 10 important diseases like cardiomegaly, pulmonary edema, calcification, pleural effusion etc. The dataset contained both frontal and lateral projections (Table 4). Table 4. Open-I Indiana dataset specifications No. of images
Type of images
Images with abnormality
Normal images
No. of patients scanned
Image format
Image size
No. of labels
8,121
PA, lateral
4034
3087
4000
DICOM
variable
10
NIH ChestXray8 - NIH clinical center [10] released the chest X-ray dataset containing 108,948 images with eight different thoracic pathologies and the keywords were identified using NLP technique (Table 5). Table 5. NIH ChestXray8 dataset specifications No. of images
Type of images
Images with abnormality
Normal images
No. of patients scanned
Image format
Image size
No. of labels
108948
PA, AP
24636
84312
32717
PNG
1024 × 1024
8
NIH ChestXray14 - NIH ChestXray14 dataset [11] is an extension of ChestXray8 dataset and contained 112,120 chest X-ray images. It includes 14 pathologies as: Atelectasis, Consolidation, Infiltration, Pneumothorax, Edema, Emphysema, Fibrosis, Effusion, Pneumonia, Pleural thickening, Cardiomegaly, Nodule, Mass, Hernia. The pathology labels were created using NLP and disease class was text mined from the reports. The disease localization information is also present in the dataset (Table 6). Table 6. NIH ChestXray14 dataset specifications No. of images
Type of images
Images with abnormality
Normal images
No. of patients scanned
Image format
Image size
No. of labels
112120
PA, AP
51708
60412
30805
PNG
1024 × 1024
14
Chexpert - The dataset [12] contains chest X-ray images of around 65,240 patients of Stanford Hospital depicting 14 different pathologies. The pathologies were detected automatically as well as annotated by expert radiologists with 500 images annotated by
Thoracic Disease Chest Radiographic Image Dataset
51
8 board certified radiologists (see Table 7). The data is available in DICOM format to store the image and associated metadata of the patient such as: sex, age, patient id etc. Table 7. CheXpert dataset specifications No. of images
Type of images
Images with abnormality
Normal images
No. of patients scanned
Image format
Image size
No. of labels
224316
PA, AP
171014
16627
65240
DICOM
1024 × 1024
14
Padchest - The dataset [14] includes 160,868 images obtained from San Juan Hospital hospital, Spain and were interpreted by the radiologists. The CXR images contain six different positional views and covers complete spectrum of thoracic pathologies. Out of 160,868 images, 39039 images manually labeled and 121,829 images automatically labeled (Table 8). Table 8. Padchest dataset specifications No. of images
Type of images
Images with abnormality
Normal images
No. of patients scanned
Image format
Image size
No. of labels
160868
PA, AP, lateral
79836
81032
69882
DICOM
variable
27
MIMIC-CXR - Medical Information Mart for Intensive care dataset [15] contained 473,084 CXR images with 13 different pathologies corresponding to radiographic studies performed at a Medical Center in Boston (see Table 9). Table 9. MIMIC-CXR dataset specifications No. of images
Type of images
Images with abnormality
Normal Images
No. of patients scanned
Image format
Image size
No. of labels
473,064
PA, AP, lateral
272,927
200,137
63,478
DICOM
variable
13
SIIM-ACR - The SIIM-ACR chest X-ray dataset [16] released by Kaggle contained CXR images with and without pneumothorax. The dataset contained around 22% images [17] with pneumothorax (see Table 10). The pneumothorax region in the image were segmented using NLP algorithm.
52
P. Malhotra et al. Table 10. SIIM-ACR dataset specifications
No. of images
Type of images
Images with abnormality
Normal images
No. of patients scanned
Image format
Image size
No. of labels
12089
PA, AP
2660
9429
–
DICOM
1024 × 1024
1 (pneumothorax)
ChestX-det10 - ChestX-det10 dataset [18] contain instance-level annotations on CXR images. The dataset is a subset of ChestXray-14dataset with box annotations by the board-certified radiologists into 10 different categories of disease. The dataset specifications are mentioned in Table 11. Table 11. ChestX-det10 dataset specifications No. of images
Type of images
Images with Normal pathology images
No. of patients scanned
Image format
Image size
No. of labels
3,543
PA, AP, lateral
2779
69,882
PNG
1024 × 1024
10
764
VinDr-CXR - VinDr-CXR dataset [19] contains 18,000 PA view CXR images with both the localization and the classification information of thoracic diseases. The images were divided in training set of 15,000 scans and the test set of 3,000 scans. Each of the images in the training set was labeled by 3 radiologists, while annotation of each of test image set was obtained from the team of 5 radiologists (Table 12). Table 12. VinDr-CXR dataset specifications No. of images
Type of images
Images with Normal pathology images
No. of patients scanned
Image format
Image size
No. of labels
18,000
PA, AP
7394
–
PNG
2788 × 2788, 2748 × 2494
27
10606
4 Discussion and Challenges The different chest X-ray dataset discussed in Sect. 3, had been employed for designing AI models for computer aided detection of various thoracic diseases. ChestXray8 dataset
Thoracic Disease Chest Radiographic Image Dataset
53
had been employed in [20] for training V3 inception CNN model for classification of pneumonia in pediatrics patient. The authors in [10] employed AlexNet, GoogleNet, VGG, ResNet50, DenseNet121 model for disease classification on Chest Xray8 dataset. ResNet-50 architecture was implemented by [21] and created class activation maps for disease conditions in chestXray14 dataset. The preparation of medical image dataset for research and technology requires lots of efforts. Despite of the efforts involved in the preparation of datasets, there are certain challenges involved in the analysis of medical images available in the dataset: Data Bias: In a medical image dataset, while considering the given population of images, the diseased cases are far less than the healthy cases. So, the dataset has problem of strong data bias. ChestXray14 dataset [11] has 46.1% of total diseased cases while there were 284 Hernia case out of 112,120 total images. SIIM-ACR dataset has very strong bias with only 22% case with pneumothorax out of total 12089 images. Erroneous Labels: The different disease labels in the dataset are usually text mined using Natural language processing (NLP) technique. The extracted labels may contain some errors owing to the performance of the algorithm employed. There may be bias in the information, as the only source for ground truth information is the available reports and reports may sometime have incomplete description. Table 13 provides information regarding the techniques employed for extracting labels from the dataset discussed. In some cases, in order to reduce the information bias, the NLP extracted labels for the images are also verified by the certified radiologists. Table 13. Techniques employed for extracting labels and suitability of dataset for training deep learning model Dataset
Year of release
No. of pathology
No. of images
Pathology labeled by radiologist
Pathology labeled by NLP algorithm
Suitability for training DL model
JSRT
2000
1
247
✓
Montgomery
2014
1
138
✓
Shenzhen
2014
1
662
✓
Indiana
2016
10
8121
✓
ChestXray8
2017
8
108948
✓
✓
✓
ChestXray14
2017
14
CheXpert
2019
14
112120
✓
✓
224316
✓
✓
MIMIC-CXR
2019
14
377110
✓
✓
Padchest SIIM-ACR
2019
193
160868
✓
✓
2019
2
12089
✓
✓
ChestX-Det10
2020
10
3543
✓
✓
VinDr-CXR
2020
28
18000
✓
✓
✓
54
P. Malhotra et al.
Quality and Size of Images: In order for a chest disease to be identified by any Artificial intelligence algorithm, both the quality of the images and its size is important. In chest X-ray images, each organ or part of the chest is represented by the different colors. A radiologist for deep learning algorithm should be able to recognize the changes well. There are cases in dataset where few diseases cannot be identified easily. As in ChestXray14 dataset, it is really difficult to recognize any retrocardiac pathology as the heart area appear purely white in these images [22]. In Convolutional neural network, the dimension of input layer determines the size of input images. The default image size at the input of Inception-v3 [23] is 299 × 299 while it is 244 × 244 × 3 for ResNet50. Similarly, different CNN has different input image size requirements. The images from the dataset are required to be pre-processed before being fed to these models. Localization and Severity of Disease: The data set labels does not specify the severity of medical entities involved in the images. All the datasets doesn’t provide the information regarding localization of diseases. ChestXray14, CheXpert, ChestX-Det10 and VinDr-CXR dataset provide the localization information of the disease. It helps in prognosis and deciding the dose of medicine [24]. Size of Dataset and Suitability for Training Deep Learning Model: The performance of a deep learning algorithm employed for computerized recognition of diseases depends on the availability of the number of images. As explained in Table 13, JSRT, Montgomery, Shenzhen, Indiana dataset are small in size and are not suitable for training deep learning model. Presence of Multiple Diseases: In a chest X-ray, the appearance of one disease may be accompanied by some other disease(s) like in several cases pulmonary tuberculosis may be accompanied by pneumonia or any other disease. The disease detection algorithm may face difficulty in accurately predicting the multiple diseases and underlying effect of one disease on other disease.
5 Conclusion One of the most important threats to human lives is chest disease. The chest diseases can be identified by chest X-ray imaging. Chest Xray reading is a complex reasoning problem which requires careful observation as well as the knowledge of anatomical principles and pathology. The chest X-ray datasets discussed in the paper are publicly available to promote the research in the field of medical image analysis and computer aided detection of disease. The in-depth review of all the parameters and the specifications of the medical CXR datasets is out of scope of this paper. However, in this paper an emphasis is laid to briefly discuss the important chest radiography datasets. It will help the future researchers in finding the dataset for the deep learning model designing for thoracic/chest diseases.
Thoracic Disease Chest Radiographic Image Dataset
55
References 1. https://clinicalgate.com/the-lungs-and-chest-wall/ 2. https://www.nhlbi.nih.gov/health-topics/chest-x-ray 3. Doi, K., MacMahon, H., Katsuragawa, S., Nishikawa, R.M., Jiang, Y.: Computer-aided diagnosis in radiology: potential and pitfalls. Eur. J. Radiol. 31(2), 97–109 (1999) 4. Ng, K.L., Yazer, J., Abdolell, M., Brown, P.: National survey to identify subspecialties at risk for physician shortages in Canadian academic radiology departments. Can. Assoc. Radiol. J. 61(5), 252–257 (2010) 5. Malhotra, P., Gupta, S., Koundal, D.: Computer aided diagnosis of pneumonia from chest radiographs. J. Comput. Theor. Nanosci. 16(10), 4202–4213 (2019) 6. Shiraishi, J., et al.: Database for chest radiographs with and without a lung nodule: receiver operating characteristic analysis of radiologists detection of pulmonary nodules. Am. J. Roentgenol. 174, 71–80 (2000) 7. https://www.kaggle.com/raddar/nodules-in-chest-xrays-jsrt 8. Jaeger, S., Candemir, S., Antani, S., Wáng, Y.X.J., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant. Imaging Med. Surg. 4(6), 475 (2014) 9. Demner-Fushman, D., et al.: Preparing a collection of radiology examinations for distribution and retrieval. J. Am. Med. Inform. Assoc. 23(2), 304–310 (2016) 10. Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., Summers, R.M.: ChestX-ray8: hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2097–2106 (2017) 11. https://www.kaggle.com/nih-chest-xrays/data 12. Irvin, J., et al.: CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 590–597, July 2019 13. http://stanfordmlgroup.github.io/competitions/chexpert/ 14. Bustos, A., Pertusa, A., Salinas, J.M., de la Iglesia-Vayá, M.: PadChest: a large chest X-ray image dataset with multi-label annotated reports. Med. Image Anal. 66, 101797 (2020) 15. Rubin, J., Sanghavi, D., Zhao, C., Lee, K., Qadir, A., Xu-Wilson, M.: Large scale automated reading of frontal and lateral chest X-rays using dual convolutional neural networks. arXiv preprint arXiv:1804.07839 (2018) 16. https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation 17. Abedalla, A., Abdullah, M., Al-Ayyoub, M., Benkhelifa, E.: The 2ST-UNet for pneumothorax segmentation in chest X-Rays using ResNet34 as a backbone for U-Net. arXiv preprint arXiv: 2009.02805 (2020) 18. Liu, J., Lian, J., Yu, Y.: ChestX-det10: chest X-ray dataset on detection of thoracic abnormalities. arXiv preprint arXiv:2006.10550 (2020) 19. Nguyen, H.Q., et al.: VinDr-CXR: an open dataset of chest X-rays with radiologist’s annotations. arXiv preprint arXiv:2012.15029 (2020) 20. Kermany, D.S., et al.: Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 172(5), 1122–1131 (2018) 21. Baltruschat, I.M., Nickisch, H., Grass, M., Knopp, T., Saalbach, A.: Comparison of deep learning approaches for multi-label chest X-ray classification. Sci. Rep. 9(1), 1–10 (2019) 22. Oakden-Rayner, L.: Exploring large-scale public medical image datasets. Acad. Radiol. 27(1), 106–112 (2020)
56
P. Malhotra et al.
23. Demir, A., Yilmaz, F., Kose, O.: Early detection of skin cancer using deep learning architectures: Resnet-101 and inception-v3. In: 2019 Medical Technologies Congress (TIPTEKNO), pp. 1–4. IEEE, October 2019 24. Qin, C., Yao, D., Shi, Y., Song, Z.: Computer-aided detection in chest radiography based on artificial intelligence: a survey. Biomed. Eng. Online 17(1), 1–23 (2018)
Batch Normalization and Dropout Regularization in Training Deep Neural Networks with Label Noise Andrzej Rusiecki(B) Department of Computer Engineering, Wroclaw University of Science and Technology, Wybrze˙ze Wyspia´ nskiego 27, Wroclaw, Poland [email protected] Abstract. Availability of large annotated datasets in computer vision, speech understanding, or natural language processing is one of the main reasons for deep neural networks popularity. Unfortunately, such data can suffer from label noise, introduced by incorrectly labelled patterns. Since neural networks, as data-driven approaches, are strongly dependent on the quality of training data, the results of building deep neural structures with noisy examples can be unreliable. In this paper, we present preliminary experimental results on how two regularization techniques, namely dropout and batch normalization, influence vulnerability to incorrect labels. On popular MNIST and CIFAR-10 datasets we demonstrate that combination of these two approaches can be considered as a tool to improve network robustness to mislabelled training examples. Keywords: Neural networks Dropout · Label noise
1
· Deep learning · Batch normalization ·
Introduction
Nowadays, deep neural networks are often considered as first-choice tools for many real-life problems. This is mainly because deep neural architectures, trained on large datasets, may present impressive performance in tasks such as computer vision, speech recognition, or natural language processing [2,29]. If only sufficient annotated data collections are available, such structures can be trained to represent high-level abstractions [3,26]. The threat that potentially may arise in obtaining such data-driven models is quality of their training sets. If available patterns are incorrectly labelled, the resulting performance may be poor and imputed classification rules not proper. In fact, even shallow models can be at most as reliable as their training sets [14,25]. Unfortunately, deep networks, having millions of parameters (hence, degrees of freedom) are potentially more vulnerable to overfit erroneous data. Large annotated datasets can be prepared by data mining algorithms, search engines, or, most often, by labelling data manually [12]. If we consider different human annotators and problems with no precise distinction between classes, it is clearly evident that large datasets suffer from label noise. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 57–66, 2022. https://doi.org/10.1007/978-3-030-96308-8_6
58
A. Rusiecki
Our previous work [24] demonstrated that popular dropout regularization may help to make deep network training less prone to label noise. In this paper, we present preliminary experimental results on how another regularization technique, namely batch normalization applied in combination with dropout, may influence robustness to incorrect labels. Our findings suggest that properly parametrized regularization can be considered as tool potentially improving classification accuracy in the presence of noisy labels.
2
Learning from Noisy Labels, Standard Dropout and Batch Normalization
In this paper, we consider a problem of training networks on data with uncertain and potentially erroneous labels. This is simple and probably the most common type of training data perturbation for classification tasks. It is worth to notice that for regression (or similar, continuous output tasks) more often performed by shallow networks, one may consider also training in the presence of outliers (observations distant from the majority of data) [9]. 2.1
Learning with Noisy Labels
As learning with label noise is an important issue for training deep neural models, several methods to deal with this problem have been previously proposed. The may be divided into two basic currents. The first one is focused on efforts directed to clean training data by removing or correcting noisy patterns. In this case, the models are image-conditional [31,32], or the label noise is considered as conditionally independent from the input [21,28]. In the second group of approaches, training process itself is designed to be less prone to noisy data, and the methods aim to learn directly from noisy labels [8,13,20,23,30], or modified and corrected loss functions are applied [7,22,25]. 2.2
Dropout
To overcome the problem of overfitting in training deep neural networks, several regularization techniques have been proposed. Name dropout refers to a simple approach, introduced in [10], where neurons or connections are randomly omitted in order to improve network generalization ability and avoid overfitting. The original method of so-called standard dropout was described in more details and applied in [16,27], and a good review of dropout-inspired approaches can be found in [5]. If dropout is applied to a network layer during training phase, its output can be written as: (1) y = f (Wx) ◦ m, mi ∼ Bernoulli (pd ) , where x is layer input, f (·) is the activation function, and W is the matrix of layer weights. The elements of the layer dropout mask m follow the Bernoulli
Batch Normalization and Dropout in Training with Label Noise
59
distribution and are equal 1 with given probability pd , describing the dropout rate. The layer dropout mask deactivates some neurons in a given step. Once the network is trained, all the neurons are taken into account, so to compensate for the larger network size, the layer output is scaled in the simulation phase as: y = pd f (Wx) .
(2)
One may consider dropout as a technique of averaging over many different neural network models that partially share parameters [27]. 2.3
Batch Normalization
One of the most popular techniques applied recently to deep learning schemes is undoubtedly so-called batch normalization, introduced in [11]. Proposed to eliminate network internal covariate shift and speed up the training, it can also act as a regularizer [6,19]. For this approach normalization is performed in each network layer for each mini-batch, and can be considered as a building block of the model architecture. When using batch normalization one can expect the network training process to be less sensitive to careful tuning of its hyperparameters. Moreover, relatively high learning rates can be successfully applied [6,11], what can significantly reduce training time. As batch normalization regularization effect reduces overfitting, a batch-normalized network could present good performance when the dropout is reduced or even removed [11]. This is why we have decided to examine its ability to deal with incorrectly labelled training data. Applying batch normalization to layer inputs starts with normalizing based on mini-batch B mean and variance: xi − μB x ˆi ← 2 , σB
(3)
2 its variance. Additionally, scaling and where μB is mini-batch mean, and σB shifting is performed as:
ˆi + β, yi ← γ x
(4)
where γ and β are learnable parameters that restore the representation power of the network [11], potentially distorted by previous step. Dropout and batch normalization can be indeed considered as strong tools to prevent neural networks from overfitting. However, it is also known that combined together these two approaches may not come up with acceptable performance, or even worsen final training results [18]. For training data sets with contaminated labels, it is not necessarily true, as revealed our preliminary experiments, described in the following sections.
60
3
A. Rusiecki
Experimental Results
To test the influence of dropout and batch normalization on training deep networks with incorrectly labelled data, two datasets were chosen, namely MNIST (Modified National Institute of Standards and Technology) [17] and CIFAR-10 (Canadian Institute For Advanced Research) [15]. These very well-known and relatively small image classification datasets were previously widely used, also to test robustness against label noise [7,24]. Thanks to their moderate size, these sets are well-suited to testing strategy with averaging results, because one can use good-performing deep architectures with reasonable number of parameters. 3.1
Label Noise
The simplest basic uniform noise model was considered in our experiments. We introduced the noise into training sets by flipping labels with preset probability μ. The procedure was as follows: for each training pattern, belonging to one of C classes, its label was flipped and uniformly sampled within the set of all available incorrect labels with certain probability μ (so the label could be correct with probability 1 − μ). Hence, the noisy training data available to the learner were {(xi , yˆi ) , i = 1, . . . , N }, where: yi with probability 1 − μ yˆi = (5) µ k, k ∈ [C], k = yi , with probability C−1 In Eq. 5, k is incorrect label uniformly drawn from the set k ∈ [C], k = yi , while yi denotes true label. In our experiments, the noise level was varied in the range from μ = 0 up to μ = 0.6 (i.e., we could expect up to approximately 60% of training patterns to have incorrect labels). 3.2
Network Architectures
The experiments were conducted on deep convolutional neural networks (CNN) with architectures described in [7] and used previously to test training with noisy labels [7,24]. The details of CNN structures and basic description of datasets used in this study were gathered in Table 1. Note, that dropout and batch normalization layers were optional and applied only for certain tests. Other training parameters are described in the following paragraphs. Dropout Rate and Batch Normalization. We tested several combination of network architectures. The baseline in our study was a network without dropout or batch norm layers. The main tests were performed for combination of batch normalization and several dropout rates, that was varied in the range pd = 0.5 up to pd = 0.85. The probability pd was set equal for each network layer. Standard dropout version defined by Eqs. 1 and 2, and batch normalization described by Eqs. 3 and 4, inserted into CNN architecture as in Table 1, were examined.
Batch Normalization and Dropout in Training with Label Noise
61
Table 1. Network architectures and dataset characteristic Dataset description
Network architecture
MNIST dataset: Input 28 × 28, 10 classes, 60k/10k training/test
Convolutional layer → batch normalization → max pooling → dropout → fully connected 1024 neurons → batch normalization → dropout → fully connected 1024 neurons → batch normalization → dropout → softmax
CIFAR-10 dataset: Input 32 × 32 × 3, Convolutional layer → batch normalization → 10 classes, 50k/10k training/test convolutional layer → batch normalization → max pooling → dropout → convolutional layer → batch normalization → convolutional layer → batch normalization → max pooling → dropout → fully connected 512 neurons → dropout → softmax
Testing Environment and Algorithm Parameters. All the simulations were run on GTX 1080Ti GPU to speed up network training. CNN used for our tests were implemented in Tensorflow 2.3.0 environment [1] under Python 3.7. As a training algorithm popular Adam [4] was chosen, and its parameters were set as follows: learning rate lr = 0.001, β1 = 0.9 and β2 = 0.999, batch size 128. Test accuracies of CNNs trained for 200 epochs were averaged over 6 runs of simulations. Choice of a loss function used in the training algorithm may influence robustness to label noise [25], so in our simulations we decided to use non-modified error measure, which in the case of classification is categorical cross-entropy loss defined as: ECC = −
B C 1 (pic log(yic )), B i=1 c=1
(6)
where pic is a binary function indicating whether the ith training pattern belongs to cth class, B is a batch size. 3.3
Simulation Results
General results of conducted experiments were presented in Figs. 1, 2, 3 and 4. Averaged test accuracies obtained for several combinations of CNN architectures, trained on clean and artificially contaminated data patterns need to be analyzed in view of our previous findings. The shapes of resulting curves for networks with combined dropout and mini-batch normalization in Figs. 1 and 3 are similar to those described in [24] for structures with dropout only: accuracy rises until it reaches optimal dropout rate and then starts decreasing. It is more noticeable for higher label noise levels. Especially, MNIST results for clean training data are not very sensitive to changing pd . For this simple task, one needs pd = 0.95 to lower the performance, while the optimal range is about 0.35–0.45. This phenomenon may be considered as a result of properly chosen (probably close to optimal) network architecture. If network structure is well-suited to the problem, it does
62
A. Rusiecki
Fig. 1. Averaged test accuracies for several levels of dropout rate for MNIST dataset: noise level is varied in range µ = 0.05–0.95, batch normalization included.
Fig. 2. Averaged test accuracies for noise levels in the range µ = 0.0–0.6, tested for networks with different combinations of regularization methods. Dropout rate chosen as optimal for MNIST dataset and minimal used in our tests.
Batch Normalization and Dropout in Training with Label Noise
63
Fig. 3. Averaged test accuracies for several levels of dropout rate for CIFAR-10 dataset: noise level is varied in range µ = 0.05–0.75, batch normalization included.
Fig. 4. Averaged test accuracies for noise levels in the range µ = 0.0–0.6, tested for networks with different combinations of regularization methods. Dropout rate chosen as optimal for CIFAR-10 dataset and minimal used in our tests.
64
A. Rusiecki
not overfit and match erroneous labels. When we consider data with high levels of noisy labels, one may notice that choosing the rate in its optimal range may dramatically improve classification accuracy. For CIFAR-10 3 we may clearly identify optimal rate at about 0.35–0.45. Because the problem is more sophisticated, and this CNN has more complex structure, differences of accuracy obtained for tested dropout rates are more distinct. In addition, a certain shift between optimal rate for clean and contaminated data can be observed. Moreover, even for clean training set without mislabelled examples, using both regularization techniques can improve classification results. Figures 2 and 4 present performance of nets without additional regularization and with combination of considered methods as a function of label noise level. In these Figures, best-performing dropout rate for each dataset is compared with the minimal one (pd = 0.05) in networks with and without batch normalization and networks not containing dropout layers. Main observation may be then formulated as follows: 1. Network architectures containing dropout with optimal rate and batch normalization outperform other approaches in each case. This is particularly noticeable for CIFAR-10, where such combination acts the best even for clean data without mislabelled examples. 2. The worst performing networks are those without additional regularization and, for some probabilities of noise, also with batch normalization only for MNIST. 3. Low dropout rates or batch normalization only slightly improved network performance, especially for lower noise levels (CIFAR-10). 4. Dropout with carefully chosen, and problem-dependent rate is still definitely better than combination of batch normalization and dropout with small pd = 0.05. Taking into account the phenomena described above, we may conclude that batch normalization cannot be considered as a substitute for dropout when training with potentially noisy labels is performed. On the other hand, using the combination of dropout with properly chosen rate and batch normalization, not only does not worsen the results, but allows one to obtained better classification accuracy, even for clean training data.
4
Conclusions
In this paper, we presented preliminary experimental results showing the impact of dropout regularization and batch normalization on deep network training in the presence of incorrectly labelled data. As it was expected, properly applied dropout can indeed improve classification accuracy, even for training patterns containing high percentage of erroneous labels. Moreover, combining such approach with simple batch normalization doesn’t spoil the results, and the architectures using both regularization methods may obtain higher classification accuracy when trained on
Batch Normalization and Dropout in Training with Label Noise
65
datasets containing label noise. This is in part in contradiction with earlier findings that advice not to combine these approaches. Our future efforts should be directed towards conducting more experiments on larger datasets, more sophisticated network structures and other learning algorithms with parameters tuned separately in each case. Based on the preliminary results we believe that it could help in formulating some rules on how to combine regularization approaches in order to achieve better robustness to label noise. Moreover, algorithms helping in automatic regularization parameters tuning could be potentially proposed.
References 1. Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). Software available from tensorflow.org 2. Erhan, D., et al.: Why does unsupervised pre-training help deep learning? J. Mach. Learn. Res. 11, 560–625 (2010) 3. Bengio, Y., et al.: Greedy layer-wise training of deep networks. In: Advances in Neural Information Processing Systems, vol. 19, pp. 153–160. MIT Press (2007) 4. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) 5. Labach, A., Salehinejad, H., Valaee, S.: Survey of dropout methods for deep neural networks. arXiv preprint arXiv:1904.13310 (2019) 6. Garbin, C., Zhu, X., Marques, O.: Dropout vs. batch normalization: an empirical study of their impact to deep learning. Multimed. Tools Appl. 79, 12777–12815 (2020) 7. Ghosh, A., Kumar, H., Sastry, P.S.: Robust loss functions under label noise for deep neural networks. arXiv:1712.09482v1 (2017) 8. Guan, M.Y., Gulshan, V., Dai, A.M., Hinton, G.E.: Who said what: modeling individual labelers improves classification. arXiv:1703.08774 (2017) 9. Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., Stahel, W.A.: Robust Statistics: The Approach Based on Influence Functions (Wiley Series in Probability and Statistics). Wiley-Interscience, New York (2005).Revised edn 10. Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012) 11. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. Proc. Mach. Learn. Res. 37(2015), 448–456 (2015) 12. Jindal, I., Nokleby, M., Chen, X.: Learning deep networks from noisy labels with dropout regularization. In: 2016 IEEE 16th International Conference on In Data Mining (ICDM), pp. 967–972. IEEE (2016) 13. Joulin, A., van der Maaten, L., Jabri, A., Vasilache, N.: Learning visual features from large weakly supervised data. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 67–84. Springer, Cham (2016). https:// doi.org/10.1007/978-3-319-46478-7 5 14. Kordos, M., Rusiecki, A.: Reducing noise impact on MLP training. Soft Comput. 20(1), 49–65 (2015). https://doi.org/10.1007/s00500-015-1690-9 15. Krizhevsky, A.: Learning multiple layers of features from tiny images, Technical report (2009)
66
A. Rusiecki
16. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105 (2012) 17. LeCun, Y., Cortes, C.: MNIST handwritten digit database. http://yann.lecun. com/exdb/mnist/ 18. Li, X., Chen, S., Hu, X., Yang, J.: Understanding the disharmony between dropout and batch normalization by variance shift. arXiv:1801.05134 (2018) 19. Luo, P., Wang, X., Shao, W., Peng, Z.: Towards understanding regularization in batch normalization. arXiv:1809.00846 (2019) 20. Misra, I., Lawrence Z.C., Mitchell, M., Girshick, R.: Seeing through the human reporting bias: visual classifiers from noisy human-centric labels. In: Computer Vision and Pattern Recognition, (CVPR) (2016) 21. Natarajan, N., Inderjit, S.D., Ravikumar, P.K., Tewari, A.: Learning with noisy labels. In: Advances in Neural Information Processing Systems (NIPS) (2013) 22. Patrini, G., Rozza, A., Menon, A., Nock, R., Qu, L.: Making neural networks robust to label noise: a loss correction approach. In: Computer Vision and Pattern Recognition (2017) 23. Reed, S., Lee, H., Anguelov, D., Szegedy, C., Erhan, D., Rabinovich, A.: Training deep neural networks on noisy labels with boot-strapping. arXiv preprint arXiv:1412.6596 (2014) 24. Rusiecki, A.: Standard dropout as remedy for training deep neural networks with label noise. In: Zamojski, W., Mazurkiewicz, J., Sugier, J., Walkowiak, T., Kacprzyk, J. (eds.) DepCoS-RELCOMEX 2020. AISC, vol. 1173, pp. 534–542. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-48256-5 52 25. Rusiecki, A.: Trimmed categorical cross-entropy for deep learning with label noise. Electron. Lett. 55(6), 319–320 (2019) 26. Salakhutdinov, R., Hinton, G.E.: Semantic hashing. In: Proceedings of the 2007 Workshop on Information Retrieval and Applications of Graphical Models (SIGIR 2007), Amsterdam. Elsevier (2007) 27. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014) 28. Sukhbaatar, S., Bruna, J., Paluri, M., Bourdev, L., Fergus, R.: Training convolutional networks with noisy labels. arXiv preprint arXiv:1406.2080 (2014) 29. Vahdat, A.: Toward robustness against label noise in training deep discriminative neural networks. In: Neural Information Processing Systems (NIPS) (2017) 30. Van Horn, G., et al.: Building a bird recognition app and large scale dataset with citizen scientists: the fine print in fine-grained dataset collection. In: Computer Vision and Pattern Recognition (CVPR) (2015) 31. Veit, A., Alldrin, N., Chechik, G., Krasin, I., Gupta, A., Belongie, S.: Learning from noisy large-scale datasets with minimal supervision. In: Computer Vision and Pattern Recognition (CVPR) (2017) 32. Xiao, T., Xia, T., Yang, Y., Huang, C., Wang, X.: Learning from massive noisy labeled data for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2691–2699 (2015)
Intelligent Software Engineering: The Significance of Artificial Intelligence Techniques in Enhancing Software Development Lifecycle Processes Vaishnavi Kulkarni1(B) , Anurag Kolhe2 , and Jay Kulkarni1 1 Pune, India 2 Akola, India
Abstract. In every sphere of technology nowadays, the world has been moving away from manual procedures towards more intelligent systems that minimize human error and intervention, and software engineering is no exception. This paper is a study on the amalgamation of artificial intelligence with software engineering. Software Development Lifecycle is the foundation of this paper, and each phase of it – Requirements Engineering, Design and Architecture, Development and Implementation, and Testing – serves as a building block. This work elucidates the various techniques of intelligent computing that have been applied to these stages of software engineering, as well as the scope for some of these techniques to solve existing challenges and optimize SDLC processes. This paper demonstrates in-depth, comprehensive research into the current state, advantages, limitations and future scope of artificial intelligence in the domain of software engineering. It is significant for its contributions to the field of intelligent software engineering by providing industry-oriented, practical applications of techniques like natural language processing, meta programming, automated data structuring, self-healing testing etc. This paper expounds upon some open issues and inadequacies of software engineering tools today, and proposes ways in which intelligent applications could present solutions to these challenges. Keywords: Intelligent computing · Artificial intelligence · Software engineering · Requirements · Design · Development · Testing · SDLC · Automation · Software development · Intelligent software engineering · NLP · Neural network · Genetic algorithm · Machine learning · KBS · CASE
1 Introduction In the field of software engineering, Software Development Life Cycle (SDLC) is a framework which development teams use to divide the process of software development from planning to maintenance into smaller and more systematic steps or subprocesses, to enhance design, product management and project management. On the other hand, Artificial Intelligence (AI) is a field which aims to imitate the problem-solving and © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 67–82, 2022. https://doi.org/10.1007/978-3-030-96308-8_7
68
V. Kulkarni et al.
decision-making capabilities of the human mind through machines. The software systems being developed are becoming much more complex with respect to the number of requirements they need to address and the nature of problems encountered. There is a need to automate the SDLC processes to ensure timely deliveries of projects in a costeffective manner while maintaining quality. This is where AI can play a very significant role. AI is well suited to assist in solving complex software engineering tasks as AI techniques are designed with the goal of replicating intelligent behavior. AI is making the process of designing, building and testing software faster and driving down project costs. AI algorithms can be used to appreciably enhance everything from project planning to quality testing thereby increasing productivity and efficiency of the processes in SDLC. Recently a substantial amount of research has been conducted in the field of intelligent software engineering. In 2020, Mirko Perkusich et al. carried out a rigorous literature review in the domain of applying intelligent techniques to Agile Software Development. [1] From a sample set of 93 unique studies, they inferred that the most popular techniques include search-based solutions and machine learning, and that such techniques are employed primarily to meet goals like approximating efforts, management of requirements, and decision making. They conclude that this is still a fledgling area of research, and is rich with potential. In 2019, [2] focuses on the field of education (as opposed to industry) – the authors seek to facilitate automated and intelligent teaching and learning of software engineering and object-oriented programming, by improving existing tools which identify design smells in a software system and using them to foster stronger development skills. Such improved tools, which intelligently identify the flaws in design that have the potential to cause issues in maintenance of the software later on, are relevant to making the initial phases of SDLC more intelligent. Danny Weyns, in his 2019 work on software engineering of self-adaptive systems, tackled the issue of conditional uncertainties causing interruptions in software systems. [3] Self-adaptive systems should ideally reconfigure themselves to adjust to external, environmental changes in operation, available resources, user goals, etc. His work studies the evolution of self-adaptation over six waves, such as automating tasks, architecturebased adaptation, runtime models, etc. Self-adaptive models for software engineering would prove highly effective – for example, software tools that can intelligently modify the requirements specification and design of an application based on changes in user needs, or testing tools which automate testing tasks, are the need of the hour.
2 Software Development Lifecycle (SDLC) In this paper we use the nomenclature, processes and terminology as defined by the IEEE 12207 [4] standard of software engineering for software lifecycle processes. This standard establishes a framework for software lifecycle processes and defines processes, activities and tasks for the development and operation of software products. This paper is organized as follows: in this section i.e., Sect. 3, we present how AI techniques can be leveraged in some of the processes as defined in the 12207 standard, namely four processes - Requirements Engineering, Design and Architecture, Development and Implementation, and Testing. Each of the aforementioned four stages are
Intelligent Software Engineering: The Significance of Artificial Intelligence
69
divided into three parts – the role played by that stage in SDLC, the need for artificial intelligence in its domain, and a review of the myriad methods like natural language processing, knowledge-based systems, neural networks, etc. which have been employed to make these software engineering processes intelligent. These techniques attempt to automate or semi-automate the tasks to generate optimal or semi-optimal solutions. In the next section i.e., Sect. 4, we highlight some open problems faced in the implementation of AI techniques in SDLC processes and challenges going ahead. In Sect. 5, we discuss the future scope for these AI techniques in software engineering. 2.1 Requirements Engineering and Planning Software Requirements Specification (SRS) in SDLC. Requirements Engineering (RE) is one of the primary phases of the software development lifecycle. It precedes the phases of design and architecture, coding, and testing. A ‘requirement’ is defined by the Institute of Electrical and Electronics Engineers (IEEE) as “a condition or capability needed by a user to solve a problem or achieve an objective, which must be met by a system to satisfy a contract, standard, specification, or other formally imposed document”. [5] SRS is a concrete step which follows the more abstract initial step of gauging feasibility and planning. It is a fundamental part of software development which is based on extensive requirement analysis. The needs of the end-users are clearly defined, following which the functionalities which the software must implement in order to fulfill these prerequisites are outlined. Based on the problem statement, requirement specification can be done with three models [6]: • Object Model – describes the objects in the system in a static context. • Dynamic Model – describes the interactions of objects in the system with one another. • Functional Model – describes the flow of data through the system. Thus, an SRS document is created which describes how the software is expected to perform and what it must do to meet the stakeholders’ expectations. The Importance of RE in Software Engineering. RE is of paramount importance in SDLC. As it is part of the preliminary phase, errors and miscalculations at this stage can cascade to the latter phases and lead to major issues at the design and coding phases. A comprehensive SRS lays a strong foundation on which the entire software project is built. It ensures consistency across the board amongst different teams like development, testing, dev-ops, quality assurance, maintenance, etc. It helps ensure that development does not diverge from the path leading towards the fulfilment of the requirements of the clients, and that the testing phase is carried out while keeping the end-users in mind. Defining requirements early on precludes redundancy, rework and excessive overhead expenditure in later stages. The Need for Intelligent Requirements Engineering. Poorly structured, abstruse and ambiguous SRS can be the cause of future failure or higher cost of maintenance. It can lead to deployment of software with unnecessary features or that which lacks the functionalities needed. However, as software requirements grow vaster and more complex as
70
V. Kulkarni et al.
the scope widens, intelligent approaches to RE for complicated systems can be highly beneficial. [7] Such approaches can help avoid the pitfalls associated with an entirely manual technique and can improve the meticulousness of requirements specification. Since the 80’s, methods for the automation of requirements specification using intelligent tools have been researched – Balzer et al. proposed that a well-defined specification called for neither an entirely manual nor automated approach but a combination of the two; a computer-based tool reliant on context provided with partial descriptions could reliably complete them and construct a precise specification [8]. Intelligent Requirements Engineering Techniques Natural Language Processing (NLP). NLP encompasses analysis of text (in natural language) which is lexical, semantic, syntactic, contextual, phonetic, morphological, discourse-oriented and pragmatic. [9] Natural language has certain drawbacks. Unlike programming languages which attempt to reduce extraneous text and maintain an entirely logical and clear-cut format, natural language can be vague, inadequate, unfinished and unintelligible. This may lead to imprecise RE that does not reconcile the understanding of the developer with the demands of clients/stakeholders. Fabiano Dalpiaz et al. proposed a tool-based methodology for reducing ambiguity and identifying missing requirements by combining NLP and information visualization techniques. [10] User requirements were taken in a fixed format (as a [viewpoint], I want [requirement], so that [purpose]). [11] They employed a tool – Visual Narrator – to extricate the objects and their interactions with one another from this format as input. With their Web 2.0 software, they focused on identifying different viewpoints for requirements specification (such as user, developer, administrator, etc.) and built a structure for pinpointing incompleteness and ambiguity, as well as an algorithm for detecting synonyms using semantic NLP methods. They integrated information visualization that outputs a Venn diagram to give a graphical representation of the aforementioned viewpoints and ambiguities. Knowledge-Based Systems (KBS). Existing expert knowledge bases can be highly useful for producing design and requirement models intelligently which satisfy the user’s needs. Knowledge bases store information about the requirements, inputs and output functionalities of various systems. [12] This data is used to optimize the requirements specification process. In the NL-OOPS (Natural Language Object-Oriented Production System) for performing object-oriented requirements analysis proposed by Garigliano et al. [13], LOLITA (Large-scale Object-based Linguistic Interactor, Translator and Analyzer) was used to scrutinize requirements specification documents. The pre-existing knowledge base of this NL processing system was used in conjunction with the analyzed documents to yield requirement models [7]. READS is a hypertext system designed by T. J. Smith which makes use of KBS in the form of a relational database server. READS facilitates requirements discovery, analysis, decomposition, allocation, traceability and reporting, all integral components of RE [14].
Intelligent Software Engineering: The Significance of Artificial Intelligence
71
Artificial Neural Networks (ANN). Inspired by a simplified representation of neural connections in a brain, an artificial neural network is a pool of nodes or ‘neurons’ with an input and an output, which form a network through interlinked ‘connections.’ [15] ANNs are trained by processing examples as inputs which makes their output grow increasingly more aligned with the intended result. ANNs can be instrumental in building intelligent RE processes. Neumann’s proposed method for categorization of requirements based on risk combines Pattern Recognition in Principal Component Analysis techniques with ANN. [16] The ANN is instrumental in determining which inputs have an above-average number of software components that pose a large risk. A Use Case for Intelligent Requirements Engineering Problem Statement. A language barrier between the different parties involved in the development of a software system (developers, users, stakeholders, etc.) can lead to unclear requirements specifications. If the needs of the various parties are lost in translation, RE cannot be efficiently and accurately carried out. There is a necessity for intelligent software tools which can facilitate translation of documents like SRS into different regional languages while retaining the salient points, and which can also take inputs of requirements in different regional languages and output an SRS in a common language (such as English). Proposed Solution. NLP can be used to develop a tool to satisfy this need. The tool makes possible the translation of SRS from English to a pre-defined list of regional languages. It also takes requirements as an input in a regional language, processes it and outputs the corresponding requirement in English. The user selects which setting they need through the user interface (UI). The list of languages is also available for selection in the UI. POStagging (parts of speech tagging) is done on the input document. Nouns are identified as the objects, while verbs, adjectives and adverbs denote the actions between them. A translation API (like Google Translate) is used to convert these simplified object-based relationships into the language of choice, which are then given as the output document (Fig. 1).
Fig. 1. Software tool architecture
72
V. Kulkarni et al.
2.2 Software Design and Architecture Software Design in SDLC. Once the User Requirements are properly documented in the SRS, the software reaches the Design phase. This is the first phase where we move from the problem domain to the solution domain. Architects, Developers and Stakeholders create a blueprint of the architecture from the documents generated by RE which would serve as the foundation of future steps like Development, Testing and Maintenance. This blueprint consists of the software modules, functions, objects and overall interaction of the code satisfying all the required functionalities. Designs developed here hold tremendous importance and they must fulfil all the user requirements mentioned in the SRS as well as be designed with a lot of foresight envisioning all the possible scenarios, their outcomes, vulnerabilities, etc. Implementation of the software developed is dependent on the designs created, hence once the designs are finalized, they cannot be changed at a later point in time. The quality of the architecture design is most commonly assessed on factors like modularity, modifiability, reusability, understandability, complexity, portability and trackability. Modularity is usually connected with Coupling and Cohesion where the software architect usually aims for a design that consists of highly cohesive and loosely coupled components. Software designs depending upon the project could be produced in the form of images, sketches, HTML screen-design, flow-charts, etc. One such methodology which has become very popular in IT industries due to its Object-Oriented approach is UML (Unified Modeling Language). A literature survey conducted by authors in [17] suggests that in 68.7% of cases UMLs are used in the Design/Modeling phases. UML is a modelling language used for specifying and documenting the artifacts of the system. It visualizes the software with the help of various diagrams which can be classified into two categories – • Structural – Used to capture the static and structural aspects of a system. • Behavioral – Used to capture dynamic features of a system. In this section different AI techniques are targeted towards UML diagrams only. The Need for AI in Software Design Processes. Software Design is a time-consuming process. The system architect needs to have a thorough understanding of all the user and business requirements from the SRS document. On top of this a lot of effort is invested in the process of creating, arranging and labelling various UML diagrams. Many computeraided software engineering (CASE) tools like Smart Draw, Visual UML, Rational Rose, GD Pro, etc. are available for the designing purpose but have their limitations. Also, it is quite a cumbersome process to familiarize oneself with these tools and create the necessary diagrams from their oft cluttered interfaces. Moreover, poor human analysis of requirements or a lack of expertise to use any CASE tool may not result in a sturdy architectural design. This calls for a more automated approach to remove human intervention from this phase as much as possible, making this process easier and much more reliable. This is where AI comes into the picture.
Intelligent Software Engineering: The Significance of Artificial Intelligence
73
Intelligent Software Design Techniques. According to the survey conducted by the authors in [17, 18], not all of the UML diagrams are used frequently in industry software – amongst all the types, only Class, Sequence, Use-case, Activity and Deployment diagrams are used most commonly. Since the only material we have preceding this phase is the SRS document in which the requirements are captured in natural human language, different implementations of the NLP technique are used to generate UML diagrams. In [19] using SharpNLP, the author has proposed a web application designed in C#.Net framework called UML Generator to generate Class and Use case diagrams. In this application user first uploads a file or enters the text. POS tagging then happens and nouns are further classified from these tag values to identify classes for class diagram and actors for use case diagram. An XML based rule approach is used to filter out unwanted nouns. Following these steps, the initial results are presented on the website and the user is given the freedom to modify the things which he/she finds unfitting. Once the required modification is done the output is run through a machine learning model in Weka using Logistics and SMO classifiers to recognize the relationships between use case and classes. The final solution is presented on the website with the option to download and generate diagrams as a visual studio modelling solution. In accordance with the NLP methods elaborated on in [20], the authors in [21] have implemented a module named LESSA (Language Engineering System for Semantic Analysis) which generates Use Case diagrams in two phases – Information Extraction and Diagram Generation. In Information Extraction, first, the text is tokenized into lexical tokens and a rule-based approach is applied to perform syntactic analysis and do POS tagging on these tokens. From parts of speech analysis, it then extracts actors, actions and objects for the use case. In the next Diagram Generation phase, LESSA will create a diagram in a 4-step process by the identified actor and actions in the first phase. In [22], Activity and Sequence diagrams are generated using NLP where Grammatical Knowledge Patterns (GKP) and Frames are used. GKP is the combination of parts of speech, Paralinguistic patterns, text structure, etc. Frames are a concise way of storing and representing knowledge in an Object-Oriented manner. Initially, GKP identification is done where authors manually studied various requirements documents and carried out lexical and syntactic analysis using the Stanford POS tagger. After learning these patterns, an automated algorithm is designed to identify these patterns and classify statements into one or more leaf categories (as proposed in the paper) depending on the GKP it has. For each leaf category, a frame structure is defined with keys as the semantics of the statement and their values as the parser dependency tags. With the information stored in these frames, activity and sequence diagrams are generated following some additional rules. There are many limitations of CASE tools present to draw UMLs, for e.g., they are difficult to learn and they generate diagrams in specific file formats only. There is also no mechanism of extracting UML from images which is of vital importance as it can help novice developers to learn from the experience of expert developers who already created UMLs and are present on the internet. Authors in [23] propose a Convolutional Neural Network (CNN) to classify UMLs diagram images. CNN has been used with and without regularization techniques. In CNNs without regularization there are four convolutional layers along with two activation functions (ReLU and Sigmoid) and four pooling layers where Max-pooling is used. In a CNN with regularization, five pooling
74
V. Kulkarni et al.
layers are used along with L2 regularization with different weight functions keeping other things the same. Moving away from the generation of UML diagrams, we will come to refactoring of existing UMLs. This is also important as developers spend a lot of effort on the refactoring of code which can be mitigated by refactoring the designs from which the code is written. Refactoring leads to better coupling and cohesion. In [24] AI techniques used for refactoring Single and Multiple-view UML models are discussed. Multipleview models combine different Structural and Behavioral UMLs into a unified view which can reveal some hidden aspects of each diagram. Authors have used search-based algorithms like Hill Climbing, Late-Acceptance Hill Climbing and Simulated Annealing to refactor single-view UML models. They are evaluated using different metrics like size complexity for the Sequence diagram where the reduced number of communications between classes leads to better clarity and elimination of overheads. 2.3 Software Development and Implementation Software Development in SDLC. In this phase, the software developers develop and implement the entire system using the specifications from the design phase. The developers need to adhere to pre-defined programming guidelines specified by their organization. Programming tools such as interpreters, compilers, debuggers and integrated development environments (IDEs) expedite the development process. The culmination of this phase is marked by the finished development of the product. The Need for AI in Software Development Processes. Software products are evolutionary by nature. Requirements keeping changing frequently which results in delays in product deliveries. This incites a demand for design by experimentation which makes the role of AI quite significant. AI can assist developers through Automated Programming Environments which can be implemented via code generation, code refactoring, code reusability. AI is redefining the way developers work. A recent report by Deloitte [25] states that AI-powered development tools could increase developer productivity by up to 10 times. These AI tools are enabling professionals to write resilient code, automate some of their work, catch bugs relatively early, improve code reusability. The next part will enumerate some of the ways in which artificial intelligence tools and techniques have successfully been used in the software development and implementation phase. Intelligent Software Development Techniques Autonomous Software Code Generation (ASCG). ASCG is an agent-oriented approach used in automated generation of code [26]. This approach utilizes an artificial agent Software Developer Agent (SDA) which supplants the traditional role of a human developer by executing development tasks independently. It begins dealing with the development by analyzing the requirements specification which is provided in the form of a physical configuration of the software under development. The SDA has the capability to synthesize this information and queries its internal knowledge so that it can make decisions regarding how to design software according to the system logic. The system logic is comprised of interconnected blocks which exchange information and data. Ultimately,
Intelligent Software Engineering: The Significance of Artificial Intelligence
75
Fig. 2. Autonomous software generation
this results in the generation of the final software code. Figure 2 illustrates the mechanism of ASCG in detail. Software Reusability. Software reusability is an important aspect in software development which enables developers to reuse software components, thereby lowering software production and maintenance costs, ensuring faster delivery of systems and increasing software quality. Programmers usually incorporate components from a library into their software systems. [27] proposes a Machine Learning based solution to promote software reuse by utilizing explanation-based learning to generalize program abstractions. It develops a system called LASR (Learning Apprentice for Software Reuse) which builds knowledge by forming interconnections between abstract data type theories. In [28], a new classification-based system for reuse is stated. In software construction, subsumption and closeness help to model function composition and function modification. The system employs the above stated concepts to facilitate searching for reusable components and assist developers in analyzing the value of reusing certain modules. The Knowledge-Based Software Reuse Environment (KBSRE) [29] enables users to search for partly matched components from a reusable library, to understand lifecycle data about modules and disintegrate the components under specific constraints. The Programmer’s Apprentice system by Waters [30] which provides a knowledge-based editor (KBEmacs) and Intelligent Design Aid are based on the refinement-based development paradigm. Language Feature. This is a technique which follows the principle of late-binding [31]. It makes the data-structures extremely flexible to use. In the method prototypes, data structures are not finalized which means that there would be no changes required in the underlying logic upon changing the type of the data structure. This enables quick creation of method prototypes which results in code which can be easily modified later. ObjectOriented Programming (OOP) encapsulates data and functions in an object which has
76
V. Kulkarni et al.
been found to be beneficial in systems where codes, data structures and requirements are constantly evolving. Meta Programming. Meta-Programming is developed via Natural Language Processing, a subfield of AI [31]. It is a programming paradigm in which a computer program can consider other programs as data. Programs can be created to read, synthesize, analyses, transform other programs or even change themselves while running. LISP, a functional programming language has support for this technique using automated parser generators and interpreters to generate machine executable programs. Automated Data Structuring. In Automated Data Structuring [31], high-level data structures are converted to an implementation structure. In this method, comprehensive changes in the codebase are performed through a program update manager rather than through a manual text editor or an Application Programming Interface. This ensures greater quality control and manageability of code. Software Refactoring. Software refactoring is the process of enhancing the codebase by making changes which don’t affect the underlying functionality of the code. Refactoring keeps the code free from bugs, eliminates potential errors and keeps the code clean. [32] discusses the implementation of refactoring in IntelliJ Idea. [33] presents the development of an automated refactoring tool which can be used to extract and propagate Java expressions. In [34], the authors present an interactive search-based approach wherein the developers evaluate the suggested refactoring possibilities by a Genetic Algorithm (GA) and later an Artificial Neural Network (ANN) uses these training examples to rank the refactoring solutions proposed by the algorithm. The system to refactor is fed as the input to the ANN. The ANN generates the comprehensive list of potential refactoring solutions. It finally ranks the top-most solutions that would enhance the code quality. Figure 3 illustrates the mechanism of the ANN based approach. Modern IDEs incorporate advanced features such as auto-text completions, instantaneous code inspections, auto-indentation of code to boost developer productivity.
Fig. 3. Software refactoring using ANN
Intelligent Software Engineering: The Significance of Artificial Intelligence
77
2.4 Software Testing Software Testing in SDLC. Software Testing is an important part of the SDLC that ensures customer contentment of the delivered application. It is the main way of validating the delivered product against user-defined requirements. Due to the increased complexity of the software products developed recently, traditional manual testing is becoming a much more expensive and time-consuming task. Hence, manual testing can no longer provide results well within the delivery time constraints especially because of the widely adopted Agile model where continuous integration, continuous development (CI/CD) is followed, due to which new features are developed and deployed frequently. This calls for expeditious testing approaches. The Need for AI in Software Testing. In the recent couple of years, in order to reduce time spent in the tedious task of manual testing, use of Automation Tools like Selenium has become a norm in industries and the results are quite fruitful. But these tools also have their limitations and thus a fair bit of human intervention is still required to manage the testing scripts. Due to this, a better way to automate the Testing phase is a dire need. This is where Artificial Intelligence comes into play. The use of AI in Software Testing is still in its embryonic stages compared to other evolved areas like voice-assisted control or autonomous driving cars. But recently, much work has been done to increase the autonomy in testing, relieving the QA team of the monotonous job of complete manual testing and instead using their expertise in other areas. AI-driven Testing Approaches. According to [35] there are 4 Key AI-driven Testing approaches – Visual Testing. Of late, the onus is on testers to validate the UI layer of an application due to different configurations, screen sizes, and portability. A lot of minor details like color, resolution, fonts, etc. need to be validated. The UI layer undergoes constant change due to dynamic client requirements. [36, 37] Visual AI techniques capture an image of a screen, break it up into various visual components, and compare it with visual components of an older snapshot of the captured screen using AI. Computer Vision techniques like OCR (optical character recognition) can be used to achieve this task. In OCR all the visual elements are converted into text. It analyses the shape of characters, context to other characters, words, etc. for converting pixels into text with higher accuracy. Existing Tools – Applitools, Percy by BrowserStack Differential Testing. Due to CI/CD pipeline being ubiquitous in IT industries, every now and then, new features are introduced and regression testing becomes a hefty task. Testers have to ensure that no existing functionalities fail due to the addition of a new feature. They also have to identify potential security vulnerabilities which may be caused by the addition of new code. AI powered tools can now execute this task with ease. They connect to the source repository and can create a basic line of unit testing automatically, thus helping developers avoid the test creation step. Whenever a pull request is raised for a new functionality, these tools can perform a code impact analysis, determining what the most recent code changes are and thus running only the most relevant part of the regression test suite.
78
V. Kulkarni et al.
Existing tools – DiffBlue, Launchable, Google OSS-Fuzz, Facebook’s Infer Declarative Testing. This type of testing focuses on eliminating error-prone and repetitive actions through smart automation. Tools that leverage ML and AI are formidable due to their application of Robotic Process Automation (RPA), Natural Language Processing (NLP), Model-based Test Automation (MBTA), and Autonomous Testing Methods (AT). Tools mimic the application under test and automatically go through the model flows to create test automation scenarios. Benefits for using AI here are – No code skills are required, faster test automation creation and faster maintenance of test automation scenarios. Existing tools – Functionize, UIPath, Tricentis Self-healing Testing. One of major drawbacks of Automation tools is that the scripts which are written need to be constantly updated. Tools like Selenium which heavily rely upon xpath to locate an object in DOM will fail if there is a change in xpath value. Similarly, if any class-name, object id changes, then the associated test cases will fail. AI can be thus used to self-heal these written scripts. ML algorithms are used to learn the website/application under test and hence can identify the element locators from each screen. Existing tools – Mabl, Testim, Perfecto Machine Learning in Intelligent Software Testing. Machine Learning approaches can be classified into mainly three categories • Supervised Learning – here the algorithm learns the mapping functions from the input to the output to find the correct answer based on the data set. • Unsupervised Learning – This is used to discover hidden patterns in unlabeled data. It organizes the data into group of clusters. • Reinforcement Learning – Here the algorithm constantly adapts to its environment based on the feedback. In [38] a Literature Survey was carried out showing that AI and ML methods were mostly used to tackle Black Box testing. Their findings suggested that most popular AI algorithms were clustering (unsupervised learning method), ANN (used in supervised and reinforcement learnings) and Genetic Algorithm (GA) (which is used in reinforcement learnings). Fuzzing and Regression Testing were the most common testing types performed by AI. Another Literature Survey [39] was conducted where they found testing activities like Test case generation, Test oracle generation, Test execution, Test data generation have the most potential to be automated and improved by AI. Also, techniques like ANN, Computer Vision (methods like NFS, SIFT, FAST), Q-learning, Bayesian Network were implemented the most for tackling above problems. In [40] GA is used for breeding of software test cases. Depending on the scenarios, the fitness function can be tweaked either to conduct a focused search that provide large number of localized tests or loosened up for providing for more random-behavior. In [41]
Intelligent Software Engineering: The Significance of Artificial Intelligence
79
also Hybrid Genetic algorithms (HBA) are used to automatically test GUI. Intelligent Search Agent (ISA) is used for generating test sequence optimally and Intelligent Test Case Optimization Agent (ITOA) for optimal test case generation are used. In domains such as avionics, oil and gas there is a common practice to derive and execute test cases manually from given requirements. In [42] as an application of NLP author proposed automatic derivation of manually executable test cases from use cases using restricted NLs. In [43] as another application of NLP, authors proposed a tool called Litmus for the generation of test cases from functional requirements. They said that no conditions are imposed on the sentence and the tool analyses each sentence of functional requirements and generates multiple Test cases in a 5-step process. Today in the software industries, Selenium is the widely used automation tool for the testing of web-based applications. It has its limitations as mentioned above. In [44] the authors try to minimize those limitations by creating a framework by combining Selenium and a machine learning algorithm. It consists of 3 steps. First, the selenium webdriver and a request library are used to send an HTTP request to the relevant webpage. Then, in the second step, the data is scraped and processed with the help of a webscraping python library called BeautifulSoup. In the last step, the SVM model is used to recognize the pattern of web elements corresponding to the ‘search box’ in each webpage. SVM model will be trained using the HTML structure of each website. Due to this the SVM model will be able to detect any change to the web elements.
3 Open Problems and Challenges The use of AI techniques has its limitations and is not a panacea for addressing Software Engineering problems, even though they can help lessen the impact of quite a number of issues. Some of the open problems we found are: • Requirements gathered in natural language contain a lot of ambiguities, and NLP algorithms for regional languages have not been a point of focus for major research. • ML algorithms give better accuracies on the test data when a lot of relevant training datasets to train the model are readily available. Training data may be specific to a software project which may result in poor accuracy of other test data. • ML models require the tweaking of many hyperparameters. If an ML model is designed to tackle multiple tasks, then changing the parameters to improve one task may result in performance degradation of others. • Due to the limitations mentioned in the above point, there is not a single AI model or tool currently which can help with the automation of multiple SDLC phases. • More advanced AI techniques can be used to enhance the automation of the Architecture and Design phase. Generated diagrams lack accuracy and still require human expertise to fix the discrepancies present in them. • [45] Search-Based Software Testing (SBST) can be extended to test the non-functional requirements. Also, search-based data generation for testing currently focuses on optimizing only a single objective (such as branch coverage) which can be somewhat ineffective as real-world data is more complex and not very orderly. Hence it should be able to generate multi-objective test data.
80
V. Kulkarni et al.
4 Future Work As evident from the open problems, future work in this field includes building an AIpowered system that could handle the automation of multiple software lifecycle phases. A complete system could be designed that takes requirements specified by the user in a multi-lingual natural language as an input. The output should contain relevant design diagrams, testing suites, etc. that users could modify according to their needs. An intelligent system should ensure that this human intervention is also minimal. AIenhanced software development tools are an example of how AI can empower, rather than replace human workers. Apart from this, the use of ML in Codeless Automation Testing to address the limitations of Automation Testing tools is also a major area for research.
5 Conclusion Despite all the open challenges, from all the findings presented in this paper, we can conclude that AI techniques can be extensively incorporated by industries in the Information Technology domain into the Software Engineering process. They can revolutionize the burdensome tasks of SDLC phases, making the entire process autonomous and free of roadblocks. We surveyed the need for using AI and how its different approaches can be integrated to automate SDLC phases like Requirement Engineering, Architecture Design, Development and Implementation, and Testing. The use of AI techniques can prove to be very lucrative for Software Industries. These methods can be employed by industries to revamp their current methodologies which require a lot of human intervention, to save time, money and effort.
References 1. Perkusich, M., et al.: Intelligent software engineering in the context of agile software development: a systematic literature review. Inf. Softw. Technol. 119, 106241 (2020) 2. Silva, V.J.S., Dorça, F.A.: An automatic and intelligent approach for supporting teaching and learning of software engineering considering design smells in object-oriented programming. In: 2019 IEEE 19th International Conference on Advanced Learning Technologies (ICALT), vol. 2161. IEEE (2019) 3. Cheng, B.H.C., et al.: Software Engineering for Self-Adaptive Systems: A Research Roadmap. In: Cheng, B.H.C., de Lemos, R., Giese, H., Inverardi, P., Magee, J. (eds.) Software Engineering for Self-Adaptive Systems. LNCS, vol. 5525, pp. 1–26. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02161-9_1 4. IEEE 12207-2-2020 - ISO/IEC/IEEE International Standard - Systems and software engineering–Software life cycle processes–Part 2: Relation and mapping between ISO/IEC/IEEE 12207:2017 (2020) 5. Institute of Electrical and Electronic Engineers, IEEE Standard Glossary of Software Engineering Terminology (IEEE Standard 610.12-1990). Institute of Electrical and Electronics Engineers, New York (1990) 6. Chakraborty, A., Baowaly, M.K., Arefin, A., Bahar, A.N.: The role of requirement engineering in software development life cycle. J. Emerg. Trends Comput. Inf. Sci. 3(5), 1 (2012)
Intelligent Software Engineering: The Significance of Artificial Intelligence
81
7. Batarseh, F.A., Yang, R.: Data Democracy: At the Nexus of Artificial Intelligence, Software Development, and Knowledge Engineering. Academic Press, Cambridge (2020) 8. Balzer, R., Goldman, N., Wile, D.: Informality in program specifications. IEEE Trans. Softw. Eng. SE-4(2), 94–103 (1977) 9. Zhao, L., et al.: Natural language processing (NLP) for requirements engineering: a systematic mapping study. arXiv preprint arXiv:2004.01099 (2020) 10. Dalpiaz, F., van der Schalk, I., Lucassen, G.: Pinpointing ambiguity and incompleteness in requirements engineering via information visualization and NLP. In: Kamsties, E., Horkoff, J., Dalpiaz, F. (eds.) REFSQ 2018. LNCS, vol. 10753, pp. 119–135. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77243-1_8 11. Robeer, M., Lucassen, G., Van der Werf, J.M., Dalpiaz, F., Brinkkemper, S.: Automated extraction of conceptual models from user stories via NLP. In: Proceedings of the International Requirements Engineering Conference (2016) 12. Ammar, H.H., Abdelmoez, W., Hamdi, M.S.: Software engineering using artificial intelligence techniques: current state and open problems. In: Proceedings of the First Taibah University International Conference on Computing and Information Technology (ICCIT 2012), Al-Madinah Al-Munawwarah, Saudi Arabia, vol. 52 (2012) 13. Garigliano, R., Mich, L.: NL-OOPS: a requirements analysis tool based on natural language processing. Conf. Data Mining 3, 1182–1190 (2002) 14. Smith, T.J.: READS: a requirements engineering tool. In: Proceedings of the IEEE International Symposium on Requirements Engineering. IEEE (1993) 15. Zell, A.: Simulation Neuronaler Netze (Simulation with Neuronal Networks). Wissenschaftsverlag, Oldenbourg (2003) 16. Neumann, D.E.: An enhanced neural network technique for software risk analysis. IEEE Trans. Software Eng. 28(9), 904–912 (2002) 17. Koc, H., Erdo˘gan, A., Barjakly, Y., Peker, S.: UML diagrams in software engineering research: a systematic literature review. Proceedings. 74, 13 (2021). https://doi.org/10.3390/proceedin gs2021074013 18. Waykar, Y.: A study of importance of UML diagrams: with special reference to very largesized projects (2013) 19. Narawita, C.R., Vidanage, K.: UML generator – use case and class diagram generation from text requirements. Int. J. Adv. ICT Emerg. Regions (ICTER) 10, 1 (2018) 20. Bajwa, I.S., Choudhary, M.A.: Natural language processing based automated system for UML diagrams generation (2006) 21. Bajwa, I., Hyder, S.: UCD-generator - a LESSA application for use case design. In: 2007 International Conference on Information and Emerging Technologies, ICIET, pp. 1–5 (2007). https://doi.org/10.1109/ICIET.2007.4381333 22. Sharma, R., Gulia, S., Biswas, K.K.: Automated generation of activity and sequence diagrams from natural language requirements. In: 2014 9th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE), pp. 1–9 (2014) 23. Gosala, B., Chowdhuri, S.R., Singh, J., Gupta, M., Mishra, A.: Automatic classification of UML class diagrams using deep learning technique: convolutional neural network. Appl. Sci. 11(9), 4267 (2021) 24. Baqais, A., Alshayeb, M.: Automatic refactoring of single and multiple-view UML models using artificial intelligence algorithms (2016) 25. Schatsky, D., Bumb, S.: AI is helping to make better software, 22 January 2020. https://www2.deloitte.com/us/en/insights/focus/signals-for-strategists/ai-assisted-sof tware-development.html. Accessed 2 Sept 2021 26. Carlos, C.I.: Software programmed by artificial agents: toward an autonomous development process for code generation. In: IEEE International Conference on Systems, Man, and Cybernetics, pp. 3294–3299 (2013)
82
V. Kulkarni et al.
27. Hill, W.L.: Machine learning for software reuse (1987) 28. Prasad, A., Park, E.K.: Reuse system: an artificial intelligence-based approach. J. Syst. Softw. 27(3), 207–221 (1994) 29. Wang, P., Shiva, S.: A knowledge-based software reuse environment for program development. IEEE (1994) 30. Waters, R.: The programmer’s apprentice: knowledge-based program editing. IEEE Trans. Softw. Eng. 8(1), 1e12 (1982) 31. Shankari, K.H., Thirumalaiselvi, R.: A survey on using artificial intelligence techniques in the software development process. Int. J. Eng. Res. Appl. 4(12), 24–33 (2014) 32. Jemerov, D.: Implementing refactorings in IntellJ IDEA (2008) 33. Mahmood, J., Reddy, Y.R.: Automated refactorings in Java: using IntelliJ IDEA to extract and propagate constants (2014) 34. Le Goues, C., Yoo, S. (eds.): SSBSE 2014. LNCS, vol. 8636. Springer, Cham (2014). https:// doi.org/10.1007/978-3-319-09940-8 35. AI in Software Testing. Testing Xperts, 16 March 2021. https://www.testingxperts.com/blog/ AI-in-Software-Testing. Accessed 27 Aug 2021 36. Yanovskiy, D.: Automated visual testing for mobile and web applications. Perfecto, Perforce, 27 May 2020. https://www.perfecto.io/blog/automated-visual-testing. Accessed 25 Aug 2021 37. Battat, M., Schiemann, D.: Why visual AI beats pixel and DOM Diffs for web app testing. InfoQ, 23 January 2020. https://www.infoq.com/articles/visual-ai-web-app-testing/. Accessed 29 Aug 2021 38. Lima, R., et al.: Artificial intelligence applied to software testing: a literature review. In: 2020 15th Iberian Conference on Information Systems and Technologies (CISTI), pp. 1–6 (2020) 39. Trudova, A., et al.: Artificial intelligence in software test automation: a systematic literature review. In: ENASE (2020) 40. Tandon, A., Malik, P.: Breeding software test cases with genetic algorithms (2013) 41. Rauf, A., Alanazi, M.N.: Using artificial intelligence to automatically test GUI. In: 2014 9th International Conference on Computer Science & Education, pp. 3–5 (2014) 42. Zhang, M., Yue, T., Ali, S., Zhang, H., Wu, J.: A systematic approach to automatically derive test cases from use cases specified in restricted natural languages. In: Amyot, D., Fonseca i Casas, P., Mussbacher, G. (eds.) SAM 2014. LNCS, vol. 8769, pp. 142–157. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11743-0_10 43. Dwarakanath, A., Sengupta, S.: Litmus: generation of test cases from functional requirements in natural language. In: Bouma, G., Ittoo, A., Métais, E., Wortmann, H. (eds.) NLDB 2012. LNCS, vol. 7337, pp. 58–69. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-64231178-9_6 44. Nguyen, D.P., Maag, S.: Codeless web testing using Selenium and machine learning. In: ICSOFT 2020: 15th International Conference on Software Technologies, July 2020, Online, France, pp. 51–60 (2020). https://doi.org/10.5220/0009885400510060, (hal-02909787) 45. Harman, M., et al.: Achievements, open problems and challenges for search based software testing. In: 2015 IEEE 8th International Conference on Software Testing, Verification and Validation (ICST), pp. 1–12 (2015). 016/11/21
Honey Bee Queen Presence Detection from Audio Field Recordings Using Summarized Spectrogram and Convolutional Neural Networks Agnieszka Orlowska1 , Dominique Fourer1(B) , Jean-Paul Gavini2 , and Dominique Cassou-Ribehart2 1
´ IBISC (EA 4526), Univ. Evry/Paris-Saclay, Evry-Courcouronnes, France [email protected] 2 Starling Partners Company, Paris, France
Abstract. The present work proposes a simple supervised method based on a downsampled time-frequency representation of the input audio signal for detecting the presence of the queen in a beehive from noisy field recordings. Our proposed technique computes a “summarizedspectrogram” of the signal that is used as the input of a deep convolutional neural network. This approach has the advantage of reducing the dimension of the input layer and the computational cost while obtaining better classification results with the same deep neural architecture. Our comparative evaluation based on a cross-validation beehive-independent methodology shows a maximal accuracy of 96% using the proposed approach applied on the evaluation dataset. This corresponds to a significant improvement of the prediction accuracy in comparison to several state-ofthe-art approaches reported by the literature. Baseline methods such as MFCC, constant-Q transform and classical STFT combined with a CNN fail to generalize the prediction of the queen presence in an unknown beehive and obtain a maximal accuracy of 55% in our experiments. Keywords: Honey bee queen detection · Audio classification Time-frequency analysis · Convolutional neural networks
1
·
Introduction
Smart beekeeping is an emerging and promising research field which aims at providing computational solutions for aiding the monitoring of bee colonies. It is known [12,13] that bees produce specific sounds when exposed to stressors such as failing queens, predators, airborne toxicants. However experienced beekeepers are not always able to explain the exact causes of the sound changes without a hive inspection. Nonetheless, hive inspections disrupt the life cycle of bee colonies and can involve additional stress factors for the bees [2,3]. With this in mind, several recent studies propose to analyze the audio signature of a c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 83–92, 2022. https://doi.org/10.1007/978-3-030-96308-8_8
84
A. Orlowska et al.
beehive through a machine learning approach [3,10] in order to develop systems for automatically discriminating the different health states of a beehive. For example, in [7,8], the authors propose a method which combines the Short-Time Fourier Transform (STFT) of the analyzed audio recording with convolutional neural networks (CNN) to discriminate bee sounds from the chirping of crickets and ambient noise. This approach outperforms classical machine learning methods such as k-nearest neighbors, support vector machines or random forests for classifying audio samples recorded by microphones deployed above landing pads of Langstroth beehives [1]. The detection of the queen presence appears to be one of the most important tasks for smart beekeeping and is addressed in [4] with a complete beehive machine-learning-based audio monitoring system. More recently in [2,11], the authors investigate Music Information Retrieval (MIR)inspired approaches based on mel-frequency cepstral coefficients (MFCC), and on the spectral parameters of sinusoidal signal components as input features of a supervised classification method based on a CNN to predict for the presence of the queen in a beehive from recorded audio signals. In spite of promising results reported in the literature, a further evaluation of the state-of-the-art approaches reveals overfitting problems using the trained models for detecting the queen presence, when applied to distinct beehives as presented for example in [10] through a beehive-independent classification experiment. This lack of generality for the trained model can be critical in real-world application scenarios because the trained models cannot efficiently be applied to another arbitrary chosen beehive without a new beehive-specific training of the model using annotated examples. Thus, the present work introduces a very simple but efficient transformation technique which improves the results of a CNN-based audio classification method in beehive-independent configurations. We compute a “summarized” time-frequency representation through a specific downsampling technique which experimentally reveals a better generalization of the trained model based on a convolutional neural network architecture. Our technique can arbitrary reduce the dimension of the input features provided to the CNN to obtain the best trade-off between the model accuracy and the computational cost. This paper is organized as follows. In Sect. 2, we present the framework of the problem addressed in this study with a description of the experimental materials. In Sect. 3 we present the proposed approach and we introduce our supervised technique based on the summarized spectrogram for automatically predicting the queen presence in a beehive from audio recordings. In Sect. 4, we comparatively assess our new proposed method with several state-of-the-art approaches, in terms of prediction accuracy with a consideration for the dimension of the computed audio features. Finally, this paper is concluded by a discussion including future work directions in Sect. 5.
Honey Bee Queen Presence Detection from Audio Field Recordings
2
85
Framework
Fig. 1. Illustration of the overall proposed approach.
2.1
Problem Formulation and Notations
We address the problem of prediction the state of a beehive using an audio signal x resulting from a field recording of a monitored beehive. The overall approach is based on a supervised machine learning approach depicted in Fig. 1 in which relevant audio features are first computed from x before being processed by a classification method. At the training step, training examples xtrain and labels y train are used to fit the model parameters of the classification method. At the testing step, the trained model is used to predict from x the associated state of the beehive associated to a label yˆ (y being the unknown ground truth). This work aim at proposing the best processing pipeline allowing an accurate prediction of the beehive health state, and focuses on the signal transformation and feature extraction step. 2.2
Materials
We use the publicly available dataset introduced by Nolasco and Benetos in [10] during the Open Source Beehive (OSBH) project and the NU-Hive project1 . The dataset contains annotated audio samples acquired from six distinct beehives. The present work focuses on the audio signals which were annotated as “bee”, corresponding to sounds emitted by the beehive. Hence, the “no bee” annotated signals correspond to external noises and are simply not investigated in our study. At a pre-processing step, each audio recording is resampled at rate of Fs = 22.05 kHz as in [10] and is transformed to single-channel signals by averaging samples from the available channels. Each recording is then split in one-secondlong homogeneous time series (associated to the same annotation label). As a result, we obtain a dataset of 17,295 distinct individuals where 8,444 ones are labeled as “queen” (y = 1) and 8,851 ones are labeled as “no queen” (y = 0). An overview of the investigated dataset with the considered labels for each beehive is presented in Table 1.
1
https://zenodo.org/record/1321278.
86
A. Orlowska et al.
Table 1. Description of the dataset content investigated in the present study. Each individual corresponds to a one-second-long audio signal sampled at Fs = 22.05 kHz. Beehive name Queen No queen Total
3 3.1
CF001 CF003 CJ001 GH001 Hive1 Hive3
0 3,700 0 1,401 2,687 656
16 0 802 0 1,476 6,557
16 3,700 802 1,401 4,163 7,213
Total
8,444
8,851
17,295
Proposed Method Time-Frequency Representation Computation
The Short-Time Fourier Transform (STFT) is a popular technique designed for computing time-frequency representations of real-world signals. STFT appears in a large number and variety of signal processing methods which involve nonstationary multicomponent signals that can efficiently be disentangled using a Fourier transform combined with a sliding analysis window [6]. Given a discretetime finite-length signal x[n], with time index n ∈ {0, 1, ..., N − 1}, and an analysis window h, the discrete STFT of x can be computed as: Fxh [n, m] =
+∞
x[k]h[n − k]∗ e−j
2πmk M
(1)
k=−∞
with z ∗ the complex conjugate of z and j 2 = −1. Here, m ∈ {0, 1, ..., M − 1} m Fs corresponds to an arbitrary frequency bin associated to the frequency f = M M expressed in Hz, for m ≤ 2 . The spectrogram is defined as the squared-modulus of the STFT |Fxh [n, m]|2 [5]. In practice, it is fractioned along the time axis by considering an integer hop size Δn > 1 with a possible overlap between adjacent frames. As a result, we obtain a M × L matrix with M the arbitrary number of computed frequency N (. being the floor function) the resulting number of time bins, and L = Δn indices such as L ≤ N when Δn > 1.
Honey Bee Queen Presence Detection from Audio Field Recordings
3.2
87
Summarizing Process
Fig. 2. Classical- and summarized-spectrogram time-frequency representation comparison of a one-second-long beehive audio recording.
The main problems occurring when a STFT is used as the input of a neural network is the high number of input coefficients which can lead to a high memory consumption and a heavy computation cost during the training step. Hence, we propose a simple dimension reduction method of the spectrogram which aims at preserving the relevant information present in the time-frequency plane without modifying the original time-frequency resolution related to the analysis window. To this end, we use a summary process on the computed spectrogram |Fxh [n, m]|2 which consists in two steps. First, the positive frequency axis (m ∈ [0, M 2 ]) is partitioned into a finite number of equally spaced frequency bands such as B < M 2 . Second, at each time index, the information of each frequency band is summarized into a unique coefficient by applying a summary aggregating function denoted g() along the frequency axis (the best choice for g is discussed later). The summarized-spectrogram SFhx with a reduced dimension of B × L is computed as: SFhx [n, b] = g |Fxh [n, mb ]|2 ∀m ∈[b M ,(b+1) M −1] b 2B 2B
(2)
with b ∈ [0, B − 1] the new frequency bin. We illustrate in Fig. 2 the result obtained by computing the summarized-spectrogram of two audio signals corresponding to beehive recordings respectively labeled as “queen” and “no queen” using the arithmetic mean as g function. 3.3
2D Convolutional Neural Network
CNN is a natural choice for analyzing a time-frequency representation that can also be considered as an image. To predict the label corresponding to the state
88
A. Orlowska et al.
of a beehive from an audio signal, the resulting summarized-spectrogram is processed by the deep neural network architecture inspired from [2] using 2 additional convolutional layers. It consists of 6 convolutional blocks including with a 3 × 3 kernel size, followed by a batch normalization, a 2 × 2 max-pooling and a 25% dropout layers. The output is connected to a 3 fully connected layers (FC) including 2 dropout layers of respectively 25% and 50% followed by a softmax activation function to compute the predicted label yˆ (rounding to the closest integer 0 or 1). Convolutional and FC layers both use a LeakyReLU activation function defined as LeakyRELU (x) = max(αx, x), with α = 0.1.
4
Numerical Experiments
4.1
Experimental Protocol
We propose here two distinct experiments for comparatively assessing our new proposed method described in Sect. 3 with several other state-of-the-art approaches for predicting the queen presence. Experiment 1: We merge the 6 available beehives and then we apply a random split to obtain 70% of the individuals for training and 30% for testing. Experiment 2: We use a 4-fold cross-validation methodology where the beehives are independent. To this end, the folds have been manually created to assign each beehive to a unique fold. An exception is made for the testing folds 1 and 2 which contain two beehives since CF001, CF003, CJ001 and GH001 only contain individuals from the same annotation label. The proposed partitioning of the whole dataset in Experiment 2 is detailed in Table 2. Table 2. Description of the partitioned dataset investigated in Experiment 2. Fold
Training set
Testing Set
Fold 1 CJ001 + GH001 + Hive3 + Hive 1
CF001 + CF003
Fold 2 CF001 + CF003 + Hive3 + Hive 1
CJ001 + GH001
Fold 3 CJ001 + GH001 + Hive3 + CF001 + CF003 Hive1 Fold 4 CJ001 + GH001 + Hive1 + CF001 + CF003 Hive3 Fold 1 Fold 2 Fold 3 Fold 4 queen
4.2
3700
1401
2687
656
no queen 16
802
1476
6557
Total
2203
4163
7213
3716
Implementation Details
The investigated methods have been implemented in python using when needed the following libraries: Librosa is used for audio processing and features extraction, Keras with Tensorflow are used for the implementation and the use of the
Honey Bee Queen Presence Detection from Audio Field Recordings
89
proposed CNN architecture, and scikit-learn [9] is used for computing the evaluation metrics. The training of our CNN was configured for a constant number of 50 epochs with a batch size of 145. The numerical computation was performed using an Intel(R) Xeon(R) W-2133 CPU @ 3.60 GHz CPU with 32 GB of RAM and a NVIDIA GTX 1080 TI GPU. The Python code used in this paper is freely available at https://github.com/agniorlowska/beequeen prediction for the sake of reproducible research.
Fig. 3. Average F-measure for different summary function g and B value configurations.
4.3
Hyperparameters Tuning and Data Augmentation
To define the best value of B with the best summary function g(), we evaluated several configurations by considering the beehive-independent Experiment 2 protocol. According to the results presented in Fig. 3, we chose B = 27 and the mean function which obtained the best results. We also tried to apply the summarizing process separately on the real and the imaginary part of the STFT before computing the spectrogram however this provides very poor results in each configuration. To improve the performance of the trained model, we used a data augmentation (DA) technique which artificially increases of 50% the number of training individuals by the addition of a white Gaussian noise to existing ones. The variance of the noise signal has been define to obtain a resulting signal-tonoise-ratio (SNR) equal to 30 dB. Due to the increase of computation time, we only applied DA on the best resulting method presented in Tables 3 and 4. Our simulations show that DA does not significantly improve the results obtained with MFCC and CQT-based methods which are poorer than with the STFT.
90
4.4
A. Orlowska et al.
Comparative Results
Our proposed method is compared to several existing approaches introduced in [10,11]. The Mel Frequency Cepstral Coefficients (MFCCs)+CNN method is a popular approach proposed in [11] where the number of computed MFCC is set to 20. The constant-Q transform (CQT)+CNN, was also investigated as a baseline method since CQT which can be viewed as a modified version of the discrete STFT with an varying frequency resolution. The so-called Q-factor corresponds f where Δf is the varying frequency resolution (difference between to Q = Δf two frequency bins). All the investigated signal representations use exactly the same CNN architecture for which the dimension of the input layer is adapted. The classification results are obtained with our proposed method based on the mean summary function for a number of frequency bands B = 27 computed from a STFT or CQT with M = 1025 and with an overlap of 50% (Δn = 512) between adjacent frames to obtain an input features matrix of dimension 27×44. The results in the two experiments respectively expressed in terms of Precision, Recall, F-score and Accuracy metrics are reported in Tables 3 and 4. According to Table 3, all the compared methods are almost equivalent since they obtain excellent classification results with an almost perfect accuracy of 1. These results are comparable with those reported in the literature and can be explained by the fact that all the available beehives are merged in the same training set. The beehive-independent results are presented in Table 4 and are very different. Now, the best results in Experiment 2 are only obtained with our proposed method (denoted mean-STFT) which uses the summarized-spectrogram combined with a CNN and obtains an average F-score of 0,75. The use of the data augmentation improves the results and leads to a maximum accuracy of 0,96. Table 3. Comparison of the classification results in Experiment 1 (random split). Method
Features
Label
Precision Recall F-score Accuracy
MFCCS+CNN [11]
20 × 44
Queen
1.00
0.99
0.99
No queen 0.99
1.00
0.99
STFT+CNN
513 × 44 Queen
CQT+CNN [11]
513×44
Mean-CQT+CNN
27 × 44
Mean-STFT+CNN
27 × 44
Mean-STFT+CNN+DA 27 × 44
1.00
0.93
0.97
No queen 0.94
1.00
0.97
Queen
0.96
0.93
0.95
No queen 0.92
1.00
0.95
Queen
0.98
1.00
0.99
No queen 0.99
0.98
0.98
0.99
1.00
1.00
No queen 1.00
0.99
1.00
Queen
0.99
1.00
1.00
No queen 1.00
0.99
1.00
Queen
0.99 0.97 0.95 0.99 1.00 1.00
Honey Bee Queen Presence Detection from Audio Field Recordings
91
Table 4. Comparison of the classification results in Experiment 2 (4-fold hiveindependent cross-validation). Method
Features
Label
Precision Recall F - score Accuracy
MFCCs+CNN [11]
20 × 44
Queen
0.36
0.44
0.40
No queen 0.22
0.16
0.19
STFT+CNN CQT+CNN Mean-CQT+CNN Mean-STFT+CNN
513 × 44 Queen
0.76
0.66
0.20
0.33
513 × 44 Queen 27 × 44 27 × 44
Mean-STFT+CNN+DA 27 × 44
5
0.77
No queen 0.33 0.10
0.07
0.08
No queen 0.32
0.41
0.36
0.25
0.11
0.16
No queen 0.41
Queen
0.65
0.50
0.71
0.86
0.78
No queen 0.81
Queen
0.64
0.71
0.96
0.99
0.96
No queen 0.99
0.94
0.96
Queen
0.31 0.55 0.25 0.38 0.75 0.96
Conclusion
We have introduced and evaluated a new downsampling method for improving the prediction of the presence of a queen bee from audio recordings using a deep CNN. Despite its simplicity, the summarized-spectrogram has a better efficiency in comparison to other perception-motivated representations such as MFCC or CQT, when they are used as input features for the queen presence detection problem. Hence, we have obtained a maximal resulting accuracy of 96% in a beehive-independent split configuration which is very promising. This result paves the way of future real-world applications of smart beehive monitoring techniques based on embedded systems. Future work consists in a further investigation including more data provided by monitored beehives. Moreover, we expect a further investigation of the relevant information conveyed by the summarized-spectrogram when used for providing audio features, in order to design new audio classification methods.
References 1. Teffera, A., Selassie Sahile, G.: On-farm evaluation of bee space of Langstroth beehive. Livestock Res. Rural Dev. 23, Article #207 (2011). http://www.lrrd.org/ lrrd23/10/teff23207.htm 2. Cecchi, S., Terenzi, A., Orcioni, S., Piazza, F.: Analysis of the sound emitted by honey bees in a beehive. In: Audio Engineering Society Convention, vol. 147 (2019) 3. Cecchi, S., Terenzi, A., Orcioni, S., Riolo, P., Ruschioni, S., Isidoro, N.: A preliminary study of sounds emitted by honey bees in a beehive. In: Audio Engineering Society Convention, vol. 144, Milan, Italy (2018)
92
A. Orlowska et al.
4. Cejrowski, T., Szyma´ nski, J., Mora, H., Gil, D.: Detection of the Bee Queen presence using sound analysis. In: Nguyen, N.T., Hoang, D.H., Hong, T.-P., Pham, H., Trawi´ nski, B. (eds.) ACIIDS 2018. LNCS (LNAI), vol. 10752, pp. 297–306. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-75420-8 28 5. Flandrin, P.: Time-Frequency/Time-Scale Analysis. Academic Press, San Diego (1998) 6. Fourer, D., et al.: The ASTRES toolbox for mode extraction of non-stationary multicomponent signals. In: Proceedings of EUSIPCO 2017, pp. 1170–1174 (2017) 7. Kulyukin, V., Mukherjee, S., Amlathe, P.: Toward audio beehive monitoring: deep learning vs. standard machine learning in classifying beehive audio samples. Appl. Sci. 8(9), 1573 (2018). https://doi.org/10.3390/app8091573 8. Kulyukin, V.A., Mukherjee, S., Burkatovskaya, Y.B., et al.: Classification of audio samples by convolutional networks in audio beehive monitoring. Tomsk State Univ. J. Control Comput. Sci. 45, 68–75 (2018) 9. McFee, B., et al.: LibROSA: audio and music signal analysis in python. In: Proceedings of the 14th Python in Science Conference, vol. 8, pp. 18–25 (2015) 10. Nolasco, I., Benetos, E.: To bee or not to bee: investigating machine learning approaches for beehive sound recognition. In: Proceedings of DCASE (2018) 11. Nolasco, I., Terenzi, A., Cecchi, S., Orcioni, S., Bear, H.L., Benetos, E.: Audiobased identification of beehive states. In: Proceedings of IEEE ICASSP, pp. 8256– 8260 (2019) 12. Papachristoforou, A., Sueur, J., Rortais, A., Angelopoulos, S., Thrasyvoulou, A., Arnold, G.: High frequency sounds produced by Cyprian honeybees Apis mellifera Cypria when confronting their predator, the oriental hornet Vespa orientalis. Apidologie 39(4), 468–474 (2008). https://doi.org/10.1051/apido:2008027 13. Wenner, A.M.: Sound production during the waggle dance of the honey bee. Anim. Behav. 10(1–2), 79–95 (1962). https://doi.org/10.1016/0003-3472(62)90135-5
Formal Verification Techniques: A Comparative Analysis for Critical System Design Rahul Karmakar(B) Department of Computer Science, The University of Burdwan, Burdwan, India [email protected] Abstract. Formal methods are used to verify software systems. The system requirements are modeled using specification languages. The models are validated by their tool supports. The Formal methods provide consistency between software development phases and also do the early verification of a system. It provides the blueprint of software. Formal modeling is a rigorous practice in industries. It is used to design systems that are safety, commercial, and mission-critical. This paper presents an in-depth elaboration of some formal verification techniques like Z, B, and Event-B. It also presents a taxonomy of the formal methods in critical system design. Lastly, the specifications are compared with the help of a case study. Keywords: Formal methods · Z notation · B language RODIN · Traffic controller · Safety-critical system
1
· Event-B ·
Introduction
Formal methods design the blueprint of software like other traditional engineering disciplines do. Logic and mathematical formulas are applied in programming by formal methods. Mathematical models are used to verify the correct behavior of complex systems. Z notation [1] is a powerful formal method to specify the behavior of a system. The proper pronunciation of Z is ZED. Z is a model-based specification language and evolved over the last 20 years. It is often called Z notation because of its mathematical nature. The Formal method communities often use Z. The Z-Schema is built with predicate logic. Notation specifications are written in the form of schemas except the set. Z is a model-based notation used for hardware and software modeling. A model is constructed incrementally. We start with an abstract or the minimum requirements and through the refinement steps, the model becomes concrete. B [2] specifications are used to model and develop software systems. The B formal modeling language was developed by Abrial, especially for refined program code. It is used in the development of computer software. The basic of the language is discrete mathematics. The same language is used for specification, design, and programming. There are some efficient tools are there for the B. The system is designed using abstract machine notations. The abstract machine of an initial model is refined by another concrete machine. Event-B is the offspring c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 93–102, 2022. https://doi.org/10.1007/978-3-030-96308-8_9
94
R. Karmakar
of the B language and useful for the system design. The B and Event-B rely on correct by construction principle [3]. The specification languages have their tools for model verifications. Event-B is the formal verification technique that allows the formalization of a system with refinements. It has the static part of a model called context. A context contains sets, constants, and axioms. The dynamic properties are designed using the machine. The state of the machine is defined using variables. The variables are constrained with invariant properties. The event signifies the sate changes in the model. An event is guarded with some event parameters and then perform certain actions [4]. Event-B is widely used in modeling and verification of controller [5] and many other applications like smart irrigation [6]. The paper is organized as: A case study on a 4-cross road traffic controller using Z, B, and Event-B is done in Sect. 2. In Sects. 3.1, 3.2, 3.3 and 3.4, the critical analysis is done on parameters. A critical discussion is presented in Sect. 3.5. The conclusive remarks are given in Sect. 4.
2
Comparative Analysis Using a Case Study
We consider a simple four-way crossing where an automatic controller is set to smooth management of road traffic. The system requirements are critical. Figure 1(a) represents the block diagram of the system. The proposed system has two types of roads, main and side for traffic movement. Signals have three lights (Red, Green, and yellow) and a timer synchronizes the traffic movement. Figure 1(b) represents the traffic lights of the system.
(a)
(b)
Fig. 1. (a) The 4-way signal (b) Traffic lights on the 4-way road
2.1
The System Using Z Notation
The Z specification of the system can be designed using the following schema. Three sets are defined. TRRAFIC LIGHT set is for three types of light green, yellow, and red. The TIMER set is defined which is either on or off. The movement of the traffic is defined in the set TRAFFIC MOVEMENT set. The traffic movement can be allowed either in NorthSouth(ns) or EastWest(ew) direction. The timer is introduced in the system and the traffic movement is controlled by the traffic light and the timer together. The timer count is long(tlong) for counting the green and red signal and short(tshort) for the yellow signal. Figure 2 represents the timer count(TCount) schema for safety. It ensures that the traffic is never allowed
Formal Verification Techniques: A Comparative Analysis
95
from the opposite direction at the same time. Figure 3 represents the Green Timer Count(GTCount) schema that inherits Timer Count schema and specifies the traffic movement in the NorthSouth(ns) direction. Two more schemas are designed for Yellow Timer Count(YTCount) and Red Timer Count(RTCount). Sets TRAFFIC LIGHT : : = green | yellow | red; TIMER : : = on | off TRAFFIC MOVEMENT : : = allow | ready | stop
Fig. 2. Safety Z schema
2.2
Fig. 3. North-South traffic movement
System Design Using B
The system requirements can be modeled using B notations. We can define the abstract machine where two sets of Light and Direction are defined with the axioms represented in Fig. 4. The traffic movement can be in either North South or East West direction. Light can be changed to Red to Yellow, Yellow to Green, and Green to Red. In Fig. 5 the refined machine “Simple Change Light” is shown for Red to Yellow. The same way we can design the other light changes also. The machines can be verified using any B tool support like ProB or AtelierB.
Fig. 4. The initial model
Fig. 5. The refined machine
96
R. Karmakar
2.3
System Design Using Event-B
Initial Model: Traffic Moves According to the Lights The model starts with a simple assumption that the traffic movement takes place according to the lights. We categorize the roads in Main Road and Side Road types. The lights are Main Green, Main Red, Main Yellow, and the same for Side Road. Table 1. Events of the initial model Traffic Move Allow
Trraffic Move Stop
Traffic Movement MainTo Side WHERE
Traffic Movement Stop Main WHERE
grd1 : Current Road=Main Road
grd1 : Current Road=Main Road
grd2 : Current Traffic Light=Main Green THEN
grd2 : Current Traffic Light=Main Yellow THEN
act1 : Current Traffic Move : =Allow END
act1 : Current Traffic Move : =Stop END
Traffic Movement SideTo Main WHERE
Traffic Movement Stop Side WHERE
grd1 : Current Road=Side Road
grd1 : Current Road=Side Road
grd2 : Curent Traffic Light=Side Green THEN
grd2 : Current Traffic Light=Side Yellow THEN
act1 : Current Traffic Move : =Allow END
act1 : Current Traffic Move : =Stop END
These are all sets and their conditions are defined as axioms like Main Road not equal to Side Road or Main Green not equal to Main Read etc. These are all the static parts or the context of the initial model. The initial model has 9 invariant properties and two events. The Traffic Move Allow and Traffic Move Stop events of the initial model is shown in Table 1. First Refinements: The Timer is Introduced We refine the initial model. Here we introduce the timer. It is a sensor that is synchronized with the lights defined in the initial model. The timer is declared as a set that is either set or reset according to the lights. The machine has eight events that refine the previous machine’s events. In this refinement, the timer will synchronize the lights. The roads (Main or Side) and the Lights (Green, Red, Yellow) are controlled by the timer with all the safety rules keeping in mind. The events are shown in Table 2. The model is verified using the RODIN tool to generate proof obligations.
Formal Verification Techniques: A Comparative Analysis
97
Table 2. Events of the refined model
3
Traffic Move Allow
Trraffic Move Stop
Start Timer MainTo Side REFINES Traffic Movement MainTo Side WHERE grd1 : Light=MainVGreen THEN act1 : Current Timer : =Set END Start Timer SideTo Main REFINES Traffic Movement SideTo Main WHERE grd1 : Light=Side Green THEN act1 : Current Timer : =Set END
Stop Timer MainTo Side REFINES Traffic Movement Stop Main WHERE grd1 : Light=Main Yellow THEN act1 : Current Timercolon=Reset END Stop Timer SideTo Main REFINES Traffic Movement Stop Side WHERE grd1 : Light=Side Yellow THEN act1 : Current Timer : =Reset END
Critical Analysis on Different Parameters
We try to formalize the requirements using Z, B, and Event-B specification techniques and compare them with respect to certain properties. The invariant property is used to represent the safety, liveness, and progress properties in Z, B, and Event-B. Sections 3.1, 3.2, 3.3 and 3.4 are used to compare the techniques. 3.1
Graphical Notation Support/Animation Support
Atomicity decomposition refers to the explicit ordering of the states. Atomicity decomposition between the states are not supported by Z and B whereas EventB supports atomicity decomposition between events. We find few works where UML diagrams are formalized and translated into Z, B, and Event-B [7–17]. The atomicity decomposition is represented in Fig. 6.
Fig. 6. Atomicity decomposition
There is a sequence of execution between the events like event Q shall execute before event R. The bold line represents the refinement and dotted lines are used
98
R. Karmakar
for refinement skip. These Graphical structures are then converted into EventB. These graphical notations help to represent the explicit ordering of events. Different plugins like UML-B, ProB, AnimB are developed by the researchers for animation support for B and Event-B. 3.2
Verification of the Model
Automatic code generation helps to verify a system model at the early phases. With comparisons to Z and B, Event-B has been used for automatic code generation like C, C++, Java, and Python [18–21]. B and Z methods design the software system, whereas using in Event-B the whole system can be modeled. Figure 7 represents the traffic control system with the environment and the controller. In the case of the Z and B method, we only focus on the controller or the software of the system.
Fig. 7. Model decomposition
3.3
Temporal Property
The safety property of a system is defined by the invariants in Z, B, and Event-B. An existence property is also called a good property. It says a state P1 eventually will hold always. A progress property defined as the state S1 must always be followed eventually by the S2 state. In the case of persistence property, the state P must eventually hold forever. All these temporal properties could not be supported by these formal methods. The temporal property of the logic model is incorporated in Event-B [22]. The temporal properties like liveness and the fairness properties can be handled effectively by symbolic model checkers. But we could not use them directly in the following formal methods. The traffic controller discussed in Sect. 4 can be modeled using Linear Temporal Logic (LTL) where safety, liveness, and progress properties can be defined using formulas given below [23,24]. Safety Property:(Red→¬ Green (LTL) Liveness property:(red→(red U(yellow U∧(yellow U green)))) (LTL) Progress property:(request→response)(LTL) Mainly safety properties are defined using invariants rules in Z, B, and Event-B.
Formal Verification Techniques: A Comparative Analysis
3.4
99
Flexibility
The refinement approach helps to design a model step by step manner. Z, B, and Event-B support refinements. Event-B has the facility to refine the event parameters that strengthen the model. In the case of Z and B, we could not refine the parameters. The static properties like constants, sets, and dynamic properties like variables are declared together in the case of Z and B.
Fig. 8. Model decomposition
We use schema for Z and abstract machine for B. Whereas, in Event-B, we declare the static part in context and the dynamic part in the machine separately. Unlike, Z and B Event-B has the facility of model decomposition. Model decomposition refers to dividing a model into sub-models. Large and complex systems could be handled effectively using model decomposition. We can refine the sub-models separately. In the case of distributed and concurrent system design, we share the events and variables. In Fig. 8, we present the shared event and variable. We have three events P, Q, and R.P and Q share variable A. The events Q and R share B. There are two submachines M1 and M2 and they share Y. The large and complex system easily handled using Event-B. A system like a traffic controller needs 100% accuracy. Event-B has the advantage of incorporating the environment with the software. 3.5
Discussion
From the case study, we can summarize some of the important technical aspects of these verification techniques. It is represented in Table 3. One should need a depth knowledge of discrete mathematics and the ability to formalize the requirements. This is why formal methods are not that much popular among the industries. We have categorically investigated some of the formal modeling techniques with their applications in critical system design and compare them on certain parameters. Z and B method is used to design and verify the software and on the other hand Event-B does the system level modeling. B and Event-B can be combined for system modeling. Say, the whole system can be designed
100
R. Karmakar Table 3. Comparison of different parameters Important parameter
Z Notation
B Method
Event-B
Safety-critical requirements
Invariant property
Invariant property
Invariant property
Operation
Conditions
Precondition with predicate
Events with guard
Refinements
Supported
Weaken the model
Strengthen the model
Proof obligation
Auto-generated
Most of them are automatically generated
Simpler than B and it is generated automatically and interactively
Function parameters
Weaken the model
Weaken the model
Refined and edited
Modeling
Static and dynamic Static and dynamic properties are components are declared in Schema declared in abstract machine
Static properties are declared in context and dynamic are in machine
Programming construct
Supported
Supported
Not supported
Fairness assumption
Not supported
Not supported
Not supported
Key components
Schema calculus
Abstract machine with preconditions and post-conditions
Context, Machine, Events with guard and actions
Temporal property
Not Supported
Not supported
Not supported
Software design
Yes
Yes
Yes
Model validation
Schema proof
Proof obligation
Proof obligation
Tool used
CZT
Pro B
RODIN
Animated tool/plugin support
No
Yes
Yes
using Event-B, and then the software part can automatically be generated using B or Z [25]. This approach would help a consistent design of a critical system. Z, B, and Event-B define the safety and other properties using the invariant properties. From the case study, we find that the Z notation is a very powerful formal method for software verification. Z has the flexibility to design and verify a complex system with the schema design. On the other hand, B is a very useful formal method to verify software by automatic code generation. Event-B helps to design the whole system efficiently.
Formal Verification Techniques: A Comparative Analysis
4
101
Conclusion
The study helps to understand the evolution of formal methods and their use in critical system verification over the years. We get the roadmap of the past, present, and future of formal modeling. One should need a depth knowledge of discrete mathematics and the ability to formalize the requirements. The industry use of B is nearly 30% and gradually increasing. Event-B is not only practiced at the university research level but gain exposure in industries in recent years. We have categorically investigated some of the formal specification languages with their applications in critical system design and compare them on different parameters. We performed a case study to explore the syntactical differences among the techniques. Different qualitative and quantitative parameters are also compared. Finally, this paper presents systematic knowledge about present stateof-the-art formal methods. Acknowledgements. I sincerely thank the Department of Computer Science and Engineering, University of Calcutta, India, and Department of Computer Science, The University of Burdwan, India, for their assistance to pursue my research work.
References 1. Jacky, J.: The Way of Z: Practical Programming with Formal Methods. Cambridge University Press, Cambridge (1996) 2. Abrial, J.R.: The B-Book: Assigning Programs to Meanings by J. R. Abrial. Cambridge University Press, Cambridge (1726) 3. Abrial, J.-R.: Modeling in Event-B: System and Software Engineering, 1st edn. Cambridge University Press, Cambridge (2010) 4. Karmakar, R., Sarkar, B.B., Chaki, N.: System modeling using Event-B: an insight. In: SSRN Scholarly Paper ID 3511455, Social Science Research Network, Rochester, NY, December 2019 5. Karmakar, R., Sarkar, B.B., Chaki, N.: Event-B based formal modeling of a controller: a case study. In: Bhattacharjee, D., Kole, D.K., Dey, N., Basu, S., Plewczynski, D. (eds.) Proceedings of International Conference on Frontiers in Computing and Systems. AISC, vol. 1255, pp. 649–658. Springer, Singapore (2021). https:// doi.org/10.1007/978-981-15-7834-2 60 6. Karmakar, R., Sarkar, B.B.: A prototype modeling of smart irrigation system using Event-B. SN Comput. Sci. 2(1), 1–9 (2021). https://doi.org/10.1007/s42979-02000412-8 7. Butler, M.: Decomposition structures for Event-B. In: Leuschel, M., Wehrheim, H. (eds.) IFM 2009. LNCS, vol. 5423, pp. 20–38. Springer, Heidelberg (2009). https:// doi.org/10.1007/978-3-642-00255-7 2 8. Salehi Fathabadi, A., Butler, M.: Applying Event-B atomicity decomposition to a multi media protocol. In: de Boer, F.S., Bonsangue, M.M., Hallerstede, S., Leuschel, M. (eds.) FMCO 2009. LNCS, vol. 6286, pp. 89–104. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-17071-3 5 9. Fathabadi, A.S., Butler, M., Rezazadeh, A.: Language and tool support for event refinement structures in Event-B. Formal Aspects Comput. 27(3), 499–523 (2015)
102
R. Karmakar
10. Said, M.Y., Butler, M., Snook, C.: A method of refinement in UML-B. Softw. Syst. Model. 14(4), 1557–1580 (2013). https://doi.org/10.1007/s10270-013-0391-z 11. Hvannberg, E.: Combining UML and Z in a Software Process (2001) 12. Sengupta, S., Bhattacharya, S.: Formalization of UML use case diagram-a Z notation based approach. In: 2006 International Conference on Computing & Informatics, pp. 1–6, Kuala Lumpur, Malaysia. IEEE, June 2006 13. Dupuy, S., Ledru, Y., Chabre-Peccoud, M.: An overview of RoZ?: a tool for integrating UML and Z specifications. In: Wangler, B., Bergman, L. (eds.) CAiSE 2000. LNCS, vol. 1789, pp. 417–430. Springer, Heidelberg (2000). https://doi.org/ 10.1007/3-540-45140-4 28 14. Younes, A.B., Ayed, L.J.B.: Using UML activity diagrams and Event B for distributed and parallel applications. In: 31st Annual International Computer Software and Applications Conference, vol. 1, (COMPSAC 2007), pp. 163–170, Beijing, China. IEEE, July 2007. ISSN: 0730-3157 15. Snook, C., Butler, M.: UML-B: formal modelling and design aided by UML. ACM Trans. Softw. Eng. Methodol. 15(1), 92–122 (2006) 16. Karmakar, R., Sarkar, B.B., Chaki, N.: Event ordering using graphical notation for Event-B models. In: Saeed, K., Dvorsk´ y, J. (eds.) CISIM 2020. LNCS, vol. 12133, pp. 377–389. Springer, Cham (2020). https://doi.org/10.1007/978-3-03047679-3 32 17. Halder, A., Karmakar, R.: Mapping UML activity diagrams into Z notation. In: Innovative Data Communication Technologies and Application, Proceedings ICIDCA-2021, Lecture Notes on Data Engineering and Communications Technologies. Springer, Cham (2022). ISSN: 2367-4512 18. M´ery, D., Singh, N.K.: Automatic code generation from event-B models. In: Proceedings of the Second Symposium on Information and Communication Technology - SoICT 2011, p. 179, Hanoi, Vietnam, 2011. ACM Press (2011) 19. Steve, W.: Automatic generation of C from Event-B. In: Workshop on Integration of Model-Based Formal Methods and Tools (2009) 20. Rivera, V., Cata˜ no, N., Wahls, T., Rueda, C.: Code generation for Event-B. Int. J. Softw. Tools Technol. Transfer 19(1), 31–52 (2015). https://doi.org/10.1007/ s10009-015-0381-2 21. Karmakar, R.: A framework for component mapping between Event-B and Python. In: Advances in Data and Information Sciences, Proceedings of RACCCS 2021, Lecture Notes in Networks and Systems. Springer, Cham (2022). https://doi.org/ 10.1007/978-981-16-7952-0 13 22. Hoang, T.S., Abrial, J.-R.: Reasoning about liveness properties in Event-B. In: Qin, S., Qiu, Z. (eds.) ICFEM 2011. LNCS, vol. 6991, pp. 456–471. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24559-6 31 23. Guha, S., Nag, A., Karmakar, R.: Formal verification of safety-critical systems: a case-study in airbag system design. In: Abraham, A., Piuri, V., Gandhi, N., Siarry, P., Kaklauskas, A., Madureira, A. (eds.) ISDA 2020. AISC, vol. 1351, pp. 107–116. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-71187-0 10 24. Karmakar, R.: Symbolic model checking: a comprehensive review for critical system design. In: Tiwari, S., Trivedi, M.C., Kolhe, M.L., Mishra, K., Singh, B.K. (eds.) Advances in Data and Information Sciences. LNNS, vol. 318. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-5689-7 62 25. Abrial, J.-R.: On B and Event-B: principles, success and challenges. In: Butler, M., Raschke, A., Hoang, T.S., Reichl, K. (eds.) ABZ 2018. LNCS, vol. 10817, pp. 31–35. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91271-4 3
Investigating Drug Peddling in Nigeria Using a Machine Learning Approach Oluwafemi Samson Balogun1(B) , Sunday Adewale Olaleye2 , Mazhar Moshin1 , Keijo Haataja1 , Xiao-Zhi Gao1 , and Pekka Toivanen1 1 School of Computing, University of Eastern, Kuopio, Finland {samson.balogun,mazhar.moshin,keijo.haataja,xiao-zhi.gao, pekka.toivanen}@uef.fi 2 School of Business, JAMK University of Applied Sciences, Rajakatu 35, 40100 Jyväskylä, Finland [email protected]
Abstract. The problem persists despite the heavy consequences of jail and the death penalty imposed on those found guilty of selling drugs. This situation creates an alarm among academics and managers who work in practice. Hotbeds for drug dealers are on the rise, and their financial impact on the global economy is moving forward. Drug peddlers distract society by warping peace, justice, and order and threaten the Sustainable Development Goals (SDGs). This study evaluated secondary data containing qualities and suspect drug groups to determine whether machine learning techniques could predict the suspect drug group. We developed a prediction model using nine machine learning algorithms: AdaBoost (AB), naïve Bayes (NB), logistic regression (LR), K nearest neighbour (KNN), random forest (RF), decision tree (DT), neural network (NN), CN2 and support vector machine (SVM). The study utilized Orange data mining software to obtain a more accurate perspective on the data. Predictive accuracy was determined using 5-fold stratified cross-validation. Friedman’s test was conducted, and the results showed that the performance of each algorithm was significantly different. Also, the study compared the models and compiled the results. The results reveal that the random forest has the highest accuracy compared to the others. This prediction model implies that high accuracy can help the government make informed decisions by accurately identifying and classifying suspected individuals and offenders. This prediction will help the government refer suspected individuals and offenders who meet the qualifications for specific drug law sections, like prosecution or rehabilitation, to those sections with a faster and more accurate rate. The outcome of this study should be helpful to law enforcement agents, analysts, and other drug practitioners who may find machine learning tools dependable to detect and classify drug offenders. Keywords: Drug group · Classification · Machine learning · Offenders · Drug peddling
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 103–120, 2022. https://doi.org/10.1007/978-3-030-96308-8_10
104
O. S. Balogun et al.
1 Introduction Drug peddling is a de novo social vice that has eaten deep into the nation’s economy and society. It is interwoven with drug abuse, cultism, prostitution, kidnapping, and general unruly attitude and behavior. Drug peddling is an act of stolen or the sale of recreational drugs. The United Nations Office on Drugs and Crime in World Drug Report shows that drug use terminates about half a million people in 2019 and projects that the increase in population from now until 2030 will increase the risk of drug use in highincome countries by −1%, middle-income countries by +10%, and the low-income countries the highest by +43%. These statistics show that low-income countries are the most vulnerable in the future. Although drug peddling is a global issue, developing countries, especially Africa, have gradually become the epicenter of drug peddling. Abject poverty seems to be the behind-the-scenes of this criminal act. The dark web is a growing market route beyond the conventional route of drug peddling. Drug peddling could emanate from ambition for quick success without labor in a conventional way connected to ill-gotten wealth. Some youths found themselves in drug peddling through their initiation. The drug baron is a high-ranking boss that deals with drugs illegally and builds a network of several illegal drug workers. Despite the severe punishments imposed on the guilty drug peddlers, such as imprisonment and death sentences to serve as deterrents to others, the pervasion of drug peddling persists, and this causes concern for the academia and practicing managers. The number of hot spots for drug peddlers is increasing, and the ripple effects on health and the global economy are advancing. Drug peddlers are distracting society peace, law, and order, and they constitute a threat to sustainable development goals (SDGs). Drug peddling is like a cobweb with a dense network of criminals. One of the demands and supply factors that affect drug peddling is trends, and a comparative example is ongoing COVID-19. Tastes and preferences also impact the misdemeanor of drug peddling. Earlier studies have examined illegal drug peddling [1, 2] and illicit drug peddling [2–4]. There is a thin line between illicit and illegal drug peddling. Illicit acts against the law while illicit against social norms, but both are not acceptable to society. Drug peddling is a case country [5–7]. Drug peddling is similar in different countries, but the route of operations, networks, and gender impacts vary based on cultural differences, laws, and policies. Extant research emphasized juvenile drug peddling [8], itinerant drug peddling [9, 10], drug peddling prostitution [11, 12], drug peddling trafficking [2, 13, 14], and drug peddling smuggling [15]. The previously mentioned literature contributed to the body of knowledge differently, but the early detection of drug peddling is under-researched. This study fills the existing gaps in the literature on drug peddling in Africa’s developing countries. This study aimed to build a predictive model for the early detection of drug peddling in Nigeria. The data used was collected from National Drug Law Enforcement Agency (NDLEA) in the West-Central of Nigeria and employed nine classification algorithms through Orange, a data mining software to analyze the data and compared the models. The results show that random forest has the highest precision. This study combined empirical data and used data analysis techniques related to machine learning to expand the existing studies on drug peddling. The remainder of this paper is organized as follows. The third section describes the methodology employed in clear terms, while the fourth section presents
Investigating Drug Peddling in Nigeria
105
the results. Section five discusses the results, and Section six concludes with the study’s implications. This study aims to discover which possible and ideal criteria may be used to predict the drug group suspects’ choice or preference using a machine learning approach. By implication, our study will assist in classifying suspected individuals and offenders, ensuring that the drug law sections that match a particular crime are referred to more rapidly and correctly for suitable actions such as prosecution or rehabilitation.
2 Background and Related Works 2.1 Drug Peddling in Nigeria Unsustainable drug use in Nigeria is now a concern. “There appeared to be indications of illogical drug use,” an informal poll of doctors, pharmacists, and other healthcare workers in Lagos, Ondo, and Ogun in western Nigeria. Some patients admit to buying drugs from unregistered sources and without legal prescriptions, such as doctors [16]. Adegbami and Uche’s [17] study found that a lack of excellent governance has contributed to the fact that the youth of Nigeria are competent, and visionary leadership has acquired names such as bunkers, hooligans, kidnappers, hostage-takers, web-switches, drug-consumers, and prostitutes. Similarly, Adebayo [18] points out that while youths are supposed to be the driving force for development in Nigeria, they become volatile because their energies are being misdirected due to a lack of greater engagement to which they may channel their energy. The recent attraction of Rivers State in Nigeria, to oil and gas exploration has attracted people from various walks for job hunting due to the nation-wide job scarcity, creating a violent environment for crime [19]. Dike [3] highlighted that illegal drug markets and consumption in Nigeria had risen significantly in the early 1980s, with both the consumption and abuse affecting the welfare and dignity of the population adversely, which has brought about several problematic side effects and bad dreams for the nation and society. The supplies for the initiation of the cultic members include poisoning wine produced with hallucinogenic drugs such as Indian hemp and cocaine, as stated by Ayatse [20]. These drugs will intoxicate cult members and make them bold, and, under the influence of the drugs, they will be able to kill or destroy their mates or lecturers. Okatachi’s [21] study focused on 144 people in Kano and Lagos, Nigeria, for drug usage predisposing factors. The study reveals that children from separated homes tend to be involved in drug abuse than children from stable homes. Also, the comparison shows that children with a background of low socioeconomic status are more likely to abuse drugs than children from high socioeconomic status families. Further, children from polygamous settings are more likely to abuse drugs than children from monogamous families. Agba and Ushie [22], Ejere and Tende [23], Egwu [24], and Saridakis and Spengler [25] concluded that the unemployed failure to start their own small companies is to blame for their resorting to selling illicit drugs to help them cover expenses.
106
O. S. Balogun et al.
2.2 Drug Trafficking and Use Drug consumption is on the rise globally [26]; hence, drug trafficking is likewise increasing. Combined with the consequences of the COVID-19 epidemic, activities such as the consumption of illicit or narcotic drugs, such as cannabis Sativa, other psychotropic substances, and trafficking worsened their effects [26]. Beyond this worsened situation, research has found that those who fall in the age range of 18 to 25 years reported being the heaviest users of illicit drugs [26–28]. Globally, 269 million people will use drugs in 2018, up to nearly 30% since 2009. It is believed that almost half of all drug-related crimes worldwide are caused by cannabis (marijuana) [26]. Additionally, Cannabis is widely used for medical, recreational, and other purposes worldwide, especially in developed countries. This use has also been reported despite an increase in novel psychoactive substances (NPS) [26]. These substances or compounds include hallucinogens, LSD, and stimulants, such as amphetamines [29]. According to UN data, growing numbers of people cultivated, manufactured, and seized cannabis in herbs and resided in Nigeria between 2008 and 2018 [27, 28]. This subjugation represents records of cannabis trafficking (to and fro) reported having been stopped and confiscated within the UNODC Member States, with a slight decrease in 2018 compared to the previous year [30]. Furthermore, in synthetic and plant-based forms, the seizure of NPS decreased significantly in 2018, with kratom and khat being the most prevalent [31]. An average of more than 500 net promoter scores (NPS) was reported between 2013 and 2018 globally, which is noteworthy [31]. Comparatively, younger people in Africa tend to peddle trades and drug trafficking than older people [28]. In terms of gender, global reports indicate that males are more likely than females within age brackets 15–64 to use drugs such as cannabis near-daily or daily [13, 26] and that males are more likely than females to consume drugs such as cocaine [26]. Cannabis use and trafficking are the most widespread trafficking and illicit drug common among the Nigerian irregular migrant returnees, with the highest frequency of use and trafficking among those under 25 [2]. According to studies [13, 26, 27], males are more active in drug peddling or trafficking than their female counterparts, and these findings are consistent with those findings. The dominance could be linked to males being perceived as more logical, physically active, and aggressive, possessing superior talents, and under tremendous financial pressure to provide for their families [13]. Drug peddlers are forced to engage in this activity because they need to collect income from unauthorized migration to other countries. People who do not sell illicit substances, on the other hand, are frequently individuals who purchase them to use for personal reasons such as irritation, stress, or trauma [2]. Drug trafficking in several nations such as Brazil and the United States [32] is a major concern for young people [32, 33]. Vulnerable children and teenagers are commonly enticed into this sophisticated and risky enterprise. Peer pressure is classified as one of the primary reasons for most who do so [34]. Consider, for example, the study conducted on Iran secondary school students, which shows various reasons for using drugs. These reasons include psychological stress, peer pressure, and self-evaluation [35]. In the United States, using data from leisure facilities in less-income metropolitan regions is paramount. Black and Ricardo [36] discovered that 12% of males African American recruited from these centers were involved in drug trafficking. Similarly, in
Investigating Drug Peddling in Nigeria
107
the jailed group studied by Sheley [37], 48% of those who sold drugs never used them, but 25% engaged in selling and using them. Existing longitudinal study, there was a more robust trajectory from initiating drug trafficking to advancing drug usage than in another longitudinal cohort study [38]. When Vale and Kennedy [39] conducted their research, they discovered series of situations that led to the arrest of young people that attempts to smuggle illegal narcotics into the United Kingdom. Except for the association between personal drug use and drug trafficking, information on the correlation between drug trafficking and personal drug use is scarce [40, 41]. However, not all users are traffickers. In the combined sample of leisure centers in Washington, D.C., and schools shows 9% exclusively sold drugs, 8% only used drugs, and 4% both sold and used drugs, according to the findings [42]. 2.3 The Impact of Hard Drugs on Society According to research, there is a high percentage of joblessness and there is an argument that the young people in Nigeria, leaves the country for diaspora to be involved in drug use and trafficking. Some were introduced to smuggling, employed by organizations that help move narcotics throughout Nigeria and out of the country [43]. Some irregular migrants have little choice but to use drugs, become prostitutes, or beg on the streets to get by. Other individuals suffer trauma from drug abuse due to mental illness [44], while others become convicted of a crime and are deported to their home country afterward. Nigerian migrants, especially the young, did not have any particular purpose in mind for traveling, others have traveled because of their desire for a better life, and some have been emotionally traumatized because of their daily frustration. After migration, some of them have been victims of drugs due to past trauma [45, 46]. Local drug markets are attributed to illegal drug trade where illicit narcotics such as amphetamines, cocaine, heroin, cannabis, and methamphetamines are bought and sold. The UN World Drug Report published in June 2007 [47] mentioned that world drug production, consumption and trafficking have remained steady worldwide and have increased considerably in West Africa in. Drug abuse has tampered with the economy, productivity, home life, health, traffic, and safety of the nation. Drug trafficking often results in drug abuse and other criminal activities, such as armed burglary, kidnapping, arson, homicide, robbery, militancy, insurgency, terrorism, and political thuggery. This shows that these things are, in fact, after-effects of drug trafficking [48–50]. Violent crime, political instability, vast sums of riches for criminals, and weaponry proliferation are all related to the trafficking of narcotics, as Goodwin [47]. Early pregnancy, greater criminality, and herpes infection are risks associated with drug use and consumption [51, 52]. It has been established that regular, early, and severe usage of cannabis can damage young adults’ cognitive and mental health as they age [53]. Daley [54] suggested that drug use imposes economic, emotional, and relationship strains and affects the growth of a fetus in pregnant women. Drug A child or teenager is in danger of trafficking exposure to extreme drug abuse, violence, and death, according to McLennan et al. [55]. Dropping out of school, being involved in gangs, seeing violence, and easy access to drugs, alcohol, and guns are primarily due to the flow of illegal narcotics, alcohol, and firearms in the community.
108
O. S. Balogun et al.
3 Material and Methods 3.1 Data This study used 262 datasets of people that National Drug Law enforcement Agency detained in Nigeria central part for drug-related offenses. The data are publicly available for the purpose of research, and there are no traces of the suspects’ identity in the public data. Table 1 provides the specifications of the datasets. The report includes data on drug trafficking in central Nigeria. 3.2 Preprocessing This study drawn the data utilized from the NDLEA branch. The data were stored in Microsoft Excel file format and the variables or features were categorical with values 0 and 1. The dependent variable was the peddling group with the label of the name, description, and type. Table 2 shows the detailed information of the data. Table 1. The raw data summary Details Subject
Drug peddling
Source
NDLEA Kwara state command
Country
Nigeria
Sample size
262
Drug group
119 drug peddlers, 143 non-drug peddlers
Sex
236 male, 26 female
Age
Between 3 and 63
Type of exhibit
245 cannabis sativa, 17 psychotropic substance
Weight of exhibit
Between 0 and 121 kg
3.3 Classification Algorithms Classification algorithms were employed to classify targets with unknown class labels. Data mining tools make use of classification algorithms as an integral part of their operations. They list the following machine learning methods: they have been used to classify plant diseases, distinguish cancer tumors, sort email spam, identify loan defaults, analyze sentiments, and more. In this study, the six classification algorithms are as follows: Logistic Regression: This algorithm is mainly used to estimate discrete values (for example, as simple as “yes” or “no”) using a set of independent variables. It uses a
Investigating Drug Peddling in Nigeria
109
logit function to forecast discrete values of a given event. The variable classification ensures that the sample values are expected to appear when all independent variables are maximized [56]. Decision Tree: This algorithm can be used for each. The algorithm is based on identifying the distinguishing characteristics to form two or more homogeneous groups of people. However, statistically, the model resembles stratification in statistics [57]. Neural Networks: It typify an algorithm that is modeled after the human brain. It is useful in different classification and regression techniques. The neural networks algorithm grouped unlabeled data based on the similarities learned from the data and produces labeled data (classification done on unlabeled data is performed by using the similarities learned from the data and results in labeled data) [58]. Random Forest: The algorithm comprises many independent decision trees, with the final classification or prediction being determined by selecting the class that receives the most significant number of votes. The concept reveals an extensive collection of models that are not correlated (trees) working as an attribute (proportion) that beats individual constituent of any models that are randomly picked [59]. K Nearest Neighbor: The algorithm is appropriate for classification, regression, and measuring similarity. The algorithm classifies new cases using mostly vote of its k neighbors [60]. A distance function, Euclidean, Manhattan, Minkowski, or Hamming, can be used to group and measure all distinct K-neighborhood cases. The best option for categorically dependent (target) variables is the Hamming distance. AdaBoost: A supervised machine-learning algorithm that incorporates the majority models weak classifiers into a single strengthen classifier (model). Adaboost Classifier often outperforms individual classifier results [61].
3.4 Key Performance Indicator Data mining models evaluated the quality of their results using the following performance indicators: When it comes to measuring the quality of predictions, AUCs can capture this regardless of the chosen classification threshold. An approximate selling price (AUC) close to one is preferred. The accuracy of classifying a dataset into one of the predefined classes is referred to as classification accuracy (CA). If the outcome is positive, how frequently does the model predict the outcome? The number of actual positives was equal to the sum of all actual positives plus all actual negatives. This situation has to do with specificity, or how often the model can correctly predict adverse outcomes. This percentage represents the percentage of actual negatives that were categorized as negative. F1, the mean harmonic sensitivity and precision is a score or F measurement. Without F1, the balance of precision and sensitivity cannot be struck.
110
O. S. Balogun et al.
Precision is the percentage of accurate predictions relative to the total accurate and accurate-yet-uncertain predictions. The root mean square error is equivalent to log loss (RMSE). The level of classification errors is gauged by the extent to which it deviates from the actual data. These values are essential. To better reflect the actual data, it is best to use the AdaBoost log loss rather than any other model. Table 2. Variables of the raw data summary Name
Category
Type
Role
Value
Drug group
Drug group of the subject
Target
Categorical
Drug peddler 0, Nondrug peddler 1
Sex
Sex of the subject
Independent
Categorical
Male 0, Female 1
Age
Age of the subject
Independent
Continuous
Between 3 and 63
Type of exhibit
Substance
Independent
Continuous
Cannabis sativa 0, Psychotropic substance 1
Weight of exhibit
Weight
Independent
Continuous
Between 0 and 121 kg
4 Results and Analysis 4.1 Descriptive Statistics The table below shows the frequency and percentages of the variables of the participants. Table 3. Demographic profiles of the participants Attributes
Scale
Frequency
Percentage (%)
Age
15–25
133
50.8
26–35
83
31.7
36–45
29
11.1
Above 45
17
6.5
236
90.1
Gender
Male Female
Weight of exhibit
Type of exhibit
26
9.9
247
94.3
6–10
6
2.3
Above 10
9
3.4
245
93.5
0–5
Cannabis Sativa Psychotropic substance
Drug group
17
6.5
Peddler
119
45.4
Non-peddler
143
54.6
Table 3 presents the demographic profiles of the participants. The demographic breakdown of the participants shows that most drug groups are within the ages of 15– 25 years, sex is male, weights of the exhibit is within 0–5 kg, cannabis Sativa is the type that exhibits most of the participant indulge in, and most of the participants were peddlers.
Investigating Drug Peddling in Nigeria
111
4.2 Correlation Analysis The table shows the linear relationships between the variables of the participants. Alternatively, data mining models were adopted to classify the drug group into the results. Table 4. Correlation among the various feature attributes
Type of exhibit Age Weight of exhibit
Type of exhibit
Age
Weight of exhibit
Gender
Drug group
1
−0.107 0.084
0.011 0.858
−0.036 0.563
0.085 0.172
1
0.104 0.092
−0.041 0.513
0.017 0.785
1
−0.027 0.663
−0.262 0.000
1
−0.064 0.301
Gender Drug group
1
Table 4 shows the relationships between the various feature attributes that were chosen for this study. In general, there is little association between specific feature traits, and there is no substantial linear relationship between them. 4.3 Application of Machine Learning Model Nine machine learning models were utilized for data gathering, with 75% of the training data and tests being used in each model. The accuracy of the prediction model was assessed using 5-fold stratified cross-validation. For the categorization of the outcome, seven models were effective in terms of age, gender, exhibit type, and exhibit weight.
112
O. S. Balogun et al.
Table 5. Machine learning result for using age, gender, display type and exhibit weight classification of outcome. Model
AUC
CA
F1
Precision
Recall
Specificity
Sensitivity
KNN
0.873
0.802
0.802
0.758
0.812
0.909
0.672
Tree
0.949
0.954
0.954
0.954
0.954
0.972
0.933
SVM
0.870
0.637
0.589
0.689
0.637
0.937
0.277
Random Forest
0.956
0.958
0.958
0.959
0.958
0.979
0.933
Neural Network
0.896
0.771
0.758
0.812
0.771
0.965
0.538
Naïve Bayes
0.933
0.897
0.897
0.902
0.897
0.860
0.941
Logistic regression
0.964
0.927
0.927
0.930
0.927
0.972
0.872
CN2
0.932
0.882
0.880
0.887
0.882
0.951
0.798
Adaboost
0.938
0.927
0.927
0.928
0.927
0.951
0.899
Table 5 shows that Random Forest has the highest classification precision and SVM has the lowest. Logistic regression yielded the highest AUC score, while SVM yielded the lowest for the same metric. All classification accuracy models could correctly predict the correct class, with Random Forest having the highest classification accuracy. The random forest has the highest and NB has the lowest specificity, while the Nave Bayes have the highest and SVM has lowest sensitivity. 4.4 Classification and Model Evaluation Table 5 lists the various computed parameters for the performance of the algorithms. The precision, recall, and accuracy of the algorithms are shown in the table. All the algorithms’ accuracy is impressive on the dataset, as some of the scores are beyond 90%. These scores were beyond the acceptable threshold of 70%. However, the performances of these algorithms are not equal, although the variation is slight. Concerning the various measures, RF yielded the best performance (A = 95.8%), followed by the DT (A = 95.4%) and AD, LR (A = 92.7%). This score shows that RF is the best performing algorithm for predicting the drug group of the participants in the dataset used in the study. The performance of each algorithm was investigated. In this study, the recall, precision, and F1-scores of the RF exhibited the same predictive strength for the drug group of the participants. In the indicated table, approximately 96% of the data instances were identified by the algorithm’s RF, while only approximately 96% of the identified data were correctly predicted. The F1 score for RF was approximately 96%. At the individual classifier level, the performance of DT was found to be better than that of the other algorithms used. The recall scores for DT, KNN, SVM, NN, NB, LR, CN2, and Adaboost were approximately 95%, 81%, 64%, 77%, 89%, 93%, 88%, and 93% for predicting the drug groups of the participants. These scores show that the DT algorithm recognized 95% of the instances of the test data, whereas only 81%, 64%, 77%, 89%, 93%, 88%, and 93% of the instances of the test data were recognized by
Investigating Drug Peddling in Nigeria
113
the KNN, SVM, NN, NB, LR, CN2, and Adaboost algorithms. Of these recognized instances of the data, the accurate predictions of DT, KNN, SVM, NN, NB, LR, CN2, and Adaboost for the drug group were approximately 95%, 81%, 64%, 77%, 89%, 93%, 88%, and 93%, respectively. The precision score for SVM was below the acceptable threshold of 70%.
Fig. 1. ROC and AUC for all the algorithms used for drug peddlers in this study
Fig. 2. ROC and AUC for all the algorithms used for nondrug peddlers in this study
As shown in Figs. 1 and 2, the receiver operating characteristic (ROC) curves for various machine learning algorithms are depicted in the attached image (RF, NB, DT, SVM, KNN, NN, LR, CN2, and AdaBoost). The ROC curve was used to determine the
114
O. S. Balogun et al.
overall effectiveness of the predictive algorithms. The area under the ROC is referred to as the ‘area under the curve’ (AROC). The AROC assesses the discriminatory strength of the prediction model, indicating the capability of the algorithms to identify drug groups for participants. This result shows that, for better predictive models, the larger the area to which the reference line is applied. The “ideal” point is found at the intersection of the Y-axis and the curve that slopes upward. This slope yields a false positive rate of zero (0) and an actual positive rate of one (1). However, achieving an AROC score of 0 or 1 is exceedingly impossible. The acceptable thresholds for the AROC are 0.90– 1.0 = excellent, 0.80–0.90 = good, 0.70–0.80 = fair, 0.60–0.70 = poor and 0.50– 0.60 = fail [62]. All the machine learning algorithms tested in this study for drug peddlers have values in the poor range, and all of the machine learning algorithms tested in this study for nondrug peddlers have values in the good range, with poor values ranging from 0.60–0.70 and good values ranging from 0.8–0.9, respectively. These results reflect how successfully the algorithms differentiate between different groups in the dataset. In addition, Table 6’s prediction accuracies were Friedman tested for statistical significance between the nine algorithms investigated. The following findings were made, Friedman chi-squared = 28.7, df = 6, p-value = 0.00, suggesting that the algorithms’ performances generally differ, as seen by the data (see Table 6). The Friedman test is a nonparametric statistic that compares matched group distributions [63]. The Friedman effect size was calculated using Kendall’s W [64] to measure the prediction accuracy of the nine algorithms on drug offenders. Our Kendall’s W effect size was 0.53, indicating a considerable influence size among the anticipated average accuracies of the categorization methods. Kendall evaluates effect sizes using Cohen’s guidelines, which are as follows: 0.1 to 0.3 (small effect), 0.3 to 0.5 (medium effect), and 0.50 and above (large effect) [64]. (see Table 7). Wilcoxon rank-sum test for pairwise accuracies comparisons. Each paired comparison has a p-value of 0.05, indicating that the two accuracy measures yielded substantially different results (see Table 8).
5 Discussion Using the multiple criteria of prediction accuracy, the model may be trusted to reliably identify drug crime suspects. Adequate information on suspects, based on past data relating to drug peddling, can aid in the classification of suspects into the appropriate drug category. A machine learning technique known as classification was used to predict the drug group of suspects in this study. Secondary data containing attributes and drug groups of suspects were examined to predict the drug group of suspects. Classification algorithms can be divided into four types: linear methods, non-linear methods, rules and trees, and ensembles of trees. Linear approaches are the most used approaches [65]. Applying a classification algorithm to a problem domain without first performing a spot check to determine whether the algorithm is suitable for the task is not recommended. The nine classifiers chosen for the experiments in this study were from the non-linear approaches and ensembles of the tree categories of classification. Non-linear methods outperform linear methods in terms of predictive accuracy, which indicates that they are better suited for predicting drug-related suspects. As shown in Table 5, the RF approach, an ensemble tree method, had the highest predictive accuracy, with a 96% prediction
Investigating Drug Peddling in Nigeria
115
accuracy. This accuracy was followed closely by another ensemble tree approach, the DT, which had a prediction accuracy of approximately 95% based on the data. Compared to the other classifiers studied, the non-linear approaches, such as the SVM, had the lowest predicted accuracy (64%). In this study, we have used numerous prediction accuracy criteria, and the algorithm can be relied on to identify drug crime suspects accurately. When Balogun et al. [66] tested two linear methods (logistic regression and discriminant analysis) for similar studies, they found that the overall predictive performance of both models was high; however, logistic regression had a higher value (95.4%) and discriminant analysis had a lower one (59.4%) (71.8%). Furthermore, Balogun, Akingabde, and Akinrefon [67] demonstrated that the model correctly detected 95.4% of the grouped occurrences initially, with positive and negative predictive values of 92.44% and 97.90%, which far better the study carried by Balogun, Oyejola, and Akingbade [68]. Other researchers have also applied the machine learning approaches utilized in this work for various inquiries. Comparison of this study’s findings with those of other research is necessary for clarity and understanding. A study by Balogun et al. [69] examined the treatment results of tuberculosis patients using five machine learning algorithms. MLP (testing) proved to be the most significant predictor of treatment outcomes for patients with tuberculosis. Eight machine learning techniques for identifying malaria from symptoms were compared by Okagbue et al. [70]. Results show that Adaboost outperformed logistic regression when it comes to classification accuracy. Five machine learning algorithms were evaluated by Adua et al. [71] as well for predicting the early identification of type II diabetes. The naive Bayes classifier was shown to be the most accurate in the investigation. Finally, the Friedman test findings showed that the anticipated accuracy of the nine classification algorithms differed considerably from one another on a general level. Following a pair test, it was discovered which algorithm pairings were responsible for the disparities in projected accuracy between the two algorithms.
6 Conclusion and Implications This study presented a machine learning method for identifying the suspect’s drug of choice. The most important thing is to make sure that drug dealers are adequately separated into two categories: drug dealers and non-drug dealers. Machine learning methods based on RF, NB, DT, SVM, KNN, NN, LR, CN2, and Adaboost are used to build nine models. Training, testing, and performance assessment were rated as the essential characteristics by the models. A 95% accuracy rate was also found for RF, which outperformed other methods such as Decision tree (DT), Kth nearest neighbor (KNNs), Support vector machines (SVMs), Neural networks (NNs), Naïve Bayes (NBs), Logistic regression (LRs), CN2 and Adaboost. The 5-fold stratified cross-validation was used to verify accuracy. According to the findings of this study, classifying drug offenders using data mining and machine learning is made more accessible. According to this prediction model’s high accuracy level, a precise classification of suspects and offenders can assist the government makes informed judgments about which drug law sections to refer to for prosecution or rehabilitation more quickly and correctly. The categorization model
116
O. S. Balogun et al.
could be enhanced by including more socio-demographic data of suspects, such as educational status, marital status, and religion. Finally, this study should serve as a guide for law enforcement, analysts, and others in their efforts to identify and categorize drug offenders using machine learning.
Appendix Table 6. Friedman test
Accuracy
n
Statistic
Df
p-value
10
28.7
6
T and 0 otherwise
M
A large number (M = 1.1016 )
Ei
Battery initial capacity of node i
Et,ij
Energy required for node i to send a bit to node j
Eth
Energy required to receive a bit
Es
Capture energy consumed per second
Er,i
Supply of solar energy per second at node i
We focused on constraint (3) since it describes the use of solar energy. It says that at each node, the total energy consumed in transmission, reception and sensing is less than the sum of the battery capacity plus the renewable energy received during its lifetime. We obtained the following linear program: Max T = z1 + z2 + . . . + z|L| Subject to:
j∈N(i)
(l)
yij −
k:i∈N(k)
(l)
yki = zl .d i
Lifetime Optimization of Sensor Networks with Mobile Sink and Solar Energy Supply |L|
⎛ ⎝
(l) (l) C ij .yij
j∈N(i)
l=1
+
181
⎞ (l) γ .yki ⎠
≤ Ei + Es .T
(4)
k:i∈N(k)
ˆ l∈L i, k ∈ N, j ∈ N, (l)
xij ≥ 0 zl ≥ 0 Where Es is the solar energy supplied to each node per time unit. Constraints (4) means that for each node, the energy consumed in transmission and reception during the network lifetime is less than the sum of the initial energy and the solar energy received. This linear program and the mobile sink linear program are solved using LPsolve for the purpose of comparison. In the rest of this work we will present our obtained results.
5 Numerical Experiments and Results The proposed linear program was solved by the linear programming solver LPsolve [4]. Many numerical experiments are described in the following table (Table 2). Table 2. Values of network lifetime according to the number of nodes and the number of sink positions before and after using the solar energy (solar energy rate equal to 10 mW/time unit). Number of nodes N
Number of sink positions L
Network lifetime (time unit) before using the solar energy
Network lifetime (time unit) after using the solar energy
4
1
40
50
4
2
67.7
102.3
5
1
28.16
32.78
5
2
42.48
53.94
5
3
42.48
53.94
5
4
43.68
54.60
12
1
8.26
8.62
12
2
12.75
13.62
12
3
12.75
13.62
It is clear that the network lifetime increases when the number of stop positions increases. This is because, as already mentioned, the nodes close to the sink didn’t quickly lose their initial energy, indeed these nodes change regularly. The network lifetime
182
M. Achour and A. Boufaied
increases more after the use of solar energy. This is clear in all cases without exception. In the case of 12 nodes, network lifetime peaked with two sink positions and adding another position has no influence. We plotted comparative curves, one based on the number of nodes for the same number of sink positions and another based on the number of sink positions for a fixed number of nodes.
Fig. 1. Network lifetime depending on the number of nodes for a fixed number of sink positions (|L| = 2)
The plotted curve in Fig. 1 shows that network lifetime decreases obviously when the number of nodes increases. This is because the number of transmissions in the network increases as the number of nodes increases and therefore the nodes consume more energy to relay data. Figure 2 shows that the network lifetime is higher when the number of sink positions increases, which is the purpose of using a mobile sink. We note here that when the number of positions of the “sink” reaches the value 2, network lifetime increases slowly. So we acknowledge that for each network there is an exact number of sink positions that represents the maximal improvement of the lifetime of this network.
Lifetime Optimization of Sensor Networks with Mobile Sink and Solar Energy Supply
183
Fig. 2. Network lifetime according to the number of sink positions for a fixed number of nodes (|L| = 5)
The network lifetime also depends on the solar energy rate used. Table 3 shows the different values of network lifetime according to the number of nodes and the number of sink positions before and after using a lower solar energy rate (1 mW/time unit). Table 3. Values of network lifetime according to the number of nodes and the number of sink positions before and after using the solar energy (solar energy rate equal to 1 mW/time unit). Number of nodes
Number of sink positions
Network lifetime (time unit) before using the solar energy
Network lifetime (time unit) after using the solar energy
4
1
40
40.81
4
2
67.7
70.08
5
1
28.16
28.57
5
2
42.48
43.4
12
1
8.26
8.298
12
2
12.75
12.83
Even with a lower solar energy rate we obtained a small improvement. The main purpose of this experiment is to demonstrate that the more solar cells attached to the sensor nodes are powerful the more the network lifetime increases. To enrich our results, we have made more experiments to see if the number of sink nodes or the number of positions that can take the sink is more important. These experiments are applied to networks with a number of nodes ranging from 3 to 20. In what follows, we give the results for a network with 16 nodes. Table 4 contains the results of experiments applied to a network of 16 nodes. Despite the increase in the number of nodes in the network, mobility technique is always effective.
184
M. Achour and A. Boufaied
We started first by fixed sink nodes for the first three experiments and each time we increase the number we get a better lifetime, this is obvious since the traffic will follow other paths to the new sink nodes and then nodes constituting the main path will conserve energy. Table 4. Values of the lifetime of a network with 16 nodes. Experience number
Nodes number
Sinks number
Sinks description
Network lifetime (time units)
1
16
1
Fixed in 1
7.69
2
16
2
Fixed in 1 and 7
9.85
3
16
3
Fixed in 1, 3 and 7 16
4
16
1
Mobile in 1 and 7
5
16
1
Mobile in 1, 3 and 11.3 7
6
16
2
Mobile in 1 and 3 then 6 and 7
11.2
20.8
It should be noted that choosing the sink location is not arbitrary. For example, the node 7 is present in almost all experiments, indeed Node 3 contains a large number of son nodes (more than node 2), so when positioning the sink in one of these son nodes, transmissions to node 3 will be reduced and then its energy is more preserved. In the fourth experiment, we used a sink movable between two positions (which are the same as those of the second experiment) to see if it is better to use two fixed sinks or one mobile sink. Experience shows that the fixed nodes are better and this is also obvious because they segment the network and make the path requested by the traffic shorter, while in the case of one mobile sink the traffic will follow a long way since the network contains only one sink at a time. Experiment number 5 is a proof of what was said. In the sixth experiment we used two sinks, each one is mobile between two positions. We obtained with two mobile sinks a lifetime better than with three fixed sinks (experiment number 3), which is considered not only a network lifetime optimization but also a network cost optimization. In Fig. 3, the network lifetime increases when the number of base stations increases. This is clear in the case of mobile base stations and also in the case of fixed base stations. It is also clear that network lifetime in the first case is better than the second case, this proves the efficiency of the mobility technique. This experience was done for a 20 nodes network model (N = 20).
Lifetime Optimization of Sensor Networks with Mobile Sink and Solar Energy Supply
185
Fig. 3. Network lifetime using mobile base stations versus fixed base stations ( N = 20)
6 Compromise Between Network Cost and Network Lifetime Just as the lifetime of a network is a very important factor for any WSN, we must never forget another important factor which is the network cost. Exploiting solar energy involves the use of solar cells at the sensor nodes which makes the total cost of the network more expensive.To make a compromise between cost and network lifetime, we combined nodes capable to receive solar energy and are preferably those close to the sink and other ordinary nodes. We define the following compromise ratio: Network cost $ network lifetime (time units) Since we aim to minimize the cost and maximize lifetime, it’s better to have low values for this ratio to calculate the total cost of the network, it is assumed that an ordinary node costs $ 1 and a node equipped with solar cells cost $ 1.5. In Table 5 all network cost/lifetime ratios after improvement are smaller than before the improvement. This implies that we compromised between the cost of the network and the lifetime even if the network lifetime has decreased since the sensor nodes are not all able to exploit solar energy. The proposed improvement minimized the total cost of the network for an acceptable lifetime. This compromise give also the possibility to improve as needed on one of the two important factors: network cost and lifetime.
186
M. Achour and A. Boufaied Table 5. Comparison of network cost/lifetime.
Number of nodes
Number of sink positions
Network cost/lifetime before improvement
Network cost/lifetime after improvement
4
1
0.12
0.09
4
2
0.058
0.073
5
1
0.228
0.167
5
2
0.139
0.135
12
1
2.088
1.45
12
2
1.321
1.018
Figure 4 shows the network cost to lifetime ratio for a 16 nodes network. It is clear that using mobile sinks is better than using fixed sinks because we have lower ratio values.
Fig. 4. Network cost to lifetime ratio for a 16 nodes network.
7 Conclusion The main contribution in this paper is the use of a mobile base station to balance the energy consumption between nodes, and exploit the solar energy that is often available as sensor networks are usually deployed in open fields. We have first implemented the mobile sink model, and then we changed this model to take into account the use of solar energy. Finally, we implemented the resulting model to compare the results with those found before the modification. The results found show that network lifetime has increased obviously after the use of solar energy.
References 1. Tavli, B. Kayaalp, M., Ceylan, O., Bagci, I.E.: Data processing and communication strategies for lifetime optimization in wireless sensor networks. Computer Engineering Department, TOBB University of Economics and Technology, Ankara, Turkey (2010)
Lifetime Optimization of Sensor Networks with Mobile Sink and Solar Energy Supply
187
2. Popa, L., Rostamizadeh, A., Karp, R.M., Papadimitriou, C.: Balancing traffic load in wireless networks with curveball routing. In: MobiHoc, September 2007 3. Abu-baker, A., Huang, H., Johnson, E., Misra, S., Asorey-Cacheda, R., Balakrishnan, M.: Maximizing α-lifetime of wireless sensor networks with solar energy sources. In: The 2010 Military Communications Conference, New Mexico State University (2010) 4. Lp_solve Reference Guide. http://lpsolve.sourceforge.net/5.5/, 01 Jan 2021 5. Wang, Z.M., Basagni, S., Melachrinoudis, E., Petrioli, C.: Exploiting sink mobility for maximizing sensor network lifetime. In: 38th International Conference on System Science, Hawaii (2005) 6. Gatzianas, M., Georgiadis, L.: A distributed algorithm for maximum lifetime routing in sensor networks with mobile sink. IEEE Trans. Wirel. Commun. (2008) 7. Papadimitriou, I., Georgiadis, L.: Maximum lifetime routing to mobile sink in wireless sensor networks. In: The 13th IEEE SoftCom (2005) 8. Yan, Y., Xia, Y.: Maximising the lifetime of wireless sensor networks with mobile sink in delay-tolerant applications. Computer and Information Science and Engineering Department, University of Florida, September 2010 9. Li, W., Cassandras, C.G.: A minimum-power wireless sensor network self-deployment scheme. Boston University, Department of Manufacturing Engineering and Center for Information and Systems Engineering 10. Prommak, C., Modhirun, S.: Minimizing energy consumption in wireless sensor networks using multi-hop relay stations, Suranaree University of Technology, Department of Telecommunication Engineering, October 2011 11. Cardei, M., Wu, J., Lu, M., Pervaiz, M.O.: Maximum network lifetime in wireless sensor networks with adjustable sensing ranges. Department of Computer Science and Engineering, Florida Atlantic University, September 2005 12. Voigt, T., Ritter, H., Schiller, J.: Utilizing solar power in wireless sensor networks. Berlin University, Germany, October 2003
Counting Vehicle by Axes with High-Precision in Brazilian Roads with Deep Learning Methods Adson M. Santos(B) , Carmelo J. A. Bastos-Filho, and Alexandre M. A. Maciel Polytechnic School of Pernambuco, University of Pernambuco, Recife 52.720-001, Brazil {ams2,carmelofilho,amam}@ecomp.poli.br
Abstract. Traffic surveys provide insight into vehicle flow generation and distribution, which is essential for forecasting the future needs for the circulation of goods. The National Department of Transport Infrastructure (DNIT) maintains the National Traffic Counting Plan (PNCT) in Brazil. The main objective of the PNCT is to assess the current traffic flow on federal highways to define public policies. However, DNIT still carries out quantitative classificatory surveys that are not automated or with invasive equipment. This work applied YOLOv3 for object detection and Deep SORT for multiple object tracking to perform the classificatory counting fully automated and non-invasive. Based on the results obtained in two hours of real videos used in traffic counting surveys, we obtained a positive predictive value above 89.85% and recall above 87.32% in the vehicle classification count in four scenarios using the DNIT table with 14 classes. So YOLOv3 and Deep SORT allow us to perform traffic analysis to automate classificatory traffic surveys, improving DNIT’s agility and generating savings for transport organizations.
1
Introduction
Road traffic studies are essential to predict the user’s future traffic needs. This knowledge is crucial to evaluate the pavement solutions adopted and create the basis for planning the road. Among the traffic studies instruments, we have volumetric surveys that allow the quantitative measurement of vehicles globally or by class [4]. The Brazilian National Department of Transport Infrastructure (DNIT) maintains the National Traffic Counting Plan (PNCT) [5]. This plan conducts quantitative surveys on specific sections of Brazilian federal highways. To collect information, DNIT uses invasive and non-invasive solutions (without incorporating them in the pavement). Therefore, DNIT also conducts no-automated classificatory vehicle surveys or automated surveys with invasive equipment. It is necessary to seek a solution that allows automated, non-invasive, and low-cost classificatory surveys. This paper show a method using the system proposed for Santos et al. [10] applied to a more complex problem of counting vehicles by axles according to a c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 188–198, 2022. https://doi.org/10.1007/978-3-030-96308-8_17
Counting Vehicle by Axes
189
real PNCT-DNIT table with 14 classes. This system is composed of the detector YOLOv3 [9], and the tracker Deep SORT [12] to count a single class of vehicle. We tested this system in a multi-class counting problem in real traffic on four Brazilian roads with 30 min for each video. We organize this paper as follows. In Sect. 2, we present related works of vehicle counting. In Sect. 3, we introduce our proposal. Section 4 describes the experimental methodology. In Sect. 5, we show the analysis and discussion of the results. In Sect. 6, we give our conclusions.
2
Related Works
Vehicle counting depends simultaneously on detection and tracking. We can divide the detection models into detection based on motion and appearance [11]. In Appearance-based counting, CNN’s are improving the classification of vehicles. Hasnat et al. [6] show vehicle classification systems integrated a camera with a different type of sensor that reached 99% accuracy, but it is only for five classes. In this case, Hasnat places all heavy vehicles with more than two axles in the same class. In another work, Adu-Gyamfi et al. [2] presented a vehicle classifier in 7 classes. However, they simplify the 13 classes of the FHWA’s vehicle classification scheme, putting several heavy vehicles in only three classes, with that he got rates higher than 89% of Recall. Furthermore, Chen et al. [3] presented work using YOLOv3 to classify trucks and trailers based in the National Heavy Vehicle Regulator of Australia. They separated the heavy vehicles into six different categories and 26 individual classes. The mean average precision reached was 84% in individual classes and 96% in categories classes. However, this work is focused only on classification, and it did not aim to track objects for counting. We need to track the object during a scene (set of frames) to execute the vehicle count. In this regard, Abdelwahab [1] proposed an efficient approach to vehicle counting employing Regions with Convolutional Neural Networks (RCNN) and the KLT tracker that achieved an accuracy of 96.44%. The R-CNN suggest regions and classifying these region proposals with feature extracted from them. The vehicles are counted every “n” frames to decrease the time complexity in the tracker. The author used “n” equal to 15 in the experiments. Furthermore, Santos et al. [10] achieved better results than Abdelwahab [1] with up to 99.15% accuracy in public datasets with YOLOv3 and Deep SORT. However, Abdelwahab and Santos et al. solve only a single class problem.
3
The Proposed Approach
In this work, we validate YOLOv3 and Deep SORT in the vehicle classificatory counting oriented to the number of axles. In this context, Santos et al. [10] show that the YOLO Confidence Score influences the requirement to detect a vehicle. So we assessed this impact on the vehicle classificatory count.
190
A. M. Santos et al.
In the tracking step, we observed that the Deep SORT use features to compare objects between frames. However, the Deep SORT is sensitive to the hyperparameter “number of frames” for association and confirmation of a track, called Association K (aK ) [10], which is also used in this article for fine-tuning the vehicle classificatory count. Therefore, we propose to evaluate a system that uses Deep SORT with YOLOv3 to perform automatic vehicle counting in videos by class using the DNIT table with videos used in real traffic surveys. In this sense, building a dataset with images of the Brazilian highways using cameras located at different points. From that, we performed a data augmentation that resulted in 8295 images for training (80%) and validation (20%). After training YOLOv3 with this training dataset, the YOLOv3 and Deep SORT count precision were evaluated by varying the Association K and YOLO Score parameters. We evaluate and analyze the values of the hyperparameters YOLO Score and Association K in the DNIT test dataset with four videos 30 min each using DNIT Precision metric. After that, we select the values of these hyperparameters with the best precision results. Selecting these hyperparameters, we fine-tuned the model to improve precision. Finally, we evaluated the solution’s performance with the optimal parameters in the DNIT test dataset in detail. Figure 1 shows the stepby-step process.
Fig. 1. Method to evaluate the proposed model of classificatory vehicle count.
4
Materials and Methods
In the experiment, we use two hyperparameters: the YOLO Confidence Score in the detection step and the Deep SORT frame association number aK to confirm a vehicle’s track. In this case, we assessed the YOLO Confidence Score with values from 0.5 to 0.7. In the tracking step, we assessed the Deep SORT with 5 to 8 frames values to confirm a track and, consequently, the count. Furthermore, we deployed the YOLOv3 implementation with the weights trained in the DNIT dataset. As well, we select the vehicle classes from the DNIT Table. Besides that, we use the default Deep SORT [12] by varying only the frame association hyperparameter to confirm a tracking.
Counting Vehicle by Axes
191
The training performance of the YOLO model is assessed from the average loss value (avg). Since the average loss no longer decreases, after many iterations, the training can be stopped. The average loss (error) should be as low as possible. As a general rule, once this reaches below 0.06 avg, we can interrupt the training [7]. In our experiment, we stopped training in 26000 batches with 0.058 avg. The table of PNCT of DNIT with 14 classes could see in Fig. 2.
Fig. 2. The table of PNCT of DNIT.
4.1
Deployed Metrics
The results were evaluated with some metrics used to track multiple objects (MOT) with updates to the counting problem [10]. As long as the vehicle is counted, there is no problem if the object is accompanied by 100% or 50% of their life cycle [10]. The metrics used in this work were: – DNIT precision: it is the result of the empirical count (True No. or Ground truth) of each class subtracted from the error calculated represented in Equations (1) and (2) below. Where EstimatedClass i is the output of proposed model for the Class i and TrueNoClass i is the ground truth number of vehicles for Class i. The DNIT uses this metric in the classificatory volumetric count. DN IT precision(%) = 100 − Error(%) (1)
192
A. M. Santos et al.
14 Error(%) =
i=1
|EstimatedClassi − T rueN oClassi | 14 i=1 T rueN oClassi
(2)
– Lost Tracked: is the number of trajectories lost. That is, the target is not tracked in its useful life. It is a particular case of False Negative (FN). – ID Switches Double Count (ID SW Double): number of times the reported identity of a true path (Ground truth) changes. Therefore, this means a double count if two IDs (tracks) are selected for the same object. It is a particular case of False Positive (FP). – ID Switches Lost Track (ID SW Lost): number of times the reported identity of a true path (Ground truth) changes. A count loss if the same ID (track) is applied to two objects at different times. It is a particular case of False Negative (FN). – Positive Predictive Value (PPV): returns the percentage of objects classified as positive that was actually positive [8], also known as just precision. – True Positive Rate (Recall): returns the percentage of True Positive (TP) objects classified as positive by the proposed model [8].
5
Analysis and Discussion
We present the evaluation of the DNIT precision metric by varying the hyperparameters YOLO score and Association K. So we select these values and evaluate the model proposed with hyperparameters selected in detail. At last, we apply all the metrics presented in each video of the test dataset. 5.1
Evaluating the Metrics Varying the Hyperparameters
Figures 3(a), 3(b), 3(c), and 3(d) show the DNIT precision in the vehicle classificatory counting by varying the YOLO Confidence Score and Association K (aK ) hyperparameters of the videos BR60 I, BR262 I, BR262 II, BR262 III, respectively. Figure 3(a) shows the precision of the BR60 I video. The proposed model obtained precision above 92% in all measured values of the hyperparameters YOLO Confidence Score and aK. Besides that, it had its highest value in YOLO Score equal to 0.5 and aK equal to 6, in which 95.15% precision was achieved. Figure 3(b) shows the precision of the BR262 I video. The proposed model obtained the best precision in YOLO Score equal to 0.7 and aK equal to 7 with 85,91%. When we use the YOLO Score equal to 0.7, we notice that the obtained precision in aK equal to 5 is 76,5%. In the same YOLO Score for aK equal to 8, the precision is 84,5%. The variation of the aK helps to deal with the vehicle entering the scene. Long vehicles take a long time to enter the camera’s visual field fully. For accurate detection of the number of axes, all axes must enter the scene. Besides that, in Fig. 3 (b), analyzing the YOLO Score, we observed a lower precision for YOLO Score equal to 0.6. In this case, the number of ID Switches and False Positives is higher. Also, in the BR262 I, the precision decreases in
Counting Vehicle by Axes
193
Fig. 3. DNIT precision of the proposed model as a function of the YOLO score from 0.5 to 0.7 and the aK from 5 to 8 in the videos BR60 I, BR262 I, BR262 II and BR262 III in (a), (b), (c) and (d) respectively.
most cases for aK equal to 8. Some vehicles in specific frames do not reach the threshold necessary to confirm the track. This aK value can make the proposed model more demanding to confirm a track and start counting, generating False Negatives. Figure 3(c) shows the precision for the BR262 II. The proposed model obtained stable results for the variation of the YOLO Score. The best precision was 92% found for YOLO Score equal to 0.5 and aK equal to 8. In Fig. 3 (d), the proposed model obtained good results for YOLO Score to 0.7 and (aK ) to 7 with 90,62%. For YOLO Score values below 0.7, many false positives and ID Switches are generated, impairing precision. 5.2
Selecting the Hyperparameters Values for Fine-Tuning
In order to choose the optimal parameters for the best precision in the test dataset, we found that the highest average DNIT precision of these videos is achieved with the YOLO Score equal to 0.7 and aK equal to 7 with 90.07% of DNIT precision. Therefore, there is a tendency to have an increase in double vehicle count (ID Switch) and False Positives with the use of values of the Association K (aK ) and the YOLO Score below the ideal hyperparameters. In case of a decrease of aK, the model uses fewer frames to confirm a track. This situation makes the model less demanding to count a vehicle. For the low YOLO Score, more than one Bounding Box may satisfy the established low confidence
194
A. M. Santos et al.
level. These Bounding Boxes also favor double counting (False Positives). On the other hand, when the values of aK and YOLO Score are higher than the optimal hyperparameters, the model becomes more demanding to confirm a vehicle’s track and detection. In this case, there may be a loss of vehicles in the count (False Negative). Therefore, it is essential to select the hyperparameters that will allow the proposed model to achieve the best results in counting precision. 5.3
Evaluating the Proposed Model in the DNIT Dataset with the Optimal Parameters
With the optimum parameters selected for the DNIT test data set, we can evaluate other metrics described in this article, shown in Table 1. In the BR60 I video with 227 vehicles, we observed the best result with a Positive Predictive Value (PPV) of 96.3% and only 8 False Positives. In this case of False Positives, four occurred by classification in the wrong class, and four occurred by doublecounting (ID Sw Double). Also, we obtained 14 False Negatives, resulting in a 93.8% Recall. In this case of False Negatives, the proposed model classified four vehicles in the wrong class, 9 Lost Track, and one vehicle was mistakenly identified as another vehicle already counted (ID Sw Lost). Table 1. Detailed analysis with optimal hyperparameters YOLO Score equal to 0.7 and aK equal to 7 in DNIT test dataset for all classes. Video name
BR60I BR262I BR262II BR262III
True no.
227
71
250
96
Global count
221
69
239
95
DNIT precis. (%)[↑] 92.95
85.91
90.80
90.62
PPV (%)[↑]
96.38
89.85
93.72
90.52
Recall (%) [↑ ]
93.83
87,32
89.60
89.58
TP [↑ ]
213
62
224
86
FP [↓ ]
8
7
15
9
FN [↓ ]
14
9
26
10
TN [↑ ]
3
0
1
0
Lost Track [↓ ]
9
3
16
5
ID Sw lOST [↓ ]
1
0
2
0
ID Sw Double [↓ ]
4
1
5
4
FPS [↑ ]
7.5
7.6
7.4
7.5
After a general analysis, we evaluate the performance of the solution in each class of Table 2. In the BR60 I video, we noticed a good precision in the classification with a PPV above 80%, except in the classes 4-axles cargo and 6-axles cargo with 60% and 75% of Positive Predictive Value (PPV), respectively. We have found that these classes are responsible for most misclassifications. This
Counting Vehicle by Axes
195
video also features one vehicle 4-axle bus and one tractor (Class Undefined), both undetected (False Negatives). It is a consequence of the small number of training samples in these classes. Table 2. Positive Predictive Value (PPV) by classes in test dataset. Class
PPV% BR60I PPV% BR262I PPV% BR262II PPV% BR262III
A1 (2-axles truck) 80
100
100
83.33
A2 (2-axles buses) –
–
–
–
B1 (3-axles truck)
88.88
B2 (3-axles buses) –
100
100
80
–
–
–
C1 (4-axles cargo) 60
50
100
0
C2 (4-axles buses) –
–
–
– 100
D (5-axles cargo)
100
100
66.66
E (6-axles cargo)
75
57.14
86.66
90.90
F (7-axles cargo)
100
50
90
100
G (8-axles cargo)
–
–
–
–
H (9-axles cargo)
–
100
100
80
I (Passenger...)
98.87
97.82
95.52
96.61
J (Motorcycles)
100
100
100
–
L (Undefined)
–
–
–
–
In the BR262 I video, we noticed a PPV above 97% in the classification, except in the cargo classes 4-axles, 6-axles, and 7-axles with 50%, 57.1%, 50% of Positive Predictive Value (PPV), respectively. We have found that these classes are responsible for most misclassifications in the BR262 I video. Besides, this video has one 3-axle bus not detected. It is a consequence of the small number of training samples in this class. In the BR262 II video, we noticed a PPV above 86% in the classification, except in the cargo classes 5-axles with 66% of PPV. We have found that these classes are responsible for most misclassifications in the BR262 II video. Looking at the accuracy, we observe that it is more difficult to separate a 6-axles cargo vehicle from 5-axles and 7-axles cargo vehicles. In the BR262 III video, we noticed a PPV above 80% in the classification, except in the cargo class 4-axles with 0% of PPV. This video has only three samples of 4-axles vehicle class, and the proposed model did misclassifications with another cargo class for the three samples. Therefore, we consider the results satisfactory for most DNIT [4] needs with Positive Predictive Value above 89.85%. We can also highlight that the model did not include any element other than vehicles (e.g., pedestrians or bicycles). All False Positives come from double counting or detection of the wrong class. Recall also showed reasonable rates above 87.32%, showing the model’s sensitivity. Besides, the 7.5 Frames Per Second (FPS) rate achieved in this experiment with a GeForce GTX 1050 from a laptop allows processing the classification count with low-cost hardware.
196
5.4
A. M. Santos et al.
Evaluating the Cases of Classification in the Wrong Class
This section analyzed the algorithm’s behavior from the vehicle’s entry into the scene until the end of the count. In this sense, we analyze some interesting counting cases to understand better the classification cases in the wrong class in the experiment.
Fig. 4. Presentation of one of the counting cases in the incorrect class, in this case a 9-axle cargo vehicle was classified as a 7-axle cargo vehicle.
There was a difficulty in counting long vehicles concerning the framing of the footage. In Fig. 4, we see a 9-axle cargo entering the scene (a). Then we find that before the truck enters the scene ultimately (b), at that moment, YOLO already detects the vehicle partially. In scene (c), Deep SORT confirms the vehicle’s partial bounding box track. At this time, YOLO erroneously classifies the vehicle as a 7-axle cargo. Therefore, a vehicle is counted in the wrong class. Finally, the vehicle moves away in the scene (d), so the tracking is finished. Therefore, we verify that as a long vehicle enters the scene, a different classification might occur concerning the part of the vehicle presented to the classifier.
6
Conclusions
To allow traffic researches in Brazilian roads using deep learning, we present a study of the YOLOv3 and Deep SORT models to perform classificatory vehicle detection and counting. Thus, an evolution of the work of Santos et al. focused only on the global count [10]. Our proposal allows for reaching a Positive Predictive Value above 89.85% and a Recall above 87.32% in the classificatory vehicle count in the DNIT table with 14 classes.
Counting Vehicle by Axes
197
Besides this, we show an analysis of the hyperparameters of the proposed model concerning classificatory vehicle counting accuracy, as was done in the global count by Santos et al. [10]. We verified that the low Association K and YOLO Score values for optimal hyperparameters resulted in less precision. This result is mainly related to the increase in the ID Switch (e.g., double count) and wrong class detection. However, for the values of association K and Confidence YOLO Score high, there is a loss of vehicles in the count (False Negative). Therefore, this work showed a more realistic application of the works related to the counting of classificatory vehicles. We use the 14 class of DNIT table with 2 h of video in the real scenario of classificatory surveys in Brazilian federal highways. We do not remove or join classes to facilitate counting. This study can be helpful for several entities that need to carry out traffic studies in their countries. In the future, we intend to improve a dataset Train with more images in the class with not good accuracy like Bus and Undefined. This system will automatically allow the vehicle to count by class, bringing higher speed in obtaining traffic information and lower costs for transport organizations. Acknowledgment. We are grateful for the support of DNIT, UPE and CAPES.
References 1. Abdelwahab, M.A.: Accurate vehicle counting approach based on deep neural networks. In: 2019 ITCE, pp. 1–5. IEEE (2019) 2. Adu-Gyamfi, Y.O., Asare, S.K., Sharma, A., Titus, T.: Automated vehicle recognition with deep convolutional neural networks. Transp. Res. Record 2645(1), 113–122 (2017). https://doi.org/10.3141/2645-13 3. Chen, L., Jia, Y., Sun, P.Y., Sinnott, R.O.: Identification and classification of trucks and trailers on the road network through deep learning. In: Proceedings of the 6th IEEE/ACM BDCAT, pp. 117–126 (2019) 4. DNIT: Manual de estudos de tr´ afego. DNIT, Rio de Janeiro, 384 p. publica¸cao ipr-173 edn. (2006) 5. DNIT: Plano nacional de contagem de tr´ afego (pnct) (2021). http://servicos.dnit. gov.br/dadospnct/Inicio/institucional. Accessed 13 Mar 2021 6. Hasnat, A., Shvai, N., Meicler, A., Maarek, P., Nakib, A.: New vehicle classification method based on hybrid classifiers. In: 2018 25th IEEE International Conference on Image Processing (ICIP), pp. 3084–3088. IEEE (2018) 7. Lechgar, H., Bekkar, H., Rhinane, H.: Detection of cities vehicle fleet using YOLO V2 and aerial images. ISPAr 4212, 121–126 (2019) 8. Moreira, J., Carvalho, A., Horvath, T.: A General Introduction to Data Analytics. Wiley (2018). https://books.google.com.br/books?id=UoZfDwAAQBAJ 9. Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018) 10. Santos, A.M., Bastos-Filho, C.J.A., Maciel, A.M.A., Lima, E.: Counting vehicle with high-precision in Brazilian roads using yolov3 and deep sort. In: 2020 33rd SIBGRAPI, pp. 69–76 (2020). https://doi.org/10.1109/SIBGRAPI51738. 2020.00018
198
A. M. Santos et al.
11. Sivaraman, S., Trivedi, M.M.: Looking at vehicles on the road: a survey of vision-based vehicle detection, tracking, and behavior analysis. IEEE Trans. Intell. Transp. Syst. 14(4), 1773–1795 (2013) 12. Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 3645–3649. IEEE (2017)
Imbalanced Learning for Robust Moving Object Classification in Video Surveillance Applications Rania Rebai Boukhriss1(B) , Ikram Chaabane1 , Radhouane Guermazi2 , Emna Fendri3 , and Mohamed Hammami3 1 2
MIRACL-FSEGS, Sfax University, Road Aeoport, Sfax, Tunisia [email protected] Saudi Electronic University, Riyadh, Kingdom of Saudi Arabia [email protected] 3 MIRACL-FSS, Sfax University, Sfax, Tunisia {emna.fendri,mohamed.hammami}@fss.usf.tn
Abstract. In the context of video surveillance applications in outdoor scenes, the moving object classification still remains an active area of research, due to the complexity and the diversity of the real-world constraints. In this context, the class imbalance object distribution is an important factor that can hinder the classification performance and particularly regarding the minority classes. In this paper, our main contribution is to enhance the classification of the moving objects when learning from imbalanced data. Thus, we propose an adequate learning framework for moving object classification fitting imbalanced scenarios. Three series of experiments which were led on a challenging dataset have proved that the proposed algorithm improved efficiently the classification of moving object in the presence of asymmetric class distribution. The reported enhancement regarding the minority class reaches 116% in terms of F-score when compared with standard learning algorithms. Keywords: Moving object classification
1
· Imbalanced data
Introduction
According to the increase in the rate of delinquency, criminality and terrorism acts over the last decade, the intelligent video surveillance systems are essentially focused on the automatic detection of abnormal events. The performance of the observed-events analysis step is closely dependent on the quality of the low leveltreatments which consists in the detection, tracking, and classification of moving objects. The latter has continued to prosper and evolve owing to its potential application in a variety of fields such as event understanding [1], human action recognition [2] and smart video surveillance systems [3]. Our research framework is integrated in this context to propose and develop solutions for moving object classification. In fact, the classification consists in identifying the class of the c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 199–209, 2022. https://doi.org/10.1007/978-3-030-96308-8_18
200
R. R. Boukhriss et al.
detected object. According to the literature studies, two main categories can be distinguished: handcrafted feature-based methods and deep learned featurebased methods. The former category of methods follows two main steps. At first, a feature set is extracted to describe and represent each detected moving object. The features can be related either on shape (area, silhouette, edges, Histogram of Oriented Gradients, etc.), texture descriptors (Markov Random Field, Local Binary Pattern, etc.), and/or movement such as speed, direction and periodicity of the tracked objects [4]. Thereafter, the feature vector is fed to a particular classifier in order to identify the class of each object. In the literature, several classifiers have been used and combined in order to achieve the best classification accuracy, including K-Nearest Neighbor (KNN) [5], Support Vector Machine (SVM) [6], Multi-Layer Perceptron (MLP) [7]. Regarding the deep learned feature-based methods, authors classify the detected moving objects based on convolutional neural network set either to new CNN architectures [8] or to pre-trained ones [9] so as to resolve their classification task. Although the investigations on moving object classification were considerably developed in the literature, it still remains an active research area particularly in outdoor scenes, due to the complexity and the diversity of real-world constraints as well as the data characteristics like the presence of noisy, overlapped, and imbalanced data. Indeed, when the handled task is not linear, which is the case in several real-life applications, the class imbalance can hinder the classification performance towards the majority class. However, the rare class may have the highest error cost. This problem is omnipresent in many applications such as the activity recognition, the behavior analysis, etc. Specifically, our study handles the video sequences so as to detect groups of persons which can cover abnormal activities such as fighting, robbery, burglary, etc. Hence, three main classes are taken into account: P edestrian, V ehicle and P edestrian Group. To highlight the class imbalance problem in our study context, we have conducted an estimation of each class presence in three of the most well-known video sequences used for video surveillance applications in an outdoor scene [10–12]. In fact, all the foreground regions were extracted in order to estimate the presence of each class in the considered sequences. In Table 1, we presented the recorded estimation of the class distributions in a few sequences. An imbalanced class distribution is evident over the different sequences. Table 1. Class imbalance distribution in video surveillance scenarios Data sets
Pedestrian Pedestrian group Vehicle
OTVBVS [10]
88.98%
11.02%
0%
Groupe Fight (GF) [11]
26.98%
13.35%
59.67%
Visitor Parking (VP) [11]
17.09%
0%
82.91%
Parking Snow (PS) [11]
93.8%
2.2%
4%
Baseline [12]
46.78%
2.29%
50.93%
Intermittent Object Motion [12] 60.07%
1.53%
38.4%
Shadow [12]
21.38%
19.29%
59.33%
Imbalanced Moving Object Classification
201
In general, when learning from imbalanced data, the minority class is misclassified towards the majority classes due to the internal bias of the standard learning algorithms, which consider even misclassification errors on all classes and just look for maximizing the overall accuracy. In our context, the presence of Pedestrian Group may refer to dangerous events, which let their misclassification hide several crimes. Hence, it is important to give more focus on the prediction of this class so as to prevent the violence scenarios. In the imbalanced learning context, literature solutions were proposed mainly on data and algorithm levels. Regarding the first strategy, authors balance data by either over-sampling or under-sampling techniques [13]. Whereas, in the second strategy, data is maintained and only algorithms are modified internally so as to take into account the class imbalance [14,15]. In order to enhance the prediction performance when handling video sequences presenting a class imbalance, the literature studies focus either on the complexity of the outdoor scene conditions (e.g., the inter-object similarity) or on the class imbalance regardless the particularities of the application context, but rarely on both directions. In [17], for example, authors have opted for over-sampling strategies to balance their dataset so as to add synthesized frames representing rare objects. Their findings show poor performances regarding the minority classes which drop until 0.323 in F-score. L. Zhang et al. have proposed in [16] a class imbalance loss function to support the recognition of rare objects. Although their approach allowed a slight improvement (i.e., 2%), it is still insufficient. In another recent study [18], authors proposed Rk-means++ to improve the localization accuracy of the objects to be recognized and used Focal Loss introduced in YOLOv2 [19] to decrease the imbalance between the positives and negatives. The obtained results show an improvement in the recognition of the different classes regarding the state-of-the-art works, but this improvement does not exceed 6% when focusing on the minority class. Such statements show that handling the class imbalance in video surveillance system is a challenging problem and need to be more investigated. To improve the prediction of the rare moving objects (i.e., the Pedestrian group) in a multi-class classification problem, an algorithmic solution based on an Asymmetric Entropy for Classifying Imbalanced Data (AECID) and a Random Forest (RF) ensemble learning was proposed in this paper. Originally, AECID was introduced on our previous work [20] to build asymmetric decision trees able to weight the class probabilities in favor of the minority classes. RF from another side proves its efficiency in different application domains since the randomness in the baseline decision trees improves the efficiency and the robustness of the single learners. Therefore, our proposal, referred as RF-AECID, focuses on taking advantage of AECID decision trees in a RF algorithm to enhance the prediction of the rare moving objects in the video surveillance sequences. Although Deep Neural Networks DNN are largely used on computer vision, deep learning from class imbalanced data set is understudied and statistical comparison with the newest studies across a variety of data sets and imbalance levels does not exist [21]. Hence, in the present work, we carried out a comparative study to rank
202
R. R. Boukhriss et al.
our approach with DNN based strategies rather than traditional learners. The rest of this manuscript is organized as follows: Sect. 2 describes our approach to improve the moving object classification, Sect. 3 evaluates the efficiency of our proposed process on different situations, and Sect. 4 concludes and draws further investigations.
2
Methodology
Our method aims to generate a prediction model for classifying moving regions into one of the three classes Vehicle, Pedestrian, and Pedestrian Group. Hence, our process relies on three steps: the data collection, the feature extraction and the model generation as shown in Fig. 1. The details of each step are depicted on the following subsections.
Fig. 1. Proposed approach for imbalanced learning for moving object classification
2.1
Data Collection
The collection of a rich and representative learning dataset is a critical step in each machine learning process. To validate our approach, we relied on three wellknown video sequences used in the context of video surveillance applications such as the OTCBVS [10], the INO Video Analytics [11] and the CD.net 2014 [12] datasets. The introduction of the selected sequences results in a learning dataset sized of 19806 objects described by 151 handcrafted features or 1000 learned features extracted through deep learning strategies. The whole dataset shows a moving objects class imbalance across the three classes Pedestrian, Vehicle, and Pedestrian group classes according the following proportions 61.76%, 30.86%, and 7.38% of the whole dataset, respectively. 2.2
Feature Extraction
The feature extraction step aims to provide a representation of each moving object by a feature vector. In our present study, we explored two feature extraction strategies so as to produce either handcrafted features or deep features generated through deep learning models. The investigation on both methods allows the comparison of their performance firstly and the capture of their impact on the training model performance, secondly. On the one hand When dealing with
Imbalanced Moving Object Classification
203
handcrafted features, we have opted for a feature set based on shape and movement [24]. On the other hand, being encouraged by the great surge and the high performance of convolutional neural network (CNN), we investigated our model generation method on deep features generated by a pre-trained deep CNN model. In the state-of-the-art studies, there is a wide range of CNN models trained on huge datasets for several data mining applications. Darknet-53 ranks among the most well-known convolutional neural networks. It was used as a feature extractor to classify detected objects in the YOLOv3 [25] and it achieved a high classification performance on the ImageNet dataset. Thus, adopting DarkNet-53 CNN model to generate deep features seems to be a promoting strategy. 2.3
Model Generation
To handle the class imbalance, we proposed an algorithmic solution prone to enhance the classification of moving objects when faced with multi-class imbalanced data. Our proposal, referred as RF-AECID, handles the class imbalance by using asymmetric baseline decision trees. In fact, they rely on an off-centered entropy, named AECID. It is an impurity measure serving on splitting the tree nodes of each level into the least impure partition. Contrary to the common symmetric split criteria, AECID weights inequitably the probability of belonging to each class according to the prior class distribution supposed to be imbalanced. For binary classification, AECID entropy was originally defined on our previous work [20]. The asymmetric property is due to the imbalanced weights associated to each class according to the asymmetric prior class distribution, which let such measure fit each problem’s characteristics. The effectiveness of AECID decision trees comes also with the dynamic tuning of the AECID’s parameters so as to fit the considered classification task. We refer any interested reader to our previous studies [20] and [27] for more details proving the robustness and the effectiveness of AECID-DT in several application fields. In the perspective of boosting the learner performance, we introduce the asymmetric decision trees as a baseline classifier into an ensemble learning framework, particularly the Random Forest (RF) [28]. It is a bagging classifier based on random trees which aggregates the prediction of multiple accurate and diverse binary classifiers. Based on several studies including [29] and [30], RF is proved to improve the diversity of the baseline classifiers thanks to the randomness of the bootstrapped training sets along with the randomness of the feature selection. To train the learning model from the multi-class dataset, we opt for the Error-Correcting Output Codes (ECOC) technique to re-frame the multi-class classification problem as multiple binary classification problems, allowing thereby the direct use of native learners. The ECOC approach is known to be simple yet powerful approach to deal with a multi-class problem [26].
204
3
R. R. Boukhriss et al.
Experimental Results
In order to evaluate the performance of our moving object classifier learned from imbalanced data (i.e., RF-AECID), we carried out a comparative study of our approach regarding well-known literature classifiers, namely Decision Trees (DT), Random Forest (RF), Support Vector Machine (SVM), Gaussian Naive Bayes (GNB), Logistic Regression (LR), along with different Convolutional Neural Network (CNN) based strategies. Hence, we conducted three experimental series. In the first and the second series, the classification performance is reported regarding standard learners following however different strategies of feature extraction. In fact, it was handcrafted in the first series, whereas automatically labored by a deep CNN model (i.e., DarkNet-53) in the second series. In the third series of experiments, we compare RF-AECID based on deep learned features with different deep CNN models ranking among the most well-known ones. Therefore, our comparative study includes both standard and recent wellknown approaches which were already implemented, thereby making their evaluation possible on our dataset. To perform our approach, we relied on the scikitlearn API [31] after having implementing our AECID asymmetric entropy [20]. The ECOC binarization technique as well as the competitive learners including DT, RF, SVM, GNB, and LR were based on the default setting of the latest API. To evaluate the competitive approaches, we relied on Precision and Recall so as to capture the performance on each class separately, F-score as a harmonic mean joining Precision and Recall, and Accuracy as an overall evaluation measure. We carried out our experiments using 5 stratified cross validation. Therefore, the reported values in Tables 2, 3, and 4 were achieved by averaging the results of the 5 folds regarding the considered evaluation measures. The moving objects classified, respectively, as Pedestrian, Pedestrian Group and Vehicle will be assigned to the following labels 0, 1 and 2 in the comparison result tables. 3.1
Evaluation of RF-AECID Based on Handcrafted Features
We present in Table 2 the comparative results of RF-AECID with the aforementioned learning algorithms. The presented results highlight the classification performance on the different classes, in which Class 1 refers to the minority class. At first glance, it is shown that RF-AECID performed the best in terms of Accuracy. In overall, RF-AECID enhances in general the precision, recall and F-score regarding all classes. This enhancement varies from a class to another. In fact, in terms of precision, it reaches 6.9% and 7% for Class 0 and Class 2, respectively. For the minority class, RF-AECID proved also a significant improvement of precision which reaches 60%. According to the recall rates, RF-AECID, LR, and SVM show the best results with slight differences. Since, a tradeoff is obvious between precision and recall, we rely on F-score which proves the effectiveness of RF-AECID almost for all classes.
Imbalanced Moving Object Classification
205
Table 2. Comparative classification results based on handcrafted features Precision
Recall
F-score
0
1
2
0
1
2
0
1
DT
0.922
0.483
0.903
0.902
0.422
0.842
0.911 0.378
RF
0.932
0.751
0.973 0.985 0.404
0.983 0.957 0.475
RF AECID 0.942 0.811 0.972
0.979
0.513
0.977
0.96
SVM
0.939
0.541
0.968
0.939 0.522
0.852
0.939
0.576
0.954
GNB
0.877
0.324
0.935
0.89
0.223
LR
0.94
0.592
0.952
0.941
0.545 0.966
3.2
Accuracy 2 0.819
0.883
0.978 0.944
0.567 0.974 0.96
0.945 0.933
0.869 0.124
0.842
0.906
0.94
0.959
0.934
0.546
Evaluation of RF-AECID Based on Deep Learned Features
Encouraged by the success of convolutional neural network (CNN), we evaluate our model generation algorithm on deep features generated by the DarkNet-53 CNN model. Table 3 displays the comparison results of our method and the learning algorithms (i.e., DT, RF, SVM, GNB and LR). The reported results show comparable performances across learning algorithms. However, the performance of RF-AECID is clearly enhanced in particular regarding the minority class. In fact, F-score on Class 1 goes from 0.567 to 0.704. Such statement shows, once again, the efficiency of RF-AECID even when it is coupled with deep learned features. Table 3. Comparative classification results based on deep learned features Precision
F-score
1
2
0
DT
0.816
0.476
0.684
0.918 0.376 0.67
RF
0.918
0.746
0.782
0.98
RF AECID 0.933
3.3
Recall
0
0.867 0.967
1 0.54
2
0 0.862
0.748 0.95
Accuracy 1
2
0.4
0.676 0.802
0.53
0.764 0.876
0.975 0.625 0.946 0.953 0.704 0.956 0.940
SVM
0.956 0.624
0.974 0.938 0.762 0.95
0.668
0.96
GNB
0.866
0.664
0.774 0.52
0.668 0.81
0.326
0.668 0.724
LR
0.956 0.642
0.77
0.954 0.78
0.782 0.952
0.662
0.776 0.889
0.318
0.946
0.928
Evaluation of RF-AECID Versus CNN Architectures
The main purpose of this series of experiments was to validate the robustness of the proposed method for moving object classification compared to the use of pretrained deep CNN. In fact, we have confronted our classification results, relied on deep features, with those obtained by well-known deep CNN models dedicated to the classification task in computer vision field (DarkNet-53 [25], ResNet-50 [32], AlexNet [33] and VGG-19 [34]). Table 4 reports the obtained results. In terms of F-score, our method records the best rates in all moving object classes. As for recall and precision, we have achieved the best results in the minority class (Class
206
R. R. Boukhriss et al.
1) compared to those of deep CNN model. In fact, the F-score enhancement reaches 337% (when compared with VGG19 architecture). It is worth noting that when being based also on the same deep feature extraction strategy (DarkNet53), the observed gains are still significant and reaching more than 159% of F-score enhancement. These results confirm that the deep learning-based methods still providing limited performances when dealing with imbalanced data. Table 4. Comparison of the performance of the proposed method with well-known deep CNN models Precision 0
1
Recall 2
0
F-score 1
2
Accuracy 1
2
DarkNet53
0.829
0.374
0.879
0.872
0.208
0.966 0.85
0.268
0.92
ResNet50
0.841
0.163
0.965
0.869
0.248
0.862
0.855
0.197
0.911
0.837
AlexNet
0.781
0.289
0.848
0.832
0.239
0.806
0.806
0.262
0.826
0.769
VGG19
0.777
0.186
0.851
0.844
0.14
0.805
0.809
0.16
0.827
0.761
RF AECID 0.933 0.867 0.967 0.975 0.625 0.946
4
0
0.814
0.953 0.704 0.956 0.940
Conclusion
In this paper, we propose a complete framework for moving object classification which takes into account firstly the challenges of the outdoor scene sequences and secondly the challenge of the class imbalance. Hence, the first contribution concerns the prior model learning step which prepares the learning data. In this context, two strategies were investigated for the feature extraction step: 1) deeplearned feature extraction, and 2) handcrafted feature extraction based on combining shape-based features with velocity so as to overcome the inter-similarity objects. Our second contribution concerns handling imbalanced data during the model training step based on asymmetric decision trees using an asymmetric entropy AECID as a node split criterion. The entire approach was compared, in a first time, with some well recognized learning algorithms (DT, RF, GNB, SVM, and LR) and four well-known deep CNN architectures (DarkNet-53, ResNet-50, AlexNet, and VGG-19), in second time. The reported results show at first the efficiency of our approach regarding the competitive classifiers on enhancing the classification performance of moving objects in overall and particularly regarding the minority class. A second remark states that the handcrafted and the deep learned feature extraction provide competitive results, unless a better prediction regarding the minority class was mainly proved by the second strategy when RF-AECID is considered. The third comparative study considering the CNN models and RF-AECID joined with deep-learned feature extraction have proved spectacular gains on the minority class in favor of our approach. Our findings open several horizons for future works. A first axe can investigate more sophisticated learning algorithms underlining for example better tuned ensemble
Imbalanced Moving Object Classification
207
frameworks regarding the class imbalance problem. A second research field can also interest the enhancing of the ECOC framework allowing the transition from the multiclass to the binary classification in such a way to better fit imbalanced data. Further investigations of other class imbalance approaches like sampling strategies joined with ensemble learning frameworks rank as well among our first interests.
References 1. Xu, H., Li, B., Ramanishka, V., Sigal, L., Saenko, K.: Joint event detection and description in continuous video streams. In: IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, USA, pp. 396–405 (2019) 2. Avola, D., Bernardi, M., Foresti, G.L.: Fusing depth and colour information for human action recognition. Multimed. Tools Appl. 78(5), 5919–5939 (2018). https://doi.org/10.1007/s11042-018-6875-7 3. Muchtar, K., Rahman, F., Munggaran, M.R., Dwiyantoro, A.P.J., Dharmadi, R., Nugraha, I.: A unified smart surveillance system incorporating adaptive foreground extraction and deep learning-based classification. In: Proceedings of the International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pp. 302–305 (2019) 4. Yu, T.T., Win, Z.M.: Enhanced object representation on moving objects classification. In: International Conference on Advanced Information Technologies (ICAIT), Yangon, Myanmar, pp. 177–182 (2019) 5. Honnit, B., Soulami, K.B., Saidi, M.N., Tamtaoui, A.: Moving objects multiclassification based on information fusion. J. King Saud Univ. Comput. Inf. Sci. (2020) 6. Ray, K.S., Chakraborty, A., Dutta, S.: Detection, recognition and tracking of moving objects from real-time video via visual vocabulary model and species inspired PSO. arXiv preprint arXiv:1707.05224 (2017) 7. Mishra, P.K., Saroha, G.P.: Multiple moving object detection and classification using block based feature extraction algorithm and K-NN Classifier. Int. J. Tomogr. Simul. 32(2) (2019) 8. Shima, R., et al.: Object classification with deep convolutional neural network using spatial information. In: International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS), Okinawa, Japan, pp. 135–139 (2017) 9. Barat, C., Ducottet, C.: String representations and distances in deep convolutional neural networks for image classification. Pattern Recognit. 54, 104–115 (2016) 10. Davis, J., Sharma, V.: Background-subtraction using contour-based fusion of thermal and visible imagery. Comput. Vis. Image Underst. 106(2–3), 162–182 (2007) 11. St-Laurent, L., Maldague, X., Prevost, D.: Combination of colour and thermal sensors for enhanced object detection. In: 10th International Conference on Information Fusion, Quebec, Canada, pp. 1–8 (2007) 12. Goyette, N., Jodoin, P.-M., Porikli, F., Konrad, J., Ishwar, P.: Changedetection.net: a new change detection benchmark dataset. In: IEEE Workshop on Change Detection (CDW-2012) at CVPR-2012, Providence, RI, pp. 16–21 (2012) 13. Chaabane, I., Guermazi, R., Hammami, M.: Impact of sampling on learning asymmetric-entropy decision trees from imbalanced data. In: 23rd Pacific Asia Conference on Information Systems, China (2019)
208
R. R. Boukhriss et al.
14. Chaabane, I., Guermazi, R., Hammami, M.: Adapted pruning scheme for the framework of imbalanced data-sets. Procedia Comput. Sci. 112, pp. 1542–1553 (2017). ISSN 1877-0509 15. Li, Z., Huang, M., Liu, G., Jiang, C.: A hybrid method with dynamic weighted entropy for handling the problem of class imbalance with overlap in credit card fraud detection. Expert Syst. Appl. 175, 114750 (2021) 16. Zhang, L., Zhang, C., Quan, S., Xiao, H., Kuang, G., Liu, L.: A class imbalance loss for imbalanced object recognition. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 13, 2778–2792 (2020) 17. Elasal, N., Swart, D., Miller, N.: Frame augmentation for imbalanced object detection datasets. J. Comput. Vis. Imaging Syst. 4(1) (2018) 18. Zhongyuan, W., Sang, J., Zhang, Q., Xiang, H., Cai, B., Xia, X.: Multi-scale vehicle detection for foreground-background class imbalance with improved YOLOv2. Sensors 19(15), 3336 (2019) 19. Lin, T., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. In. IEEE International Conference on Computer Vision (ICCV), Venice, Italy, pp. 2980–2988 (2017) 20. Guermazi, R., Chaabane, I., Hammami, M.: AECID: asymmetric entropy for classifying imbalanced data. Inf. Sci. 467, 373–397 (2018) 21. Johnson, J.M., Khoshgoftaar, T.M.: Survey on deep learning with class imbalance. J. Big Data 6, 1–54 (2019) 22. Boukhriss, R.R., Fendri, E., Hammami, M.: Moving object detection under different weather conditions using full-spectrum light sources. Pattern Recognit. Lett. 129, 205–212 (2020) 23. Jarraya, S.K., Boukhriss, R.R., Hammami, M., Ben-Abdallah, H.: Cast shadow detection based on semi-supervised learning. In: Campilho, A., Kamel, M. (eds.) ICIAR 2012. LNCS, vol. 7324, pp. 19–26. Springer, Heidelberg (2012). https://doi. org/10.1007/978-3-642-31295-3 3 24. Boukhriss, R.R., Fendri, E., Hammami, M.: Moving object classification in infrared and visible spectra. In: Proceedings of the SPIE 10341, Ninth International Conference on Machine Vision, p. 1034104 (2016) 25. Joseph, R., Ali, F.: YOLOv3: an incremental improvement. arxiv:1804.02767Comment, Tech Report (2018) 26. Zhou, Z.-H.: Ensemble Methods: Foundations and Algorithms, 1st. edn. Chapman and Hall/CRC (2012) 27. Chaabane, I., Guermazi, R., Hammami, M.: Enhancing techniques for learning decision trees from imbalanced data. Adv. Data Anal. Classif. 14(3), 677–745 (2019). https://doi.org/10.1007/s11634-019-00354-x 28. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001) 29. Patel, R.K., Giri, V.K.: Feature selection and classification of mechanical fault of an induction motor using random forest classifier. Perspect. Sci. 8, 334–337 (2016) 30. Shukla, G., Dev Garg, R., Srivastava, H.S., Garg, P.K.: An effective implementation and assessment of a random forest classifier as a soil spatial predictive model. Int. J. Remote Sens. 39(8), 2637–2669 (2018) 31. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011) 32. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, pp. 770–778 (2016)
Imbalanced Moving Object Classification
209
33. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Image net classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 1097–1105 (2012) 34. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv, pp. 1409–1556 (2014)
Mining Frequently Traveled Routes During COVID-19 George Obaido1(B) , Kehinde Aruleba2 , Oluwaseun Alexander Dada3,5 , and Ibomoiye Domor Mienye4 1
Department of Computer Science and Engineering, University of California, San Diego, USA [email protected] 2 Department of Information Technology, Walter Sisulu University, Mthatha, South Africa 3 Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Biomedicum 2U, 00290 Helsinki, Finland [email protected] 4 Department of Electrical and Electronic Engineering Science, University of Johannesburg, Johannesburg, South Africa [email protected] 5 The School of Software, 18 Kessington Street, Lekki, Lagos 101245, Nigeria [email protected]
Abstract. The aviation industry has been one of the biggest employers of labor through its value chain until the world came to a halt due to the outbreak of the novel Coronavirus Disease 2019 (COVID-19). During the first quarter of 2020, many countries started implementing travel restrictions to minimize the spread of the virus, leading to a severe impact on the revenues of travel and tourism industry operators. As the world begins to reopen, it has become imperative for airlines to create attractive travel packages to meet the demands of passengers in a post-COVID era. This study introduces an associative data mining approach, using the Apriori algorithm – an associative rule mining algorithm tasked with finding frequent patterns in large heterogeneous datasets. Our preliminary findings show that London had the most frequently traveled destination for travellers out of a dataset containing flight information from March 2020 to April 2021. Our results may assist aviation operators in tailoring ticket fares at affordable prices to meet the demands of passengers. Keywords: COVID-19 · Aviation industry · Data mining · Associative pattern mining · Unsupervised machine learning
1
· Apriori
Introduction
The US Airline Deregulation Act (ADA) in 1978 resulted in remarkable growth and transformation in the global aviation industry, leading to the emergence of c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 210–219, 2022. https://doi.org/10.1007/978-3-030-96308-8_19
Mining Frequently Traveled Routes During COVID-19
211
newer businesses growing at significant levels [4,14,39]. According to the International Air Transport Association (IATA), the aviation industry contributed to $2.7 trillion in Gross Domestic Product (GDP) and carried over billions of passengers worldwide [14]. The recent Coronavirus Disease 2019 (COVID-19) outbreak has caused immense disruptions to the aviation industry, leading to significant job losses and the grounding of numerous airlines [12]. Consequently, this has negatively impacted other sectors, such as tourism, catering and haulage industries, and even the economy at large [25]. Previous studies have described the impact of previous pandemics on the aviation industry, especially with the outbreak of the Severe Acute Respiratory Syndrome (SARS) to Swine flu, leading to negative perceptions to want to travel [8,24,25]. In some cases, passengers raised concerns about being close to others, especially in airports and on aircrafts, and a vast majority perceived flying as no longer unique and fun. At the time, this lead to a sharp decrease in valuation [18]. Despite the negative perception to travel, Lamb et al. [17], in a study that explores personality traits and emotional heuristics during COVID-19, indicated that most passengers expressed enthusiasm to want to travel for leisure purposes if stricter and preventive measures are implemented. Even so, leisure travels appear common to travelers than business purposes [17]. As the world slowly reopens, it has become imperative for airlines to create attractive travel packages to meet the demands of passengers in a post-COVID era. In addition, this information may assist airline operators in promoting messaging that encourages the resumption of travel. To this end, this study aims to mine frequently traveled routes using an associative data mining algorithm called Apriori in an attempt to aid airlines attempting to create travel discounts and packages. The Apriori has found application to many areas [5]. The remainder of this paper is structured as follows. Section 2 presents the background information of the study. The method used for this study is described in Sect. 3. Section 4 presents the result of the method and Sect. 5 provides the conclusion and future directions.
2
Literature Review
In this section, we present the background related to the study. First, we highlight relevant works that have highlighted the impact of COVID-19 on the aviation industry. Next, we describe associative pattern mining and present areas of applications. 2.1
COVID-19 and Aviation Industry
Since the World Health Organization (WHO) declared COVID-19 as a pandemic on 11 March 2020, the aviation industry has experienced a severe global transportation crisis, which has affected millions of passengers worldwide and resulted in monumental job losses [3,12,23]. As a huge employer of labor, the aviation sector remains integral to the tourism economy, employing millions of skilled
212
G. Obaido et al.
and semi-skilled works across its value chain, but widely vulnerable to external factors. Despite its immense contribution as a global employer of labor, a recent study by Chung [9] indicated that the aviation sector has been vulnerable to catastrophic events, especially with pandemics in the past, such as SARS in 2003, Avian influenza in 2006 and the 2009 Swine Influenza. Even non-pandemic events, such as the Iran-Iraq war in the 80s, Gulf crisis in the 90s and the 9/11 terrorist attack in 2011, have challenged the aviation industry [7,12]. Similar to SARS, COVID-19 can be transmitted rapidly among people, leading to adverse effects on both human health and national economies with the aviation industry at the center of the storm [12,31]. Due to its severe impact, IATA predicted that the global aviation industry could lose about $314 billion in 2020, and EUROCONTROL1 reported that the number of flights decreased incredibly by 87% in April 2020 [31], leading to a 55% decline compared to the previous fiscal year. With the current crisis, it remains to be seen how the industry would bounce back from the situation with many flights still grounded due to its impact with a significant decrease in passenger demands with only leisure travels preferred [16,32]. 2.2
Associative Rule Mining
The concept of associative rule mining was first introduced by Agarwal et al. [2] for mining relationships in large-scale databases. The key motive behind this concept is discovering frequently occuring keywords or associates in large datasets, and has shown to be one of the most useful technique in the field of data mining [19,37]. Generally, associative rule mining algorithms are classified into Apriori and FP-Growth, which are widely used for mining associations on large data items [5,35]. According to Agarwal et al. [2], the metrics of the Apriori algorithm include support, confidence and lift, which were first used on market basket analysis. The support metric indicates how items are frequently occurring together, the confidence measures how often an item appears that contains the other, and the lift tells how likely two or more items occur frequently together. The Apriori algorithm uses rules to show probabilistic relationship between itemsets. It is worth noting that the confidence and support are measures of interestingness, which are intended for selecting and ranking patterns in data [2]. Since its creation, the Apriori algorithm has been widely used for other studies, such as fraud detection, medical applications, web log analysis and etc. [13,20,28]. Figure 1 shows a list of transactions with the Apriori rule evaluation. If the rule had a lift a lift of 1, then both {d,f} are independent. If the rule is >1 then both {d,f} are dependent on each other. If the lift is 0. 2.2
Dempster s Rule of Combination
Ω Let mΩ 1 and m2 two bba’s modeling two distinct sources of information defined on the same frame of discernment Ω.
258
M. Ben Khalifa et al.
The Dempster combination rule is introduced in [5], denoted by ⊕ and defined as: Ω Ω A∩B=C m1 (A)m2 (B) if C = ∅, ∀C ⊆ Ω, Ω (A)mΩ (B) Ω Ω 1− m 1 2 A∩B=∅ (2) m1 ⊕ m2 (C) = 0 otherwise. 2.3
Vacuous Extension
1 2 Frequently, we need to aggregate two bba s mΩ and mΩ that have different 1 2 frames of discernment. Thus, we rely on the vacuous extension which extends 1 the frames of discernment Ω1 and Ω2 , corresponding to the mass functions mΩ 1 Ω2 and m2 , to the product space Ω = Ω1 × Ω2 . The vacuous extension operation denoted by ↑ and defined such that:
mΩ1 ↑Ω1 ×Ω2 (B) = mΩ1 (A)
if
B = A × Ω2
(3)
where A ⊆ Ω1 , B ⊆ Ω1 ×Ω2 . It transforms each mass to the cylindrical extension B to Ω1 × Ω2 . 2.4
Decision Process
The belief function framework proposes various solutions to make decision. Within the Transferable Belief Model (TBM) [19], the decision process is performed at the pignistic level where bba s are transformed into the pignistic probabilities denoted by BetP and defined as: BetP (B) =
|A ∩ B| mΩ (A) |A| (1 − mΩ (∅))
∀B∈Ω
(4)
A⊆Ω
3
Evidential Spammers and Group Spammers Detection (ESGSD) Method
In the following section, we elucidate a novel proposed hybrid method named Evidential Spammers and Group Spammers Detection (ESGSD) which aims to classify reviewers into spammer or genuine ones while taking into account both the group spammer and single spammer detection aspects. Our proposal is composed from three parts. In the first one, we rely on the spammers’ indicators extracted from each reviewer historic and used as features in the Evidential K-Nearest Neighbors classifier (EK-NN) [6] to model the reviewer spamicity according the single spammer indicators, which is inspired from our contribution in [3]. In the second part, we propose to model the reviewer spamicity while taking into account the group spammers’ indicators which are modeled basing on the candidate groups and used as features in the EK-NN classifier which takes into account the pairwise group to define the distance choosing the K-nearest neighbors [4]. Once the reviewer spamicity basing on the spammer
Evidential Spammers and Group Spammers Detection
259
indicators and that based on the group spammers indicators are represented into two mass functions. Then, we combine them, in the third part, in order to make more suitable decision basing on the two aspects. We detail these three parts in depth. In the following section, the reviewer is denoted by Ri where i = 1, . . . , n is the id of the corresponding one. 3.1
Modeling the Reviewer Spamicity Using the Spammer Detection Indicators
In this part, we propose to model the reviewer spamicity through a bba when relying on the spammer’s indicators extracted from each reviewer behavior based on its historical data. Then, we handle the problem as binary classification one in order to model each reviewer spamicity under the frame of discernment ΩS = {S, S} where S represents the class of the reviewer considered as spammer and S is the class of the innocent reviewers. 3.1.1 Step 1: Pre-processing Phase As well known in the fake reviews fields, the attitudes and the behaviors of the reviewer are considered as the most important points in the detection process. These behaviors may be extracted from the historic data of each reviewer. In ESGSD, we propose to use the spammers’ indicators as features to train our algorithm. Thus, we rely on nine significant ones used in our method in [3]. Four features have the values into an interval of [0, 1], where those closed to the 1 indicate the high spamicity degree. These features are Content Similarity (CS), Maximum Number of Reviews (MNR), Reviewing Burstiness (BST) and Ratio of First Reviews (RFR). Moreover, we rely on also on five other binary features where the 1 value indicates the spamming and the 0 value presents the non spamming behavior. These are named: Duplicate/Near Duplicate Reviews (DUP), Extreme Rating (EXT), Rating Deviation (RD), Early Time Frame (ETF) and Rating Abuse (RA). All the definitions and the calculation details are presented in our previous work [3]. 3.1.2 Step 2: EK-NN Application After extracting the spammers indicators, we propose to use them as features to train the EK-NN [6] in order to model the reviewer spamicity while taking into account the uncertain aspect. We apply the EK-NN classifier in which we initialize the parameters, we measure the distance between each reviewer R and the target one Ri using d(R, Ri ) and select K most similar neighbors to each target reviewer. After that, we generate bbas for each reviewer and we combine them through the Dempster combination rule. Thus, the obtained bba represents the reviewer spamicity while taking into account the uncertain aspect and the ΩS ΩS S spammers’ indicators. It is obtained as follows: mΩ R = mR,R1 ⊕ mR,R2 ⊕ .... ⊕ S mΩ R,RK .
260
3.2
M. Ben Khalifa et al.
Modeling the Reviewer Spamicity Using Group Spammer Detection Indicators
3.2.1 Step 1: Pre-processing Phase Group Spammers usually attack the brands together where posting multiple reviews in order to promote or demote any target products. Thus, in order to build candidate spammers groups, we use frequent pattern mining which catch the spammers working together on multiple products. After that, we enumerate the group spammers indicators which can control the behaviors of the candidate spammers and to find out if these groups are behaving strangely. In order to construct sufficient groups for evaluation from the data, we use the Frequent Itemset Mining (FIM). Since, we aim to focus on the worst spamming activities in our dataset, we apply the Maximal Frequent Itemset Mining (MFIM) to discover groups with maximal size. The different group spammers’ indicators used in this part are Time Window (TW), Group Deviation (GD), Group Content Similarity (GCS), Member Content Similarity (MCS), Early Time Frame (ETF), Ratio of Group Size (RGS), Group Size (GS) and Support Count (SC) [4]. 3.2.2 Step 2: EK-NN Application After applying the FIM algorithm with fixed parameters, we find that the suspect clusters turn out to be very similar to each other in terms of members, examined products and similar evaluations. These small clusters can be favorable for detecting groups in novel ways. Thus, we apply the EK-NN based method to detect groups when relying on the similarities between such groups. We model our detection problem as a binary classification problem in order to assign each reviewer R to a class ΩGS = {M GS, M GS} where M GS represents the class of the members of the group spammer and M GS is the class of the reviewers who did not belong to the group spammer (innocent ones). The idea is that given a set of groups, the reviewers who belong to “similar” groups may be more likely to have the same class labels. Thus the class label of a reviewer R can be determined commonly by a set of K reviewers Ri who belong to groups most “similar” to the groups R belongs to. After applying EK-NN classifier, the whole bba that models the evidence of the K-nearest Neighbors regarding the class of the reviewer is measured as such: ΩGS ΩGS GS GS = mΩ mΩ R R,R1 ⊕ mR,R2 ⊕ .... ⊕ mR,RK . It is considered as the mass function which represents the reviewer spamicity while relying on the group spammer indicators. The details are given in our previous work [4]. 3.3
Distinguishing Between Spammers and Genuine Reviewers
In this part, we aim to combine the two bbas that model the reviewer spamicity once when taking into account the individual spammer behaviors and the other while taking into account the group spammer’s indicators and in order to construct a global bba representing the reviewer spamicity. For this, we must model the bbas under a global frame and transfer them to the decision frame to make the final decision. These steps are detailed above.
Evidential Spammers and Group Spammers Detection
261
3.3.1 Modeling the Reviewer Spamicity Basing on both the Spammers and the Group Spammers Aspects – Define ΩGSS as the global frame of discernment relative to the reviewer spamicity according the group spammers and the spammers indicators. It defines the cross product of the two different frames ΩGS and ΩS denoted by: ΩGSS = ΩGS × ΩS – Extend all the review trustworthiness and the reviewer spamicity bbas, respecGS S and mΩ tively mΩ R R , to the global frame of discernment ΩGSS to get new ΩGS ↑ΩGSS S ↑ΩGSS and mΩ using the vacuous extension (Eq. 3). bbas mR R – Combine different extended bbas using the Dempster rule of combination. GS ↑ΩGSS S ↑ΩGSS GSS = mΩ ⊕ mΩ mΩ R R R GSS Finally, mΩ represents the reviewer spamicity relying on both the group R spammers and the spammers indicators.
3.3.2 The Reviewer Spamicity Transfer GSS under the product space ΩGSS The next step is to transfer the combined mΩ R to the frame of discernment ΘD = {RS, RS}, where RS represents the class of the reviewers confirmed as spammers, and RS is the class of the genuine reviewers, in order to make the final decision by modeling the reviewer into a spammer or not. For that, a multi-valued operation [5], denoted τ is applied. The function τ : ΩGSS → 2ΘD rounds up event pairs as follows: – Masses of event couples with at least an element in {GS, S} and not in {GS, S} are transferred to Reviewer Spammer RS ⊆ ΘD as: GSS mΩ (SR), SR ⊆ ΩGSS (5) mτ ({RS}) = R τ (SR)=RS
– Masses of event couples with at least an element in {GS, S} and not in {GS, S} are transferred to Reviewer not Spammer RS ⊆ ΘD as: GSS mΩ (SR), SR ⊆ ΩGSS (6) mτ ({RS}) = R τ (SR)=RS
– Masses of event couples with no element in {GS, S} and not in {GS, S} are transferred to ΘD as: GSS mΩ (SR), SRi ⊆ ΩGSS (7) mτ (ΘD ) = R τ (SR)=ΘD
3.3.3 Decision Making Now that we transferred all bbas modeling both the whole reviewer spamicity to the decision fame of discernment ΘD in order to differentiate between the spammer and the genuine reviewers. Thus, we apply the pignistic probability BetP using (Eq. 4). We select the hypothesis with the greater value of BetP and we considered it as the final decision.
262
4
M. Ben Khalifa et al.
Experimentation and Results
Data Description We use two real labeled datasets collected from Yelp.com in order to evaluate our ESGSD method effectiveness. These datasets are considered as the most complete. They are labeled through the yelp filter which has been used in different related works [2,3,10,14,16] as a fundamental truth in favor of its effective detection algorithm based on both experts judgment and several behavioral features. Table 1 represents the datasets content where the percentages indicate the filtered fake reviews (not recommended) also the spammers reviewers. Table 1. Datasets description Datasets
Reviews (filtered %)
Reviewers (Spammer %)
Services (Restaurant or hotel)
YelpZip
608,598 (13.22%)
260,277 (23.91%)
5,044
YelpNYC 359,052 (10.27%)
160,225 (17.79%)
923
In order to evaluate the ESGSD method, we rely on 3 evaluation criteria mentioned in the following: Accuracy, precision and recall. Experimental Results In our datasets extracted from Yelp.com, we find that the number of genuine reviews is much larger than the number of fraudulent reviews, which can lead to an over-fitting. For the purpose of avoiding this problem, we extract a balanced data (50% of spam reviews and 50% of trustful ones). After that, we divided the datasets into 70% of training set and 30% of testing set. In addition, we average 10 trials values using the 10 cross-validation technique to obtain the final estimation of evaluation criterion. In the first part, we extract the spammers indicators through the historical of each reviewer (in our two datasets) to create our features in order to apply the EK-NN algorithm, we choose k = 3. In the second part, we apply the frequent itemset mining FIM, where I is the set of all reviewer ids in our two datasets. Each transaction is the set of the reviewer ids who have reviewed a particular hotel or restaurant. Thus, each hotel or restaurant generates a transaction of reviewer ids. By mining frequent itemsets, we find groups of reviewers who have reviewed multiple restaurants or hotels together. Then, we rely on the Maximal Frequent Itemset Mining (MFIM) to spot groups with maximal size in order to focus on the worst spamming activities. In the YelpZip dataset, we found 74,364 candidate groups and 50,050 candidate groups for the YelpNYC dataset. Since, there is no methods which deal with both the spammer and the group spammers aspects. We propose to compare our ESGSD method with two methods from the group spammer detection field which is based on the FIM named: Detecting Group Review Spam (DGRS) proposed in [12] and Ranking Group
Evidential Spammers and Group Spammers Detection
263
Spam algorithm (GSRank) introduced in [13]. We compare also with two methods from the spammers detection field: SpEagle framework proposed by Rayana and Akoglu [16] and the method proposed by Fontanarava et al. [10] we denoted FAFR. We add to the comparison study our two previous methods: Evidential Group Spammers Detection (EGSD) introduced in [4] and Evidential Spammer Detection (ESD) proposed in [3]. The evaluation results are elucidated in Table 2. We observe that our ESGSD method continuously outperforms almost all compared baselines in most of evaluation criteria. We obtain the best results in terms of accuracy, precision, and a competitive ones in term of recall while using YelpZip dataset. For the YelpNYC database, we get the best results for all three considered criteria. We record at best an accuracy of 86.9% with the YelpZip dataset. We note an improvement which almost reaches 2% comparing with ESD method and 3% comparing with EGSD method. The precision criterion reaches at best 87%. The performance improvements recorded prove the importance of the use of both the group spammers and the spammers indicators while taking into account the uncertain aspects which allowed us to detect more particular cases. Table 2. Comparative results
5
Evaluation criteria YelpZip YelpNYC Precision Recall Accuracy Precision Recall
Accuracy
SpEagle [16]
75.3%
65.2% 79%
73.5%
67.3%
76.9%
FAFR [10]
77.6%
86.1% 80.6%
74.8%
85%
81.6%
DGRS [12]
70%
71%
65%
62%
61.3%
60%
GSRank [13]
76%
74%
78%
76.5%
77.2%
74%
ESD [3]
85%
86%
84%
86%
83.6%
85%
EGSD [4]
83.5%
86%
85%
83.55%
85%
84.3%
ESGSD
86%
85%
86.9%
87%
85.2% 85.9%
Conclusion
In this paper, we tackle for the first time both the group spammer and the spammer review detection problem in order to perform the detection quality. This detection allows the different review systems to block suspicious reviewers in order to stop the emergence of fake reviews. Our ESGSD method succeeds in distinguishing between the spammers and the innocent even in the special cases. It proves its performance and effectiveness against various state-of-the-art approaches from the spammer and the group spammer fields. As future work, we aim to create a platform that deals with the all aspects of fake reviews detection problem to detect different types of spammers and deceptive reviews.
264
M. Ben Khalifa et al.
References 1. Ben Khalifa, M., Elouedi, Z., Lef`evre, E.: Fake reviews detection based on both the review and the reviewer features under belief function theory. In: Proceedings of the 16th International Conference Applied Computing (AC 2019), pp. 123–130 (2019) 2. Ben Khalifa, M., Elouedi, Z., Lef`evre, E.: Spammers detection based on reviewers’ behaviors under belief function theory. In: Wotawa, F., Friedrich, G., Pill, I., Koitz-Hristov, R., Ali, M. (eds.) Proceedings of the 32nd International Conference on Industrial, Engineering Other Applications of Applied Intelligent Systems (IEA/AIE 2019), pp. 642–653. Springer, Cham (2019). https://doi.org/10.1007/ 978-3-030-22999-3 55 3. Ben Khalifa, M., Elouedi, Z., Lef`evre, E.: An evidential spammer detection based on the suspicious behaviors’ indicators. In: Proceedings of the International MultiConference on: “Organization of Knowledge and Advanced Technologies” (OCTA), Tunis, Tunisia, pp. 1–8 (2020) 4. Ben Khalifa, M., Elouedi, Z., Lef`evre, E.: Evidential group spammers detection. In: Lesot, M.-J., et al. (eds.) IPMU 2020. CCIS, vol. 1238, pp. 341–353. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-50143-3 26 5. Dempster, A.P.: Upper and lower probabilities induced by a multivalued mapping. Ann. Math. Stat. 38, 325–339 (1967) 6. Denoeux, T.: A K-nearest neighbor classification rule based on Dempster-Shafer theory. IEEE Trans. Syst. Man Cybern. 25(5), 804–813 (1995) 7. Elmogy, A., Usman, T., Atef, I., Ammar, M.: Fake reviews detection using supervised machine learning. Int. J. Adv. Comput. Sci. Appl. 12(1), 601–606 (2021) 8. Fayazbakhsh, S., Sinha, J.: Review spam detection: a network-based approach. Final Project Report: CSE 590 (Data Mining and Networks) (2012) 9. Fei, G., Mukherjee, A., Liu, B., Hsu, M., Castellanos, M., Ghosh, R.: Exploiting burstiness in reviews for review spammer detection. In: Proceedings of the Seventh International Conference on Weblogs and Social Media, ICWSM, vol. 13, pp. 175– 184 (2013) 10. Fontanarava, J., Pasi, G., Viviani, M.: Feature analysis for fake review detection through supervised classification. In: Proceedings of the International Conference on Data Science and Advanced Analytics, pp. 658–666 (2017) 11. Heydari, A., Tavakoli, M., Ismail, Z., Salim, N.: Leveraging quality metrics in voting model based thread retrieval. Int. J. Comput. Electrical Autom. Control Inf. Eng. 10(1), 117–123 (2016) 12. Mukherjee, A., Liu, B., Wang, J., Glance, N., Jindal, N.: Detecting group review spam. In: Proceedings of the 20th International Conference on World Wide Web, WWW 2011, Hyderabad, India, ACM 978-1-4503-0637-9/11/03 (2011) 13. Mukherjee, A., Liu, B., Glance, N.: Spotting fake reviewer groups in consumer reviews. In: Proceedings of the 21st International Conference on World Wide Web, ACM, New York, pp. 191–200 (2012) 14. Mukherjee, A., Venkataraman, V., Liu, B., Glance, N.: What yelp fake review filter might be doing. In: Proceedings of the Seventh International Conference on Weblogs and Social Media, ICWSM, pp. 409–418 (2013) 15. Savage, D., Zhang, X., Yu, X., Chou, P., Wang, Q.: Detection of opinion spam based on anomalous rating deviation. Expert Syst. Appl. 42(22), 8650–8657 (2015) 16. Rayana, S., Akoglu, L.: Collective opinion spam detection: bridging review networks and metadata. In: Proceedings of the 21th International Conference on Knowledge Discovery and Data Mining, ACM SIGKDD, pp. 985–994 (2015)
Evidential Spammers and Group Spammers Detection
265
17. Shafer, G.: A Mathematical Theory of Evidence, vol. 1. Princeton University Press, Princeton (1976) 18. Smets, P.: The transferable belief model for expert judgement and reliability problem. Reliabil. Eng. Syst. Saf. 38, 59–66 (1992) 19. Smets, P.: The transferable belief model for quantified belief representation. In: Smets, P. (ed.) Quantified Representation of Uncertainty and Imprecision, pp. 267– 301. Springer, Dordrecht (1998). https://doi.org/10.1007/978-94-017-1735-9 9 20. Wang, G., Xie, S., Liu, B., Yu, P. S.: Review graph based online store review spammer detection. In Proceedings of the 11th International Conference on Data Mining, ICDM, pp. 1242-1247 (2011) 21. Wang, Z., Hou, T., Song, D., Li, Z., Kong, T.: Detecting review spammer groups via bipartite graph projection. Comput. J. 59(6), 861–874 (2016) 22. Wang, Z., Gu, S., Xu, X.: GSLDA: LDA-based group spamming detection in product reviews. Appl. Intell. 48(9), 3094–3107 (2018). https://doi.org/10.1007/ s10489-018-1142-1
NLP for Product Safety Risk Assessment: Towards Consistency Evaluations of Human Expert Panels Michael Hellwig1(B) , Steffen Finck1 , Thomas Mootz2 , Andreas Ehe2 , and Florian Rein2 1
Josef Ressel Center for Robust Decision Making, Research Center Business Informatics, FH Vorarlberg University of Applied Sciences, Dornbirn, Austria [email protected] 2 Hardware Mechanics Maintenance Service and Development, OMICRON Electronics GmbH, Klaus, Austria
Abstract. Recent developments in the area of Natural Language Processing (NLP) increasingly allow for the extension of such techniques to hitherto unidentified areas of application. This paper deals with the application of state-of-the-art NLP techniques to the domain of Product Safety Risk Assessment (PSRA). PSRA is concerned with the quantification of the risks a user is exposed to during product use. The use case arises from an important process of maintaining due diligence towards the customers of the company OMICRON electronics GmbH. The paper proposes an approach to evaluate the consistency of humanmade risk assessments that are proposed by potentially changing expert panels. Along the stages of this NLP-based approach, multiple insights into the PSRA process allow for an improved understanding of the related risk distribution within the product portfolio of the company. The findings aim at making the current process more transparent as well as at automating repetitive tasks. The results of this paper can be regarded as a first step to support domain experts in the risk assessment process. Keywords: Product safety · Applied data analysis · Unstructured data · Natural Language Processing · Risk assessment
1
Introduction
Natural Language Processing (NLP) encompasses several language and speechrelated tasks, like information extraction, translation, or text generation [1,9,10]. Solution strategies are based on statistical approaches ranging from determining word frequencies to using Machine Learning (ML) for model building. Due to the quality of recent NLP methods, applications to industrial problems have seen a The work has been funded by the Austrian Research Promotion Agency via the “Austrian Competence Center for Digital Production”. Further, we gratefully acknowledge the financial support by the Austrian Federal Ministry for Digital and Economic Affairs, the National Foundation for Research, Technology and Development and the Christian Doppler Research Association. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 266–276, 2022. https://doi.org/10.1007/978-3-030-96308-8_24
NLP for Product Safety Risk Assessment
267
rapid rise in the last years [6,14], e.g. for virtual assistance or conformity checks. A central idea in NLP is the embedding, i.e. the transformation of text into numerical representations. Only the latter allows to perform statistical analyses or to build ML applications. To avoid sparse and high-dimensional discrete representations that slow down the performance in NLP tasks, embeddings were developed, where similar or related words are placed close to each other. The resulting representations are of lower dimensionality and in a continuous space. Approaches, like word2vec [11,12], use neural networks to learn a suitable embedding. Thus, they require a (large) corpus of data for the training of the network. More recent approaches include word attention and the transformer architecture for (deep) neural networks [7,18]. Attention allows for obtaining embeddings that make use of the semantic context, while the transformer allows for a faster (i.e., parallel) computation of the data. The required training is computationally expensive since the neural networks can have millions of parameters. Yet, several of such networks, particularly based on Bidirectional Encoder Representations from Transformers (BERT) [3], achieved human-level abilities in NLP-related benchmarks. Pre-trained versions of this network are freely available [5]. The use case considered in this paper deals with the analysis of the current Product Safety Risk Assessment (PSRA) process at OMICRON electronics GmbH (OMICRON). In this context, product safety refers to the manufacturing, distribution, and sale of products that are either potentially (or inherently) unsafe for consumer use from various perspectives. Within a production system, a company is responsible for ensuring safe product use by the end customer (user) in accordance with its duty of care. As required by law, sellers and manufacturers are expected to provide appropriate instructions and warnings regarding the use of a product. To this end, international standards exist to ensure guidance on how to integrate safety into the product (e.g., ISO 10377). Such standards intend to provide guidance on hazard identification, reliable risk assessment, and remedial actions to reduce potential product risks. They serve as benchmarks for the elimination of hazards that would be unacceptable during product use and should be implemented consistently. To ensure consistency, checks and adjustments of PRSAs across the product range have to be performed regularly. Consistency checks are complex for a human to perform because of small variations in the data, e.g. dissimilar risk descriptions or level of detail. Given the structure of the available use case data, NLP provides reasonable tools for the identification of similarities and dissimilarities in the historical risk factor assessment [13]. The computerized identification of similar risk factor groups over the product range allows for accelerated consistency checks and for the ability to partially automate risk assessment for new products. This results in the reduction of assessment deviations and strengthens the compliance with the company’s obligations of due diligence regarding product safety. The remainder of this paper is organized as follows: Sect. 2 outlines the structure of the use case under consideration. Two NLP-based approaches for clustering similar product safety risks within the company’s product range are presented in Sect. 3. Section 4 discusses the results and their potential impact on the PSRA process. Finally, Sect. 5 provides an outlook on future work.
268
2
M. Hellwig et al.
Data Description
At OMICRON, the PSRA process is based on the manual identification and the evaluation of individual risk factors by expert groups. That is, a consistent risk assessment is very time-consuming and resource-binding. Further, it may even depend on the composition of the expert group as different engineers might assess similar risks differently in certain situations. Such subjective risk evaluations need to be largely excluded in a consistent risk management process. The investigated data set contains 32 PSRA files of individual devices produced at OMICRON. For each device, risks for several application scenarios are provided. Each scenario consists of several risk descriptions and associated ratings. Since the textual risk descriptions are unstructured, the mapping to the ratings is subjective and requires regular consistency checks. In consultation with the domain experts, the focus was restricted to particularly relevant parts of the data sets. These include the device under consideration, the textual description of the risk scenario (five columns), and the quantitative risk assessment (four columns of numerical values). The circumstance that only a relatively small data set is available makes it necessary to use a pre-trained model. The data are processed in a Python3.7 environment. In particular, the Python framework Sentence Transformers [15] which represents a state-of-theart tool for obtaining sentence and text embeddings was utilized. Offering a large set of pre-trained BERT models, it is able to transform textual expressions into a high-dimensional vector space. The vector space representation includes information of input order and self-attention. It depends on the architecture of the selected BERT model. We used ‘distilbert-base-nli-mean-tokens’ [5] in this study. This way, a contextual embedding of the PSRA entries in a highdimensional vector space is obtained. Such embeddings can then be compared, e.g. by cosine-similarity [17], to find sufficiently similar risk descriptions.
3
Risk Clustering Approach
For the processing of the PSRA use case, different NLP techniques are used in different steps. In fact, two distinct approaches for grouping similar risk scenario descriptions within the data set are presented. The remainder of this section describes the steps involved in clustering similar risk scenarios. The comprehensive workflow of these approaches is illustrated in Fig. 1(a) and (b). A comparison of both approaches as well as findings with respect to company-relevant questions will be provided in the following section. In the first step, the raw data are reviewed and pre-processed. This step includes the identification of the relevant spreadsheets and the cleaning of the Excel sheets. Cleaning involves the identification and substitution of missing values. Further, the text entries are lowercase, and uninformative stop words are removed [16]. The result of this pre-processing step is a data frame that consists of 10 columns and 666 rows. The number of rows corresponds to the number of data samples under investigation. Each sample refers to an individual
NLP for Product Safety Risk Assessment
269
risk assessment obtained from the PSRA for a specific product. The columns cover textual information about the data source (Device) the risk characteristics (Function, Failure, Cause, Effect, Control), as well as numerical information about the associated risk composition (Severity, Probability, Detectability). The latter values determine the so-called Risk Priority Number (RPN) which quantifies the hazardous potential of the product device as judged by the panel of domain experts. The risk categories subdivide potential product risk scenarios with respect to the device function that carries the risk (Function), the malfunction (Failure), the expected reason for failure (Cause), the potential failure impact (Effect), and the implemented remedial actions (Control). Assuming a consistent PSRA, similar risks described by such a characterization should entail comparable risk priority numbers. To check this consistency, the risk samples are to be grouped according to commonalities in their textual risk category descriptions. However, these descriptions are very diverse as they are non-standardized and originate from different expert group compositions. Some entries consist of single words while others include an extensive description of the situation. Spelling errors can also be observed. This demands intelligent processing based on modern NLP methods. In the following subsections, two approaches for the task described above are presented in the following subsections. After having generated groups of similar risk descriptions using one of the approaches features are determined within all clusters to facilitate the evaluation of the clustering. Such features include the number of entries in a cluster, the count of unassigned entries, the number of unique PSRAs that contribute to each cluster, or the most representative entry of each cluster, etc. Each cluster is then enriched with the numerical risk values of its entries. For the consistency analysis, the distributions of the risk indicators are investigated and, for example, visualized in the form of a box plot for each cluster. Box plot illustrations of selected clusters must be omitted for confidentiality reasons. Similar location and low dispersion in the distributions indicate a high consistency within the risk assessment. From the company’s point of view, this is especially interesting if the rating was carried out across different PSRAs. A wide dispersion, then again, prompts the experts to look more closely at the corresponding cluster and investigate the cause of the deviations. At this point, the detailed analysis of the cluster quality must be left to the domain experts. 3.1
Keyword Extraction
After a pre-processing step of data cleansing and risk category selection, the proposed approach extracts a maximum of 0 < k ≤ 5 keywords for each risk category. Note, that omitting this keyword extraction step and working directly with the raw entries resulted in extreme differences in the length of the observed character sequences. This led to significantly worse results in experiments. The automated keyword generation step is applied to each category. It makes use of Tokenization [16], i.e. cutting running text into pieces called tokens. After that, each entry of a risk category is decomposed into specified n-grams, i.e. a
270
M. Hellwig et al.
sequence of n consecutive tokens from a given sample of text [16]. The n-grams, as well as the complete tokenized entry (or sentence), are then transformed into a vector space representation by use of Sentence Transformer. The next task is concerned with identifying those k ∗ ≤ k n-grams that turn out most diverse while being most representative for the sentence in terms of their (low) cosine-similarity distance. This balance can be tuned using a single parameter by ranking with respect to the Maximal Marginal Relevance of the keywords [4]. The transformation step combines these keywords in a key phrase. This key phrase comprises a maximum of 5k ∗ keywords that turn out to be sufficiently diverse and at the same time representative for the risk characterization. Consisting only of keyword tokens, key phrases can easily be embedded in a highdimensional vector space by application of Sentence Transformer [15]. Hence, one obtains a set of 666 vectors in a 768-dimensional real vector space that can be analyzed for subgroups of similar vectors by use of clustering algorithms. As mentioned before, the similarity in this vector representation is measured in terms of cosine-similarity. Several clustering algorithms can identify clusters based on this similarity measure[19], each with its individual advantages and drawbacks. Based on the data representation, the similarity function, and, in the uncertainty of the appropriate cluster number, the clustering algorithm HDBScan [2] is selected for this task. Being provided with a single parameter specifying the minimal demanded cluster size, HDBscan determines groups of similar vectors and maps non-assignable points to one “leftover class” with the class label “0”. Useful clusters are labeled with ascending numbering starting from “1”. Depending on the parameter settings, this approach yields different groupings of the risk portfolio. More details can be found in Sect. 4.
Fig. 1. (a) Illustration of the consistency check workflow for the PSRA analysis. (b) Illustration of the clustering workflow using the Levenshtein approach. Determination of cluster features and visualization are afterward performed as in (a).
NLP for Product Safety Risk Assessment
3.2
271
Levenshtein Distance
As an alternative approach for obtaining clusters of similar PSRA descriptions, a clustering based on the Levenshtein distance [8] is developed. In comparison to approaches based on word embeddings no transformation of the character/strings into a numerical space is required. The workflow is given in Fig. 1. After performing the same pre-processing as before, an additional data cleaning step is executed. In this step, data samples where three or more textual descriptions of the risk characteristics are missing (47 out of 666) are removed. Next, the data are augmented with an additional feature. The feature is the concatenation of the risk characteristic descriptions. In case a description is missing, it is replaced with the string “missing characteristic value” where characteristic is replaced with the respective risk characteristic. The features used are strings (i.e., the five risk descriptions and the concatenated descriptions) and allow for pairwise comparisons. To evaluate the agreement between two strings (s1 and s2 ), a dissimilarity measure D involving the Levenshtein distance is defined as D=
|sj | − |si | LD(si , sj [1 + k : |si | + k]) + min . |si | |si | k∈{0,|sj |−|si |}
(1)
Here, si represents the shorter string of s1 and s2 , i.e. |si | ≤ |sj |. The first addend is the relative difference in length which is independent of the Levenshtein distance measure. The second addend is the minimum normalized Levenshtein distance LD(·) for si and all string sequences of length |si | contained in sj . The notation sj [a : b] refers to string of characters from position a to position b (both included) of sj . As an example, let s1 be “shock” and s2 be “electric shock”. This yields the following values: si = s1 since |s1 | = 5 < |s2 | = 14; the first part of Eq. (1) is 95 = 1.8; the Levenshtein distance is determined for the following |s2 | − |s1 | + 1 = 10 string pairs: (“shock”,“elect”),(“shock”,“lectr”), . . ., (“shock”,“shock”); and the minimum normalized Levenshtein distance is 0 obtained for the last pair. Dissimilarity is used to identify similar data samples and to assign them to clusters. To this end, the pairwise dissimilarities in all features for all data samples are determined. Using a graph as representation, data samples (nodes of the graph) are connected with an edge, if in at least nc risk characteristics the dissimilarity is equal or below a pre-defined threshold, i.e., D ≤ T . Both, T and nc need to be set by the end-user. By computing the connected sub-components within this graph, intermediate clusters are identified. The same procedure is applied by considering the concatenated string alone. This provides a second set of intermediate clusters. The final clustering is obtained by the union of nodes in overlapping clusters, i.e., clusters that have at least one common data sample. Note, given that nodes in one sub-component can belong to different subcomponents in the other graph, the employed procedure aims at large clusters and ensures that each node appears only once in the final clustering.
272
4
M. Hellwig et al.
Discussion
Characteristic features of the risk clusters obtained from both approaches are summarized in Table 1. At this stage the aim was a simple comparison of the obtained clusters and the identification of groups which need to be verified by human experts. The values provided for the Keyword Extraction approach (i.e., complete pass of the workflow in Fig. 1) rely on fixed choice of the number of keywords generated per risk category, the type of n-grams used, as well as on the minimal cluster size considered by HDBscan [2]. For the results in Table 1, we considered the most representative three keywords per risk category as well as a separation of each category into 2- and 3-grams, accordingly. The minimal cluster size allowed in HDBscan is set to 2. Hence, the approach is able to identify pairs of similar risks within a unique cluster. The parameter selection is motivated by extensive experiments with varying parameter settings. These experiments are left out due to the shortage of space. The impact of varying the parameter choices is discussed below. That is, by considering more than a single keyword the key phrases increase in length and get more distinct. Hence, the number of clusters will reduce as the count of unassigned unique risk entries increases. With this in mind, the number of unique clusters varied between 42 and 152 depending on the parameter selection. The number of unassigned entries is inversely proportional to the cluster count, i.e. it varies between 427 and 205. However, the average similarities within the generated clusters turns out to be quite constant with only small reductions for decreasing cluster numbers. The cluster count with more than three risk entries remains quite constant around 30, suggesting that similar clusters can be found regardless of the parameter choice which supports the utility of our approach. The keywords extracted in this approach do add structure to the data set. An analysis of this structure can be used to facilitate insight into the PSRA process, e.g., answering questions concerning the domain experts’ expectations like which failure is likely causing what effect. This already creates some added value for the partner company. The results offer a new perspective on the risks within the product portfolio. Yet, due to confidentiality, it is not possible to go into more detail on the analysis of these aspects in this paper. Table 1. Comparison of the clusters found by the approaches presented in Sect. 3. Categories
Keyword extraction Levenshtein distance
# unique clusters
129
44
MAX cluster size
9
9
# cluster size > 3
25
9
MEAN cluster size
2.71
2.78
STD cluster size
1.16
1.28
MEDIAN cluster size
2
2
# unassigned entries
313
# mutually contained clusters 17
541 11
NLP for Product Safety Risk Assessment
273
Regarding the Levenshtein distance clustering (Fig. 1), a dissimilarity threshold of T = 0.3 and nc = 4 for the number of satisfied features were selected. The reasons for these selections are taken from some preliminary experiments: Grammatical and spelling errors should not pose an obstacle for identifying groups. Also, entries with the same meaning but using more/less terms should to a certain extent fall below the threshold. Note, however, the approach does not consider any context or meaning for the descriptions. By using T = 0.3 the number of possible candidates for clustering is about 50% higher compared to the exact similarity (D = 0). Higher values of T result in more data samples being assigned to a cluster. Using nc = 4 for the risk characteristics (i.e., without the concatenated feature) allows for some fuzziness in the data samples. For example, if the description in one of the characteristics is missing, one should still be able to identify candidates for a cluster by neglecting the respective characteristic. Lower values of nc result in more data samples being clustered and in obtaining clusters with a larger number of elements. Both clustering approaches rely on different mechanisms, thus, one expects different clustering. Clusters that are common in both approaches (i.e., containing exactly the same elements) (indirectly) confirm that these data samples form a valid cluster. By constructing a bipartite graph B, where one part of the nodes comes from the keyword clustering and the other nodes from the Levenshtein clustering, ideal common clusters are identified as connected sub-components consisting of two nodes and a single edge. An extension to this common cluster is a graph with three nodes and two edges, meaning that a single cluster of one approach is represented by two clusters in the other approach. In Fig. 2(A) and (B) examples for common clusters are presented. More complex sub-components of B contain several nodes of both partitions and several edges. Typically, this involves the unassigned elements of both approaches. For the clusters given in Table 1, 36.04% of all data samples are unassigned in both approaches and 50% of the edges in B connect unassigned data samples in one approach to clustered data samples in the other approach. This rather high number of unassigned entries is not really surprising given that the data set contains several unique risk descriptions. Using the edges in such a sub-component allows one to identify sets of clusters with possible similar entries. An example is shown in Fig. 2(C). On top, clusters from the keywordbased approach and, on the bottom, clusters from the Levenshtein approach are shown as circles. Edges, shown as black lines, between the clusters mean that at least one common element is shared by the clusters. The blue circles identify the collection of unassigned elements for each approach. The green circle identifies the root node, i.e., the node the user is currently interested in. Using an iterative process, possible similar clusters with additional data samples can be identified by following the respective edges in the graph. For the connections in Fig. 2(C), the direct connections and their respective connections are shown. An exception is the collection of unassigned elements (blue nodes) which act as a sink and do not introduce new connections. Given that more than 100 clusters
274
M. Hellwig et al.
Fig. 2. Connections between clusters obtained by the two clustering approaches.
can be proposed by the approaches, this process allows users to extract a subset of clusters for verification and analysis. Of particular interest are root nodes that represent clusters with risk entries across several devices and that are possibly based on the evaluation of different expert groups. Currently, this information is not available. From the results in Table 1, one observes that Keyword Extraction is able to assign more risk entries to clusters without significant changes in the cluster size statistics. Most of these clusters contain two elements. For the larger clusters (# cluster size > 3), the count increases proportionally to the count of the unique clusters. The number of common clusters (# mutually contained clusters) represents the number of clusters fully contained in the clusters of the other approach. For the results, we find 17 clusters from Keyword Extraction completely contained in 14 clusters of the Levenshtein approach. Conversely, only 11 clusters of the Levenshtein approach are completely contained in the clusters of Keyword Extraction. These 11 clusters are identified equally by both approaches.
5
Outlook
In addition to evaluating the PSRA consistency provided by the panel of domain experts, the presented clustering approaches aim at identifying potential for revising the PSRA process. Such information would help to standardize the risk descriptions. In return, the clustering approach would profit from working on more structured data. Another key focus of future work will be on creating an automated support system for the expert panel. The identified clusters of good quality will serve as a template to support risk ratings by offering previous knowledge to give informed options. These ratings will then be proposed to the expert panel directly after the risk description has been prepared. The rating score recommendation might be directly integrated into Excel or in form of a separate user interface. This will facilitate the discussion within the expert panel and will help to ensure more consistent ratings in the future. To further increase the class of good clusters, the reasons for any inconsistencies found must also be thoroughly investigated and incorporated into the process. Hence, a representation of the inconsistencies needs to be developed.
NLP for Product Safety Risk Assessment
275
The experiments performed so-far show that most clusters are connected to several clusters across the two approaches. This suggests some potential for combining both approaches. For example, the Levenshtein approach could be used to improve the clustering found by keyword extraction.
References 1. Alves, S., et al.: Information extraction applications for clinical trials: a survey. In: 2019 14th Iberian Conference on Information Systems and Technologies (CISTI), pp. 1–6. IEEE (2019) 2. Campello, R.J.G.B., Moulavi, D., Sander, J.: Density-based clustering based on hierarchical density estimates. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013. LNCS (LNAI), vol. 7819, pp. 160–172. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37456-2 14 3. Devlin, J., et al.: BERT: pre-training of deep bidirectional transformers for language understanding (2019) 4. Forst, J.F., Tombros, A., Roelleke, T.: Less is more: maximal marginal relevance as a summarisation feature. In: Azzopardi, L., et al. (eds.) ICTIR 2009. LNCS, vol. 5766, pp. 350–353. Springer, Heidelberg (2009). https://doi.org/10.1007/9783-642-04417-5 37 5. HuggingFace: Hugging Face – The AI community building the future (2021). https://huggingface.co/ 6. Kalyanathaya, K.P., et al.: Advances in natural language processing–a survey of current research trends, development tools and industry applications. Int. J. Recent Technol. Eng. 7, 199–202 (2019) 7. Kitaev, N., Kaiser, L., Levskaya, A.: Reformer: The Efficient Transformer. arXiv:2001.04451 (2020) 8. Levenshtein, V.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl. 10, 707–710 (1965) 9. Arai, K., Bhatia, R., Kapoor, S. (eds.): FTC 2018. AISC, vol. 880. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-02686-8 10. Maruf, S., et al.: A survey on document-level machine translation: methods and evaluation. arXiv preprint arXiv:1912.08494 (2019) 11. Mikolov, T., et al.: Distributed Representations of Words and Phrases and their Compositionality. arXiv:1310.4546 (2013) 12. Mikolov, T., et al.: Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781 (2013) 13. Miner, G., et al.: Practical Text Mining And Statistical Analysis For Nonstructured Text Data Applications. Academic Press, Milton Park (2012) 14. Quarteroni, S.: Natural language processing for industry: ELCA’s experience. Informatik Spektrum 41(2), 105–112 (2018). https://doi.org/10.1007/s00287-0181094-1 15. Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using siamese BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (2019) 16. Sarkar, D.: Text Analytics with Python. A Practical Real-World Approach to Gaining Actionable Insights from your Data, Apress, Berkeley (2016). https://doi.org/ 10.1007/978-1-4842-2388-8
276
M. Hellwig et al.
17. Singhal, A.: Modern information retrieval: a brief overview. Bull. IEEE Comput. Soc. Tech. Committee Data Eng. 24(4), 35–43 (2001) 18. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017) 19. Xu, D., Tian, Y.: A comprehensive survey of clustering algorithms. Ann. Data Sci. 2(2), 165–193 (2015). https://doi.org/10.1007/s40745-015-0040-1
Augmented Reality for Fire Evacuation Research: An A’WOT Analysis El Mostafa Bourhim(B) EMISYS: Energetic, Mechanic and Industrial Systems, Engineering 3S Research Center, Industrial Engineering Department, Mohammadia School of Engineers, Mohammed V University, Rabat, Morocco [email protected]
Abstract. One of the most important disaster management strategies for reducing the impact of man-made and natural risks on building inhabitants is evacuation. Building evacuation readiness and efficacy have been improved using a variety of current technology and gamification concepts, such as immersive virtual reality (VR) and Augmented Reality (AR). These technologies have been used to study human behavior during building crises and to train people on how to handle evacuations. AR is a cutting-edge technology that can help increase evacuation performance by giving virtual content to building inhabitants. The goal of this research is to identify and prioritize the fundamental features of AR as a research tool for human fire behavior. The hybrid multi-criteria decision approach A’WOT is used in this investigation. The Analytic Hierarchy Process (AHP) and the Strengths, Weaknesses, Opportunities, and Threats (SWOT) techniques are combined in this method. This approach can be used to identify and prioritize the elements that affect a system’s operation. As a consequence of this research and the use of the A’WOT technique, the fundamental factors of AR technology were identified and prioritized. Keywords: Building evacuation · Augmented reality · Virtual reality · Analytic hierarchy process · Swot analysis
1 Introduction Building evacuations are one of the most important risk-reduction techniques for structures that have been damaged by both man-made and natural disasters. Many research have been conducted until far focused on human behavior in catastrophes such as fires and simulating those actions in evacuation model tools. Novel technologies, such as VR and AR, have been shown to be critical in understanding human behavior in catastrophes and improving building occupants’ preparedness [1, 2]. For example, VR has been used to examine occupant perceptions of evacuation systems [3]; (b) evacuee route and exit options [4]; (c) evacuee navigation [5]; and to prepare occupants of the building for fire and earthquake incidents [6]. AR is a unique technology that has gained appeal among the general public in recent years as new AR devices and applications have been © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 277–285, 2022. https://doi.org/10.1007/978-3-030-96308-8_25
278
E. M. Bourhim
released. Furthermore, by providing people with real-time digital information, AR is a technology that might aid improve building evacuation performance. During evacuations and crises, users would be able to make real-time in-place best-action options. The goal of this research is to determine and prioritize the essential features of AR as a predictive and effective research tool for human fire behavior.
2 Literature Review 2.1 Augmented Reality Milgram and Kishino established a conceptual foundation for AR and VR more than two decades ago. Real and virtual environments are at opposing ends of the virtuality continuum [7]. Environments made entirely of real items are at one extreme of this spectrum. On the other side, there are settings that are entirely composed of computer-generated things. Mixed Reality settings combine real and virtual material and are situated in the middle of the continuum. Virtual settings containing some real-world material are referred to as augmented virtuality (AV). Reality is the primary component of AR settings, with computer-generated visual information serving as a supplementary component (Fig. 1). Different sorts of applications are possible due to the technological distinctions between AR and VR devices. Users can practice in entirely computer-generated settings in VR, giving them access to a theoretically infinite number of training situations and circumstances. Because AR apps incorporate virtual material in the actual environment, training choices are slightly different. For example, training may be customized to specific situations.
Fig. 1. Simplified representation of a “virtuality continuum”
2.2 AR and VR for Fire Evacuation AR and VR have previously been utilized to lessen the impact of catastrophes on human and constructed surroundings in a variety of ways. Several review articles have been published to far that illustrate the potential of these technologies for certain purposes (e.g. training and human behavior investigations) or specific catastrophes (e.g. fire research). The following document can be consulted by readers [8]. Several research on human behavior in catastrophes have been conducted utilizing VR and AR. The majority of these studies, according to the analysis, have focused on fire catastrophes, particularly in terms of safety design and behavioral research. A variety of scientific observation and
Augmented Reality for Fire Evacuation Research: An A’WOT Analysis
279
simulation approaches have been developed to better understand how individuals evacuate buildings. Drills, both announced and unannounced, case studies, and laboratory experiments are among them [9–11]. These approaches have helped to advance the simulation of human behavior utilizing many modeling systems by providing insights into human behavior in disasters such as fire [12, 13]. In recent years, emerging technologies such as VR and AR have piqued the curiosity of the safety research community. This tendency has been fuelled by two main expectations [14]. For starters, VR and AR might provide effective, adaptable, and cost-effective training platforms for safety-related scenarios. Second, as research tools, VR and AR might help researchers strike a balance between ecological validity and experimental control in investigations. Much has been published about VR as a tool for studying human behavior and training individuals for emergency situations [15, 16]. However, in pedestrian evacuation research, AR has just lately been discovered as a training and research tool. AR has gotten a lot of interest in related sectors (including construction and industrial safety), and a recent research identified some of the method’s possible uses and problems [14].
3 Methodology The A’WOT technique was used to identify and prioritize the essential components of AR as a predictive and effective research instrument on human fire behavior in this study. SWOT analysis was used to establish the important elements; the AHP was used to build the decision hierarchy; the AHP was used to calculate the priority of SWOT factors and groups; and the flow diagram of the approach is illustrated in Fig. 2. 3.1 The A’WOT Method The A’WOT approach [17] is a hybrid model that integrates two previously unrelated techniques: SWOT analysis and the AHP methodology [18, 19]. The latter is utilized in the A’WOT technique for the prioritizing of SWOT components. Giving weights to them. As a result, by converting qualitative data into quantitative data, This analytical technique is utilized to address one of the primary disadvantages of the SWOT analysis as a strategic planning tool (the incommensurability of the factors found), which obstructs the determination of priority actions. The A’WOT technique has been widely used by many writers in a variety of fields. The A’WOT technique consists of three steps: • The first stage is to create a SWOT analysis by listing the most significant internal (strengths and weaknesses) and external (opportunities and threats) criteria for strategic planning; • The weights of each SWOT category are captured using the AHP method in the second phase; • Finally, the AHP approach is used in the third phase to calculate the relative weights of each component inside the SWOT categories. The first stage in this study was to conduct a literature review; the second and third steps involved conducting structured interviews with a panel of AR experts and practitioners using a questionnaire. Written
280
E. M. Bourhim
questionnaires distributed electronically were used to obtain additional information from independent specialists;
Fig. 2. The flow diagram of the method
Augmented Reality for Fire Evacuation Research: An A’WOT Analysis
281
4 Results and Discussion 4.1 SWOT of AR The A’WOT hybrid approach is used to illustrate the development process for fire evacuation research. Based on a study of the literature [20], the important variables of the internal (strengths and weaknesses) and external (opportunities and threats) environments were determined (see Table 1). Table 1. The SWOT Matrix. Strengths (S)
Weaknesses (W)
S1 : Internal validity
W1 : Need for confirmation/validation
S2 : Replicability
W2 : Inter-individual differences in ease of interaction with AR
S3 : Safety of participants
W3 : Technical limitations
S4 : Real-time feedback
W4 : No gold standard available yet; technology not mature
S5 : Precise measurement
W5 : Costs for development and maintenance of AR software
S6 : Low costs compared to other methods
W6 : Limited computational power of AR hardware
S7 : Design flexibility S8 : Independent of imagination abilities/ willingness of participants S9 .: No simulator sickness compared to VR S10 .: Navigation in real-world environment S11 : Reduced need for 3D models S12 : Ease of transportation and setup Opportunities (O)
Threats (T)
O1 : Intuitive and natural user interfaces
T1 : Failure to show validity
O2 : Graphical developments
T2 : Failure to show training success in real world
O3 : Multi-modal simulation and feedback
T3 : Misleading expectations
O4 : Usability for researchers
T4 : Technical obstacles
O5 : Exchange of 3D-scenes or experiments
T5 .: Privacy
O6 .: Integration in BIM-based design and evacuation modelling O7 : Proliferation of AR ready consumer devices O8 .: Access to cloud computing
282
E. M. Bourhim
4.2 Building the Hierarchical Structure The decision hierarchy for pair-wise comparisons of SWOT factors and groups was created in this stage. The hierarchical structure is divided into three tiers. AR for fire evacuation study is the first level (final objective), the SWOT groups are the second level (criteria), while the SWOT factors are the third level (sub-criteria). 4.3 Prioritization of SWOT Factors and Groups Following the creation of the decision hierarchy, experts evaluated the levels of significance of SWOT criteria and groups. This step included pair-wise comparisons and the creation of a 31-item questionnaire form. Experts were questioned face to face, and judgements were recorded on the questionnaire as paired comparisons. The questionnaire forms were developed in collaboration with specialists in the AR area. Experts were chosen based on their knowledge and experience working on AR projects. The questionnaire was completed by ten individuals from diverse backgrounds. While the AR academics had at least a master’s degree, the expert group’s other members had more than 5 years of expertise in their respective professions. Geometric means of expert views were employed to establish a group consensus. The Super Decisions (Version 2.8) software program was used to calculate the geometric means of all replies for each pairwise comparison. Furthermore, for each comparison matrix, the consistency ratio (CR) was computed and found to be less than 0.10. Table 2 shows the relative importance of relevant factors. According to the findings of the AHP, experts believe strengths to be the most significant factor, followed by opportunities, weaknesses, and threats (see Table 4). Strengths are four times more significant than weaknesses, and opportunities are around two times as essential. Table 2 shows that navigation in a real-world setting (0.305) is one of the biggest assets of AR as a predictive and practical study tool on human fire behavior. Users may navigate intuitively in a real-world environment with AR apps. When contrasted to VR, this is another advantage of AR. Users in VR are generally connected to a wire, restricting their physical walking range. VR apps generally need users to navigate using input devices such as gamepads as a workaround. Furthermore, AR’s capabilities will aid in the development of this relatively new technology for fire evacuation studies. The most crucial AR potential that should be exploited is the proliferation of AR-ready consumer devices (0.225). Because to the successful introduction of AR to the broad consumer market, AR systems have made significant development. The addition of a large number of users is expected to enhance future development, especially for specialized applications like fire safety research, at a lower cost of hardware. The AR technology, while its benefits and potential, nevertheless has flaws and risks. The AR’s most significant flaw is the AR hardware’s limited computing capacity (0.323). Because most AR devices are transportable, they have limited processing power, memory, and storage. The pace at which spatial representations may be refreshed, the weight of computing operations, and the quantity of data that can be processed are all limited by this. The rising availability of broadband internet via cellular or wireless local area networks, on the other hand, promises to eliminate many of these restrictions. The greatest technical danger to AR technology is the failure to demonstrate training success in the actual
Augmented Reality for Fire Evacuation Research: An A’WOT Analysis
283
world (0.322). Future study is needed to see how well AR training systems compare to other techniques of training. Table 2. The priorities of criteria and subcriteria. Criteria Priorities Subcriteria Priorities Criteria Priorities Subcriteria Priorities for level 2 for level 3 for level 2 for level 3 S
O
0.470
0.264
S1
0.101
S2
0.020
W
S3
0.051
W1
0.200
S4
0.014
W2
0.117
S5
0.029
W3
0.210
S6
0.030
W4
0.125
S7
0.020
W5
0.025
S8
0.014
W6
0.323
S9
0.201
S10
0.305
S11
0.204
S12
0.011
O1
0.211
O2
0.120
T1
0.211
O3
0.102
T2
0.322
O4
0.122
T3
0.141
O5
0.044
T4
0.233
O6
0.077
T5
0.093
O7
0.225
O8
0.099
T
0.140
0.126
5 Conclusions For the topic of AR fire research, this article proposes an A’WOT hybrid technique. Many AR strengths are mentioned, which will continue to justify development of existing AR fire applications and the development of new ones. There are flaws in AR, notably with specific restrictions on the computing capacity of AR hardware, although they do not jeopardize the field’s dependability. It is expected that AR technology in fire research will continue to expand tremendously and acquire acceptance as a mainstream tool with intelligent system design that addresses fire research applications. Threats to the field do exist, but none are dangerous, and all are likely solvable with high inspiration, that
284
E. M. Bourhim
is, by all accounts with the field’s different scholars. SWOT analysis was combined with the AHP multicriteria decision-making approach to create the current study. The quantitative values for the SWOT external and internal variables are calculated using the A’WOT technique. This was also shown by the findings of this study, which showed its efficacy, simplicity, and capacity to deal with both quantitative and qualitative aspects. The AHP method is a good way to handle the criteria in a SWOT analysis. One of the issues with SWOT is the inconsistency of the outcomes of diverse factors. This may cause comparisons to get muddled. AHP analysis, on the other hand, can handle a variety of situations with certain uncertainties. The hybrid method’s outcomes were promising. Making pairwise comparisons pushes professionals to consider the relative importance of the various elements and to study the issue in greater depth. Because AHP lacks this feature, this technique ignores interdependencies and experiential feedback between levels. This can be avoided by using the analytic network process (ANP) instead of AHP. Furthermore, the inconsistency of subjective views can be accounted for by rating SWOT elements using pairwise comparisons. We will use some of the most typical methodologies, such as the krill herd (KH), monarch butterfly optimization (MBO), and moth search algorithms, to handle the problem in the future. Acknowledgements. The authors would like to express their gratitude to the experts for their important input. In addition, El mostafa Bourhim wants to thank, in particular, the patience, care and support from lamya Taki over the passed years. Will you marry me?
References 1. Bourhim, E.M., Cherkaoui, A.: Efficacy of virtual reality for studying people’s pre-evacuation behavior under fire. Int. J. Human-Comput. Stud. 142, 102484 (2020). https://doi.org/10.1016/ j.ijhcs.2020 2. Bourhim, E.M., Cherkaoui, A.: Simulating pre-evacuation behavior in a virtual fire environment. In: 2018 9th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1–7 (2018). https://doi.org/10.1109/ICCCNT.2018.8493658 3. Olander, J., et al.: Dissuasive exit signage for building fire evacuation. Appl. Ergon. 59, 84–93 (2017) 4. Lovreglio, R., Fonzone, A., Dell’Olio, L.: A mixed logit model for predicting exit choice during building evacuations. Transp. Res. Part A 92, 59–75 (2016). https://doi.org/10.1016/j. tra.2016.06.018 5. Rios, A., Mateu, D., Pelechano, N.: Follower Behavior in a Virtual Environment’, in Virtual Human Crowds for Immersive Environments (2018) 6. Feng, Z., et al.: Immersive virtual reality serious games for evacuation training and research: a systematic review - working paper (2018) 7. mostafa Bourhim, E., Cherkaoui, A.: How can the virtual reality help in implementation of the smart city? In: 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1–6, (2019). https://doi.org/10.1109/ICCCNT45670.2019.894 4508 8. Feng, Z., González, V.A., Amor, R., Lovreglio, R., Cabrera-Guerrero, G.: Immersive virtual reality serious games for evacuation training and research: a systematic literature review. Comput. Educ. 127, 252–266 (2018). https://doi.org/10.1016/J.COMPEDU.2018.09.002
Augmented Reality for Fire Evacuation Research: An A’WOT Analysis
285
9. Bernardini, G., Lovreglio, R., Quagliarini, E.: Proposing behavior-oriented strategies for earthquake emergency evacuation: a behavioral data analysis from New Zealand. Italy and Japan. Saf. Sci. 116, 295–309 (2019). https://doi.org/10.1016/J.SSCI.2019.03.023 10. Gwynne, S.M.V., et al.: Enhancing egress drills: preparation and assessment of evacuee performance. Fire Mater (2017). https://doi.org/10.1002/fam.2448 11. Lovreglio, R., Kuligowski, E., Gwynne, S., Boyce, K.: A pre-evacuation database for use in egress simulations. Fire Saf. J. 105, 107–128 (2019). https://doi.org/10.1016/J.FIRESAF. 2018.12.009 12. Gwynne, S.M., et al.: A review of the methodologies used in evacuation modelling. Fire Mater 23(6), 383388 (1999) 13. Lindell, M.K., Perry, R.W.: The protective action decision model: theoretical modifications and additional evidence. Risk Anal. 32(4), 616–632 (2012). https://doi.org/10.1111/j.15396924.2011.01647.x 14. Li, X., Yi, W., Chi, H.-L., Wang, X., Chan, A.P.C.: A critical review of virtual and augmented reality (VR/AR) applications in construction safety. Autom. Constr. 86, 150–162 (2018). https://doi.org/10.1016/J.AUTCON.2017.11.003 15. Lovreglio, R., et al.: Comparing the effectiveness of fire extinguisher virtual reality and video training. Virtual Reality 25, 133–145 (2020) 16. Nilsson, D., Kinateder, M.: Virtual reality experiments - the future or a dead end? In: Boyce, K. (ed.) 6th International Symposium Human Behaviour in Fire, pp. 13–22. Interscience Communications, Cambridge (2015) 17. Bourhim, E.M., Cherkaoui, A.: Exploring the potential of virtual reality in fire training research using A’WOT hybrid method. In: Thampi, S.M., et al. (eds.) Intelligent Systems, Technologies and Applications. AISC, vol. 1148, pp. 157–167. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-3914-5_12 18. Bourhim, E.M., Cherkaoui, A.: Selection of optimal game engine by using AHP approach for virtual reality fire safety training. In: Abraham, A., Cherukuri, A.K., Melin, P., Gandhi, N. (eds.) ISDA 2018 2018. AISC, vol. 940, pp. 955–966. Springer, Cham (2020). https://doi. org/10.1007/978-3-030-16657-1_89 19. Bourhim, E.M., Cherkaoui, A.: Usability evaluation of virtual reality-based fire training simulator using a combined AHP and fuzzy comprehensive evaluation approach. In: Jeena Jacob, I., Kolandapalayam Shanmugam, S., Piramuthu, S., Falkowski-Gilski, P. (eds.) Data Intelligence and Cognitive Informatics. AIS, pp. 923–931. Springer, Singapore (2021). https://doi. org/10.1007/978-981-15-8530-2_73 20. Lovreglio, R., Kinateder, M.: Augmented reality for pedestrian evacuation research: promises and limitations. Safety Sci. 128, 104750 (2020). https://doi.org/10.1016/j.ssci.2020.104750
Optimization of Artificial Neural Network: A Bat Algorithm-Based Approach Tarun Kumar Gupta and Khalid Raza(B) Department of Computer Science, Jamia Millia Islamia, New Delhi, India [email protected]
Abstract. Artificial Neural Networks (ANNs) are dominant machine learning tools over the last two decades and are part of almost every computational intelligence task. ANNs have several parameters such as number of hidden layers, number of hidden neurons in each layer, variations in inter-connections between them, etc. Proposing an appropriate architecture for a particular problem while considering all parametric terms is an extensive and significant task. Metaheuristic approaches like particle swarm optimization, ant colony optimization, and cuckoo search algorithm have large contributions in the field of optimization. The main objective of this work is to design a new method that can help in the optimization of ANN architecture. This work takes advantage of the Bat algorithm and combines it with an ANN to find optimal architecture with minimal testing error. The proposed methodology had been tested on two different benchmark datasets and demonstrated results better than other similar methods. Keywords: Neural network · Architecture · Optimization · Bat algorithm · Hidden neurons
1 Introduction An artificial neural network (ANN) having one hidden layer is the basic neural network architecture. It consists of an input layer, an output layer, and a hidden layer between the input and output layers. This approach simulates, the way human brain processes information. Neural networks are popular for their simple architecture, good generalization ability and it is very much capable of solving complex problems. ANNs have far reaching applications such as speech processing [1], signal processing [2], pattern recognition [3], classification and clustering [4], control [5], reconstruction of gene regulatory system [6], function approximation [7], cancer class prediction [8], medical image analysis[9], etc. The success of a neural network depends on several parameters, the initialization weights and proposed optimal architecture have a major contribution in this. Weights can be improved by an appropriate learning algorithm, and estimating the architecture in advance is very crucial. In the architecture of neural networks having fixed neurons at the input and output layers, the cardinality of neurons in the middle layer (hidden) is not fixed and are subject to optimization for the last two decades. A tiny model cannot © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 286–295, 2022. https://doi.org/10.1007/978-3-030-96308-8_26
Optimization of Artificial Neural Network
287
describe the problem with adequate accurately, though bigger models have the desired potential to do so but they become very complex. A number of algorithms have been proposed for the optimization of neural networks and researchers are constantly striving to find better models that could find a way to calculate and determine the appropriate architecture exactly. Metaheuristic algorithms frame neural network features like weights, models, neurons, etc. as an optimization problem. They then use different heuristics for searching a near ideal solution. Also, a multi-objective metaheuristic approach can deal with several objectives concurrently. Neural network architecture optimization is a minimization problem where the aim is to find a minimal structure with a minimum testing error. In a neural network, taking neurons randomly to find an optimal architecture is a very onerous task, and cannot guarantee optimal results. Thus, during the last two decades, numerous researchers have been working in the field of neural network architecture optimization to find globally optimal architectures. There are several optimization algorithms that have already been applied to optimize neural network architectures. For example, study [10] applies a stochastic optimization technique using a combination of genetic algorithm (GA) and simulated annealing (SA). GA translates the ANN architecture and SA helps in finding the optimal one. The study [11] proposes a hybrid approach using Tabu search (TS) and simulated annealing to minimize multi-layered perceptron (MLP). A Multi-objective evolutionary technique was employed in [12] for minimizing ANN architecture. A hybrid Taguchi-genetic algorithm was used in [13] for tuning the structure and parameters of ANN. Particle swarm optimization (PSO) and improved PSO were used in [14] to design a three-layered feed-forward neural network and its architecture. Bayesian optimization methods have also made notable contributions in ANN design. Studies [15, 16, 17] use Bayesian method based techniques to generalize ANN architecture for enhancing the learning ability. A new approach MCAN in [18], demonstrates noteworthy results for video-QA concert. The study [19] presents statistical analysis for video question answering. The GaTSa (Combination of Genetic algorithm TS and SA) is proposed to optimize MLP in [20]. The study [20] initially started with random neurons at the hidden layer then rapidly converges to the global best position. The model proposed in [21] is a hierarchical deep click feature prediction technique for image recognition. Another study [22] uses dimensionality reduction using unsupervised learning. The study [23] defines modified bat algorithm for neural network model, but it does not include the obtained parameter of optimal model. The work in [24] proposes GADNN (a genetic algorithm-based neural network), which consists of two vectors where the first vector contains information about hidden layers and their respective neurons and the second vector contains information about weights. In the studies [25] and [26] researchers have demonstrated the performance of PSO, GA and ant colony optimization over MIMO systems. The work done in [27] uses TS and gradient descent with backpropagation to gradually optimize a neural network having multiple hidden layers. A recent review in the area of optimizing neural networks is shown in [28].
288
T. K. Gupta and K. Raza
After a comprehensive review, it can be concluded that there are several lacunae in ANN model optimization which need further consideration for better results and the area needs further exploration of metaheuristic techniques. This paper aims to combine the advantages of the Bat algorithm [29] and gradient descent with momentum backpropagation (GDM) [30] for searching an ideal model for an ANN. The proposed novel algorithm calculates the hidden layer neurons automatically, a task that was previously done manually. The application of the proposed algorithm has neen tested on two classification benchmark datasets namely, ISOLET and Gas-drift sensor. The proposed paper is structured as follows: Sect. 2 explores the bat algorithm with optimization methodology and fitness function. Section 3 describes the datasets, hardware and software of the experimental setup used. Section 4 presents the results obtained by our experiment and compares them with the other state-of-the-art methods. Finally, Sect. 5 concludes the paper.
2 Optimization Methodology 2.1 Bat Algorithm Bat algorithm is a metaheuristic algorithm based on the swarm intelligence technique proposed by [29]. The method is based on the echolocation behavior of bats when searching for distance to the target. Initially, bats emit a very loud pulse and concentrate on the echo which bounces back from the target. Bats start the journey randomly with some attributes like frequency, velocity, position, pulse rate, and loudness to find the exact location of the target. Different types of bats having different properties of pulses. In the bat algorithm, every movement of bats updates their frequency, velocity, position, loudness, and pulse rate. These updates help bats to decide further movement. The bat algorithm is a population-based algorithm with local search. The loudness and the pulse rate help to preserve the balanced combination of population-based and local search processes. In this algorithm, the pulse rate and loudness are used to maintain stable exploration and exploitation throughout the searching task. The algorithm includes the progressive process of iteration where some sets of solutions are updated by a random walk. The algorithm updates pulse rate and loudness only after it accepts a new solution. The basic computation for updating the frequency, velocity, and position is as follows: fi = fmin + (fmax − fmin )β
(1)
t fi vit = vit−1 + xit−1 − xgbest
(2)
Optimization of Artificial Neural Network
xit = xit−1 + vit
289
(3)
where, β is a random number that belongs to [0, 1], fi denotes the frequency of ith bat that helps to maintain the range and speed of ith bat’s movement. The velocity and position t represents global of ith bat are represented by vi and xi respectively. The Eq. (2), xgbest best at iteration t. Using the local search advantage bat algorithm controls the diversity in each iteration. The local search can be seen in the algorithm by using a random walk strategy as follows: (4) Here, e is a random number that lies between [−1, 1] and A is the mean loudness of whole the population. In each iteration, the algorithm updates the pulse rate ri and loudness Ai . According to the bat algorithm, the pulse rate increases and loudness decreases as the bat moves towards the target. The equation can be written as: At+1 = ∝ Ati i
(5)
rit+1 = ri0 1 − exp(−γ t)
(6)
In Eq. (5) ∝ and γ are constants where ∝ ∈ [0, 1] and γ > 0. 2.2 Proposed Methodology In this work, the Bat algorithm is combined with the GDM learning algorithm and a neural network having only one hidden layer is used for optimization. The objective of the proposed method is to find an optimal solution pgbest from the bat’s population P using GDM where, f pgbest < f (p), where p ∈ P and f is the real-valued cost function. The bat population is randomly initialized by the neurons in between the range of input and output neurons of the dataset. Bat algorithm generates a local-search-based population by updating the frequency and velocity and accepts only the best solution using GDM. Algorithm 1 shows the pseudo-code for the proposed algorithm in the current work.
290
T. K. Gupta and K. Raza
Algorithm1: Pseudocode for Proposed Methodology
INPUT: #inputneurons, #outputneurons, popSize, minFreq, maxFreq, maxLoud, minLoud, maxPulse, minPulse, maxVel, minVel, trainData, testData OUTPUT: //number of optimal neurons with minimum testing error [ ] //number of neurons [ ] [ ] [ ] [ ] ( )//USE GDM with momentum 0.7 0 // [ ] 0 : (1), (2)
(3)
[ ] [ ] ,) ′
: [ ] [ ]) ← [ ]
(
. (4) [ ]
′
)):
′ ′
[ ]
:
[ ]
. (5)
′
2.3 Fitness Function A fitness function defines how accurately a given model is performing on a given task. It works in each iteration and compares the result of the new solution with the older solution(s) to attain the goal of the optimization function. Suppose the dataset is divided into DN classes, and the sample s from the testing set E is: ∅(s) = {1, 2, 3 . . . , DN }∀s ∈ E
(7)
The method inherits the “winner takes all” method, so that the output class DN has one to one correspondence with the number of output neurons of each dataset, it is presume that the output value from node k, for the data sample s is Ok (s). Then the class of data sample s is: δ(s) = argmaxk∈{1,2,3,...,DN } Ok (s)∀k ∈ E
(8)
Optimization of Artificial Neural Network
291
1, if ∅(k) = δ(k) 0, if ∅(k) = δ(k)
(9)
The error can be written as: ∈ (k) ≡
Therefore, the testing classification error in the model for sample E in terms of percentage is: 100 ∈ (k) (10) Error(E) = s #E Where #E is the size of testing dataset E.
3 Datasets and Experimental Setup The proposed method was tested on two benchmark datasets i.e. ISOLET [31] and the Gas Sensor Array Drift [32, 33]. ISOLET (Isolated Letter Speech Recognition) dataset has 150 speakers grouped in 5 groups such that each group consists of 30 speakers. Every speaker needs to speak each alphabet twice. The dataset has 7797 samples. Each recorded data item consists of 617 input features; the target is to classify the data items into 26 different classes. The dataset is divided into 5 parts, where 4 parts are used for training and the remaining fifth part of the dataset is used for testing. The Gas-drift dataset has 13,910 samples taken from 16 chemical sensors fitted in recreations for drift compensation in a classification task of 6 gases at different levels of concentrations. Gas sensor array drift dataset 128 input features and 6 output features (Table 1). This work, is implemented in Python (Anaconda) using the h2o package. Every model in the experiment is validated by 20% of the dataset. The fitness function is implemented with multinomial distribution, rectifier with dropout activation function, with learning rate 0.001, momentum term 0.2, and the drop out ratio is considered as 0.2. The proposed model was trained and tested using a machine having an Intel® Core(TM) i7-10700 CPU @ 2.90 GHz with 16 GB RAM. Table 1. Considered dataset statistics Dataset
Examples
Features
Classes
References
ISOLET
7797
617
26
[28]
Gas-drift
13,910
128
6
[29, 30]
4 Results and Discussion The accuracy of the proposed algorithm has been tested using two classification benchmark datasets, namely, ISOLET and Gas drift (Table 1). Estimation of neurons at a
292
T. K. Gupta and K. Raza
single hidden layer when the dataset has less number of input features is not a very time-consuming task. This work is proposed to estimate the neurons when the size of the dataset is large and has a large number of input features. Further, the proposed model can also work with datasets having more than two classes. 4.1 Experiments with Random Neurons In general, a neural network does not have any straightforward method to calculate the neurons at the hidden layer. For estimating the same, users need to take a random model and test it on their dataset, then based on testing and training error, they need to repeat the process several times. After several repetitions, users can select the result that best suits their requirements. This is a very crucial and highly tedious task has many flaws. In this proposed work, every dataset is tested over 30 different architectures and chose the optimal architecture. A fully connected Neural network is considered for the experiments and the performance is evaluated on the basis of testing error. The result summary is attached in Table 2. When experimenting with random parameters, initially a population of 10 bats was chosen where every bat has its frequency, velocity, pulse rate, and loudness. The position of every bat suggests the number of neurons, which are chosen in between the range of input and output neurons. Every bat is then iterated for 10 iterations. In each iteration, the bat is controlled by pulse rate and loudness. Finally, the algorithm returns the values for an optimal bat. In the case of ISOLET the best topology with random neurons was found to have 357 neurons and testing error as 2.289% (MSE) while for Gas-drift optimal model has 82 neurons with 6.874% (MSE). 4.2 Experiments with Proposed Optimization Methodology The proposed technique (Algorithm 1), starts with random initialization of hidden neurons (in between the range of input and output features) to the bat population then calculates the fitness of every bat. It finds the best bat based on fitness function (having a minimum testing error) and puts it in pgbest . This pgbest contains the testing error, training error and neurons. The whole population of bats is passed in the loop that iterates the population for max_iter = 10. Every iteration updates the bat population (number of neurons) locally and finds the local best. Finally, it returns the global best as in Algorithm 1. In this algorithm, we chose loudness [1, 2], frequency [0, 1], velocity [−3, 3] and pulse rate [0, 1]. The range of velocity helps to increase or decrease the number of neurons in the bat population (Eq. (3)). Similarly, loudness helps the bat for making small adjustments randomly (Eq. (4)). In each iteration, the bat moves towards the target and increases its pulse rate, and decreases its loudness (Eq. (5) and (6)). Table 2 shows that for ISOLET, the optimal architecture has 346 neurons with 2.002% (MSE) testing error and for the Gas-drift dataset minimum testing error is 6.141% with 78 neurons at hidden layers. In the same table, it can also be noted that the proposed methodology is compared with the Tabu search-based algorithm [24] and that the proposed methodology has slightly better performance than the Tabu search-based algorithm [27].
Optimization of Artificial Neural Network
293
Table 2. Comparison of the proposed methodology with related work (#N are neurons) Dataset
Proposed methodology
Tabu search-based algorithm [24]
#N
#N
MSE (%) Train error
Test error
Experiment with random neurons
MSE (%)
#N
Train error
Test error
MSE (%) Train error
Test error
ISOLET
346
0.0591
2.002
335
0.0301
2.070
357
0.022
2.289
Gas-drift
78
5.013
6.141
75
5.3171
6.347
82
5.611
6.874
5 Conclusion The motivation behind this work was to find hidden neurons for a dataset having a large number of input features, and need to classify in a large number of classes. So, we start with ISOLET dataset having 26 output classes and having 617 input features. The range between 617 and 26 is very large so estimation of neurons at hidden layers randomly is very crucial. In the end, the work proves that integration of the Bat algorithm and GDM can provide an optimal solution. Similarly, the Gas-drift dataset also shows minimal testing error when architecture is generated out by the proposed method. This work uses a fully connected network with a single hidden layer; it can be further improved by integrating it with another algorithm. For example, the random fly method can be integrated with some other algorithm that can find a finer architecture. The proposed method can also be implemented for denser neural networks i.e. having multiple hidden layers. Availability of Code. Python code will be uploaded to Github after the acceptance of the paper.
References 1. Gorin, L., Mammone, R.J.: Introduction to the special issue on neural networks for speech processing. IEEE Trans. Speech Audio Process. 2(1), 113–114 (1994). https://doi.org/10. 1109/89.260355 2. Hwang, J.N., Kung, S.Y., Niranjan, M., Principe, J.C.: The past, present, and future of neural networks for signal processing: the neural networks for signal processing technical committee. IEEE Signal Process. Mag. 14(6), 28–48 (1997). https://doi.org/10.1109/79.637299 3. Jain, A.K., Duin, P.W., Mao, J.: Statistical pattern recognition: a review. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 4–37 (2000). https://doi.org/10.1109/34.824819 4. Zhang, G.P.: Neural networks for classification: a survey. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 30(4), 451–462 (2000). https://doi.org/10.1109/5326.897072 5. Lam, H.K., Leung, F.H.F.: Design and stabilization of sampled-data neural-network-based control systems. In: Proceedings of the International Joint Conference on Neural Networks, vol. 4 (2005). https://doi.org/10.1109/IJCNN.2005.1556251 6. Raza, K., Alam, M.: Recurrent neural network based hybrid model for reconstructing gene regulatory network. Comput. Biol. Chem. 64, 322–334 (2016). https://doi.org/10.1016/j.com pbiolchem.2016.08.002
294
T. K. Gupta and K. Raza
7. Selmic, R.R., Lewis, F.L.: Neural-network approximation of piecewise continuous functions: application to friction compensation. IEEE Trans. Neural Netw. 13(3), 745–751 (2002). https://doi.org/10.1109/TNN.2002.1000141 8. Raza, K., Hasan, A.N.: A comprehensive evaluation of machine learning techniques for cancer class prediction based on microarray data. IJBRA 11(5), 397 (2015). https://doi.org/10.1504/ IJBRA.2015.071940 9. Raza, K., Singh, N.K.: A tour of unsupervised deep learning for medical image analysis. CMIR 17(9), 1059–1077 (2021). https://doi.org/10.2174/1573405617666210127154257 10. Stepniewski, S.W., Keane, A.J.: Pruning backpropagation neural networks using modern stochastic optimisation techniques. Neural Comput. Appl. 5(2), 76–98 (1997). https://doi. org/10.1007/BF01501173 11. Ludermir, T.B., Yamazaki, A., Zanchettin, C.: An optimization methodology for neural network weights and architectures. IEEE Trans. Neural Netw. 17(6), 1452–1459 (2006). https:// doi.org/10.1109/TNN.2006.881047 12. Gepperth, A., Roth, S.: Applications of multi-objective structure optimization. Neurocomputing 69(7–9), 701–713 (2006). https://doi.org/10.1016/j.neucom.2005.12.017 13. Tsai, J.T., Chou, J.H., Liu, T.K.: Tuning the structure and parameters of a neural network by using hybrid Taguchi-genetic algorithm. IEEE Trans. Neural Netw. 17(1), 69–80 (2006). https://doi.org/10.1109/TNN.2005.860885 14. Huang, D.S., Du, J.X.: A constructive hybrid structure optimization methodology for radial basis probabilistic neural networks. IEEE Trans. Neural Netw. 19(12), 2099–2115 (2008). https://doi.org/10.1109/TNN.2008.2004370 15. Pelikan, M., Goldberg, D.E., Cantú-Paz, E.: BOA: the Bayesian optimization algorithm 1 introduction. In: Proceedings of the Genetic and Evolutionary Computation Conference GECCO 1999, vol. 1 (1999) 16. Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine learning algorithms. In: Advances in Neural Information Processing Systems, vol. 4 (2012) 17. Zhou, Y., Jun, Y., Xiang, C., Fan, J., Tao, D.: Beyond bilinear: generalized multimodal factorized high-order pooling for visual question answering. IEEE Trans. Neural Netw. Learning Syst. 29(12), 5947–5959 (2018). https://doi.org/10.1109/TNNLS.2018.2817340 18. Yu, Z., Yu, J., Cui, Y., Tao, D., Tian, Q.: Deep modular co-attention networks for visual question answering. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2019 (2019). https://doi.org/10.1109/CVPR.2019.00644 19. Yu, Z., et al.: ActivityNet-QA: a dataset for understanding complex web videos via question answering (2019). https://doi.org/10.1609/aaai.v33i01.33019127 20. Zanchettin, C., Ludermir, T.B., Almeida, L.M.I.: Hybrid training method for MLP: optimization of architecture and training. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 41(4), 1097–1109 (2011). https://doi.org/10.1109/TSMCB.2011.2107035 21. Yu, J., Tan, M., Zhang, H., Tao, D., Rui, Y.: Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans. Pattern Anal. Mach. Intell. (2019). https://doi. org/10.1109/tpami.2019.2932058 22. Zhang, J., Yu, J., Tao, D.: Local deep-feature alignment for unsupervised dimension reduction. IEEE Trans. Image Process. 27(5), 2420–2432 (2018). https://doi.org/10.1109/TIP.2018.280 4218 23. Jaddi, N.S., Abdullah, S., Hamdan, A.R.: Optimization of neural network model using modified bat-inspired algorithm. Appl. Soft Comput. J. 37, 71–86 (2015). https://doi.org/10.1016/ j.asoc.2015.08.002 24. Jaddi, N.S., Abdullah, S., Hamdan, A.R.: A solution representation of genetic algorithm for neural network weights and structure. Inf. Process. Lett. 116(1), 22–25 (2016). https://doi. org/10.1016/j.ipl.2015.08.001
Optimization of Artificial Neural Network
295
25. Sindhwani, N., Singh, M.: Performance analysis of ant colony based optimization algorithm in MIMO systems (2018). https://doi.org/10.1109/wispnet.2017.8300029 26. Sindhwani, N., Bhamrah, M.S., Garg, A., Kumar, D.: Performance analysis of particle swarm optimization and genetic algorithm in MIMO systems (2017). https://doi.org/10.1109/ICC CNT.2017.8203962 27. Gupta, T.K., Raza, K.: Optimizing deep feedforward neural network architecture: a tabu search based approach. Neural Process. Lett. 51(3), 2855–2870 (2020). https://doi.org/10. 1007/s11063-020-10234-7 28. Gupta, T.K., Raza, K.: Optimization of ANN architecture: a review on nature-inspired techniques. Mach. Learn. Bio-Signal Anal. Diagn. Imaging (2019). https://doi.org/10.1016/b9780-12-816086-2.00007-2 29. Yang, X.S.: A new metaheuristic Bat-inspired Algorithm. In: Studies in Computational Intelligence, vol. 284 (2010). https://doi.org/10.1007/978-3-642-12538-6_6 30. Li, Y., Fu, Y., Zhang, S.W., Li, H.: Improved algorithm of the back propagation neural network and its application in fault diagnosis of air-cooling condenser (2009). https://doi.org/10.1109/ ICWAPR.2009.5207438 31. Dua, D., Graff, C.: UCI Machine Learning Repository: Data Sets. School of Information and Computer Science, University of California, Irvine (2019) 32. Rodriguez-Lujan, I., Fonollosa, J., Vergara, A., Homer, M., Huerta, R.: On the calibration of sensor arrays for pattern recognition using the minimal number of experiments. Chemometrics Intell. Lab. Syst. 130, 123–134 (2014). https://doi.org/10.1016/j.chemolab.2013.10.012 33. Vergara, A., Vembu, S., Ayhan, T., Ryan, M.A., Homer, M.L., Huerta, R.: Chemical gas sensor drift compensation using classifier ensembles. Sens. Actuators B: Chem. 166–167, 320–329 (2012). https://doi.org/10.1016/j.snb.2012.01.074
ResD Hybrid Model Based on Resnet18 and Densenet121 for Early Alzheimer Disease Classification Modupe Odusami1(B) , Rytis Maskeli¯ unas1 , Robertas Damaˇseviˇcius2 , and Sanjay Misra3 1
Department of Multimedia Engineering, Kaunas University of Technology, Kaunas, Lithuania {modupe.odusami,rytis.maskeliunas}@ktu.lt 2 Department of Software Engineering, Kaunas University of Technology, Kaunas, Lithuania [email protected] 3 Department of Computer Science and Communication, Ostfold University College, Halden, Norway [email protected] Abstract. AD is a neurodegenerative condition that affects brain cells. It is a progressive and incurable disease. Early detection will save the patient’s brain cells from further damage and thereby prevent permanent memory loss. Various automated methods and procedures for the detection of Alzheimer’s disease have been proposed in recent years. To mitigate the loss to a patient’s mental health, many approaches concentrate on quick, reliable, as well as early disease detection. Even though deep learning techniques have greatly enhanced imaging devices for medical purposes for Alzheimer’s disease by delivering diagnostic accuracy that is like that of humans. The existence of strongly associated features in the brain structure continues to be a challenge for multi-class classification. This used a ResD hybrid approach based on Resnet18 and Densenet121 for the multiclass classification of Alzheimer’s Disease on the MRI dataset. Information from the two pre-trained models is combined for classification. Experiments show that the proposed hybrid model outperforms alternative techniques from existing works. The proposed ResD model gives a weighted average (macro) precision score of 99.61%. Through experiments, we show that the proposed hybrid model produces less classification error with hamming loss of 0.003.
1
Introduction
AD is a neurological disease that affects more than 50 million individuals around the world [1]. The disease damages the brain irreversibly, impairing cognition, memory, and other functions, and this could result in the person’s death in the instance of brain failure. Because the signs of Alzheimer’s disease advance slowly, and the morphological similarity between normal cognitive (NC) and Mild cognitive impairment (MCI), it is difficult to make a proper diagnosis. Early c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 296–305, 2022. https://doi.org/10.1007/978-3-030-96308-8_27
Hybrid Model for Early Alzheimer Disease Classification
297
detection and treatment, on the other hand, can help a patient’s health. The intermediate stage of AD is divided into three stages based on the extent of brain damage and the patient’s condition: MCI, Early MCI (EMCI), and late MCI (LMCI). Slight and measurable changes in thinking ability are seen in people with MCI. 32% of people with MCI develop AD within 5 years, indicating that MCI patients have a great chance of developing AD [2]. Researchers have focused a lot of effort on the early detection of MCI in recent decades. Magnetic resonance imaging (MRI) is a crucial aspect of the clinical examination of individuals with probable Alzheimer’s disease because it can clearly show clinicians the functional and structural alterations in the brain among the intermediate stages of AD. To accurately classify the functional changes in the brain is very essential during the early diagnosis of AD by the medical practitioners. This stage is not only difficult and intricate, but it is also time-consuming. This procedure may be automated by creating computerized classification techniques for the brain alterations to aid early diagnosis of AD using MRI. However, this is not a simple process; there are numerous difficulties connected with MRI categorization, the main challenge being a difficult clinical problem due to the occurrence of highly related traits among the moderate cognitive stages [3]. Many researchers have utilized deep learning (DL) approaches on MRI analysis for multi-class classification (EMCI vs LMCI vs MCI vs AD vs NC) utilizing deep learning approaches. VGGNet, ResNet, and Densenet are examples of pre-trained neural networks that have been effectively used in MRI analysis. Their last layer, known as a Fully Connected (FC) layer, is remodeled with equal amount of inputs as the previous layer and equal amount of the dataset’s total number of classes. And because these Pre-trained networks were based on a big benchmark dataset to tackle a similar issue to the one we’re trying to solve, they can automatically learn hierarchical feature representations. However, for multi-class classification, the features of a class might not be discriminative enough for a single pre-trained model to extract the meaningful information for accurate classification. The motivation behind this study is to build a reliable method for the multiclass classification of AD. [4]. From previous studies, there is intervention of inter-class instability [5], and MRI images are usually structurally sealed to a peculiar space which might change AD - related pathology and loss some useful information. Hence it is often not feasible for a single network to extract enough information for accurate prediction of AD. We explore the use of aggregating features from the FC layer of different pretrained CNNs to strenghtened classification performance. To this end, we introduce a hybrid Resnet18 and Densenet121 model fine-tuned for multiclass classification. We do a lot of tests with a bunch of pre-trained CNNs used in existing studies. The following are the key contributions of the proposed model: i. The proposed approach yields a weighted average (macro) precision score of 99.61% with reduced false positives for the MRI dataset. ii. The proposed architecture, which includes the aggregation of ResNet18 FC and Densenet121 FC can be used with a variety of Pre-trained CNN models and results in increased performance.
298
2
M. Odusami et al.
Literature Review
Early diagnosis of AD by analysing MRI dataset on DL architectures based on pre-trained networks has been widely explored by several researchers. Authors [6] utilized layer wise approach based on VGG19 for early detection of AD. MRI scans from the ADNI database were used for model assessment. On EMCI/LMCI, the proposed model achieved an accuracy of 83.72%. Authors [7] proposed a deep CNN model with two dimensions which trains using 3D MRI images from OASIS as input. The model did not only showed increase accuracy, but also tackled the issue of class imbalance that is in-built in multiclass classification problems. Authors [8] proposed utilized a deep CNN by investigating two pre-trained models (Vgg16 and Inception V4) on 3D MRI images from OASIS as input. Experimental result showed that inception V4 outperforms Vgg16 with an accuracy of 96.25%. Authors [9] presented a DL approach for AD classification from ADNI database using Vgg16 and support vector machine (SVM). Vgg16 was utilized for feature extraction and SVM for classification. On the OASIS dataset, authors in [10] used a combination of three DenseNet pre-trained network with various depths and transfer learning. In the training phase, a strong gradient flow was used, and the model performance was improved given accuracy of 93.18%. Furthermore, authors [11] generated a Siamese CNN model based on VGG-16 to classify dementia stages. The proposed model gave a test accuracy of 99.05% on OASIS dataset. However, Preprocessing measures like normalization, skull stripping, and segmentation have complicated parameters that the model cannot handle. Author [12] proposed and evaluate an approach based on AlexNet for multiclass classification to classify Alzheimer’s disease. For multi-class classification of un-segmented images, the algorithm produced good results, with an overall accuracy of 92.85%. However, the authors suggested the need to investigate the efficacy of alternative cutting-edge CNN architectures. Authors in [13] proposed a DL model using ResNet18 and Support Vector Machine (SVM) for the identification of AD in sagittal MRI images from OASIS. ResNet18 was used for the extraction of features and these features are fed to SVM for classification. The proposed model gave an accuracy of 86.81%. Authors in [6] created a layer-wise transfer learning model for AD by anticipating the best outcomes on binary class classification. The suggested model distinguished between normal control (NC) and AD with high accuracy but gave a low accuracy in distinguishing between mild cognitive impairment in its intermediate stages and AD. For multiclass categorization of Alzheimer’s disease, Authors in [14] used multiple CNN-based transfer learning approaches with varied parameters. DenseNet outperformed the other pre-trained CNN models by obtaining a maximum average accuracy of 99.05% utilizing a fine-tuned transfer learning strategy. The authors noted that employing ensemble-based methodologies, the proposed method’s findings could be enhanced further in the future. Authors [15] looked at the use of rs-fMRI for AD classification based on multiple stages. The clas-
Hybrid Model for Early Alzheimer Disease Classification
299
sification job was carried out with resnet networks, and the results revealed a broad range of outcomes AD stage.
3
DataSet
This study used data from ADNI database, and ADNI is long-term research aimed at developing imaging, clinical, genetic, and biochemical indicators for early diagnosis and tracking of AD. There are 138 fMRI scans in the dataset, with 25 CN, 25 SMC, 25 EMCI, 25 LMCI, 13 MCI, and 25 AD. The participants are above the age of 71, and each has been diagnosed with AD and assigned to one of the phases based on their cognitive test results [15]. Figure 1 depicts the MRI dataset’s class distribution.
Fig. 1. MRI dataset’s class distribution before train/validation/test split
4
Proposed Approach
We will discuss our proposed model in this section. Figure 2 depicts the proposed ResD hybrid model and is based on the Resnet18 and Densnet121 models which employ a variety of hyperparameters for training and optimizing these parameters using a loss function and SGD optimizer during training. The proposed model is made up of two parts., fine-tuning which updates all of the model’s parameters for our new task and a classification module. In image classification task using pre-trained CNN, the classification head is usually a softmax classifier. We test various common CNN designs, including ResNet18, DenseNet121, Vgg16, Inceptionv4, and Alexnet to see which one gives the best result. Then, they are optimized for classification and the best model with the highest weighted Precision is selected. Resnet 18 and Densenet121 offered the best precision after
300
M. Odusami et al.
trying with many pre-trained CNN models for this study. The proposed model undergoes two phases of training for the fine tuning. In both first phase and second phase, Resnet18 model and Densenet121 are fine-tuned on the training data to get the discriminating information for the different classes respectively. In the classification module, the discriminating information acquired for different classes at the FC from the two models is concatenated for classification of AD.
Fig. 2. Proposed ResD hybrid model based on Resnet18 and Densnet121 models
Resnet18 acquires compact features of say 512-dimension and add a new FC layer which has 512 neurons. Likewise, Densesenet121 extracts discriminative features and add an FC layer which takes the input of 1024 and 512 output. The information from the FC layers are concatenated and sent to the final FC layers. According to experimental assessment, the approach to concatenation of features yielded information on multiple scales from input images for AD categorization.
5
Experimental Results
The findings from the experimental analysis for our proposed model are presented in this part. To train our proposed model, we employed a variety of hyper-parameters such as: categorical cross-entropy loss function, the number of epochs 10, learning rate 0.003, SGD optimizer, weight decay 0.02 and batch size 10. The NVIDIA Corporation Tu116 [Geforce GTX 1660] GPU was used to train all of the networks. The DL models were created using the PyTorch package. The proposed model is trained by using a random split that split the dataset into non-overlapping datasets of given lengths 5317, 1141, and 1141 for the train, validation, and test phase. At epochs above, the model stopped learning and begins to overfit, weight decay of 0.01 is utilized to avoid overfitting by penalizing large weights. The most appropriate checkpoint to utilize in model evaluation using the underlying performance metrics is selected once the model is trained. The evaluation metrics of the different Pre-trained CNN models examined are listed in Table 1. Resnet18 and Densenet121 gave the most effective model with a weighted average precision of 99.20% and 99.00%, respectively.
Hybrid Model for Early Alzheimer Disease Classification
301
Table 1. Evaluation metrics result of different pretrained model on the test dataset Model
Precision Recall F1-score
Vgg16
97.00
Densenet121 99.00
100
98.00
99.00
99.00
Resnet18
99.20
99.20
99.20
Inceptionv4
97.90
97.90
97.90
Alexnet
0.958
0.958
0.981
During the second stage of the proposed model, the features from Resnet18 and Densenet121 are concatenated for the classification task. The evaluation results of concatenating various characteristics from Resnet18 and densnet121 and used in training the proposed ResD model. By reshaping the final FC, the proposed model gave a better result on the test data. Validation accuracy/training accuracy and Validation loss/Training Loss is depicted in Fig. 3. The classification results for our proposed model are shown in Table 2 for individual class of AD diagnosis. For the purpose of bench marking, the performance of our proposed model was evaluated on multiclass classification. Figure 2 depicts the confusion matrix obtained for ResD hybrid model. We also measure the hamming loss and jaccard score of the proposed model. To further validate our proposed model, in Table 3 we detailed 2-way (AD vs CN) classification results on the ADNI dataset and the OASIS dataset (MRI images in Saggital orientation). The confusion matrix of the proposed model on OASIS data is depicted in Fig. 5.
Fig. 3. Validation accuracy/training accuracy and validation loss/training loss
302
M. Odusami et al. Table 2. Evaluation metrics result of the proposed model Classes
Precision Recall F1-score
AD (0)
1.00
0.98
0.99
CN (1)
1.00
1.00
1.00
EMCI (2) 1.00
1.00
1.00
LMCI (3) 0.98
1.00
0.99
MCI (4)
1.00
1.00
1.00
Fig. 4. Confusion matrix for the proposed ResD model
Fig. 5. Confusion matrix for proposed ResD with OASIS data
6
Discussion
The main finding of our work was that, using MRI brain studies, our classifier was able to identify Early mild-AD patients and late mild- AD patients who need treatments before AD. This study showed that the concatenation informa-
Hybrid Model for Early Alzheimer Disease Classification
303
Table 3. Evaluation metrics result of the proposed model ADNI and OASIS data based on 2-ways classification Classes Precision Recall F1-score ADNI data AD (0) 1.00
1.00
1.00
CN (1) 1.00
1.00
1.00
AD (0) 0.76
1.00
0.87
CN (1) 1.00
0.88
0.94
OASIS data
tion of different algorithms of proposed Resnet18 and DensNet121 gave excellent performance as compared to single pretrained model finetuning approach which Table 1 and 2 explain. The proposed model gave an average accuracy and Loss of 99.64% and 0.026% respectively on the test data. The precision produced by proposed method, based on feature concatenation, clearly showed the highest performance. As compared to single pretrained model. The proposed model produced the most effective outcome on the validation data with precision 100% of class 0, 1, 2, and 4 and with a weighted average (macro) precision of 99.61%. Although there is still needed to interpret the Recall and precision since they are both proportional to true positive (TP). The proposed model gave a weighted average (macro) precision of 99.67%. Although the recall is slightly higher than the precision, the margin could be neglected as it is not an indication that the model is prone to false positive. From Fig. 4 we observe that few data were misclassified, 4 out of 206 MCI subjects were classified as AD. This is a better outcome, as it is an important factor when dealing with medical diagnosis to diagnose an healthy person as disease than to diagnose a disease person as healthy. It is preferring to screen a person as diseased than to eliminate a diseased person by falsely predicting a negative. Table 2 showed that the proposed model gave a good performance for all the predicted classes. However, the misclassified subjects from AD are due to the fact that changes in the brains of people with AD are close to those that are present in AD.Table 4 showed that the proposed model generalized well on the oasis dataset with 76%, 100%, and 87% precision, recall and F1-score in discrimination AD from CN respectively. From Fig. 5, our proposed model misclassified fewer subjects as AD.
7
Comparison with Existing Studies
We chose the following models depicted in Table 4 from recent studies on early detection of AD to benchmark and compare our hybrid model. These types of models have been validated on ADNI dataset of MRI images for multiclass classification. Our findings are compared to the selected existing studies as shown in Table 4. For multiclass classification scenario, the proposed model performs better.
304
M. Odusami et al.
Table 4. Comparison of existing model with our proposed ResD hybrid model Ref.
Methods
[19]
Acc (%)
Prec/ Prec/ Prec/ Rec/ Rec/ Rec/ MCI (%) EMCI (%) LMCI (%) MCI (%) EMCI (%) LMCI (%)
ResNet18 and AD/MCI/CN SVM (3 Way)
76.64
69.00
–
–
84.61
–
[22]
Densenet
AD/MCI/CN (3 Way)
99.05
–
–
–
–
–
[23]
Resnet18
AD/MCI/EMCI/ 97.88 LMCI/SMC/CN (6 Way)
100
99.87
100
97.40
97.38
97.43
AD/MCI/EMCI/ 99.64 LMCI/CN (5 Way)
100
100
98.00
100
100
100
Proposed Resnet18 and Densenet121
Classification
–
The study [15] outperformed our proposed hybrid model with LMCI class precision of 1.00. Although study [15] also utilized fine-tuning, but numerous data augmentation approaches were used to avoid overfitting, while our proposed model utilized weight decay. The ideal of weight decay is to penalize large weights, LMCI discriminative features may be having larger weights and some of which are penalized during classification. LMCI may have different pathosis from AD, our proposed model still extracted the most discriminative features with higher classification accuracy.
8
Conclusions
DL models for the detection of AD at the prodromal stage was effectively utilized in this study. In this paper, different pre-trained models were initially assessed on the MRI dataset from ADNI to select the best performance. Then the best two models namely Resnet18 and Desenet121 were trained, and meaningful information was extracted from these models. To build our proposed ResD hybrid model. Then, the extracted facts was concatenated and passed to the fully connected layer to classify AD. Experiments show that the proposed ResD hybrid model can accurately identify early AD with a weighted average (macro) precision of 99.67%. Future studies will concentrate on improving the model’s interpretability by displaying the model decision to make it more understandable.
References 1. Nichols, E., Szoeke, C.E.I., et al.: Global, regional, and national burden of Alzheimer’s disease and other dementias, 1990–2016: a systematic analysis for the global burden of disease study 2016. Lancet Neurol. 18(1), 88–106 (2019) 2. Kang, L., Jiang, J., Huang, J., Zhang, T.: Identifying early mild cognitive impairment by multi-modality MRI-based deep learning. Front. Aging Neurosci. 12, 206 (2020) 3. Jiang, J., Kang, L., Huang, J., Zhang, T.: Deep learning based mild cognitive impairment diagnosis using structure MR images. Neurosci. Lett. 730, 134971 (2020)
Hybrid Model for Early Alzheimer Disease Classification
305
4. Odusami, M., Maskeli¯ unas, R., Damaˇseviˇcius, R., Krilaviˇcius, T.: Analysis of features of Alzheimer’s disease: detection of early stage from functional brain changes in magnetic resonance images using a finetuned resnet18 network. Diagnostics 11(6), 1071 (2021) 5. Lin, W., et al.: Convolutional neural networks-based MRI image analysis for the Alzheimer’s disease prediction from mild cognitive impairment. Front. Neurosci. 12, 777 (2018) 6. Mehmood, A., et al.: A transfer learning approach for early diagnosis of Alzheimer’s disease on MRI images. Neuroscience 460, 43–52 (2021) 7. Nawaz, A., Anwar, S.M., Liaqat, R., Iqbal, J., Bagci, U., Majid, M.: Deep convolutional neural network based classification of Alzheimer’s disease using MRI data. In: 2020 IEEE 23rd International Multitopic Conference (INMIC). IEEE (2020) 8. Hon, M., Khan, N.M.: Towards Alzheimer’s disease classification through transfer learning, pp. 1166–1169 (2017) 9. Janghel, R.R., Rathore, Y.K.: Deep convolution neural network based system for early diagnosis of Alzheimer’s disease. IRBM 42(04), 258–267 (2021) 10. Islam, J., Zhang, Y.: Understanding 3D CNN behavior for Alzheimer’s disease diagnosis from brain pet scan (2019) 11. Mehmood, A., Maqsood, M., Bashir, M., Shuyuan, Y.: A deep Siamese convolution neural network for multi-class classification of Alzheimer disease. Brain Sci. 10(2), 84 (2020) 12. Maqsood, M., et al.: Transfer learning assisted classification and detection of Alzheimer’s disease stages using 3D MRI scans. Sensors 19(11), 2645 (2019) 13. Puente-Castro, A., Fernandez-Blanco, E., Pazos, A., Munteanu, C.R.: Automatic assessment of Alzheimer’s disease diagnosis based on deep learning techniques. Comput. Biol. Med. 120, 103764 (2020) 14. Ashraf, A., Naz, S., Shirazi, S.H., Razzak, I., Parsad, M.: Deep transfer learning for Alzheimer neurological disorder detection. Multimedia Tools Appl. 80(20), 30117– 30142 (2021) 15. Ramzan, F., et al.: A deep learning approach for automated diagnosis and multiclass classification of Alzheimer’s disease stages using resting-state fMRI and residual neural networks. J. Med. Syst. 44(2), 1–16 (2019)
Quantum Ordering Points to Identify the Clustering Structure and Application to Emergency Transportation Habiba Drias1(B) , Yassine Drias2 , Lydia Sonia Bendimerad1 , Naila Aziza Houacine1 , Djaafar Zouache3 , and Ilyes Khennak1 1
Laboratory of Research in Artificial Intelligence LRIA, USTHB, Algiers, Algeria [email protected], {lbendimerad,nhouacine,ikhennak}@usthb.dz 2 University of Algiers, Algiers, Algeria [email protected] 3 University of BordjBouAreredj, Bordj BouAreredj, Algeria
Abstract. Studies exploring the use of artificial intelligence (AI) and machine learning (ML) are knowing an undeniable success in many domains. On the other hand, quantum computing (QC) is an emerging field investigated by a large expanding research these last years. Its high computing performance is attracting the scientific community in search of computing power. Hybridizing ML with QC is a recent concern that is growing fast. In this paper, we are interested in quantum machine learning (QML) and more precisely in developing a quantum version of a density-based clustering algorithm namely, the Ordering Points To Identify the Clustering Structure (QOPTICS). The algorithm is evaluated theoretically showing that its computational complexity outperforms that of its classical counterpart. Furthermore, the algorithm is applied to cluster a large geographic zone with the aim to contribute in solving the problem of dispatching ambulances and covering emergency calls in case of COVID-19 crisis. Keywords: Quantum machine learning · Unsupervised clustering OPTICS algorithm · Quantum procedures · Quantum OPTICS · Application
1
·
Introduction
Machine learning has been investigated for more than four decades, resulting in a rich repertoire of interesting tools. In order to deal with huge and complex data, these tools should be highly efficient. One recent and original way to tackle the computational intractability of the big data is the use of quantum computing. This technology is developing fast in various disciplines such as neural networks and smarm intelligence. Recently quantum machine learning (QML) is knowing a great interest in the scientific community as it let speed up classical machine learning algorithms that deal with big data [2,3,11]. On the other hand and in c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 306–315, 2022. https://doi.org/10.1007/978-3-030-96308-8_28
Quantum OPTICS and Application to Emergency Transportation
307
order to allow these algorithms to run properly, researchers are also working on quantum computers design and implementation [12]. In this paper, we are interested in the Ordering Points To Identify the Clustering Structure (OPTICS) density-based clustering algorithm, knowing it has practical applications in numerous domains. We focus in particular in designing a quantum version of this algorithm. Quantum computing notions are first explored in order to respond to our objective. Concepts such as superposition of states, interference, entanglement are expected to provide an exponential speed up to solve complex scientific and industrial problems. Geographic information systems (GIS) are exploiting nowadays sophisticated tools such as density-based clustering techniques to cluster regions with arbitrary shape. The objects determined outside the clusters are of low density and hence represent noise. Amongst the most popular algorithms in this category is the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) [8]. Given the minimum desired density for a cluster and the maximum distance that should separate two objects in a cluster, DBSCAN clusters the objects in just one scan. The major drawback of this algorithm resides in its sensitivity to the input parameters, which usually are unknown and consequently hard to determine, especially for real-world and high-dimensional data sets. In order to achieve the right clustering, intensive and long experiments should be performed to tune their values. As an alternative, Ordering Points To Identify the Clustering Structure (OPTICS) [10] is an algorithm that can palliate to such limitations. OPTICS is based on similar principles with the advantage to monitor a broad range of parameter settings. For this reason, it can be seen as an extension from DBSCAN allowing automatic and interactive cluster analysis. Besides, it creates a file containing an ordering of clusters that can be graphically viewed. From these outcomes, clustering information such as cluster centers, arbitrary-shaped clusters and the clustering structure can be extracted. The aim of this study is to devise a quantum version of OPTICS called QOPTICS and to show the added value relatively to its classic counterpart from the theoretical point of view. This paper is organized in five sections. The next one presents preliminaries on the OPTICS algorithm and basic and popular quantum subroutines. The third section describes the proposed quantum OPTICS algorithm alongside with its computational complexity. The fourth section enchain with experimental results obtained from an application to a real life problem, which consists in clustering geographical zones containing a great density of hospitals in the Kingdom of Saudi Arabia. The last section concludes the article by highlighting its major contribution and future open questions.
2
Preliminaries
Our proposal consists in empowering the OPTICS algorithm with quantum computing in order to reduce its computational complexity. Therefore, OPTICS is first recalled and then some basic and most popular quantum subroutines are presented in this section.
308
2.1
H. Drias et al.
OPTICS Algorithm
OPTICS is an unsupervised density-based clustering algorithm. It consists in grouping the similar objects of an unordered set O = {O1 , O2 , ..., On } in the same cluster and the dissimilar ones in different clusters. The similarity between objects is measured by a distance function. As for DBSCAN, OPTICS has two input parameters, the maximum distance between objects and the minimum number of objects in a cluster. In addition to cluster the objects while considering both input parameters, OPTICS also creates inner clusterings by just taking in account the second parameter. This way, it discovers another clusterings inside the outer one that can be detected by DBSCAN. Moreover, it computes an ordering of the objects, while associating with each object its core-distance and its reachability distance. This ordering allows the extraction of all clusterings with respect to any distance smaller than the distance parameter. OPTICS is outlined in Algorithm 1.
Algorithm 1. OPTICS Algorithm input: D, eps, M inP ts /* D is the dataset and eps and M inP ts are parameters output: Ordered list of Clusters including objects with their reachability distance 1: for each point p of D not processed do 2: p.processed = true 3: eps-Neighbors = eps-Neighbors(p, eps) 4: p.reachability = undefined 5: p.core = Core-Distance(p, M inP ts, eps-Neighbors) 6: insert p (p.core, p.reachability) in OrderList 7: Seeds = empty 8: if p.core = undefined then 9: for each q in eps-Neighbors and not processed do 10: Seeds = Seeds ∪ {q, q.reachability} 11: end for 12: end if 13: while Seeds not empty do 14: find the object p that has the minimum reachability distance 15: Seeds = Seeds - {p} /* retrieve p from Seeds 16: p.processed = true 17: eps-Neighbors = eps-Neighbors(p, eps) 18: p.reachability = undefined 19: p.core = Core-Distance(p, M inP ts, eps-Neighbors) 20: insert p (p.core, p.reachability) in OrderList 21: if p.core = undefined then for each q in eps-Neighbors and not processed do 22: 23: Seeds = Seeds ∪ {q} 24: end for 25: end if 26: end while 27: end for 28: output (OrderedList, p, p.core, p.reachability)
Quantum OPTICS and Application to Emergency Transportation
309
The computational complexity of OPTICS is O(n2 ). 2.2
Basic Quantum Concepts and Techniques
In this subsection, the important concepts and techniques of quantum computing are introduced. Quantum State. A quantum state represents a position in a n-dimensional Hilbert space defined by basis states {b1 , b2 , ..., bn }. A quantum state is then represented by its amplitudes {α1 , α2 , ..., αn } that are statistical information about its position. notation by Formally, it is represented according to the Dirac a vector |Ψ >= i αi |bi >, αi for all i are complex numbers such as i αi 2 = 1. Superposition. A superposition is a linear combination of all the basis states. A superposition encompasses simultaneously several states. This concept is central in quantum computing as it corresponds to the classic parallelization operation. A qubit (quantum bit) is a quantum state that represents the smallest unit of quantum information storage. It consists of a superposition α1 |0 > +α2 |1 > of two basis states that are |0 > and |1 > and such that α1 2 + α2 2 = 1. A n-qubits is a superposition of 2n states. So when an operator is applied to the set of qubits, it is applied to the 2n states at the same time, which is equivalent to a parallel calculation on 2n data. Measurement. A measurement consists in getting a state component with a probability proportional to its weight. Its main function is to measure a qubit in the classic computing. For instance, when measuring the value of a qubit, if the probability of measuring is α1 then the measured state is 0 and if the probability of measuring is α2 then the measured state is 1. Quantum Register. As a quantum state, a quantum register belongs to the Hilbert space with 2n states and n qubits. Using the Dirac notation, it is reprei=2n −1 sented as |Ψ >= i=0 αi |i > under the normalization condition i αi 2 = 1. The basis state |i > expresses the binary encoding of i. Hadamard Transform. The Hadamard gate transforms the pure state |0 > into the superposed state √12 |0 > + √12 |1 > and the pure state |1 > into the superposed state √12 |0 > − √12 |1 >. The measurement in this case has the same probability of producing 1 or 0. The Hadamard transform Hn is the application of a Hadamard gate to each qubit of an n-qubit register in parallel.
310
H. Drias et al.
The Grover’s Algorithm. The Grover’s algorithm [9] is a quantum algorithm that √ searches for an element x0 belonging to an unordered set S of n elements in O( n) time complexity and O(log(n)) space complexity. This time complexity is more interesting than that of the classic algorithm where the best complexity to perform such search is O(n), as in the worst case, all the elements will be tested before encountering x0 . The spatial complexity is also attractive as in the classic algorithm it is O(n). In [4], the authors provide a quantum technique based on Grover’s algorithm for counting the number of occurrences of an element of a list, with the same √ computational complexity O( n). The Quantum Amplitude Amplification Algorithm. The quantum amplitude amplification algorithm [5] is a generalization of Grover’s algorithm. It does not start in a uniform superposition as in the Grover’s algorithm but only initializes states when information is available on the states. If the probability to find an element a of S is pa (and not n1 as in Grover’s algorithm) and the sum of |S| probabilities of all elements is 1 pi , the computational complexity is O( √1pa ). The Search of the Minimum. The authors in [7] devised a quantum algorithm that determines √ the minimum of an unordered set with a computational complexity O( n) and in [6], Durr et al. developed a technique √ called quant find smallest values to find the c closest neighbors of a point in O( c × n) time. These techniques are usually used in QML that is developing very rapidly. Although these important advancements, QML remains rich in open questions such as a quantized version of the OPTICS algorithm, which is addressed in this paper.
3
Quantization of OPTICS Algorithm
The quantum OPTICS algorithm is called QOPTICS as aforementioned. It operates on the same input as OPTICS but in a different structure. OPTICS uses as input, in addition to the input parameters, the distance matrix that contains the distances that separate any two objects of the dataset. The computation of the distance between two objects in QOPTICS is performed by a black-box called also oracle D. The quantum oracle D is shown in Fig. 1. |p > and |q > are respectively a one-state input representing respectively the index of object p and the index of object q. r is a quantum register set to 0 to get distp,q , the distance between object p and object q. 3.1
QOPTICS Algorithm
In order to quantize the OPTICS algorithm, quantum subroutines such those introduced in Subsect. 2.2 are exploited in an optimal manner. In Algorithm 1,
Quantum OPTICS and Application to Emergency Transportation
311
Fig. 1. The distance oracle
the actions that can be quantized are Instruction 3 which is repeated in 17 and Instruction 5 repeated in 19. Instruction 3 is a call to a procedure that builds the eps-neighborhood of object p and Instruction 5 calculates the core distance of the object p and the reachability distance of the objects belonging to Seeds. These two procedures are quantized and correspond to Algorithm 2 and Algorithm 3 respectively. Algorithm 2 produces the eps-neighbors {q} of an object p such that distp,q for q in eps-neighbors 5: using the oracle O defined in Eq. 1, determine q.reachability for all q in epsneighbors 6: return p.core, q.reachability for q in eps-neighbors
p.core q.reachability = distp,q
if distp,q ≤ p.core otherwise
Theorem. The running time of QOPTICS is O(logn ×
(1)
√ M inP ts × n).
Proof. Quant-eps-neighbors Algorithm calls Quant-Min-Search subroutine a number of times equal to the number of eps-neighbors. The worst case access time to eps-neighbors is O(logn) as the maximum size √ of eps-neighbors is n. The computational complexity of Quant-min-search is O(√ n), then the computational complexity of Quant-eps-Neighbors is O(logn√ × n). Algorithm 3 calls Quant find smallest values in O( M inP ts × n) according to [6]. √ The computational complexity of Quant-Core-Distance Algorithm is then O( M inP ts × n). QOPTICS √ calculates for each point its eps-neighborhood with a complexity of O(logn × n). As the worst case of the eps-neighborhood considered was n, this complexity stands also for all the objects. The algorithm computes also for each √ object its core distance and its reachability distance with a complexity of O( M inP ts × n). √As the points can be accessed in O(logn), its time complexity is then O(logn × M inP ts × n).
4
Experiments
QOPTICS is designed to help solving an NP-hard problem, which is the ambulance dispatching and emergency calls covering (ADECC). Ambulance dispatching and zones covering problems are managed jointly in order to optimize not only the transportation time but also the covering of all the calls. An application in progress consists in simulating the case of emergency medical services (EMS) of the Kingdom of Saudi Arabia (KSA) in the context of COVID-19 crisis. QOPTICS is used in a first phase of the application to locate the zones that condense an important number of hospitals. Once these regions are identified, ADECC will be addressed using an intelligent optimization method. This second phase will be the subject of another study.
Quantum OPTICS and Application to Emergency Transportation
4.1
313
Dataset Construction
The objects to be clustered are the hospitals of KSA and the resulted outcome is the spatial clusters representing dense regions of hospitals. The dataset is a spatial database containing the geographic positions of the hospitals. It is built using the data information available on the Open Data portal of Saudi Arabia [1]. The data have undergone a preprocessing step where redundant data, maternity centers and mental patient centers were removed and GPS position of each hospital according to Google Maps was added. An instance of the dataset is composed by a hospital and its geographic coordinates. As we are dealing with geolocation data, the spherical distance described in Eq. 2, is used to calculate the distance between two hospitals h1 and h2 . Distance(h1 , h2 ) =R × arccos(sin(h1 .latitude) × sin(h2 .latitude) + cos(h1 .latitude) × cos(h2 .latitude) × cos(h2 .longitude − h1 .longitude))
(2)
R is the earth radius and its global average value is 6371 Km. Once the distances between hospitals are calculated, another dataset containing the distances is generated to be used as input for QOPTICS. 4.2
Results
As all hospitals should be exploited for hosting the calling patients, outliers are not allowed; MinPts is then set to 1. To fix eps the distance that determines the eps-neighbors, experiments were performed on values of eps equal to 5, 10, 15, 20, 25, 30, 35, 40 (Km). The experimental outcomes are shown in Table 1. The best setting is the one that minimizes both the number of clusters and eps in order to make the calls close to the hospitals. According to the results presented in Table 1, the most appropriate setting would be eps = 35, which allows the formation of 75 clusters. The dispersion of the 75 clusters on the map of Saudi Arabia is shown in Fig. 2. Each cluster is represented by a distinct color. The figure shows also the “reachability plot” resulting from clustering with the best parameterization found. It shows the structure of the data in clusters with different densities. The correspondence between the representation of the 50th cluster on the map (which includes 43 hospitals) and on the reachability plot is also highlighted. Table 1. Parameters’ setting. eps
5
10
15 20 25 35 40
#clusters 144 119 108 98 91 75 72
314
H. Drias et al.
Fig. 2. A mapping of a cluster between its geographic position and the reachability plot
5
Conclusion
In this paper, a quantum algorithm for the Ordering Points To Identify the Clustering Structure is designed. Its computational complexity is calculated and is shown to be reduced relatively to its classical counterpart. Experiments were performed for an application of Emergency Transportation. Clustering hospitals in KSA has been carried out prior of dispatching ambulances in case of COVID19 crisis. For future work, the proposed quantum algorithm will be tested on calls during a COVID-19 crisis. The clusterings generated respectively for hospitals and calls will be exploited to address the issue of dispatching ambulances and covering zones. Acknowledgement. We would like to express our special thanks of gratitude to Prince Mohammad Bin Fahd Center for Futuristic Studies for the support of this work.
References 1. The open data portal of Saudi Arabia (2021). https://data.gov.sa/Data/en/ dataset/accredited-health-service-providers march2021 2. Bharti, K., Haug, T., Vedral, V., Kwek, L.C.: Machine learning meets quantum foundations: a brief survey. AVS Quantum Sci. 2(3), 034101 (2020). https://doi. org/10.1116/5.0007529
Quantum OPTICS and Application to Emergency Transportation
315
3. Biamonte, J., Wittek, P., Pancotti, N., Rebentrost, P., Wiebe, N., Lloyd, S.: Quantum machine learning. Nature 549(7671), 195–202 (2017). https://doi.org/ 10.1038/nature23474 4. Boyer, M., Brassard, G., Høyer, P., Tapp, A.: Tight bounds on quantum searching. Fortschr. Phys. 46(4–5), 493–505 (1998). https://doi.org/10.1002/(sici)15213978(199806)46:4/5493::aid-prop4933.0.co;2-p 5. Brassard, G., Høyer, P., Mosca, M., Tapp, A.: Quantum amplitude amplification and estimation. Quantum Comput. Inf. 305, 53–74 (2002). https://doi.org/10. 1090/conm/305/05215 6. Durr, C., Heiligman, M., Hoyer, P., Mhalla, M.: Quantum query complexity of some graph problems. SIAM J. Comput. 35(6), 1310–1328 (2006). https://doi. org/10.1137/050644719 7. Durr, C., Hoyer, P.: A quantum algorithm for finding the minimum. arXiv:quant-ph/9607014, vol. 92 (1996). http://dx.doi.org/10.1103/PhysRevD.92. 045033 8. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD 1996, pp. 226–231. AAAI Press (1996) 9. Grover, L.K.: A fast quantum mechanical algorithm for database search. In: Proceedings of the Twenty-Eighth Annual ACM Symposium on Theory of Computing, STOC 1996, pp. 212–219. Association for Computing Machinery, New York (1996). https://doi.org/10.1145/237814.237866 10. Mihael, A., Markus M., B., Hans-Peter, K., Sander, J.: Optics: ordering points to identify the clustering structure. In: Proceedings of ACM SIGMOD 1999 International Conference on Management of Data, Philadelphia PA, pp. 656–669. ACM (1999) 11. Wittek, P.: Quantum Machine Learning: What Quantum Computing Means to Data Mining (2014) 12. Zahorodko, P., Semerikov, S., Soloviev, V., Striuk, A., Striuk, M., Shalatska, H.: Comparisons of performance between quantum-enhanced and classical machine learning algorithms on the IBM quantum experience. J. Phys.: Conf. Ser. 1840, 012–021 (2021). https://doi.org/10.1088/1742-6596/1840/1/012021
Patterns for Improving Business Processes: Defined Pattern Categorization Nesrine Missaoui1,3(B) and Sonia Ayachi Ghannouchi2,3 1 University of Sousse, IsitCom, 4001 Hammam Sousse, Tunisia 2 University of Sousse, Higher Institute of Management of Sousse, Sousse, Tunisia 3 University of Mannouba, ENSI, RIADI Laboratory, Mannouba, Tunisia
Abstract. Business process improvement (BPI) is a theme that is widely studied in business process management. It permits the continuous improvement of a process by considering the impact of changes on its performance. One of the most highly cited solutions for BPI is the use of patterns: they allow presenting a set of reusable models aiming to transform the as-is model into an improved model. Our research work is devoted to the act of improvement of business processes. More precisely, we aim to put forward an approach for BPI based on a set of improvement patterns, classified according to their application context. The purpose of this paper is to present our proposed pattern categories by detailing the structure of two types of patterns and demonstrating the way a selected pattern in one of these categories would be applied. The definition of these patterns was primarily based on a pattern meta-model describing the structure of BPI patterns. It was defined in our previous research work where we identified a general meta-model and four specific pattern models applicable to specific contexts. To validate their usefulness, we choose to apply the proposed approach to the process of examining patients in an emergency service. Keywords: Business process improvement · Business process improvement patterns · Pattern categorization · Process adaptation · Process variability
1 Introduction Improving business processes is one of the main preoccupations within the business process management (BPM) discipline. It is deemed as the most important task to achieve within an organization with the purpose of enhancing the performance of its processes. Business process improvement (BPI) is used to analyze and improve existing business processes (BP) by redesigning the process without creating a new version of the model which permits better control of its model quality as well as its execution time. This done by using techniques introducing the way improvement is conducted within BPM projects [2]. Among these techniques, pattern use was highly recommended [3, 8, 18]. It was defined in BPI domain, as a set of reusable models applied as a step of process modification with the intention of better presenting the act of improvement for a BP i.e. the transformation of the as-is model into a to-be model representing an improved © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 316–325, 2022. https://doi.org/10.1007/978-3-030-96308-8_29
Patterns for Improving Business Processes: Defined Pattern Categorization
317
model of the process [3]. Several approaches adopted the use of patterns through the improvement process such as Falk et al. [4], Höhenberger et al. [6], Kim et al. [7] and Weber et al. [19]. All of these approaches have presented patterns dealing with the improvement of a process in different aspects (continuous improvement, process change, process automation, process modelling, etc.). This work falls into the process improvement domain where we aim at presenting a pattern categorization regrouping two types of patterns: variability patterns and adaptation patterns. Each proposed pattern is applied according to a specific context e.g. to deal with modifications in the structure of a process model an adaptation pattern needs to be chosen. Also to present a new version of a process, a variability pattern needs to be selected. To demonstrate the use of the defined patterns, we will present an example of applying BPI pattern(s) into the process of patient examination and treatment in the emergency service. The remainder of this paper is structured as follows. Section 2 gives an overview of proposed patterns in literature for BPI domain. In Sect. 3 we detail the process followed to define our patterns. Afterwards, an illustration of pattern application and implementation is presented in Sect. 4. Section 5 presents the discussion section comparing what is already proposed in literature with our work. Finally, Sect. 6 concludes the paper and underlines some implications for further research.
2 Related Work In literature, patterns in process improvement were defined based on a number of measures/procedures presenting their structure and the way to apply them. We focused through this review, on identifying the procedure applied to define patterns for each studied paper as well as the way they have been classified. Among these studies, best practices employed to redesign a BP were proposed by [14]. To define these practices, the authors were based on a literature review and a case study presenting their experience on using the proposed patterns. The work of [7] focuses on change patterns classified based on the control flow perspective of a BP and are regrouped into three types: extend/delete patterns, merge/change patterns, and split/change patterns. Another study proposed by [19] introduced a set of change patterns enhancing the flexibility of a process. They were classified based on the flexibility requirement where two categories were proposed: patterns dealing with exceptions and patterns supporting deferred decisions at run-time. Improvement patterns were proposed by [4]. The authors proposed a list of BPI patterns derived from a study of BPM literature that was mainly identified according to a study of notations describing the structure of patterns. Patterns proposed by [11] focused on optimizing BPs by proposing a catalogue of optimization patterns. It was a result of a study of BPM literature and interviews with production engineers and BPM consultants. Meanwhile, the work of [6] aimed to detect weaknesses within a process using a set of weakness patterns, identified based on a study of a collection of BP models retrieved from different projects in the field of public administration. Another research study presented by [1] concentrated on the aspect of process configuration. They proposed a set of patterns dealing with variability in BPs. These patterns
318
N. Missaoui and S. A. Ghannouchi
were derived from a variability-specific constructs language where three categories of patterns were introduced: insertion, deletion, and modification patterns. In the same context, [17] proposed an approach to enhance collaborative processes (CP) flexibility by applying a set of adaptation patterns. The work handles variability within CP by allowing the management of various versions of a process. The defined patterns were derived from the literature and adjusted to meet the goal of the proposed approach. Each of these approaches tackles improvement for a specific goal (e.g. managing process versions, collaborative processes, detecting weaknesses, etc.). However, the majority of the studied works generally focus on one single aspect to derive the category of their patterns. They do not consider the possibility of studying and integrating several aspects to classify their patterns. As an example, [19] targeted the control flow aspect within a process, [1, 17] focused on managing process variability, and [6] dealt with weakness detection. In our work, we focus on the flexibility perspectives of the BPI technique; we consider that process efficiency and process effectiveness can be obtained by allowing the process to adapt to any type of changes (predictable or unpredictable) [9]. So, the definition and classification of our pattern categories mainly consider the flexibility requirement where we focused on two types of patterns: Adaptation patterns and Variability patterns. Each type regroups a list of patterns applied according to specific context where we were based on three types of operators: Add, Replace, and Modify. The definition of these categories is inspired by the studied literature to which we attempt to identify possible improvement by completing, enriching, or replacing existing patterns in order to create a list of improvement patterns favouring good flexibility of the process.
3 Proposed Pattern Categorization This paper focuses on introducing a BPI pattern categorization by highlighting the definition of two types of patterns: Variability patterns and Adaptation patterns. We have mentioned above that the definition of these patterns was essentially based on a pattern meta-model, presented in [10]. The meta-model presents attributes facilitating the definition of a BPIP such as the name of the pattern, the context, the solution, etc. We will present in the following section the proposed pattern categorization, regrouping a set of improvement patterns defined according to their application context. 3.1 Adaptation Patterns The adaptation requirement presents the capacity of a process to deal with modifications affecting its structure, caused by emerging events that are triggered by either internal or external factors which causes inadequacy between what should happen (i.e. real-world process) and what is already happening. In general, process adaptation is triggered by situations that were not foreseen during process modeling but arise during process execution. It can be presented as problems related to processing errors (failed activity, falsified data, missed information, etc.) or exceptions handling (resource unavailability, deadline expiration, external event, etc.). In literature, there are mainly two types of BP adaptations: vertical adaptation and horizontal adaptation [13]. The first refers to modifications made at service-level which
Patterns for Improving Business Processes: Defined Pattern Categorization
319
concerns processes executed using web services. In this case, adaptation aims to improve the quality of the service. Meanwhile, horizontal adaptation considers modifying the structure of the process i.e. adding and/or removing a process element in order to change its behaviour. It allows adjusting the process to new needs by only modifying the process model. These changes can be applied by either using high-level changes e.g. adding a new process fragment or using low-level changes by adding or deleting a single node (also called primitive changes) [15]. In our case, we adopted the horizontal adaptation of a process where the proposed patterns focus on improving the structure of the process by introducing a set of highlevel changes into the model. Three categories of patterns are considered: Add a process fragment, replace a process fragment and modify a process fragment. Table 1 gives detailed information about these categories where each is presented according to three attributes: name, description, and proposed patterns. The presentation of these patterns was based on the Business Process Modelling Notation (BPMN), a standard for business process modelling that is widely used by professionals in BPM projects e.g. [2] and [11]. It helps to have a clear representation of a BP by covering all aspects of the process. Figure 1 shows the definition of one of the presented adaptation patterns. The pattern allows substituting a fragment of the process by an event. The fragment can be a task, an activity, a sub-process, etc. It generates a modification to the structure of the model by deleting the selected fragment and replacing it by a new one where, in this case, two consecutive tasks (B and C) are replaced by an intermediate event. Table 1. Proposed pattern categorization for adaptation context Adaptation pattern categorization
Description
Proposed patterns
Add a process fragment
Presents a set of patterns for adding new elements to a process fragment
Add a task Add an event Add a gateway Add two connected tasks Add a decisional process fragment Add a parallel process fragment
Modify a process element attribute
Permits changing the attributes of added elements or those related to the definition of a task (name, data, resources, etc.)
Modify task attributes Change attribute
Replace a process fragment Regroups a set of patterns used to change an element by another or reorganize the order of presented elements
Swap activities Replace a fragment by an event Replace a sub-process by a task Replace a task by an event Replace an intermediary event by a task
320
N. Missaoui and S. A. Ghannouchi
3.2 Variability Patterns Variability requirement represents processes that are identical in some ways but differ in other parts. It regroups different process variants that are handled based on welldetermined context. Variability is presented at both design-time and/or run-time and applied essentially by configuring the process. It can follow two types of approaches: single model approach and multiple model approach [20]. Single model approaches capture the different variants of a process in one single model using conditional branching or labels. This results in creating a process family repository containing large variants of process models. The second approach maintains and defines each process variant in a separate process model i.e. various models of the same process are created. In our work, we have chosen the single model approach to define the structure of our patterns where several tools/techniques dealing with this type of variability have been proposed. The work of [16] classified them into four categories regrouping different variability approaches: activity specialization, element annotation, node configuration, and fragment customization approaches. To define our patterns, we have adopted two types of approaches: Provop approach from fragment customization [5] and PESOA approach from activity specialization [12]. These approaches offer the possibility of configuring a process by restriction i.e. all process variants are presented in the process model in which configuration is applied by either restricting or extending the behaviour of the customizable process model.
Fig. 1. Pattern definition: replace a process fragment
The defined list of patterns regroups three categories: specify a variant point, restrict a process element, and substitute a process element. Each category comprises a set of patterns that permits to either restrict the behaviour of a process or extend the structure of the model. This is done by specifying the variant point within a process where we
Patterns for Improving Business Processes: Defined Pattern Categorization
321
used the stereotype concept within BPMN elements (mainly activity) in order to discover variations. The use of these patterns will help in configuring the process by first identifying variation points within the process and then choosing the appropriate pattern to apply according to the context or the situation at hand. Table 2 details the structure of the defined patterns based on the name of the category, its description and the list of the proposed patterns. Table 2. Proposed pattern categorization for variability context Variability pattern categorization
Description
Proposed patterns
Specify a variant point
Distinguish the commonalities and the differences within a process model in order to define process variants
The pattern is presented in the form of a stereotype, regrouping four types: , , , and
Restrict a process element (Aux stereotype)
These patterns modify the behaviour of a process element (restricting its execution)
Skip a process element Block a process element
Substitute a process element (Sub stereotype)
Regroups patterns used to extend the process model
Add a variant Replace a task by a gateway Move a process element Modify element attributes
Figure 2 details the structure of the pattern “skip a process element”. In this case, the element labeled by the Skip stereotype will be omitted from being executed for a certain number of instance(s). However, it will always be presented in the process model. This example shows that task b is presented as a skipped task where it will not be executed for three consecutive instances. After these instances, it will be reactivated and executed for the following instances. The main advantage of such a pattern is that it allows configuring the process model by generating new variants without deleting any element of the initial model.
322
N. Missaoui and S. A. Ghannouchi
Fig. 2. Pattern definition: skip a process element
4 An Illustrative Example for Pattern Application Using the presented pattern categorization, we will demonstrate the way to apply such patterns given the application context. To do so, we have taken as an example the process of handling medical examinations in a public hospital. It describes the different activities required to examine a patient within the emergency service in a public hospital. In the present paper, we targeted the workflow of urgent examination where Fig. 3 describes the different activities performed in the process in which two types of conditions could occur: patient in good shape or patient with serious injuries. For serious injuries, two situations can appear: critical and not critical. Patients with a critical state require an immediate examination in a surgery room in order to present a quick diagnosis about the state of the patient and the required intervention to do. However, during surgery, some complications may appear resulting in changing the course of the surgical operation. This requires adapting the process to these changes by adding some additional activities to the model. Hence, complementary tests, in the form of x-rays, must be immediately carried out during the operation. This results in adapting the process to a certain/sudden event by adding new elements to the model allowing improving the performance of the process without completely modifying its structure. The introduced situation falls into the context of process adaptability in which modifications can be implemented to the model to respond to new needs that occurred while executing the process. Thus, high-level changes have been administrated by adding a new fragment to the process model (fragment in red color in Fig. 3). This allows modifying the same model which facilitates the handling of different situations occurring while executing the process. The applied pattern is a combination of two types of patterns: add a gateway and add two connected tasks which allow improving the quality of the model more specifically the correctness and completeness of the model compared to the reality. In
Patterns for Improving Business Processes: Defined Pattern Categorization
323
fact, the addition of the fragment allows validating the model compared to the real-world process and to include other situations that can occur in the reality. Also, the application of these patterns permits presenting an improved model which is comprehensible by the user and more specifically by the medical staff and thus facilitating the decision-making process.
5 Discussion The paper presents the proposed pattern categorization regrouping two types of patterns: Adaptation patterns and Variability patterns. The definition of these patterns was based on studying the literature and identifying its limits in order to define our list of BPIP. The defined categorization includes a set of patterns classified, according to the objectives to achieve. In fact, applying a given pattern mainly relies on one among three operations: addition, replacement, and modification. The contribution of our work is that it focuses on improving existing process models in order to enhance their quality and promote the flexibility of a process. The main difference between our study and what we found in literature is the fact that we consider two aspects of process flexibility whereas other studies focus only on one aspect. Also, the structure of the defined categories can be considered as a means to appropriately choose a particular pattern to apply based on the proposed pattern meta-model. This helped us to present a clear definition of each category and present a set of BPIP that can be easily applied by the end-user. As with every research, our work still has limitations. The first limitation is the incompleteness of the suggested categories since we only focused on two aspects: adaptation and variability. This can be justified by the fact that these two aspects are the most important to take into consideration while improving a process where the defined patterns can be manipulated and adjusted to be used in other contexts. For example, adaptation can be applied to the context of loosely specified processes where changes can be adopted to improve the process by adding new fragments to the model. Moreover, the work still lacks an implementation of the proposed BPIP where we are in the process of developing a prototype for process improvement allowing creating a pattern database and applying these patterns based on the context of improvement.
The fragment is added after the task ‘Perform surgical operation’
Fig. 3. Workflow for the process of patient examination: applied patterns
324
N. Missaoui and S. A. Ghannouchi
6 Conclusion and Future Works The main purpose of the current paper is to propose a list of BPIP used during the improvement process where two types of patterns were defined. To understand the process of pattern definition and use, we have presented an illustrative example describing the process of patient examination and treatment for urgent situations. There are some gaps that need to be studied in further research including the implementation of the defined pattern categorization into a system allowing selecting and applying a pattern according to a defined context. Also, for promoting the continuous improvement as well as the concept of pattern reuse, we intend to apply the prototype on several case studies from different domains of application which will help us validating the feasibility of our patterns and their impact on the BP lifecycle.
References 1. Ayora, C., Torres, V., de la Vara, J.L., Pelechano, V.: Variability management in process families through change patterns. Inf. Softw. Technol. 74, 86–104 (2016) 2. Dumas, M., La Rosa, M., Mendling, J., Reijers, H.A.: Fundamentals of Business Process Management, vol. 1, p. 2. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-66256509-4 3. Forster, F.: The idea behind business process improvement: toward a business process improvement pattern framework. J. BPTrends, pp. 1–14 (2006) 4. Falk, T., Griesberger, P., Leist, S.: Patterns as an Artifact for Business Process Improvement - Insights from a Case Study. In: vom Brocke, J., Hekkala, R., Ram, S., Rossi, M. (eds.) DESRIST 2013. LNCS, vol. 7939, pp. 88–104. Springer, Heidelberg (2013). https://doi.org/ 10.1007/978-3-642-38827-9_7 5. Hallerbach, A., Bauer, T., Reichert, M.: Capturing variability in business process models: the Provop approach. J. Softw. Maint. Evol. Res. Pract. 22(6–7), 519–546 (2010) 6. Höhenberger, S., Delfmann, P.: Supporting business process improvement through business process weakness pattern collections. Wirtschaftsinformatik 2015, 378–392 (2015) 7. Kim, D., Kim, M., Kim, H.: Dynamic business process management based on process change patterns. In: 2007 International Conference on Convergence Information Technology (ICCIT 2007), pp. 1154–1161. IEEE (2007) 8. Lanz, A., Weber, B., Reichert, M.: Time patterns for process-aware information systems. Requirem. Eng. 19(2), 113–141 (2012) 9. Mejri, A., Ghannouchi, S.A., Martinho, R.: An approach for measuring flexibility of business processes based on distances between models and their variants. In: Madureira, A.M., Abraham, A., Gamboa, D., Novais, P. (eds.) ISDA 2016. AISC, vol. 557, pp. 760–770. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-53480-0_75 10. Missaoui, N., Ghannouchi, S.A.: A Pattern Meta-model for Business Process Improvement. IADIS, pp. 11–18 (2020) 11. Niedermann, F., Radeschütz, S., Mitschang, B.: Business process optimization using formalized optimization patterns. In: Abramowicz, W. (ed.) BIS 2011. LNBIP, vol. 87, pp. 123–135. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21863-7_11 12. Puhlmann, F., Schnieders, A., Weiland, J., Weske, M.: Variability mechanisms for process models. PESOA-Report TR 17, 10–61 (2005) 13. Papazoglou, M.P.: Web Services: Principles and Technology. Prentice Hall, Pearson (2008) 14. Reijers, H.A., Mansar, S.L.: Best practices in business process redesign: an overview and qualitative evaluation of successful redesign heuristics. Omega 33(4), 283–306 (2005)
Patterns for Improving Business Processes: Defined Pattern Categorization
325
15. Reichert, M., Weber, B.: Enabling Flexibility in Process-Aware information Systems. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30409-5 16. Rosa, M.L., Aalst, W.M.V.D., Dumas, M., Milani, F.P.: Business process variability modeling: a survey. ACM Comput. Surv. 50(1), 1–45 (2017) 17. Said, I.B., Chaâbane, M.A., Andonoff, E., Bouaziz, R.: BPMN4VC-modeller: easy-handling of versions of collaborative processes using adaptation patterns. Int. J. Inf. Syst. Change Manage. 10(2), 140–189 (2018) 18. Tran, H.N., Coulette, B., Tran, D.T., Vu, M.H.: Automatic reuse of process patterns in process modeling. In: Proceedings of the 2011 ACM Symposium on Applied Computing, pp. 1431– 1438 (2011) 19. Weber, B., Reichert, M., Rinderle-Ma, S.: Change patterns and change support features– enhancing flexibility in process-aware information systems. Data Knowl. Eng. 66(3), 438–466 (2008) 20. Yousfi, A., Saidi, R., Dey, A.K.: Variability patterns for business processes in BPMN. IseB 14(3), 443–467 (2015). https://doi.org/10.1007/s10257-015-0290-7
SAX-Preprocessing Technique for Characters Recognition Using Gyroscope Data Mariem Taktak1 and Slim Triki2(B) 1 Higher Institute of Applied Sciences and Technologies of Sousse, Sousse, Tunisia
[email protected]
2 National Engineering School of Sfax, Sfax, Tunisia
[email protected]
Abstract. This paper presents an experimental evaluation of SAX (SymbolicAggregate approXimation) based preprocessing technique applied on Gyroscope data for character recognition. Different from the classical signal processing techniques such as FFT or DWT, SAX is one of the most popular symbolic dimensionality reduction techniques. SAX’s popularity come essentially from its simplicity and efficiency as it uses precomputed distances in similarity measure between time series data. Both properties are important to tackle storage and preprocessing challenge that require characters recognition application using low-cost mobile device. In this study, characters are writing in a 3D air space by a user via a smartphone and angular velocity data gathered by the built-in Gyroscope sensor are used for the recognition task. Keywords: Time series data · Classification · Symbolic aggregate approXimation · Handwrite character · Smartphone
1 Introduction There is currently a wide range of sensors available that have been used for developing 3D handwriting character recognition problem. In this study, the angular velocity signals which are directly read from MEMS gyroscope internal sensor included in a smart phone are used in the recognition process. Some smartphone-based approaches for character and gesture recognition have been already proposed in the literature. Most of them use three-dimensional accelerometer data to achieve recognition problem. However, for different 3D-character writing movements, the acceleration signal, in contrast to velocity signal, often exhibits similar appearances in their signals as stated in [3]. As Gyroscope become popular in smartphone, we employ 3D gyroscope sensors embedded in such device for the purpose of 3D handwriting character recognition of the 26 lowercase letters in the English alphabet. Available public dataset from handwritten user-independent English lowercase letters is used in this work. Notice that this real dataset was collected by [3] and used to develop a 1-Nearest Neighbor (1NN) classification algorithm based on Bounded Dynamic Time Warping (BDTW) similarity distance measure. To tackle the computational resources challenge in mobile devices, BDTW optimizes the quadratic © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 326–335, 2022. https://doi.org/10.1007/978-3-030-96308-8_30
SAX-Preprocessing Technique for Characters Recognition Using Gyroscope Data
327
time complexity of the DTW algorithm in similarity search using stepwise lower bounding computation. The concept of stepwise BDTW includes the LB_Keogh extended to a multivariate case in the first step, and a local lower bound in the second step which allow early abandon of the DTW computation applied to the set of candidates selected by the first step bounding function. Although the memory resources are also limited in mobile devices, the proposed 1NN-BDTW similarity search in [3] require the storage of a large amounts of multi-dimensional labeled time series on the memory of the smart phone. To overcome this drawback, we focus in this work on the SAX-based preprocessing for TS (Time Series) classification purposes.
2 Definition and Background We begin by given necessary definition to understand the taxonomy used in this paper. Definition 1: A time series T is a sequence of real-valued numbers Xi : T = [X1 , X2 , . . . , Xn ] where n is the length of T. Definition 2: A z-normalized time series T is series with zero mean and one stan n X1 −μ Xn −μ where μ = 1n Xi and σ = dard deviation: Tz−norm = σ ,..., σ i=1 n 1 (Xi − μ)2 . n−1 i=1
Definition 3: A subsequence Tij from time series T is a continuous subset of the values from Xi to Xj : Tij = Xi : Xj . Definition 4: A Piecewise Aggregate Approximation (PAA) of a time series T of length n is a reduced time series T of length ω (ω n) obtained by dividing T into ω-dimensional equal-sized subsequence by their average value: which mapped T = X 1 , X 2 , . . . , X ω where X i = mean X ωn (i−1)+1 : X ωn i . Definition 5: The MINDIST() distance measure between two time series T1 and T2 with equal length n and of the same SAX words number ω is defined as: MINDIST Tˆ 1 , Tˆ 2 =
ω 2 n dist xˆ 1i , xˆ 2i where the SAX symbolic form of a time series T is represented ω i=1
as: Tˆ = xˆ 1 , . . . , xˆ ω and dist() function is a precomputed distance table [13].
Definition 6: The TDIST() distance measure between two time series T1 and T2 is ˆ ˆ defined as: TDIST (T1 , T2 ) = MINDIST T1 , T2 + TD(T1 , T2 ) where the TD() is a trend distance [14]. Definition 7: The SAX_SD() distance measure two time series T1 and T2 between is defined as: SAX _SD(T1 , T2 ) = MINDIST Tˆ 1 , Tˆ 2 + SD(T1 , T2 ) where the SD() function is the standard deviation distance [4].
328
M. Taktak and S. Triki
SAX convert a TS to a symbolic representation which is achieved in three steps as illustrated in Fig. 1(a), (b) and (c). In the first step, the z-normalized time series is divided into ω equal-length subsequences. Then, the average value of each subsequence is computed which result in a PAA representation. Finally, each entry of the PAA vector is mapped to a symbol according to a discretization of a Gaussian distribution. SAX can also be combined with a sliding window to preprocess a long time series data. This technique is illustrated in Fig. 2. X1
X2
X1
X2
X1
X2
X3
X13
X14
X15
X16 Time series
X4
X5
X6
X7
X8
X9
X10
X11
X12
X3
X4
X5
X6
X7
X8
X9
X10
X11
X12
X13
X14
X15
X16
X3
X5
X4
X6
X7
X9
X8
X10
X11
X13
X12
X14
X15
X16 Overlap (a’)
subseq1
(a)
PAA (b)
SAX overlapSAX
m1=mean( subseq1 )
m2=mean( subseq2)
m3=mean( subseq3)
m4=mean( subseq4)
max(subseq1) min(subseq1)
max(subseq2) min(subseq2)
max(subseq3) min(subseq3)
max(subseq4) min(subseq4)
eSAX
std( X1:X4 )
std( X5:X7 )
std( X9:X12 )
std( X13:X16 )
SAX-SD
[X1-m1, X4-m1]
[X5-m2,X8-m2]
[X9-m3,X12-m3]
[X13-m4,X16-m4]
SAX-TD
(c)
Fig. 1. From time series raw data to a several variations of SAX representations.
Time series
X1
Sliding windows
X1
X2
X3
X4
X5
X6
X7
X8
X9
X10
X11
X12
X13
X14
X15
X16
sw1
X2
X3
X4
X2
X3
X4
X5
X3
X4
X5
X6
X4
X5
X6
X7
X5
X6
X7
sw2
X8
...
Aggregation
Histogram
PAA(sw1)
PAA(sw2)
PAA(sw3)
PAA(sw4)
PAA(sw5)
word1
word1
word3
word2
word4
Count(word1)
Count(word2)
Count(word3)
Count(word4)
Count(word5)
Fig. 2. SAX preprocessing with sliding window technique.
The major importance of the symbolic representation by SAX is its ability to be exploited within a classification process. This is achieved by using the MIDIST() function, which is a proper distance measure that guarantee no false dismissal. Since its first apparition in 2003, SAX has attracted the attention of several researchers in the data mining area whose reported some drawback and attempted to improve it and proposed a novel SAX-based classifier algorithm. In this work we attempt to find the adapted classification algorithm based on SAX preprocessing technique that can efficiently achieve
SAX-Preprocessing Technique for Characters Recognition Using Gyroscope Data
329
the handwrite character recognition using smartphone device. To reach our goal, several SAX-based preprocessing techniques will be investigated. A summary of the reviewed techniques can be found in Table 1. Table 1. Summary of the reviewed SAX-based techniques. Year 2003 2006 2013 2013 2014
Authors Jessica Lin et al. Bauguldur Lkhagva et al. Simon Malinowski et al. Pavel Senin & Sergey Malinchik Youqiang Sun et al.
Title A symbolic representation of time series, with implication for streaming algorithms New time series data representation ESAX for financial application 1d-SAX : A novel symbolic representation for time series SAX-VSM : Interpretable time series classification using SAX and Vector Space Model An improvement of symbolic aggregate approximation distance measure for time series
Methods SAX eSAX 1d-SAX SAX-VSM SAX_TD
2014 Tomasz Gorecki
Using derivatives in longest common subsequence dissimilarity measure for time series classification
DDlcss
2016 2016 2016 2016
An improved symbolic aggregate approximation distance measure based on its statistical features RPM : Representative pattern mining for efficient time series classification Using Dynamic Time Warping distances as a features for improved time series classification DSCo-NG : A practical language modeling approach for time series classification SAX-based representation with longest common subsequence dissimilarity measure for time series data classification Linear time complexity time series classification with Bag-of-Pattern-Features Interpretable time series classification using all-subsequence learning and symbolic representation in shapeDTW : Shape Dynamic Time Warping
SAX_SD RPM F-DTW DSCo-NG
Chaw Thet Zan et al. Xing Wang et al. Rohit J. Kate Daoyuan Li et al.
2017 Mariem TAKTAK et al.
2017 Xiaosheng Li & Jessica Lin 2018 Thach Le Nguyen et al. 2018 Jiaping Zhao & Laurent I Muhammad Marwan Muhammad Modifying the symbolic aggregate approximation method to capture segment trend information 2020 Fuad
LCSS-SAX BOPF SAX-SEQL shapeDTW overlap-SAX
It is worth noting that aggregation step is a key processing in the SAX technique [7]. In fact, due to the temporal correlation of the data, the high dimension, and the different lengths of the TS in the database, the need to extract key signal features is of direct influence in the TS data classification efficiency. However, the original version of the SAX extract only the simple statistical mean value over a subsequence of data. To circumvent this drawback, several researchers reported the need of additional statistical features to extract more important information such as: • • • •
Maximum and Minimum values as proposed in eSAX [8], Trend information by applying linear regression as proposed in 1d-SAX [13], Standard Deviation as proposed in SAX-SD [2], The range between local maximal (res. local minimal) and the mean value of each subsequence as proposed in SAX-TD [14], • Trend information by computing the sample difference (delta value) as proposed in DDlcss [5], • Trend information by swapping the end points of each subsequence with the end points of the neighboring subsequence (cf. Fig. 1(a’)) as proposed in Overlap-SAX [10].
3 SAX-Based Preprocessing for Time Series Classification SAX-based preprocessing technique for TS classification (listed in Table 1) can be divided in two categories: the shape-based and the structure-based as depicted in the flowchart of the Fig. 3.
330
M. Taktak and S. Triki
Fig. 3. Flowchart of the SAX-based preprocessing technique used for the TS classification.
3.1 Shape-Based Classifier It is known that a key feature that distinguishes TS to other kinds of data is its shape. That is, similarity measure in combination with 1NN classifier is shown to be the simple and excellent method for shape-based classifier. Most common similarity measures for matching TS are Euclidean, DTW and LCSS. While Euclidean allows for only a one-toone point comparison, DTW and LCSS allow for more elasticity and instead compare one-to-many points. Under realistic conditions (i.e., noisy and/or missed data), LCSS is more powerful than DTW in TS classification. Moreover, LCSS is well suited for both representation of the TS; i.e. real-valued or sequence of SAX (or any extension of SAX) words. In shape-based category, SAX; eSAX; SAX-TD; SAX-SD and Overlap-SAX are usually used as a preprocessing technique within 1NN classifier where the dissimilarity measure is defined as; (i) MINDIST() function for SAX, eSAX and OverlapSAX, (ii) TDIST() function for SAX-TD, and (iii) SAX_SD() function for SAX-SD. Because our goal is to present the benefit of SAX-based preprocessing in handwriting character recognition, we will also investigate the use of the subsequence similarity measure with LCSS. Authors in [5] showed that parametric extension of the LCSS (called DDLCSS and 2DDLCSS) using trend information by computing the sample difference outperforms the classic 1NN-LCSS. Despite its simplicity and speed of computation, the major shortcoming of sample difference approximation is its sensitivity to noise. In related works [9], we show that Piecewise Linear Regression (PLR) transformation can be used as a robust way to estimate a succession of a local first derivative in TS data. That is, like for SAX which maps the local average values of PAA transformation into sequence of symbols, we propose mapping the local trend values of PLR transformation into sequence of symbols as well. To distinguish the proposed pure symbolic representation from the 1d-SAX representation [13] which leads to a binary word, we call the proposed representation 1D-SAX. Another recent subsequence similarity measure based on DTW named shapeDTW was proposed in [17]. Aiming for a many-to-many point comparison, shapeDTW attempts to pair locally similar subsequence and to avoid matching points with distinct neighborhood subsequences. Hence, shapeDTW represent local subsequence around each temporal point by a shape descriptor and then, use DTW to align two sequences of descriptors. Even many shape descriptors can be used to map a local subsequence, only SAX will be used in this study when we use the 1NN classifier
SAX-Preprocessing Technique for Characters Recognition Using Gyroscope Data
331
with the shapeDWT. Despite its simplicity and easy to implement, many researchers claim that shape-based classification work well only for short TS data. In addition, 1NN classifier compares instances to instances instead of learning an abstract model from the training set. That is, the 1NN classifier requires intensive computation when the size of training datasets grows. In next subsection, we will present the structure-based classifier which is a common solution to extract higher-level structural information to deal with the long TS data classification issue. 3.2 Structure-Based Classifier Based on higher-level structural information, SAX-based compression technique offers an appropriate alternative to determine similarity between long TS data. For example, authors in [6] use the Bag-of-Patterns (BoP) to represent TS data instances, and then use the 1NN classifier. Bag-of-Patterns are obtained by transforming TS data into SAX symbols and then using a sliding window to scan the symbols and adopt a histogram-based representation of unique words. The 1NN-BoP addresses the scalability in classification of TS data while maintaining the lazy nearest neighbor classifier. To provide an interpretable TS data classification, authors in [12] propose SAX-VSM as an alternative to 1NN classifier. Like BoP, SAX-VSM also takes the advantage of the SAX and use the sliding window technique to convert TS data into a set of bag-of-words (BoW). During the training phase, SAX-VSM builds TF-IDF (Term Frequency-Inverse Document Frequency) BoW for all classes and then, cosine similarity is used to do the classification. The extension of SAX-VSM to 1DSAX-VSM is straightforward. Recently, author in [11] showed that the TS classification performance improves if instead of using similarity distances (especially DTW distance) with 1NN technique they are used as features with a more powerful technique such as Support Vector Machine (SVM). Moreover, this distance-based feature can be easily combined with the SAX method under the BoP representation. Inspired by the work in [11], we will use the combination of the LCSS distance-based feature and SAX-based feature with SVM (through polynomial kernel) as a learning method in the handwriting recognition application. Note that, for unlabeled TS data, the LCSS distance-based features (denoted as F-LCSS) are defined as its LCSS distance from each of the learning instances. Another recent structure-based TS data classification using SAX is Representative Pattern Mining (RPM) [16]. RPM focus on finding the most representative subsequence (i.e., shapelets) for the classification task. After conversion of TS into sequences of SAX symbols, RPM takes advantage of grammatical inference techniques to automatically find recurrent and correlated patterns of variables lengths. This pool of patterns is further refined so that the most representative pattern that capture the properties of a specific class are selected. Other grammaticalbased classifier was proposed in [4] and named Domain Series Corpus (DSCo). As its name suggest, DSCo apply the SAX with sliding window technique on training TS data to build a corpus and subsequently each class is summarized with an n-gram Language Model (LM). The classification is performed by checking which LM is the best fit for the tested TS. Recently, authors in [15] proposes an ensemble of multiple sequence learner algorithm called SAX-SEQL. In summary, SAX-SEQL uses a multiple symbolic representation which combine SAX with Symbolic Fourier Approximation to find a set of discriminative subsequences by employing a brunch-and-bound feature search strategy.
332
M. Taktak and S. Triki
4 Handwriting Character Recognition with SAX-Based Classification In this section we will conduct experiments on real handwriting character dataset [3]. For shape-based classification, we handle the complete TS dataset with a single PAA transformation since the length of the character’s gyroscope signal is not very long. Recall that our goal is to evaluate the use of SAX-based preprocessing technique in handwriting recognition application. For structure-based classification, SAX (res. 1D-SAX) bag-of-words dictionary are built by a sliding window of length n and convert each subsequence into w SAX (res. 1D-SAX) words. The SAX (res. 1D-SAX) features are built based on Bag-of-Words histogram of the word counts. For evaluation purpose, we randomly divide the dataset into a train set (containing 60% of the labeled characters in the dataset) and a test set (containing the remaining 40% of the labeled characters). Notice that, in addition to the z-score magnitude normalization pre-processing step, length normalization is performed to unify the gyroscope signals length. We achieve that using the re-sampling technique from the built-in MATLAB function resample. Furthermore, it is well known that SAX method was originally designed for a single-dimensional time series data. That is, we will convert multidimensional gyroscope signals into single dimensional before conducting classification algorithm. There are several existing methods to achieve that and we only focus on a simple technique such as: Principal Component Analysis (PCA), reducing by summing (SUM) and reducing by concatenation (CONC). PCA is a well-known statistical approach that reduces multidimensional data based on orthogonal linear transformation. Hence, we start by decomposing the pre-processed 3D gyroscope signals X of size 3 × n (where n is the TS length) according to SVD transform (singular value decomposition):X = USV T . The output single dimensional TS denoted Y1×n = U1 X3×n , is then obtained by projecting X on the first largest singular value in S, denoted U1 . Unlike PCA, SUM and CONC are straightforward in conversion from multidimensional to single dimensional TS. To get the best SAX parameters, we evaluated 5-fold cross-validation based parameter selection technique with DIRECT [1] search in training set. Figure 4 shows the recognition accuracy of SAX-based classification achieved using 1NN classifier on test data set for the three uni-variate forms. We can see that no classifier provides its better recognition accuracy on all three forms. In PCA form, 1NN SAX-TD shows the better classification accuracy of 48.14%. However, 1NN shapeDTW provide the better classification accuracy when 3D gyroscope signals are converted by SUM. For CONC form, it is the 1NN LCSS1DSAX which yields the better classification accuracy. Although classification was achieved with optimal SAX parameters, the best accuracy value is less than 60% which is insufficient for practical application. That is, SAX-based similarity measures fail to distinguish between similar handwriting characters since individual alphabetic symbols do not contain any local subsequence shape information. Beside the low classification accuracy rate, the classification time of the better result in the CONC form, for example, is dominated by the quadratic complexity of the LCSS dissimilarity measure (see Fig. 4(a) and Fig. 5). We do the same above experiment using SAX-based approach within the SAX structurebased classifiers. The classification accuracy in % is reported in Fig. 4(b), where only good practical accuracy rate is exhibited for the CONC form.
SAX-Preprocessing Technique for Characters Recognition Using Gyroscope Data
333
Fig. 4. Classification accuracy in % of (a) SAX shape-based classifiers and (b) SAX structurebased classifiers.
Fig. 5. Classification speed performance over classification of handwriting test dataset.
334
M. Taktak and S. Triki
In order to bring a closer look about the accuracy of each character, a graphical representation of the F1-score deduced from the confusion matrix (calculated only for the top three classifiers) is given in Fig. 6. Recall that F1 -score is a general measure of the classifier’s accuracy that combines precision and sensitivity: F1 − score = 2 · precision·sensitivity precision+sensitivity . From these radial displays, we can visually examine the local accuracy of each classifier when the single dimensional form CONC is feeding. For example, we can see that the sensitivity recognition of the RPM classifier decreases for characters “r”, “o”, “m” and “j”. The 1DSAX-VSM classifier show the highest accuracy in handwriting characters classification. Furthermore, like SAX-VSM, 1DSAX-VSM builds only one feature vector per class instead of one vector per sample, which drastically reduces runtime as we can see in Fig. 5.
Fig. 6. F1-score in % of RPM, SAX-VSM and 1DSAX-VSM.
5 Conclusion In this paper, we have investigated a several SAX-based preprocessing techniques to assess their advantage in handwritten character recognition application. Our choice of the SAX method was motivated by its simplicity and high dimensionality reduction capability which is important to tackle storage and processing challenge that require smart phone-based application. Based on the experiments on real database of the 26 lowercase letters in the English alphabet, we show that an extended version of a bagof-words based approach that relies on SAX which handle supplementary information about the trend of the gyroscope signals improves the precision/recall of the handwriting character recognition in the case where multidimensional data are converted into single dimensional by concatenation.
References 1. Bjorkman, M., Holmstrom, K.: Global optimization using the DIRECT algorithm in Matlab. Adv. Model. Optimiz. 1, 17–37 (1999) 2. Chaw, T.Z., Hayato, Y.: An improved symbolic aggregate approximation distance measure based on its statistical features. In: Proceeding of the 18th International Conference IIWAS, Singapore, 28–30 November 2016
SAX-Preprocessing Technique for Characters Recognition Using Gyroscope Data
335
3. Dae-Won, K., Jaesung, L., Hyunki, L., Jeongbong, S., Bo-Yeong, K.: Efficient dynamic time warping for 3D handwriting recognition using gyroscope equipped smartphones. Expert Syst. Appl. 41(11), 5180–5189 (2014) 4. Li, D., Bissyandé, T.F., Klein, J., Le Traon, Y.: DSCo-NG: a practical language modeling approach for time series classification. In: Boström, H., Knobbe, A., Soares, C., Papapetrou, P. (eds.) IDA 2016. LNCS, vol. 9897, pp. 1–13. Springer, Cham (2016). https://doi.org/10. 1007/978-3-319-46349-0_1 5. Gorecki, T.: Using derivatives in a longest common subsequence dissimilarity measure for time series classification. Pattern Recogn. Lett. 45, 99–105 (2014) 6. Lin, J., Khade, R., Li, Y.: Rotation-invariant similarity in time series using bag-of-patterns representation. J. Intell. Inf. Syst. 39(2), 287–315 (2012) 7. Lin, J., Keogh, E., Lee, W., Lonardi, S.: Experiencing SAX: a novel symbolic representation of time series. Data Min. Knowl. Disc. 15(2), 107–144 (2007) 8. Lkhagva, B., Suzuki, Y., Kawagoe, K.: New time series data representation ESAX for financial applications. In: IEEE International Conference on Data Engineering, pp. 17–22 (2006) 9. Mariem, T., Slim, T., Anas, K.: SAX-based representation with longest common subsequence dissimilarity measure for time series data classification. AICCSA 2017, 821–828 (2017) 10. Muhammad Fuad, M.M.: Modifying the symbolic aggregate approximation method to capture segment trend information. In: Torra, V., Narukawa, Y., Nin, J., Agell, N. (eds.) MDAI 2020. LNCS (LNAI), vol. 12256, pp. 230–239. Springer, Cham (2020). https://doi.org/10.1007/ 978-3-030-57524-3_19 11. Kate, R.J.: Using dynamic time warping distances as features for improved time series classification. Data Min. Knowl. Disc. 30(2), 283–312 (2016) 12. Senin, P., Malinchik, S.: SAX-VSM: interpretable time series classification using SAX and vector space model. In: IEEE International Conference on Data Mining, pp. 1175–1180 (2013) 13. Malinowski, S., Guyet, T., Quiniou, R., Tavenard, R.: 1d-SAX: a novel symbolic representation for time series. In: Tucker, A., Höppner, F., Siebes, A., Swift, S. (eds.) IDA 2013. LNCS, vol. 8207, pp. 273–284. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3642-41398-8_24 14. Sun, Y., Li, J., Liu, J., Sun, B., Chow, C.: An improvement of symbolic aggregate approximation distance measure for time series. Neurocomputing 138, 189–198 (2014) 15. Le Nguyen, T., Gsponer, S., Ilie, I., O’Reilly, M., Ifrim, G.: Interpretable time series classification using linear models and multi-resolution multi-domain symbolic representations. Data Min. Knowl. Disc. 33(4), 1183–1222 (2019) 16. Wang, X., et al.: RPM: Representative pattern mining for efficient time series classification. In: 19th International Conference on Extending Database Technology (2016) 17. Zhao, J., Itti, L.: shapeDTW: shape dynamic time warping. Pattern Recogn. 74, 171–184 (2018)
Lower Limb Movement Recognition Using EMG Signals Sali Issa1(B)
and Abdel Rohman Khaled2
1
2
Physics, Mechanical and Electrical Engineering, Hubei University of Education, Wuhan, China [email protected] Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China Abstract. This paper presents an enhanced extraction feature for lower limb movement recognition application using surface EMG signals. Public SEMG database is used for system evaluation, where subjects, depending on knee normality, are divided into normal and abnormal groups. The spectogram of input EMG signals are calculated in time-frequency domain, and then processed with standard deviation texture. Experimental results show that EMG data of Semitendinosus (ST) muscle with Convolutional Neural Network (CNN) classifier provide the highest accuracy of 92% for classifying up to three movements (gait, leg extension, leg flexion) in normal group, and 95% for classifying two movements (gait, leg flexion) in abnormal group. Keywords: Lower limb Recognition
1
· Movement · Time-frequency domain ·
Introduction
Electromyography (EMG) is a medical test that traces, records and analyses myoelectric signals. It is formed by physiological variations in the state of muscle fiber membranes, and related directly to electrical currents which generated in muscles during contraction process [1]. On the other hand, artificial EMG applications attract many researches due to its amazing advantages and necessary rule in human daily life. It almost covers many science aspects such as motor control, robotics, remote control, movement disorders, physical therapy, postural control, limbs abnormality detection, earlier disease diagnosis [2–4]. Actually, the idea of classifying lower limb movements based on EMG signals is not a new topic, but it is still a hot and challenge study due to the advantages and necessary achievements which could be obtained. For instance, it is possible to track and predict limb movements for disabled people, and control robotics. Furthermore, it could help in earlier diagnosis which increases the probability of recovery [5,6]. In this study, a new leg movement prediction system is produced based on the texture analysis of EMG spectrogram in time-frequency domain. Public SEMG c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 336–345, 2022. https://doi.org/10.1007/978-3-030-96308-8_31
Lower Limb Movement Recognition Using EMG Signals
337
database was obtained for system evaluation. The spectrogram of EMG raw data in both time and frequency domains are calculated using Short Time Fourier Transform (STFT). Next, the combination of mathematical operations with local standard deviation texture filter is applied to produce the final extracted feature. In classification, Convolutional Neural Networks (CNN), Standard Deviation Vector (SDV), Linear Discriminant Analysis (LDA), and Probabilistic Neural Networks (PNN) classifiers are constructed to classify three leg movements of gait, leg extension and flexion. Two different types of experiments are evaluated; the first one related to normal subjects, while the second one related to subjects with knee abnormality. As a result, CNN classifier provides the best results among other classifiers for both experiments. This paper is organised as follows; Section two gives a description of recent related studies. Section three presents the work methodology. Section four provides the experimental outcomes of the proposed system, and final section mentions the net conclusion, and gives possible recommendations for future work.
2
Literature Review
In fact, predicting the lower limb motion is not easy. Compared with upper limb motion prediction, the amount of researches are limited, because human lower limbs are affected by body gravity and muscle jitter [7]. Scientists aim to improve the classification system performance by enhancing the power of extracted features, using newer classification methods, and generating denoising EMG sensors [8]. For example, Naik et al. [9] depended on public EMG database for lower limb movements classification. Their system approach divided into three steps. First, EMG signals were decomposed using Independent Component Analysis with Entropy Bound Minimization. Second, several time domain features were extracted, and finally, features were processed using Fisher score and statistical techniques for feature selection purpose. For classification, Linear Discriminant Analysis (LDA) classifier was trained and tested, and a high accuracy of 96.1% and 86.2% were achieved for both healthy and knee-abnormality groups, respectively. In Ai et al. [7] study, median filter was applied for EMG data denoising. Then, power spectral correlation coefficient technique was implemented to locate the start and end points of active EMG signals. A collection of several time domain features with wavelet coefficients were extracted. Linear Discriminant Analysis (LDA) and Support Vector Machine (SVM) were constructed to classify five distinct lower limb movements, and achieved a high accuracy of 95%. Laudansk et al. [10] used the recording of EMG database of eight different lower limb muscles. Several time domain and frequency domain features were extracted and reduced to a combination of only ten features depending on using neighborhood component analysis. KNN classifier was trained to estimate knee flexion postures and got an accuracy of 80.1%. In Zhang et al. [11] work, they proposed a lower limb movement identification system to classify four different motions: flexion of the leg up, hip extension
338
S. Issa and A. R. Khaled
from a sitting position, stance and swing. Wavelet Transform (WT) with Singular Value Decomposition (SVD) techniques were used to produce the extracted feature. For classification, Support Vector Machine (SVM) was built and got a high average accuracy of approximately 91.85%. Nazmi et al. [12] applied an EMG database to classify stance and swing motions. Five time domain features were extracted for each subject, an Artificial Neural Network (ANN) classifier was applied and produced an accuracy of 87.4%. Depending on the previous related works, it is clear that this field still needs more improvement [8]. Many studies depended on extracting a huge feature dimensions, implementing complex feature extractions, and/or hybrid feature combinations to enhance the classification accuracy [7,9–12]. In this paper, an enhanced spectogram feature is applied with a powerful Convolutional Neural Network (CNN) classifier which overcome the main drawbacks of previous related works for both normal and knee-abnormal subject groups.
3
Methodology
The proposed lower limb movement recognition system diagram is presented in Fig. 1. The recognition system comprises three main stages; Data acquisition and pre-processing stage; Feature extraction Stage; and Classification stage.
Fig. 1. The proposed lower limb movement recognition system
3.1
Data Acquisition and Pre-processing
Database surface Electromyography (SEMG) for lower limb analysis was implemented in this research. It includes samples from 22 subjects (11 of subjects with knee abnormality). Every subject was asked to do three different movements: gait, leg extension during a sitting position, and leg up flexion [13].
Lower Limb Movement Recognition Using EMG Signals
339
DataLOG MWX8 of biometrics 8 digital channels, and 4 analog channels device was used. The electrodes of Vastus Medialis, Semitendinosus, Biceps Femoris and Rectus Femoris muscles were recorded, sampled 1000 Hz, and connected to a computer of MWX8 internal storage with microSD card [13]. For pre-processing, a bandpass filter of range 10–250 Hz was implemented on the recorded EMG samples. 3.2
Feature Extraction and Classification
Short Time Fourier Transform (STFT) conversion with some mathematical operations is used to get the time-frequency domain of non-stationary EMG samples; next, the statistical standard deviation texture is applied. Feature extracting can be summarized in the following sequential steps: 1. Apply discrete Short Time Fourier Transform (STFT) to convert the recorded EMG raw data from time domain to time-frequency domain. Short time Fourier transform (STFT) is a Fourier transform that finds the sinusoidal frequency and phase content of local sections of time domain signals [14]. It’s principle depends on dividing EMG time signal into equally segments or chunks which overlapped with each othe, and calculating Fourier transform for each chunk [14]. The general discrete Short time Fourier transform (STFT) is given in the following equation [14]: ST F T x[n](m, ω) ≡ X(m, ω) =
∞
x[n]ω[n − m]e−jωn
(1)
n=−∞
where x is the EMG time domain signal; and ω is the window function. The calculated EMG time-frequency domain provides a full knowledge and deeper information than the original time domain [15,16]. 2. Find the spectrogram representation for the transformed data by squaring the STFT magnitude of the Power Spectral Density, as illustrated in below equation [14]. Spectogramx(t)(τ, ω) ≡ |X(τ, ω)|2
(2)
3. Rescale the spectrogram data to another range, between 0 and 255. 4. Filter the result data matrix using the statistical texture of local standard deviation. Texture analysis is an important measures and feature calculations of data. It aims to extract the characterization or content texture of the regions in image/matrix information [17]. The Local standard deviation texture of EMG spectrum matrix is calculated; it depends on filtering the spectrum matrix using the statistical standard deviation. Figure 2 below explains the texture algorithm of local standard deviation. The standard deviation in each neighborhood sub-matrix is calculated, and replaced with it’s corresponding cell [17].
340
S. Issa and A. R. Khaled
Fig. 2. Standard deviation texture principle
The proposed feature provides further and relational statistical information for EMG frequency texture/contents which are covered during time. For classification, four machine learning techniques of Convolutional Neural Network (CNN), Support Vector Machine (SVM), Linear Discriminant Analysis (LDA) and Probabilistic Neural Network (PNN) are constructed and trained to recognize the lower limb movement.
4
Results and Discussion
The extracted feature of each EMG raw data for all subjects were calculated. The spectrum of EMG signals was extracted in time-frequency domain using STFT transform. Hamming window function was selected for STFT conversion, 500 segments or windows with 50% overlapping. Then, the standard deviation texture of 3 × 3 neighborhood size was applied to filter the spectrum matrix. The extracted feature matrix has the size of 241 × 12, where number 241 refers to the EMG frequency of range 10–250 Hz, and number 12 refers to the time domain intervals. Two experiments were evaluated, the first one predicts three movements of lower limb for normal subjects. While, the second experiment predicts two and three movements of lower limb for subjects with knee Abnormality. Database was splitted into training and testing groups using ten-fold crossvalidation strategy. Convolutional Neural Networks (CNN), Support Vector Machine (SVM), Linear Discriminant Analysis (LDA), and Probalistic Neural Networks (PNN) classifiers were constructed. Figure 3 [18] below illustrates the basic construction of Convolutional Neural Network (CNN). It is more complicated and advanced than the traditional neural networks, a deeper dimension (third dimension) during the training process is applied which improved the neural ability to deal directly with two dimensional (2D) features including images. A basic CNN classifier has the following main layers; Input layer; Convolution and Pool layers to extract the neurons information which are connected to the local parts in the input image/matrix, and reduce the calculation spatial dimension, respectively; Fully connected layer to calculate the class category [19].
Lower Limb Movement Recognition Using EMG Signals
341
Fig. 3. Convolutional Neural Networks (CNN) structure
Classifiers’ specifications for normal group experiment are as follows: 1. Convolutional Neural Networks (CNN): Includes input Layer of size 241 × 12; convoloutional layer of two filters of size 5 × 5 and a stride movement of two steps; pooling layer of one filter of size 2 × 2 and a stride movement of one step; fully connected layer of soft max lose function; and an output layer. 2. Support Vector Machine (SVM): Polynomial kernel function. 3. Linear Discriminant Analysis (LDA): PseudoLinear discriminant. 4. Probalistic Neural Networks (PNN): Radial basis spread function of 0.8. While the classifiers’ specifications for abnormal group experiment are as follows: 1. Convolutional Neural Networks (CNN): Includes input Layer of size 241 × 12; convoloutional layer of ten filters of size 10 × 10 and a stride movement of two steps; pooling layer of one filter of size 2 × 2 and a stride movement of one step; fully connected layer of soft max lose function; and an output layer. 2. Support Vector Machine (SVM): Polynomial kernel function. 3. Linear Discriminant Analysis (LDA): PseudoLinear discriminant. 4. Probalistic Neural Networks (PNN): Radial basis spread function of 0.5. Table 1 provides the average accuracy results for normal subjects. While Table 2 provides the average accuracy results for subjects of knee abnormality. Figures 4 and 5 show the average accuracy of the tested classifiers for normal, and knee-abnormal groups, respectively. According to the experiment results, the proposed extracted feature of Semitendinosus (ST) muscle with CNN classifier provide a good results in recognizing leg movements for normal subjects. While for subjects of knee-abnormality, the proposed extracted feature of Semitendinosus (ST) muscle with CNN classifier
342
S. Issa and A. R. Khaled
provide the best results in recognizing only two leg movements. Furthermore, the experiments of knee-abnormal group proves the possibility for lower limb movement prediction among people suffering knee abnormality. Table 1. Average accuracy results of each muscle for normal group. Classifier RF BF VM ST CNN
84
83
89
92
SVM
68
60
65
60
LDA
55
45
55
50
PNN
43
35
50
50
Table 2. Average accuracy results of each muscle for knee abnormal group. Class. Two movements Three movements RF BF VM ST RF BF VM ST CNN
90
90
91
95 75
70
74
75
SVM
67
70
78
75 50
60
55
58
LDA
55
66
65
65 55
53
60
60
PNN
55
45
40
60 45
50
35
40
Fig. 4. The average accuracy of the tested classifiers for normal group
Table 3 illustrates a comparison between the proposed and previous approaches. Compared with previous related works, the proposed work achieved good results and solved the previous drawbacks, such as creating high dimensional features, implementing complex feature selection methods, and/or hybrid features [7,9–12].
Lower Limb Movement Recognition Using EMG Signals
343
Fig. 5. The average accuracy of the tested classifiers for abnormal group Table 3. Comparison between the proposed and previous approaches
5
Study
Method description
Classification
Naik et al. [9]
Time domain features + Fisher score and statistical techniques for feature selection
LDA classifier, 96.1% for normal group 86.2% for knee-abnormality groups
Ai et al. [7]
Time domain features + wavelet coefficients
LDA and SVM classifiers, 95% for five movements
Laudansk et al. [10] Time + frequency domain features
KNN classifier, 80.1% for knee postures
Zhang et al. [11]
Wavelet Transform (WT) + Singular Value Decomposition (SVD)
KNN classifier, 91.85% for four movements
Nazmi et al. [12]
Time domain features
ANN classifier, 87.4% for two movements
Proposed Methode
Short Time Fourier Transform (STFT) + Texture Analysis
CNN classifier, 92% three movements for normal group 95% two movements for knee-abnormality groups
Conclusion
The proposed paper produces an enhanced lower limb movement classification based on EMG signals of TS muscle. Short Time Fourier Transform (STFT) and standard deviation texture analysis are used for feature extraction process. In experiments, the constructed Convolutional Neural Network (CNN) achieved an acceptable and high results for both normal and kne-abnormal cases. In future work, different types of time-frequency conversions and advanced machine learning methods will be implemented to enhance the system results.
344
S. Issa and A. R. Khaled
References 1. Konrad, P.: The ABC of EMG. A Practical Introduction to Kinesiological Electromyography, 1st edn (2005) 2. Raez, M.B.I., Hussain, M.S., Mohd-Yasin, F.: Techniques of EMG signal analysis: detection, processing, classification and applications. Biol. Proced. Online 8, 11–35 (2006) 3. Govindhan, A., Kandasamy, S., Satyanarayan, M., Singh, P., Aadhav, I.: Towards development of a portable apparatus for knee health monitoring. In: 3rd International Conference on Recent Developments in Control, Automation and Power Engineering (RDCAPE) 2019, Noida, India. IEEE (2019) 4. Aiello, E., et al.: Visual EMG biofeedback to improve ankle function in hemiparetic gait. In: IEEE Engineering in Medicine and Biology 27th Annual Conference 2019, Shanghai, China. IEEE (2019) 5. Shabani, A., Mahjoob, M.: Bio-signal interface for knee rehabilitation robot utilizing EMG signals of thigh muscles. In: 4th International Conference on Robotics and Mechatronics (ICROM) 2016, Tehran, Iran. IEEE (2016) 6. Adami, A., Adami, A., Hayes, T., Beattie, Z.: A system for assessment of limb movements in sleep. In: IEEE 15th International Conference on e-Health Networking, Applications and Services (Healthcom 2013), Lisbon, Portugal. IEEE (2013) 7. Ai, Q., Zhang, Y., Qi, W., Liu, Q., Chen, K.: Research on lower limb motion recognition based on fusion of sEMG and accelerometer signals. Symmetry 9(8), 147 (2017) 8. Singh, E., Iqbal, K., White, G., Holtz, K.: A Review of EMG Techniques for Detection of Gait Disorders. Machine Learning in Medicine and Biology. Intechopen (2019) 9. Naik, G., et al.: An ICA-EBM-based sEMG classifier for recognizing lower limb movements in individuals with and without knee pathology. IEEE Trans. Neural Syst. Rehabil. Eng. 26(3), 675–686 (2018) 10. Laudanski, A., Acker, S.: Classification of high knee flexion postures using EMG signals. PMID 68(3), 701–709 (2021) 11. Zhang, Y., Li, P., Zhu, X., Su, S., Guo, Q., Yao, D.: Extracting time-frequency feature of single-channel vastus medialis EMG signals for knee exercise pattern recognition. PLoS ONE 12(7), e0180526 (2017) 12. Nazmi, N., Abdul, R.A., Yamamoto, S., Ahmad, S.: Walking gait event detection based on electromyography signals using artificial neural network. Biomed. Signal Process. Control 47, 334–343 (2019) 13. Sanchez, O., Sotelo, J., Gonzales, M., Hernandez, G.: EMG Dataset in Lower Limb Data Set. UCI Machine Learning Repository (2014) 14. Sejdi´c, E., Djurovi´c, I., Jiang, J.: Time-frequency feature representation using energy concentration: an overview of recent advances. Digital Signal Process. 19(1), 153–183 (2009) 15. Issa, S., Peng, Q., You, X.: Emotion classification using EEG brain signals and the broad learning system. IEEE Trans. Syst. Man Cyberne. Syst. 51(12), 7382–7391 (2020) 16. Issa, S., Peng, Q., You, X.: Emotion assessment using EEG brain signals and stacked sparse autoencoder. J. Inf. Assur. Secur. 14(1), 20–29 (2019) 17. Texture Analysis. https://www.mathworks.com/help/images/texture-analysis-1. html. Accessed 30 Sept 2021
Lower Limb Movement Recognition Using EMG Signals
345
18. Image Classifier using CNN. https://www.geeksforgeeks.org/image-classifierusing-cnn. Accessed 08 Nov 2021 19. Convolutional Neural Networks: An Intro Tutorial. https://heartbeat.comet.ml/abeginners-guide-to-convolutional-neural-networks-cnn-cf26c5ee17ed. Accessed 30 Sept 2021
A Model of Compactness-Homogeneity for Territorial Design María Beatriz Bernábe-Loranca1(B) , Rogelio González-Velázquez1 , Carlos Guillen Galván2 , and Erika Granillo-Martínez3 1 Facultad de Ciencias de La Computación, Benemérita Universidad Autónoma de Puebla,
Puebla, México 2 Facultad de Ciencias Físico-Matemáticas, Benemérita Universidad Autónoma de Puebla,
Puebla, México 3 Facultad de Administración, Benemérita Universidad Autónoma de Puebla, Puebla, México
Abstract. This paper proposes a model to solve the problem of compactness and homogeneity in territorial design. Similarly, a multi-objective optimization proposal is presented to minimize two objectives: geometric compactness in the geographical location of the data and homogeneity for some of its descriptive variables. In particular, it addresses a drainage and water problem in the Metropolitan Area of the Toluca Valley. This combinatorial problem is NP Hard and is solved under the restrictions of partitioning in complicity with the requirements of the problem variables. To obtain an approximate set of non-dominated solutions, Pareto dominance and the Minima are used. Keywords: Territorial design · Compactness · Homogeneity · Multi-objective
1 Introduction In the territory design problem that is addressed, the objective is to obtain a partition of geographics objects or spatial data. Its composition is given by 2 components: geographic coordinates in the R2 plane and a vector of 171 descriptive characteristics of a census character. The first component allows a distance matrix to be obtained for the geometric compactness calculation process, one of the objective functions to minimize. The description vector is used to optimize the second objective function, which consists of minimizing the homogeneity of some of the census variables, chosen based on some particular interest [1]. A set of solutions is sought, made up of partitions, where each element of the partition is geographically very close to each other such that the intra-group geometric compactness is satisfied. For the homogeneity objective, the balance is calculated for some census variables that define the problem and it is incorporated into the Variable Neighborhood Search within the optimization process to achieve approximate solutions that conform to the process of identifying the non-dominated solutions (Minimal) [2, 3]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 346–357, 2022. https://doi.org/10.1007/978-3-030-96308-8_32
A Model of Compactness-Homogeneity for Territorial Design
347
This work is organized as follows: Introduction in Sect. 1. In Sect. 2 the definitions and basic concepts are exposed. The model for compactness is described in Sect. 3. Section 4 presents the modeling for homogeneity. Section 5 deals with introducing multiobjective concepts and application. Finally the conclusions and references are presented.
2 Preliminaries We agree that the sets involved in this section are immersed in a metric space (Z, d) (space where a metric or symmetric distance or triangular inequality d is defined). Let A be a subset of Z, the power set of a set A is denoted by 2A and its cardinality by card (A). I is the index of i a A partition of a set Z is a collection P = {Gα : α ∈ I } (where n), of nonempty subsets, disjoint two by two, such that Z = α Gα . A k-partition of a set Z is a partition of Z of cardinality k. If P is a k-partition we denote P = Pk . The class of all partitions of Z is denoted by P and the class of all k-partitions by Pk . It should be noted that the power set contains all the subsets, therefore, all the classes or groups that can be formed from Z are subset of the power set. The distance between two nonempty sets of objects F and G is defined as d (F, G) = min{d (s, t) : s ∈ F, t ∈ G}. It is convenient to point out that s∗ , t ∗ are the set but they are distinguished elements. Ifs, t must be on the boundary then the distance would be zero. This idea gives rise to the following: the distance from an object t to a nonempty set of objects G is defined as (t, G) = d ({t}, G). An object t ∈ G is central, if its greatest distance to any other object is as small as possible, formally if we define the radius of G as rad (G) = mint∈G {maxx∈G {d (t, x)}}, then t ∈ G is central if for anyx ∈ Gd (x, t) ≤ rad (G). Given a set of objectsG, its diameter is defined as diam(G) = max{d(t, t ):t, t ∈ G} and it is interpreted as 2 times the radius and implies that there is intra-group connection. By consistency for each G we define d (G, ∅) = diam(G) y diam(∅) = 0. Considering that an object can be represented in the plane under a graph, the following is defined: An undirected graph is a pair of sets G = (V , E) where E = E(G) (set of edges of G) is a subset of the set of binary subsets of V = V (G) (set of vertices or nodes of G). The order of G is the cardinality of V and is denoted by G. A graph G is connected if for any two different nodes u, wV (G) there is a path or path that connects them, that is, a sequence of edges of G of the form {v0 , v1 }, {v1 , v2 }, . . . , {vn−1 , vn }, donde u = v0 y w = vn [4]. If a graph is not connected, we say that it is disjointed or disconnected. If the distance between v1 , v2 is different from zero, it is understood that the line joining v1 and v2 keeps them within the class or group and the intragroup connection is indirectly satisfied. Statement 1. A component of G is a maximally connected subgraph of G. Statement 2. A subset X separates G if G − X is disjointed. A graph G is said kconnected (k ∈ N), if |G| > k and G − X is connected for all X ⊆ V with card (X ) < k. The connectivity of a graph G, denoted by k(G), is the maximum integer k such that G is k-connected (there is a graph G and inside of it is X where k speaks of the degree of connection of the graph).
348
M. B. Bernábe-Loranca et al.
3 Compactness Geometric compactness in land design has not been precisely defined. In this context, the researchers have concentrated on describing the measure quantitatively based on the problem, which has implied the existence of many compactness proposals [5]. Intuitively, geometric compactness can be understood when we imagine that the shape of each area grouping resembles a square, a circle or a convex geometric figure, this means that in some cases the measurements are not totally convincing [6]. In other words, geometric compactness has been expressed as a condition that seeks to avoid the creation of areas of irregular or non-polygonal shape. It is also considered that the compactness constitutes clarity in the delimitation of the zones and it has been appreciated that the compact zones are easier to administer and evaluate due to the reduction of transfer times and communication difficulties [7]. According to the above, we have defined compactness as follows: Definition 1. We say that there is compactness when several objects are brought together very closely so that there are no gaps or few are left in such a way that the idea that the objects towards a center that represents a group have a minimum distance is fulfilled. To formalize this idea when the objects are spatial data, the distance between groups is first defined. Definition 2. Compactness If we denote by Z = {1, 2 . . . , n} the set of n objects to be classified, we try to divide Z into k groups G1 , G2 , . . . , Gk , with k < n, in such a way that: k i=1
Gi = Z; Gi
Gj = ∅ i = j and |Gi | ≥ 1 i = 1, 2, . . . , k
A group Gm with |Gi | > 1 it is compact if for each object t ∈ Gm it meets the following inequality called for this problem as neighborhood criterion (NC), that is, the NC between objects to achieve compactness is given by the pairs of distances min d (t, i) < min d (t, j) j ∈ Z − Gm i ∈ Gm i = t A group Gm with |Gm | = 1 is compact if its object t fulfills min d (t, i) < max d (j, l) i ∈ Z − {t} j, l ∈ Gf
∀f = m
An example of how two objects meet the above constraints is shown in Fig. 1. Considering the previous, a group G is compact if it is better to move to a central object of the same group than to go to any object of another group F. In formal terms, the following definition is available: Definition 3. A set of objects G ∈ 2Z is compact if for any central object z ∈ Z − G holds rad (G) ≤ d (G, z)
(1)
A Model of Compactness-Homogeneity for Territorial Design
349
Fig. 1. Four compact groups
A partition P is compact if for any G ∈ P (1) is satisfied in every central object z of the elements of the participation P. Note that with this definition the space Z is compact, this may suggest that the trivial input P1 = {Z} satisfies the desired characteristics since it is clear that rad (G) < diam(G). See Fig. 2.
Fig. 2. Compactness representation
Another useful way of looking at the compactness of a group G is to think that its elements are “close together”, that is, as close as possible. In more formal terms, its radius is close to zero, that is, the radii are very small (rad (G) → 0). Taking into account the previous observation, it is define the compactness of a partition as follows: Definition 4. Given a partition P of Z, its compactness is defined as compac(P) = rad (G)
(2)
G∈P
The following proposition establishes an equivalence with definition 1. Proposition 1. The partition P0 is compact if and only if compac(P0 ) = min{compac(p) : P ∈ P}
(3)
Demonstration. If P0 is compact, then for G ∈ P0 , is fulfilled diam(G) ≤ d (G, Z − G). For sufficiency, if the partition P0 is not compact, then there exists G ∈ P0
350
M. B. Bernábe-Loranca et al.
such that diam diam G = d (g1 , g2 ) G > d G , Z − G , let g1 , g2 ∈ G such that diam(G) > y r = d G , Z − G . If G = g ∈ G : d (g, g1 ) < r/2 -hence G∈P
diam(G). d G , Z − G + G∈P−{G }
4 Homogeneity There is an imprecision in terms of homogeneity and balance. What we are trying to find is that the groups of zones have equilibrium or balance with respect to one or more properties of the geographical units that form them. For example, zones can be designed that have the same workload, the same travel times, or the same percentages of socioeconomic representation. In general, it is not possible to achieve the exact equilibrium, therefore, it is chosen to calculate the deviation with respect to the ideal arrangement. The greater the deviation, the worse the equilibrium of the zone. Population Balance In the design of zones for a logistical problem, for example, it is desirable that all zones contain the same amount of population or at least that the population difference between each zone is the minimum possible to optimize transfers. Different measures can be defined to calculate the population balance of zones, however, each measure depends on the specific problem and all of them lead to very similar results. The formulas for homogeneity are listed below: 1. The simplest way to measure the population balance is to add the absolute values of the difference between the population of each area and the population average
per area |Pi − P|, where
Pi = population of the zone i, P = average population per area given by P = k∈K Pk /n, n = number of zones to create, K = set of all geographic units, Pk = Population of geographic unit k. 2. The difference in population between the most populated area and the least populated Pk Pi − MIN . area MAX Pk Pi − MIN /P. Sometimes this difference is divided by the population average MAX Pk Pi 3. The division of the most populated area by the least populated MAX / MIN . 4.
The following method is by the function given max P − + β)P, − β)P − P , 0 /P; where: J = set of all zones, Pj = (1 (1 j j j∈J population of the zone, β = Population standard deviation percentage. In this way, it is sought that the population of each area is within the interval (1 − β)P, (1 + β)P with 0 ≤ β ≤ 1. It is observedthat this function will take the value of 0 if the population of each zone is in the interval (1 − β)P, (1 + β)P . Otherwise, it will take a positive value equal to the sum of the deviations with respect to these levels [8]. Homogeneity for Variables Homogeneity for spatial data of a population makes sense when the values of the population variables in a grouping process have “more or less” the same values, that is,
A Model of Compactness-Homogeneity for Territorial Design
351
homogeneous groups are those that want the groups to maintain approximately the same amount of a population variable, for example, each group having roughly the same number of inhabitants. Homogeneous grouping has many applications; it is useful when you want to obtain groups of zones to which workloads or resources must be assigned equally. For example, assume that it is necessary to supply a state of the republic with a service x that is provided by machinem; if the machine m can only supply housesv, then totalhomes/v machines would be needed to supply the entire state, then it is possible to divide the state into groups that maintain homogeneity regarding the population variable “number of dwellings” and thus assign each group or zone a machine. Homogeneity Over Geographic Unit for Population Variables If we denote by Z = {1, 2, . . . , n} the set of n objects to be classified and V1 , V2 , . . . , Vn the set of values of the variable of which we want to keep the groups homogeneous, we try to divide Z in k groups G1 , G2 , . . . , Gk with k < n, such that: k i=1
Gi = Z; Gi ∪ Gj = ∅ i = j, |Gi | ≥ 1 i = 1, 2, . . . , k.
A group Gm is homogeneous if it meets j∈Gm Vj ≈ k1 ni=1 Vi . When only homogeneity is considered, the results are not practical since in the absence of compactness, the dispersion of the objects is an expected consequence [9]. According to the above, a desirable characteristic is that the groups in an area have a balance with respect to one or more properties of the geographic units that form them. A population variable can be considered as a function X with domain a subset G of a population Z and with values in the real numbers X : 2Z → R. For example, X (G) can be the number of inhabitants in zone G, or it could represent the number of dwellings in zone G among many other population parameters. Definition 5. A partition P is homogeneous with respect to the population variable X if X (F) = X (G) for any F; G ∈ P. Note that definition 5, in practice, turns out to be an ideal condition, since in many territorial design problems this condition is clearly impossible. To cite an example, if X is a variable that represents the number of economically active women in an area, it is difficult to obtain a homogeneous non-trivial partition with this condition. However, definition 6 gives us the guideline to propose a definition with a more practical utility. Definition 6. A partition P is (, μ) − homogenea or quasi-homogeneous with respect to the population variable X if for all G ∈ P it is fulfilled |X (G) − μ(G)| ≤
(4)
where is a real no negative number and μ is a function μ : 2Z → R. Note that a (, μ) − homogenuos partition is homogeneous if we choose = 0 and μ a constant function over the elements of P. Also is a bound for the homogeneity error. The determination of the parameter and the function μ are determined depending on the characteristics of the problem and the opinion of the expert.
352
M. B. Bernábe-Loranca et al.
If a partition P is (, μ) − homogeneuos and F, G ∈ P, from the triangle inequality we have |X (F) − X (G)| ≤ 2 + |μ(F) − μ(G)|
(4.1)
then if the bound for the homogeneity error is small and the function μ is close to a constant function, which implies that X (F) and X (G) are considered to be “very similar”. In many problems of territorial design, more than one population variable may intervene, for example, it may be desirable to consider an area G where the population variables are considered: X and Y , where X (G) represents the number of inhabitants in the area and Y (G) the number of economically active residents. Then definition 5 can be given in more general terms as follows: Definition 7. A partition P is (, μ)−homogeneous or quasi-homogeneous with respect to the population variable vector X (G) = (X1 (G), . . . , Xn (G)) if for all G ∈ P |Xi (G) − μi (G)| ≤ i is satisfied
(5)
for each i = 1, 2, . . . , n where = 1 , 2 , . . . , n and for each i = 1, 2, . . . , n, i is a real no negative number and μ(G) = (μ1 (G), . . . , μn (G)) where μi are real valued functions over the subsets of Z to each i = 1, 2, . . . , n. Notice that if X (G)−μ(G) ≤ , then definition 6 is fulfilled, although this would be a more simplified way of defining quasi-homogeneous partitions, there is the disadvantage that the bounds of the homogeneity errors would be uniform. However, if for all G ∈ P the modulus of each component of the vector X (G) − μ(G) is zero, then the partition P is (, μ) − homogenueous for any vector with non-negative components. Definition 8. The μ − homogeneity of a partition P with respect to the population variable vector X (G) is defined by homμ (P) = X (G) − μ(G)
(6)
where μ, is like in definition 4. Notice that if homμ (P) = 0, μ = (c, c, . . . , c), c and non- negative integer, then P partition is homogeneous.
5 Multi-objetive Scheme and Methodology Multi-objective problems can be made clearer by identifying the relationships between the characteristics of the problem, its constraints, and the main objectives to be improved together. For these types of problems, it is possible to have an expression as a mathematical function. When referring to the improvement as a whole, it is said that all functions must be optimized simultaneously, thus defining a problem of the type described below:
A Model of Compactness-Homogeneity for Territorial Design
353
Definition 9. A multiobjective problem (MOP) can be defined in the minimization case (and analogously for the maximization case) as Minimize f (x) Given the f : F ⊆ Rn → Rq , q ≥ 2 and is evaluated in A = {a ∈ F : gi (a) 0.6 and hence the performance measures calculated for λ = 0.6 are reported in below tables. Table 1 records the mAP values of Mask R-CNN, YOLOv3, CNN, DBN and Mask YOLOv3 algorithms. CNN and DBN algorithms cannot obtain the comparative performance due to its architectural pattern. Table 2 depicts the prediction accuracy % all the five algorithms. Here, also it is found that Mask YOLOv3 outperforms other algorithms in terms of prediction accuracy. Table 1. Mean average precision of object prediction algorithms @ IoU > 0.6 Mask R-CNN
YOLOv3
CNN
DBN
Mask YOLOv3
Cow
0.85
0.83
0.80
0.77
0.87
Sheep
0.87
0.86
0.78
0.79
0.85
Horse
0.84
0.84
0.8
0.75
0.91
Average
0.8
0.83
0.79
0.77
0.88
Automated Cattle Classification and Counting Using Hybridized Mask R-CNN
365
Table 2. Prediction accuracy of object prediction algorithms @ IoU > 0.6 Mask R-CNN
YOLOv3
CNN
DBN
Mask YOLOv3
Cow
83.12
85.04
82.1
81.9
92.8
Sheep
82.91
89.37
80.06
77.37
89.1
Horse
81.45
80.91
78.56
77.4
95.3
Average
86.49
82.44
80.24
78.89
85.17
Table 3. Confusion matrix of object prediction by Mask YOLOv3 @ IoU > 0.6 Cow
Sheep
Horse
Total samples
Accuracy %
Cow
79
0
6
85
93%
Sheep
9
123
7
139
89%
Horse
1
3
72
76
95%
The confusion matrix obtained from Mask YOLOv3 algorithm is shown in Table 3. 93% of the cows are predicted and counted correctly with only 6 cows misclassified as horse. Since cow has features which are heavily deviating from that of sheep, no cow is misclassified as sheep. The Mask YOLOv3 algorithm has predicted more accurately when the features are properly extracted. Out of 139 sheep, 16 are misclassified as cow and horse and the algorithm results in 89% accuracy. Detection accuracy of horse is the highest with 95% accurate prediction and counting. Out of 76 horses, 72 horses standing and walking in different positions and angles are predicted accurately by Mask YOLOv3 algorithm. In Table 4, F1 score of the object prediction algorithms are compared. Since it is a combined measure of both precision and recall, it considers both accurate predictions and also the capability of algorithm in extracting samples predicted out of total samples. For the given IoU > 0.6, F1 score of Mask YOLOv3 algorithm is superior than other four algorithms. Table 4. F1 score of object prediction algorithms @ IoU > 0.6 Mask R-CNN
YOLOv3
CNN
DBN
Mask YOLOv3
Cow
0.91
0.85
0.77
0.80
0.93
Sheep
0.89
0.81
0.76
0.78
0.92
Horse
0.89
0.83
0.78
0.79
0.94
Average
0.90
0.83
0.77
0.79
0.93
Precision value is plotted against that of recall in Precision Recall curve and is given in Fig. 5.
366
R. Devi Priya et al. 1
0.8
0.6
0.4
0.2
0 0.1
0.2
Mask R-CNN
0.3
0.4
YOLOv3
0.5
CNN
0.6
0.7
DBN
0.8
0.9
1
Mask YOLOv3
Fig. 5. Precision recall curve of object prediction algorithms
Finally, for 25 videos which are converted into 300 test images, average accuracy of all the videos is found to be 93% for cow, 89% for sheep and 95% for horses. For a video of duration 2 min and 20 s, the algorithm takes about 2 min to detect and count the cattle.
5 Conclusion and Future Work Mask-YOLOv3 algorithm is a hybrid algorithm which effectively detects, classifies and counts cattle. Single stage YOLOv3 and two stage Mask R-CNN algorithms are combined in an efficient manner so that the benefits of both algorithms are achieved together. The cattle are detected by rectangle (Bounding) boxes and performance of the proposed method is measured in terms of accuracy, mAP, precision, recall and F1 score and compared with other existing algorithms. The movement of the cattle in accordance with the line that is drawn on the frame allows to maintain the correct count of the cattle. Even though in the experiments conducted, cows, sheep and horses are detected, this system can be trained and enhanced to detect and count other animals too.
References 1. Burghardt, T., Calic, J.: Real-time face detection and tracking of animals. In: 2006 8th Seminar on Neural Network Applications in Electrical Engineering, pp. 27–32(2006) 2. Descamps, S., Béchet, A., Descombes, X., Arnaud, A., Zerubia, J.: An automatic counter for aerial images of aggregations of large birds. Bird Study 58(3), 302–308 (2011) 3. Chabot, D., Francis, C.M.: Computer-automated bird detection and counts in high resolution aerial images: a review. J. Field Ornithol. 87(4), 343–359 (2016)
Automated Cattle Classification and Counting Using Hybridized Mask R-CNN
367
4. Mejias, L., Duclos, G., Hodgson, A., Maire, F.: Automated marine mammal detection from aerial imagery. In: 2013 OCEANS-San Diego, pp. 1–5 (2013) 5. Neethirajan, S.: Recent advances in wearable sensors for animal health management. Sens. Bio-Sens. Res. 12, 15–29 (2017) 6. Huang, J., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7310– 7311 (2017) 7. Li, G., et al.: Practices and applications of convolutional neural network-based computer vision systems in animal farming: a review. Sensors 21(4), 1492 (2021) 8. Murthy, C.B., Hashmi, M.F., Bokde, N.D., Geem, Z.W.: Investigations of object detection in images/videos using various deep learning techniques and embedded platforms—a comprehensive review. Appl. Sci. 10(9), 3280 (2020) 9. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
UTextNet: A UNet Based Arbitrary Shaped Scene Text Detector Veronica Naosekpam(B) , Sushant Aggarwal, and Nilkanta Sahu Indian Institute of Information Technology Guwahati, Guwahati, Assam, India {veronica.naosekpam,sushant.aggarwal,nilkanta}@iiitg.ac.in
Abstract. The task of scene text detection has achieved notable success in recent years owing to its wide range of applications, such as automatic entry of information from the image to database, robot sensing, text translation, etc. Many works have already been proposed for horizontal text, and some works for multi-oriented scene text. However, currently, the works of text detection on arbitrarily shaped texts that commonly appear in a natural world environment are scarce. This paper proposed a segmentation-based arbitrary shaped scene text detector adopted from the UNet, called the UTextNet. It comprises of a ResNet-UNet encoderdecoder network where the residual blocks of the ResNet encoder perform features extraction, and UNet decoder module performs the segmentation of the text region. A shallow segmentation head called an approximate binarization (AB) is added for the post-processing task. It performs binarization by using the probability map and threshold map generated by the encoder-decoder framework. The performance of the UTextNet is validated on benchmark datasets, namely ICDAR 2015 and Total-Text, and has demonstrated competitive performance as with the existing state-of-the-art systems. Keywords: Scene text detection · Arbitrary shaped text Segmentation · Deep learning · Multi-oriented text
1
·
Introduction
Text detection is one of the fundamental problems of the scene text understanding pipeline whose aim is to determine the location of text instances in a given natural scene image. The text instance’s position is often represented by a rectangle, oriented rectangle, or polygon. In recent years, the vision community has witnessed an outpouring research interests towards extracting textual information from natural scenes. Some potential applications of scene text detection include image retrieval, product search, visual translation, scene parsing, blind navigation, etc. Despite immense efforts made in the last decade, it still stands out as an open problem. Most of the existing methods, though they have progressed, assumed that scene text is either straight or close to being horizontal and used rotated rectangle or quadrilateral to localize the text instances. Suppose such representations c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 368–378, 2022. https://doi.org/10.1007/978-3-030-96308-8_34
UTextNet: A UNet Based Arbitrary Shaped Scene Text Detector
369
are used to detect arbitrary oriented or curved text instances which frequently appear in the real-world environment. In that case, they tend to occupy more background space in the scene, which is considered an inefficient solution. The same is illustrated in Fig. 1. In this paper, we propose UTextNet, a multi-oriented and arbitrary (curved) shaped scene text detection system by employing a UNet based architecture to generate a more pliable boundary representation that can appropriately fit well text of arbitrary shape more compactly. The UTextNet is made up of ResNe50-UNet, a novel encoder-decoder framework which efficiently performs the task of segmentation of text from its background. The ResNet50 pre-trained on ImageNet is applied as the backbone. To optimize the model, an objective function comprising the sum of balanced cross-entropy, dice, and L1 loss is proposed. Our proposed UTextNet has demonstrated competitive performance as the state-of-the-art on standard benchmark datasets containing multioriented and curved shaped scene texts, namely, ICDAR2015 and Total-Text.
Fig. 1. Different depiction of text instances (a) Input Image (b) Horizontal rectangle. (c) Quadrangle (d) UTextNet
The rest of the paper is organized as follows: Sect. 2 surveyed the existing scene text detection methods, their schemes and shortcomings; Sect. 3 explains the proposed UNet based text detection scheme called the UTextNet; Sect. 4 shows the experimental setup and investigation of the results obtained by the proposed system, and finally, a conclusion is drawn in Sect. 5.
2
Related Work
The research area of scene text detection has witnessed an eminent trend switch from conventional methods, that is, the statistical features, machine learningbased approaches [8,11,27] to deep learning-based methods [3,16,19,28,32]. Maximally Stable Extremal Regions (MSER)[18], and Stroke Width Transform (SWT) [5] are the prominent algorithms that have impacted the conventional methods. A method proposed in [31] assumes that pixels belonging to the same character have similar colors and can be separated from the background by segmentation. Neumann et al. [20] explore MSER [18] to detect individual characters as ER (Extremal Regions), taking into consideration the computation complexity ratio and color similarity. There exist few techniques [9,11,27] that utilized the benefits of both statistical features and the deep learning-based
370
V. Naosekpam et al.
methods to detect text images in a natural scene. Convolutional Neural Network (CNN) [21,25] is allowed to learn MSER features in Huang et al. [8] and later incorporates sliding window and non-maximal suppression as the postprocessing module. A similar approach is also attempted by Wang et al. [27] using a sliding window to detect text line and text location estimation. CNN’s are applied in sliding window fashion in [9] to obtain the text saliency map. In Khatib et al. [11], RGB images are first converted to grayscale, followed by edge detection algorithm and other operations to obtain a binary image. On the binary images, connected components and text selection takes place in a multilayer fashion. Designing a fast and reliable text detector using classical machine learning approaches is strenuous owing to the fact that non-text components that look similar to text in terms of appearance are detected as text sometimes. It also fails when working with the multi-oriented text. In this context, deep learning techniques are considered more powerful and efficient in terms of robustness and accuracy. The deep learning-based scene text detection can be broadly divided into 1) Object detection-based technique (regression-based) and 2) Segmentation based technique. The object detection based methods [12,19,24] draw motivation from the object detection frameworks. Shi et al. [24] utilized the Single Shot Detector (SSD) [14] to generate SegLinks. These are small segments obtained by breaking down texts into smaller segments. By predicting links between neighboring text segments, SegLink is enabled to detect long lines of texts. Textboxes [12] also adopted the SSD architecture and made modifications by adding long default boxes along with filters, which can handle notable alterations in the aspect ratio of the text instances. Efficient and Accurate Scene Text Detector (EAST) [32] produces directly the rotated boxes or quadrangles of text in a per-pixel fashion. Inspired by the You Only Look Once (YOLO) network [22], Veronica et al. [19] proposed a multi-lingual scene text detection scheme using its shallow versions on multi-lingual Indian scene text dataset. It is observed that the regression-based methods require further processing, such as multi-scale feature layers and filters, in order to handle multioriented scene texts more accurately. Segmentation-based methods [3,16,28] on the other hand, consider the problem of scene text detection as partitioning of an image into different segments to classify the text regions at the pixel level. Fully Convolutional Neural Network (FCN) [15] is commonly taken as a base framework for this approach. From the segmentation map generated by FCN, text blocks are extracted using their proposed methods and perform postprocessing to obtain the bounding boxes. To address the adjacent character linking problem, in PixelLinks [3] 8-directional information of each pixel is used. TextSnake [16] proposed by Long et al. detects text by predicting text region and center lines. For better detection of adjacent text instances, Progressive Scale Expansion Network (PSENet) [28] finds kernel with multiple scales to accurately separate scene texts that reside close to each other at the cost of the requirement of a specific amount of resources to run the algorithm. The methods mentioned above have given encouraging results, mostly on horizontal and multi-oriented scene texts. However, most works besides the segmentation-based methods have not given emphasis to irregular or arbitrarily shaped scene texts.
UTextNet: A UNet Based Arbitrary Shaped Scene Text Detector
3 3.1
371
Proposed Methodology Network Architecture
In the proposed scheme, we have utilized the UNet [23] like encoder-decoder architecture for segmenting text regions from the background regions. The overall architecture of the proposed UTextNet is depicted in Fig. 2. The ResNet50 [6] pre-trained on ImageNet [4] is applied as the backbone network. U-Net is an extension of a fully connected network, popularly used in the medical image segmentation domain. The intuition behind the UNet is to encode the input image that passes through a series of convolution layers and decode it back to acquire the segmentation mask. The encoding process computes the essential image features and removes the spatial information. The decoding process takes the features to reconstruct desired output. One of the critical constructs of UNet is the presence of long skip connection between the contraction (encoding) and the expansion (decoding) path, which makes it practical for creating a segmentation mask. It is well known that residual blocks [6] assist in producing a deeper neural network by addressing the problem of vanishing gradient. So, in the proposed scheme, instead of using superficial convolution layers in the encoding module of UNet, we used ResNet50 blocks. The residual network makes use of two 3 × 3 convolutional layers and an identity mapping. A batch normalization layer and a ReLU (Rectified Linear Unit) follow each convolution layer. The identity mapping connects the input and output of the convolutional layer. The desired input image size for ResNet is 224×224, but when it is used as an encoder, the fully connected (FC) layers are removed. Hence the network is adapted to any input size. Resnet50 layers followed by the global average pooling layers are abandoned, and the features are passed to the decoder. The decoder performs transpose convolution to upscale the incoming feature maps into the desired shape. These upscaled feature maps are connected with the specific shape feature maps from the encoder through the skip connections. The skip connections aid the model in accumulating all the low-level semantic information from the encoder, allowing the decoder to generate the desired feature map. Thus, by substituting the convolutions of UNet at every level with the residual block, a good performance
Fig. 2. Proposed UTextNet architecture
372
V. Naosekpam et al.
is achieved. In order to make the dimension of the input feature map and output feature map same, the same padding is performed in the convolution operation. A notable improvement to the architecture is the presence of local and long skip connections in between convolutions at every level. It helps in avoiding gradient explosion and disappearance. A probability map, P r ∈ F h×w also known as the segmentation map and the thresholding map, T ∈ F h×w are the output predicted by the ResNet-UNet’s feature map F. They have the same height (h) and width (w) as the original image but with only one channel. By using Pr and T, the approximate binary map AB ∈ F h×w is calculated. During the training period, all the mentioned maps, Pr, T, and AB, have training labels where the P r and AB share one label. At the inferencing stage, the irregular-shaped boundary can be acquired using the probability map or the approximate binary map with the help of a box formulation module. 3.2
Approximate Binarization
The general segmentation task with two classes involves probability map prediction of the foreground and binarizing it along with a priorly set threshold. The standard binarization process is defined as: 1, if P rm,n > t SB = (1) 0, otherwise SB refers to the binary map, P r is the probability map, t denotes the priorly defined threshold, and (m, n) signifies the coordinate points in the map. 1 refers to the text region, and 0 refers to the non-text region. However, this standard binarization function of Eq. 1 is not differentiable and cannot be added to the trainable deep learning network for optimization. Therefore, Approximate Binarization has been opted which can be optimized along with the segmentation network defined as: 1 (2) ABm,n = 1 + e−z(P rm,n −Tm,n ) Where AB denotes the differential binary map, z denotes the amplifying factor set to 50, T is the adjustable threshold map learned from the network, and Pr is the probability map. It is observed that a boundary-like threshold map is advantageous for more finer guidance of bounding box creation. This binarization head is inspired by [13]. The approximate binarization with adaptive threshold supports setting apart the text instance regions from the background and separating the texts that are firmly joined together. At the segmentation head of the network using the differential binarization, three output maps are produced, the threshold map, T, probability map, Pr, and the approximate binary map, AB. 3.3
Loss Function
It is evident that for any scene text detection dataset, the number of background class instances is greater than the number of text instances. Therefore, training
UTextNet: A UNet Based Arbitrary Shaped Scene Text Detector
373
a deep neural network might be biased towards detecting the background class more than the text class. To tackle this issue, the balanced cross-entropy loss function is adopted for the probability map’s loss LP r and is given by the following equation (Eq. 3): ¯ Z∗ ) LP r = bal entropy(Z, ¯ − (1 − β)(1 − Z∗ log(1 − Z)) ¯ = −βZ∗ log Z
(3)
Z¯ is the score map prediction and Z ∗ is the ground truth. β denotes the balancing factor given by : ∗ z z ∗ ∈Z (4) β =1− |Z∗ | Loss for the approximate binary map Lab is defined by the dice loss as: 2 yh,w y¯h,w Lab = 1 − (5) yh,w + y¯h,w where y and y¯ are the true and the predicted values. The dice loss is used because it is known to work well for the segmentation of regions that come from an unbalanced dataset, which in our case, is dominated by the background class compared to the text regions. Threshold map’s loss LT h is determined as the sum of L1 distance between the prediction and label interior to the enlarged text polygon: LT h = |yk∗ − zk∗ | (6) k∈Q
where Q is the set of pixels’ indices inside the enlarged polygon. The L1 loss is found to be resistant to outliers in the data making it robust with its least absolute deviations. While training the network in a supervised manner, the loss function of the UTextNet is a combination of LP r , LT h and Lab as: Loss = LP r + LT h + Lab
(7)
As a result of the merits of each of the individual loss functions, the novel utilization of the ResNet-UNet based network, and the proper segregation of text from non-text using approximate binarization, the outcome of the detection network is found to be promising. During the testing time, to generate the text boundary bounding box, either the approximate binary map or the probability map can be considered. The steps to obtain the bounding box are: 1) A constant threshold (set to 0.5) is used first to binarize the map to obtain the binary map; 2) The shrunk text regions are accumulated from the binary map; and 3) The shrunk regions are dilated with ˜ using the Vatti clipping algorithm [26], a method of clipping arbitrary offset O polygons. The formula is as follows: ˜ ˜ = Area × q O P˜
(8)
374
V. Naosekpam et al.
˜ denotes the shrunk polygon’s area, P˜ is the polygon’s perimeter and q is Area called the shrink ratio whose value is empirically set to 1.7.
4 4.1
Experiments Datasets
The proposed method is analyzed and evaluated using ICDAR 2015 [10] and Total-Text dataset [2]. ICDAR 2015 dataset contains 1500 natural images acquired using Google glass in the streets. The size of each image is fixed at 1280 × 760. 1000 images are set as training data, and 500 images are set as test data. The text instances are annotated by 4 points quadrilateral. The majority of the images have multiple text instances that are blurred, small, skewed, and multi-oriented. Word regions are assembled as care and do not care. The do not care annotation comprises text in non-Latin scripts, non-readable texts. They are not considered during the evaluation. It is one of the regularly used datasets for scene text detection tasks. The Total-Text dataset consists of texts of various shapes but is dominated by curved scene texts or a combination of curved with multi-oriented texts. It has 1255 training and 200 testing images. The ground truths are annotated at word labels using 10 coordinates, forming a polygon. 4.2
Training
The models (ICDAR 2015 and Total-Text) are trained using their respective training sets from scratch. A scene text image with 3 channels (RGB) resized to a height and width of 640 × 640 is passed as an input to the network. During the training stage, Adam is employed to optimize the proposed network. AMSGrad is set to true so that it fixes the convergence issue while using Adam. The batch size used is 8, the initial learning rate is set to 0.0000005 for ICDAR 2015 and 0.000005 for Total-Text respectively. The learning rate is reduced by 10% after each step per epoch. The total of epochs is set to 10000 with 100 steps per epoch. A weight decay with the value 0.0001 is also introduced. The momentum value is set to 0.9. The training set size is increased using data augmentation techniques such as 1) Random cropping and size adjustment; (2) Random flipping; (3) Random rotation with an angle between −10◦ and +10◦ with the blank part filled with 0s. The images are resized to an equal height and width for the training set. The aspect ratio is maintained for the inference set, and the input image’s height is resized to a specific appropriate value. 4.3
Results and Analysis
The comparison of our proposed UTextNet scene text detection system with the existing methods on ICDAR2015 and Total-Text dataset is shown in Table 1 and Table 2 respectively. Some of the visual results obtained from the experiment are shown in Fig. 3 and 4 respectively.
UTextNet: A UNet Based Arbitrary Shaped Scene Text Detector
375
Fig. 3. Perfect (Row 1) and imperfect (Row 2) detection examples of UTextNet on ICDAR 2015
As shown in Table 1 and 2, our method has performed quite competitively as per the existing state-of-the-art methods. It is evident from the results of ICDAR 2015 visualized in Fig. 3 that our model performs exceptionally well in outdoor settings as well as in indoor settings that contain blurry text instances. A lower precision is obtained in ICDAR 2015 dataset (Table 1) due to the higher occurrence of false positives caused by the presence of multiple bounding boxes that break a word into smaller chunks. For the Total-Text dataset, our model can localize most of the text present in the scene images. Our method fails to accomplish appropriately when the image’s background is very similar to the text, such as tiles, windows, similar foreground, background-color combination, etc. An overall lower accuracy obtained in ICDAR 2015 when compared to TotalText can be due to the major occurrence of blurred, small text instances captured via wide-angled google lens camera. Nonetheless, the proposed UTextNet achieves effective performance on both the datasets.
Fig. 4. Perfect (Row 1) and imperfect (Row 2) detection examples of UTextNet on Total-Text
376
V. Naosekpam et al.
Table 1. Experimental analysis on ICDAR 2015 to verify the feasibility of the UTextNet on multi-oriented scene texts
Table 2. Experimental analysis on Total-Text to verify the feasibility of the UTextNet on arbitrary shaped texts
Method
Precision Recall F1-score
Method
Precision Recall F1-score
EAST [32]
0.68
0.79
0.74
EAST [32]
0.50
0.36
0.42
Yao et al. [30]
0.58
0.72
0.64
Textboxes [12]
0.62
0.45
0.52
SegLink [24]
0.73
0.76
0.75
SegLink [24]
0.35
0.33
0.34
Ma et al. [17]
0.82
0.73
0.77
Ch’ng et al. [2] 0.33
0.40
0.36
Liao [13]
0.86
0.70
0.82
Liao [13]
0.88
0.77
0.82
Megvii-Image++ 0.72
0.57
0.63
TextSnake [16]
0.82
0.74
0.78
WordSup [7]
0.64
0.74
0.69
PSENet [28]
0.81
0.75
0.78
Ours
0.77
0.78
0.79
Wang et al. [29] 0.76
0.81
0.78
Ours
0.76
0.81
5
0.86
Conclusion
This paper proposes a ResNet-UNet based scene text detector, the UTextNet principally for multi-oriented and curved text instances. The experimental evaluation has verified that the proposed algorithm achieved comparable, balanced results and outperformed some state-of-the-art systems when tested on benchmark datasets. In the future, we intent to extend our work to deal with multilingual arbitrary shaped scene text data using a more robust and less computationally complex architecture easily deployable in resource constraint systems like mobile devices and embedded systems. Exploration of a hybrid tree-based computational intelligence approach [1] can also be performed. Another extension work is considering video scene text in place of the static scene text images in the dataset to tackle the issue of constant blurring and occlusion.
References 1. Chen, Y., Abraham, A.: Tree-Structure Based Hybrid Computational Intelligence: Theoretical Foundations and Applications, vol. 2. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04739-8 2. Ch’ng, C.K., Chan, C.S.: Total-text: a comprehensive dataset for scene text detection and recognition. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 935–942. IEEE (2017) 3. Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: detecting scene text via instance segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018) 4. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009) 5. Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2963–2970. IEEE (2010)
UTextNet: A UNet Based Arbitrary Shaped Scene Text Detector
377
6. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) 7. Hu, H., Zhang, C., Luo, Y., Wang, Y., Han, J., Ding, E.: Wordsup: exploiting word annotations for character based text detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4940–4949 (2017) 8. Huang, W., Qiao, Yu., Tang, X.: Robust scene text detection with convolution neural network induced MSER trees. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 497–511. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2 33 9. Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 512–528. Springer, Cham (2014). https://doi.org/10.1007/978-3-31910593-2 34 10. Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE (2015) 11. Khatib, T., Karajeh, H., Mohammad, H., Rajab, L.: A hybrid multilevel text extraction algorithm in scene images. Sci. Res. Essays 10(3), 105–113 (2015) 12. Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: Textboxes: a fast text detector with a single deep neural network. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31 (2017) 13. Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11474–11481 (2020) 14. Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0 2 15. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) 16. Long, S., Ruan, J., Zhang, W., He, X., Wu, W., Yao, C.: Textsnake: a flexible representation for detecting text of arbitrary shapes. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 20–36 (2018) 17. Ma, J., et al.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimedia 20(11), 3111–3122 (2018) 18. Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22(10), 761–767 (2004) 19. Naosekpam, V., Kumar, N., Sahu, N.: Multi-lingual Indian text detector for mobile devices. In: Singh, S.K., Roy, P., Raman, B., Nagabhushan, P. (eds.) CVIP 2020. CCIS, vol. 1377, pp. 243–254. Springer, Singapore (2021). https://doi.org/10.1007/ 978-981-16-1092-9 21 20. Neumann, L., Matas, J.: A method for text localization and recognition in realworld images. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010. LNCS, vol. 6494, pp. 770–783. Springer, Heidelberg (2011). https://doi.org/10.1007/9783-642-19318-7 60 21. Poma, Y., Melin, P., Gonz´ alez, C.I., Mart´ınez, G.E.: Optimization of convolutional neural networks using the fuzzy gravitational search algorithm. J. Autom. Mob. Robot. Intell. Syst. 14, 109–120 (2020)
378
V. Naosekpam et al.
22. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) 23. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4 28 24. Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2550–2558 (2017) 25. Varela-Santos, S., Melin, P.: A new modular neural network approach with fuzzy response integration for lung disease classification based on multiple objective feature optimization in chest x-ray images. Expert Syst. Appl. 168, 114361 (2021) 26. Vatti, B.R.: A generic solution to polygon clipping. Commun. ACM 35(7), 56–63 (1992) 27. Wang, T., Wu, D.J., Coates, A., Ng, A.Y.: End-to-end text recognition with convolutional neural networks. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR 2012), pp. 3304–3308. IEEE (2012) 28. Wang, W., et al.: Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9336–9345 (2019) 29. Wang, X., Jiang, Y., Luo, Z., Liu, C. L., Choi, H., Kim, S.: Arbitrary shape scene text detection with adaptive text region representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6449– 6458 (2019) 30. Yao, C., Bai, X., Sang, N., Zhou, X., Zhou, S., Cao, Z.: Scene text detection via holistic, multi-channel prediction. arXiv preprint arXiv:1606.09002 (2016) 31. Zhong, Y., Karu, K., Jain, A.K.: Locating text in complex color images. Pattern Recognit. 28(10), 1523–1535 (1995) 32. Zhou, X., et al.: East: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5551– 5560 (2017)
VSim-AV: A Virtual Simulation Platform for Autonomous Vehicles Leila Haj Meftah(B) and Rafik Braham PRINCE Research Lab, ISITCom, Sousse University, 4011 Sousse, Tunisia [email protected], [email protected]
Abstract. Autonomous cars that can operate on their own will be commercially accessible in the near future. Autonomous driving systems are growing increasingly complex, and they must be thoroughly tested before being used. Our study is contained under this framework. The main goal of this study is to provide a simulation environment for self-driving cars. We first developed VSim-AV, a revolutionary high-delity driving simulator that can link arbitrary interfaces, construct simulated worlds comprised of situations and events encountered by drivers in real-world driving, and integrate completely autonomous driving. VSim-AV enables the cloning of human driver behavior as well as some complicated scenarios such as obstacle avoidance techniques. The effort entails developing a virtual simulation environment to collect training data for cars to learn how to navigate themselves. As a result, the simulator is similar to a car racing video game. We did, in fact, use the scenes to create certain driving experiences. Following the collection of training data, we decided to utilize a deep learning approach to develop a model for autonomous cars that avoid obstacles. Clearly, the fundamental problem for an autonomous car is to travel without colliding. This simulator is used to evaluate an autonomous vehicle’s performance and to examine its self-driving operations. The proposed solution in this technique is practical, efficient, and dependable for autonomous vehicle simulation study. Keywords: Simulator avoidance · VSim-AV
1
· Unity 3D · Autonomous vehicles · Obstacle
Introduction
Since the 1950s, Modeling and Simulation technology has evolved to become a synthesized specialized technology, fast becoming a common and strategic technology, as a result of the pulling of the requirements of diverse applications and the pushing of the evolution of related technologies [1,2]. The current trend in modern modeling and simulation technologies is toward networking, virtualization, intelligence, collaboration, and pervasiveness. Aside from theory and experiment research, modeling and simulation technologies paired with high speed computers is now the third key technique for identifying and recreating the objective reality [3]. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 379–388, 2022. https://doi.org/10.1007/978-3-030-96308-8_35
380
L. H. Meftah and R. Braham
Infrastructure expenses and the logistical hurdles of training and testing systems in the actual environment stymie autonomous vehicle research. Even one autonomous vehicle takes substantial funding and labor to outfit and operate [4]. And a single vehicle is far from adequate for collecting the necessary data to cover the variety of corner situations that must be handled for both training and validation. This is true for traditional modular systems, but it is especially true for data-hungry deep learning approaches [5]. Simulation may be used to train and verify driving techniques. Simulation has the potential to simplify research in autonomous driving conditions. It is also required for system verification since some scenarios are too harmful to stage in the real world (e.g., a child running onto the road ahead of the car) [6]. Since the early days of autonomous driving studies, simulation has been used to train driving models. Racing simulations have lately been used to test novel methods to autonomous driving. To train and evaluate autonomous vision systems, special simulation settings are often employed. Commercial games have also been utilized to collect high-quality data for training and testing visual perception systems [5]. To create a realistic simulator for autonomous vehicle testing, numerous scenarios that may occur in the proximity of cars in the actual world must be simulated. This simulator is intended to evaluate an autonomous vehicle’s performance and to examine its self-driving operations. The proposed technique is shown to be feasible and efficient for autonomous vehicle simulation study using this method. To gather data, train, and assess the performance of the autonomous vehicle depicted in this study, the open-source VSim-AV: Virtual simulation platform for AV is implemented. External software is being created to replicate human driving on this simulator track’s training mode by combining image processing with machine learning techniques like as neural networks and behavioral cloning.
2 2.1
VSim-AV: Virtual Simulation Platform for AV Problem Definition
Following the study and comparison of existing simulators of autonomous vehicles [5], we discovered that we require a new simulation environment with the following challenges: • Specifically solve obstacles avoidance maneuver problems. • Building a simple autonomous vehicle training environment. • The requirement for a real-time simulator. This paper describes the creation of a real-time simulation environment for autonomous cars. This environment allows you to replicate a vehicle’s behavior while doing specific moves. We decided to start with the obstacle avoidance maneuver. This environment is being created in the style of a car racing video game. The data from road navigation was then used to clone driver behavior for autonomous vehicles. This environment enables the cloning of a human driver’s behavior in complicated scenarios such as the obstacle avoidance maneuver.
VSim-AV: A Virtual Simulation Platform for Autonomous Vehicles
381
As a result, the job done consists of developing a virtual simulation environment. This is designed to collect training data for learning vehicles. To construct this simulator, we utilize the Unity 3D game engine to generate a road environment with realistic circumstances. This simulator is akin to a video game about car racing. Following the collection of training data, we decided to utilize machine learning techniques to create a model, which would then be followed by an autonomous car to avoid obstacles. 2.2
Technologies Used
This section describes the technologies utilized in the implementation of this simulation platform as well as the reason for adopting them. TensorFlow is an open-source dataflow programming framework. It’s commonly utilized in machine learning applications. It is also used as a math library as well as for big computations. Keras, a high-level API with TensorFlow as the backend, is utilized for this project. Keras makes it easier to construct models since it is more user friendly [7]. Python has a number of libraries that may be used to aid with machine learning applications. Several of these libraries have enhanced the project’s performance. This section mentions a few of them. The first is “Numpy,” which provides a library of high-level arithmetic functions to handle multidimensional metrics and arrays. This is used in neural networks for quicker calculations over weights (gradients). Second, “scikit-learn” is a Python machine learning library that includes several algorithms and Machine Learning function packages [7]. Another option is OpenCV (Open Source Computer Vision Library), which is developed for computational efficiency and real-time applications. OpenCV is utilized in this work for image preprocessing and augmentation techniques [7]. AnaConda Environment, an open source Python distribution that facilitates package management and deployment, is used in this work. It is ideally suited to large-scale data processing. This simulator was built on a personal computer with the following configuration: • CPU: Intel(R) Core i5-7200U @ 2.7 GHz • RAM: 8 GB • System: 64-bit operating system, x64 CPU. 2.3
Contribution and Development Tools
Our job is to follow the fundamental design of an existing project [8] and alter the road structure: to minimize the number of curvatures and to integrate items with velocity, such as vehicles, as obstacles. The simulator proposed in this chapter is provided to address the problem of overtaking in autonomous cars in general. We opted to solve the situation of overtaking without velocity in this study (obstacle avoidance).
382
L. H. Meftah and R. Braham
Unity 3D Game Engine: A game engine is a collection of software components used in video games to execute geometry and physics computations. The whole thing comes together to make a customizable real-time simulator that matches the features of the imagined worlds in which the games take place [9]. Unity 3D is a video game development tool: • • • •
Developed by Unity Technologies Written in C, C #, C ++ Latest version used October 12, 2018 Environment: Microsoft Windows, macOS, Linux, Xbox One, Wii, Wii U, PlayStation 3, PlayStation 4, PlayStation Vita, iOS, Android, WebGL, Tizen, Facebook, tvOS and Nintendo Switch.
The software is distinguished by the use of a script editor compatible with (C #) UnityScript (a language similar to JavaScript and inspired by ECMAScript that has been discontinued since version 2017.2) and Boo (which has been discontinued since version 5.0) rather than Lua, which is widely used in video games. This is an asset-oriented strategy, as demonstrated by the usage of a specialized EDI, as opposed to engines like as the Quake engine, whose core parts are the source codes. It is the 2D counterpart of the Director writing program that use Lingo. It is more convenient for 3D applications such as Shiva, Virtools, and Cheetah3D. It does not enable modeling and instead lets you to build scenes using lighting, topography, cameras, textures, music, and movies. It is distinguished by these characteristics as a hybrid of VRML and QuickTime. The design software was originally developed for the Mac platform and has now been ported to Windows, allowing users to obtain compatible applications for Windows, Mac OS X, iOS, Android, TV OS, PlayStation 3, PlayStation Vita, PlayStation 4, Xbox 360, Xbox One, Xbox One X, Windows Phone 8, Windows 10 Mobile, PlayStation Mobile, Tizen, Oculus Rift, Wii U, Nintendo 3DS, Nintendo Switch, WEB GL, Samsung TV, and in a web page. Version 4.0, published in November 2012, incorporates the production of Linux-compatible games. As a result, the games created will be able to operate on Linux. Unity has been accessible under Linux with export limitations since August 25, 2015. (no Windows export for example). Initially, support for Linux-compatible games is based on the Ubuntu distribution and proprietary drivers given by graphics card vendors. Within a video game team, Unity 3D interacts with Canonical [9]. The choice to use Unity 3D is justified by the following characteristics: • • • • •
Portability Compatibility An easy-to-use programming language. A virtual reality environment. Provides an environment where the implementation of the autonomous concept already exists in some games.
VSim-AV: A Virtual Simulation Platform for Autonomous Vehicles
383
The steps to creating a car racing video game in Unity 3D are as follows: 1. 2. 3. 4. 5. 6.
Get the appropriate version. Create a Unity user account. Installation. Load the assets. Design the components and their interactions. Making a new track. You may quickly create a new track by dragging prebuilt road prefab parts from Assets/RoadKit/Prefabs onto the editor. You can also simply snap road pieces together by holding down “v” and dragging a road piece near to another piece (Fig. 1).
Why choose Unity? • • • •
Can be used as a simulator Once developed the game can be exported anywhere A 3D engine Scripts made in several C #, UnityScript, BOO
Fig. 1. VSim-AV: Unity project
Because of Unity’s persistent focus on performance, VSim-AV on Unity provides smooth overall performance while operating at high frames per second. When combined with Visual Studio, you get the ideal development environment that is both integrated and user-friendly. The concept for VSim-AV comes from Udacity’s self-driving course. We take this open source project and modify the road’s architecture, adding some obstacles in the form of yellow cubes. As shown in Fig. 2, we have arbitrarily positioned the obstacles (yellow cubes) in the left and right lanes of the road. We try to avoid it later with driving exams for a competent driver. We need to make this unity 3D project executable after designing the road and the environment. As a result, we used the Inno Setup Compiler program. There are two modes in the first interface of the simulator. The training mode is the initial mode. In training mode, we manually operate the vehicle to record the driving behavior. The data set contains all images captured by the three cameras installed in the autonomous vehicle (center, right, and left cameras). The captured pictures may be used to train any machine learning model.
384
L. H. Meftah and R. Braham
Fig. 2. Road architecture of VSim-AV
It’s similar to a car racing game, except you can record all of your driving data. A driving file log will be created, which will be utilized later in the training process. Anaconda: Python Environment. On some systems, installing a Python machine learning environment might be challenging. Python must be installed initially, and then numerous packages must be installed. As a result, we decided to use Anaconda to set up a Python machine learning development environment. Then we install all machine learning libraries (Tensorflow, Keras, ...) TensorFlow is an open-source library created by the Google Brain team for internal usage. It employs automated learning methods based on the deep neural network concept (deep learning). There is a Python API accessible. Keras is a Python library that provides access to the functionalities provided by many machine learning frameworks, most notably Tensorflow. Keras, in reality, does not natively provide the techniques. It just communicates with Tensorflow. Since the performance of our Google Colab or Colaboratory is a cloud service provided by Google (for free), based on Jupyter Notebook, and intended for machine learning training and research (Fig. 3). Training Mode: In training mode, we manually operate the vehicle to record the driving behavior. The captured pictures may be used to train the machine learning model. Autonomous Mode: In autonomous mode, we are putting your machine learning model to the test to see how effectively it can drive the vehicle without going off the road or falling into the lake. Technically, the simulator operates as a server to which your application may connect and receive a stream of image frames. For example, your Python script might analyse road pictures using a machine learning model to predict the optimal driving directions and transmit them back to the server. Each driving instruction includes a steering angle and an acceleration throttle, which control the direction and speed of the vehicle (via acceleration). Your program will get fresh picture frames in real time as this occurs.
VSim-AV: A Virtual Simulation Platform for Autonomous Vehicles
385
Fig. 3. VSim-AV operation
3
Training Data
The main concept here is to collect training data by driving the vehicle in a simulator, then train the machine learning model with that data, and finally let the machine learning model drive the car. VSim-AV already included several pre-recorded laps, but We chose to experiment with the simulator directly. So far, we’ve completed 5 laps in each direction of the road. Both directions are required to avoid the machine learning model’s bias toward turning to the left side of the road. The recording resulted in 36 534 pictures being recorded. Images feature data from three different cameras on the vehicle: left, center, and right. For training objectives, we just employed the center camera, which proved to be sufficient for obtaining extremely good end-results. To make the model more generic, it is recommended that all three cameras for the vehicle be used to better handle scenarios for returning to the centre of the road. When utilizing multiple cameras, keep in mind that the driving direction must be correctly set with a constant for both the left and right cameras. The training data also includes a CSV file with time stamped image captures as well as steering angle, throttle, brake, and speed. The steering angle is not biased since the training data includes laps in both track directions: The data is designed to be utilized for training 80% of the time and validation 20% of the time (Fig. 4). 3.1
Data Augmentation and Preprocessing
We turned every image around the horizontal axis to expand the size of the training data set even further. Preprocessing of pictures is done in this case within the model since it is less expensive when done on GPU rather than CPU. To eliminate unneeded noise from the pictures, we clipped 50 pixels from the top and 20 pixels from the bottom of each image (front part of the car, sky, trees, etc.). The most difficult problem was generalizing the vehicle’s behavior on the road, something it had never been taught for.
386
L. H. Meftah and R. Braham
Fig. 4. Data distribution
In practice, we will never be able to train a self-driving car model for every potential track since the data will be too large to analyze. Furthermore, it is not possible to collect data for all weather conditions and routes. As a result, there is a need to devise a method for generalizing the behavior on diverse tracks. This issue is addressed utilizing image preprocessing and augmentation approaches, which will be explored more below: • Crop: The pictures in the dataset contain important characteristics in the lower portion where the road can be seen. The external environment above a certain image part is never utilized to decide the output and can thus be cropped. In the training set, approximately 30% of the top section of the picture is clipped and passed. The code snippet and transformation of a picture after cropping and resizing it to its original size. • Flip (horizontal): The image is horizontally flipped (that is, a mirror image of the original image is supplied to the dataset). The goal is for the model to be trained for comparable types of turns on opposing sides as well. This is significant because the route only has left turns. The code snippet and picture transformation after flipping it. • Shift (horizontal/vertical): The picture has been slightly moved. • Brightness: To generalize to weather circumstances such as brilliant sunny days or overcast, lowlight conditions, brightness augmentation can be extremely beneficial. The code snippet and brightness rise Similarly, we have arbitrarily reduced the brightness level for various situations. • Shadows: Even when the light conditions are taken into account, there is a risk that there will be shadows on the road. This will result in a half-lit and half-lowlight scenario in the image. This augmentation is applied to the dataset to throw random shadows and address the shadow fitting problem. • Random blur: This augmentation is used to correct the distortion effect in the camera while capturing images, as an image obtained is not always clear. The camera may fall out of focus at times, but the vehicle must still suit that condition and remain stable. This random blur augmentation can account for such circumstances. • Noise: This adds random noise to the pictures by imitating dust or dirt particles and distortions while recording the image in unclean circumstances.
VSim-AV: A Virtual Simulation Platform for Autonomous Vehicles
387
Fig. 5. Example of augmented image
4
Results and Discussion
The simulator presented in this article is useful and efficient. We investigated two deep learning-based ways to managing a particularly sensitive maneuver, namely obstacle avoidance for autonomous cars. • In this paper [10] we propose the first application of our simulator VSimAV: To build a model for autonomous cars that avoid obstacles, we opted to utilize deep learning explicitly Convolutional Neural Networks. Clearly, the fundamental problem for an autonomous car is to travel without colliding. This simulator is used to evaluate an autonomous vehicle’s performance and to examine its self-driving operations. The proposed solution for autonomous vehicle simulation study is practical, efficient, and reliable using this technology. • In this paper [11] we present the second successful use of VSim-AV: We are investigating a model for high-quality obstacle avoidance prediction for autonomous cars based on pictures generated by a virtual simulation platform (VSim-AV) and then used with a VGG 16 deep learning approach that includes transfer learning. This article [11] offers a learning strategy transfer utilizing the VGG16 architecture, with the suggested architecture’s output compared to the present NVIDIA architecture. The experimental findings show that the VGG16 with the transfer learning architecture outperformed the other evaluated techniques. As a result, we can conclude that the simulator provided in the study is suitable and efficient for testing instances of deep learning techniques to control the obstacle avoidance maneuver for autonomous cars (Fig. 5).
5
Conclusion and Future Work
This paper describes the development of a real-time simulation environment for autonomous vehicles. This environment enables users to replicate a vehicle’s behavior while doing specific maneuvers. This environment is being built in the form of a car racing video game. The data from road navigation was then used to clone driver behavior for self-driving cars. In the future, we may utilize the simulator provided in this research to solve difficulties in autonomous cars urban navigation such as overtaking maneuver, traffic signal recognition, bad weather situation, and so on.
388
L. H. Meftah and R. Braham
References 1. Yao, W., Dai, W., Xiao, J., Lu, H., Zheng, Z.: A simulation system based on ROS and gazebo for robocup middle size league. In: 2015 IEEE International Conference on Robotics and Biomimetics (ROBIO), pp. 54–59. IEEE (2015) 2. Daily, M., Medasani, S., Behringer, R., Trivedi, M.: Self-driving cars. Computer 50(12), 18–23 (2017) 3. Takaya, K., Asai, T., Kroumov, V., Smarandache, F.: Simulation environment for mobile robots testing using ROS and gazebo. In: 2016 20th International Conference on System Theory, Control and Computing (ICSTCC), pp. 96–101. IEEE (2016) 4. Rosique, F., Navarro, P.J., Fern´ andez, C., Padilla, A.: A systematic review of perception system and simulators for autonomous vehicles research. Sensors 19(3), 648 (2019) 5. Ahangar, M.N., Ahmed, Q.Z., Khan, F.A., Hafeez, M.: A survey of autonomous vehicles: enabling communication technologies and challenges. Sensors 21(3), 706 (2021) 6. Shah, S., Dey, D., Lovett, C., Kapoor, A.: AirSim: high-fidelity visual and physical simulation for autonomous vehicles. In: Hutter, M., Siegwart, R. (eds.) Field and Service Robotics. SPAR, vol. 5, pp. 621–635. Springer, Cham (2018). https://doi. org/10.1007/978-3-319-67361-5 40 7. Brownlee, J.: Deep learning with Python: develop deep learning models on Theano and TensorFlow using Keras. Machine Learning Mastery (2016) 8. Smolyakov, M., Frolov, A., Volkov, V., Stelmashchuk, I.: Self-driving car steering angle prediction based on deep neural network an example of carnd udacity simulator. In: 2018 IEEE 12th International Conference on Application of Information and Communication Technologies (AICT), pp. 1–5. IEEE (2018) 9. Roedavan, R.: Unity tutorial game engine (2018) 10. Meftah, L.H., Braham, R.: A virtual simulation environment using deep learning for autonomous vehicles obstacle avoidance. In: 2020 IEEE International Conference on Intelligence and Security Informatics (ISI), pp. 1–7. IEEE (2020) 11. Meftah, L.H., Braham, R.: Transfer learning for autonomous vehicles obstacle avoidance with virtual simulation platform. In: Abraham, A., Piuri, V., Gandhi, N., Siarry, P., Kaklauskas, A., Madureira, A. (eds.) ISDA 2020. AISC, vol. 1351, pp. 956–965. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-71187-0 88
Image Segmentation Using Matrix-Variate Lindley Distributions Zitouni Mouna1(B) 1
and Tounsi Mariem2
Laboratory of Probability and Statistics, Sfax University, B.P. 1171, Sfax, Tunisia 2 Computer Engineering and Applied Mathematics Department, Sfax National Engineering School, Sfax, Tunisia
Abstract. The aim of this article is to study a statistical model obtained by the mixture of the Wishart probability distribution on symmetric matrices. We call it “Matrix-variate Lindley distributions”. We show that this distribution includes the matrix-variate Lindley distribution of the first kind and second kinds on the modern framework of symmetric cones. Its statistical properties and its relationship with the Wishart distribution is discussed. For estimating its parameters, an iterative hybrid Expectation-Maximization Fisher-Scoring (EM-FS) algorithm is created. Finally, in medical image segmentation the effectiveness and the applicability of the proposed distributions are proved with respect to the Wishart distribution. Keywords: Expectation-Maximization algorithm · Fisher-Scoring algorithm · Image segmentation · Lindley probability distribution · Mixture models · Wishart probability distribution
1
Introduction
In the last decades, the Wishart and derived distributions on the space of symmetric matrices have received a lot of attention because of their use in graphical Gaussian models. Furthermore, it is well known that the Wishart distribution [10] plays a prominent role in the estimation of covariance or precision matrices in multivariate statistics. It is also of particular importance in Bayesian inference, where it is the conjugate prior of the inverse of the covariance matrix of a multivariate normal distribution. It is very useful for several image processing applications such as image restoration and image segmentation (see [9,11] and [14]). In our research work, we are interested in the generalization of Lindley probability distribution in the real line [4] to matrix-valued random variables. It is a one-parameter distribution with probability density function given by θ2 (1 + x)e−θx ; x > 0, θ > 0. 1+θ
(1)
From the above equation, it can be seen that this distribution is a mixture of exponential distribution with parameter θ and gamma distribution with c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 389–398, 2022. https://doi.org/10.1007/978-3-030-96308-8_36
390
Z. Mouna and T. Mariem
parameters (2, θ). It has attracted a wide applicability in survival and reliability. Ghitany et al. [3] have discussed its various nice practical properties and they showed the performance of the Lindley than exponential distribution for modeling the waiting times before service of the bank customers. From the motivating properties of this probability distribution, we extend the definition of the Lindley distribution in the matrix case. Shanker et al. [7] introduced another kind of Lindley distribution which is the Lindley distribution of the second kind with a probability density function presented as θ2 (1 + αx)e−θx ; x > 0, θ > 0, α > −θ. θ+α
(2)
It also includes the matrix-variate Lindley distribution of the first kind as spatial case for α = 1 in (2). Various important mathematical properties of the Lindley distribution of the second kind as moments, stress-strength reliability, estimation of parameters and their applications to model waiting and survival times data were tackled by (see [7] and [8]). The aim of the present paper is to give a multivariate analogs of the real Lindley distributions of the first and the second kind on the modern framework of symmetric cones. Further, we provide an application in medical image segmentation (see [12] and [14]) to show that the matrix-variate Lindley distributions (LDs) yields a better fit than the Wishart distribution. The remaining sections of this paper are organized as follows. Some notations and definitions are introduced in Sect. 2. New probability distributions on symmetric matrices, named “the matrix-variate Lindley distributions” have been tackled in Sect. 3. An estimation of the model parameters using hybrid Expectation-Maximization Fisher-Scoring (EM-FS) algorithm is displayed in Sect. 4. The proposed iterative approach is based on the multivariate Fisher-Scoring (FS) method [6] and the Expectation-Maximization (EM) algorithm (see [1] and [13]). Unsupervised method of segmentation based on the EM algorithm and the LDs to illustrate the effectiveness and feasibility of the new matrix distributions is detailed in Sect. 4. The last section wraps up the conclusion.
2
Notations and Definitions
In this section, we display some definitions and notations in the general framework of positive definite real symmetric matrices. Let Mr be the space of real symmetric r × r matrices with identity element Ir . We equip Mr with the inner product X, Y = tr(XY ), where tr(XY ) is the trace of matrices’s ordinary product. We further designate by M+ r the cone of positive definite elements of Mr . The determinant of the matrix X is denoted by det(X) and its transpose is X t . Definition 1. An r × r positive definite symmetric matrix X is said to have a matrix-variate Wishart distribution with a shape parameter p > r−1 2 and scale
Multivariate Lindley Distributions
391
parameter Σ in M+ r , denoted as X ∼ Wr (p, Σ), if its probability density function is given by r+1 det(Σ)p −Σ,X e det(X)p− 2 1M+ (X), (3) r Γr (p) where Γr (.) denotes the multivariate gamma function indicated by Γr (p) = π
r(r−1) 4
r j=1
3
Γ (p −
j−1 ). 2
(4)
Matrix-Variate Lindley Distributions
In this section, we consider matrix-variate generalizations of Lindley distributions of the first and second kinds defined by the densities (1) and (2), respectively. Basing on the fact that the gamma distribution is replaced by the Wishart distribution on symmetric matrices, then we defined in this section an extension of the real Lindley distributions to the matrix case. In this study, we used the mixture of two different Wishart distributions (see [5]). Let U be a random matrix from the mixture distributions which are defined as (5) fU (u) = πfX (u; Σ) + (1 − π)fY (u; Σ), where X ∼ Wr ( r+1 2 , Σ) and Y ∼ Wr (r + 1, Σ). π is the mixture weight of the distributions. It should belong to the interval (0, 1) and is given by r+1 (6) π = (det(Ir + Σ −1 ))− 2 . Using the definition of the Wishart probability distribution given in (3), we get fU (u) = −
1
r+1 1 det(u) 2 Γr (r + 1) r+1 r+1 (det(Ir + Σ −1 ))− 2 det(u) 2 det(Σ)r+1 e−Σ,u 1M+ (u). (7) r
(det(Ir Γr ( r+1 2 ) 1 Γr (r + 1)
+ Σ))−
r+1 2
+
Definition 2. The distribution of U is called the matrix-variate Lindley distri+ bution of the first kind on M+ r with parameter Σ ∈ Mr , it is also denoted by (1) U ∼ LDr (Σ). reduces to the Lindley Note that when r = 1 in the Definition 2, the LD(1) r distribution in the real line R given in (1). We now proceed to define the matrix-variate Lindley distribution of the second kind. Definition 3. An r × r random positive definite symmetric matrix V is said to have a matrix-variate Lindley distribution of the second kind with parameters Σ
392
Z. Mouna and T. Mariem
(2) and ψ, where both Σ and ψ are in M+ r , it is also denoted by V ∼ LD r (Σ, ψ), if its probability density function is given by
1
r+1 1 det(v) 2 Γr (r + 1) r+1 r+1 (det(Ir + Σ −1 ψ))− 2 det(v) 2 det(Σ)r+1 e−Σ,v 1M+ (v). (8) r
(det(ψ Γr ( r+1 2 ) −
1 Γr (r + 1)
+ Σ))−
r+1 2
+
The relationship between the two kinds of LDs is now exhibited and given in the following theorem. 1
1
−2 Theorem 1. (i) Let U ∼ LD(1) Uψ − 2 , then V ∼ r (Σ). If we set V = ψ 1 1 (2) LDr (ψ 2 Σψ 2 , ψ). 1 1 1 (1) − 12 2 2 (ii) Let V ∼ LD(2) Σψ − 2 ). r (Σ, ψ). If we set W = ψ Vψ . Then W ∼ LD r (ψ
Proof 1
r+1
1
(i) Transforming V = ψ − 2 Uψ − 2 , with the Jacobian J(U −→ V) = (det ψ) 2 , in the density of U given in (7), the desired result is obtained. r+1 1 1 (ii) Transforming W = ψ 2 Vψ 2 , with the Jacobian J(V −→ W) = (det ψ)− 2 , in the density of V given in (8), the desired result is obtained. Note that when ψ = Ir in the Definition 3, then the LD(2) r (Σ, Ir ) reduces to (2) the LD(1) reduces to the r (Σ). Also, when r = 1 in the Definition 3, the LD r Lindley distribution in the real line R given in Eq. (2).
4
Parameters Estimation via the EM-FS Algorithm
The aim of this section is to estimate the parameters of the LDs using the EM algorithm proposed by Dempster et al. [1]. We describe in detail how to estimate the parameters of the LDs using the proposed hybrid algorithm called Expectation- Maximization Fisher-Scoring (EM-FS) algorithm based on the EM algorithm and the multivariate Fisher scoring method. Let (X1 , ..., XN ) be N independent random symmetric positive definite matrices with r order and common density function f given respectively by (8), and let (x1 , ..., xN ) be N associated observations. Let (π1 , π2 ) be the vector of mixing proportions. They should belong to the interval [0, 1] and sum up to one. The incomplete likelihood function l of (x1 , ..., xN ) is indicated as l(x1 , x2 , . . . , xN ; Θ) =
N
(π1 f1 (xi ; Σ) + π2 f2 (xi ; Σ)),
i=1
where ∀1 ≤ i ≤ N f1 (xi ; Σ) =
1 Γr ( r+1 2 )
det(Σ)
r+1 2
e−Σ,xi 1M+ (xi ), r
(9)
Multivariate Lindley Distributions
and f2 (xi ; Σ) =
393
r+1 det(Σ)r+1 −Σ,xi e det(xi ) 2 1M+ (xi ). r Γr (r + 1)
r+1
r+1
Recall that π1 = det(Σ) 2 (det(Σ + ψ))− 2 and π2 = 1 − π1 . We denote by Θ = (Σ, ψ), the vector of unknown parameters to be estimated by the EM algorithm. Since we need to use the logarithm in order to turn multiplication into addition, the log-likelihood L can be written as L(x1 , x2 , . . . , xN ; Θ) = log l(x1 , x2 , . . . , xN ; Θ) =
N
log (π1 f1 (xi ; Σ) + π2 f2 (xi ; Σ)) .
(10)
i=1
So, in order to estimate Θ for each observed data matrix xi we associate a discrete random vector zi = (zi1 , zi2 ) following a multivariate Bernoulli distribution with vector parameters (π1 , π2 ). We have ∀1 ≤ k ≤ 2, zik = 1 if Xi arises from the k th distribution; 0 otherwise. So, the log-likelihood function Lc from complete data is presented as Lc (x1 , x2 , . . . , xN , z1 , z2 , . . . , zN ; Θ) =
N zi1 log (π1 f1 (xi ; Σ)) + zi2 log (π2 f2 (xi ; Σ)) . i=1
Let’s consider the Expectation step (E) which is equivalent to compute at iteration (l + 1) the conditional expectation of the complete data log-likelihood Q(Θ||Θ(l) ) given the previous values Θ(l) which are defined as Q(Θ||Θ
(l)
) =
N i=1
+
N i=1
+
where
(l)
τi1
(l)
τi2
(r + 1) log(det(Σ)) −
r+1 r+1 log(det(Σ + ψ)) − log(Γr ( )) − Σ, xi 2 2
r+1 − r+1 2 log 1 − det(Σ) 2 det(Σ + ψ) − log(Γr (r + 1)) + (r + 1) log(det(Σ))
r+1 log(det(xi )) − Σ, xi . 2
(11)
(l) τik = E zik |Θ(l) , x1 , . . . , xN = E zik |Θ(l) , xi (l)
=
πk fk (xi ; Σ (l) ) , k = 1, 2 i = 1, . . . , N. 2 (l) (l) πk fk (xi ; Σ )
(12)
k=1
The maximization step (M) of the algorithm consists of estimating, at the (l+1)th = Θ(l+1) that maximizes Q iteration, the new Θ = arg max Q(Θ||Θ(l) ) . Θ (13) Θ
394
Z. Mouna and T. Mariem
To determine the estimators of Σ and ψ, we start by calculating and
∂Q(Θ||Θ (l) ) . ∂ψ
By making system:
∂Q(Θ||Θ (l) ) ∂Σ
and
∂Q(Θ||Θ (l) ) ∂ψ
∂Q(Θ||Θ (l) ) ∂Σ
equal to zero, we have the following
⎧
N ⎪ r+1 ⎪ (l) −1 −1 ⎪ ⎪ (Σ + ψ) τ − − x (r + 1)Σ i ⎪ i1 ⎪ 2 ⎪ i=1 ⎪ ⎪
⎪
N ⎨ Σ −1 − (Σ + ψ)−1 r+1 (l) −1 τi2 (r + 1)Σ + − xi = 0; + r+1 ⎪ 2 −1 ⎪ 2 1 − det(I i=1 ⎪
r + Σ ψ) ⎪
⎪ N N ⎪ (l) (l) ⎪ r+1 (Σ + ψ)−1 r+1 ⎪ −1 ⎪ = 0. τ τ − − (Σ + ψ) ⎪ r+1 i1 i2 ⎩ 2 2 1 − det(Ir + Σ −1 ψ) 2 i=1 i=1
(14) These two natural log-likelihood equations defined by Eq. (14) do not seem to be solved directly. However, the FS method see [6] which is a form of NewtonRaphson (NR) approach can be applied to solve these equations. We shall give a short description about FS approach applicable to the parameters vector Θ and the parameters of interest in this study are matrices. It may note that the NR method is one of the most used methods of statistical optimization. It can be used to calculate iteratively the maximum likelihood when direct solutions do not exist. Its iteration formula is given by Θ(l+1) = Θ(l) + (J(Θ))−1
∂Q(Θ||Θ(l) ) , ∂Θ
where
∂ 2 Q(Θ||Θ(l) ) . ∂2Θ However, the NR approach has certain limitations. As, its lack of stability (see [6]). Hence, it is remedied by making J(Θ) = −
Θ(l+1) = Θ(l) + γ(J(Θ))−1
∂Q(Θ||Θ(l) ) , ∂Θ
where 0 < γ < 1 is a constant. An additional method of stabilization is to let Θ(l+1) = Θ(l) + γ
J(Θ) +
∂Q(Θ||Θ(l) ) ∂Θ
2 −1
(l)
∂Q(Θ||Θ(l) ) . ∂Θ
) This formula avoids taking large step when ∂Q(Θ||Θ is large. ∂Θ As J(Θ) may be negative unless the true parameter Θ is already very close Additionally, J(Θ) might sometimes be to the maximum likelihood estimator Θ. hard to calculate. Furthermore, the optimum of the estimation will generally be easier if the initial predicted value of the function is good. If the initial predicted value of the parameter is not near the optimum value so the new estimation will
Multivariate Lindley Distributions
395
be recalculated and the process will be repeated continuously. To resolve this problem, we shall use the scoring algorithm known as FS. The latter is given by Sutarman and Darnius [6] to solve the maximum likelihood equation. It is similar to NR algorithm, the difference is FS using information matrix. FS equation is presented as
Θ
(l+1)
=Θ
(l)
∂Q(Θ||Θ(l) ) 2 ) + γ I(Θ) + ( ∂Θ
−1
∂Q(Θ||Θ(l) ) , ∂Θ
where I is the matrix of information written as 2
∂ Q(Θ||Θ(l) ) I(Θ) = −E . ∂2Θ In order to compute the estimators using the EM-FS algorithm, we need to 2 (l) 2 (l) ) ∂ 2 Q(Θ||Θ (l) ) ) , and ∂ Q(Θ||Θ . calculate ∂ Q(Θ||Θ ∂Σ 2 ∂Σ∂ψ ∂ψ 2 A description of the EM-FS algorithm is given below. Algorithm 1 1: 2: 3: 4: 5:
begin Initialisation: Σ (0) , ψ (0) r+1 −1 (0) π1 = (det(Ir + (Σ (0) ) ψ))− 2 . (0) (0) π2 = 1 − π 1 (0)
6:
τi1 ←−
7:
τi2 ←−
8: 9: 10: 11: 12: 13: 14: 15: 16:
5
(0)
(0)
π1 f1 (xi ;Σ (0) ) (0) (0) π1 f1 (xi ; Σ (0) ) + π2 f2 (xi ; Σ (0) ) (0) π2 f2 (xi ;Σ (0) ) (0) (0) π1 f1 (xi ; Σ (0) ) + π2 f2 (xi ; Σ (0) )
Iteration (l+1) : Expectation: Q(Θ||Θ(l) ) Fisher Scoring-Maximization: 2 2 (l) (l) −1 ∂Q(Θ||Θ (l) ) ) ) + ∂Q(Θ||Θ Σ (l+1) ←− Σ (l) + γ − E ∂ Q(Θ||Θ ∂Σ ∂Σ ∂2 Σ 2 −1 ∂Q(Θ||Θ (l) ) ∂ 2 Q(Θ||Θ (l) ) ∂Q(Θ||Θ (l) ) (l+1) (l) + ψ ←− ψ + γ − E ∂ψ ∂ψ ∂2 ψ Θ(l+1) ←− {Σ (l+1) , ψ (l+1) } If ||Θ(l+1) − Θ(l) || < is not satisfied return to step 2. end
Application: Medical Image Segmentation
Let us consider an unsupervised method of image segmentation based on the EM algorithm and the LDs so as to illustrate the effectiveness and feasibility of the new matrix distributions.
396
Z. Mouna and T. Mariem
The proposed unsupervised algorithm of segmentation relies upon four steps, which are defined as follows. Step 1: Choice of the number of image regions. Step 2: Initialization of the parameters by K-means algorithm. Step 3: Alternating the following two steps until the following constraint is obtained (15) ||Θ(l+1) − Θ(l) || < . – Expectation step: Computing the posterior probabilities. – Maximization step: Maximizing the conditional expectation with respect to ΣK and ψK , K = 1, 2. The image pixels are then classified based on the highest posterior probability. Step 4: Calculate the posterior probability τ (α; ΣK , ψK ) of a pixel α given by Eq. (12), for each class C1 and C2 . The matrix of pixels denoted α is said to be tumor if (16) τ (α; Σ1 , ψ1 ) > τ (α; Σ2 , ψ2 ); otherwise it is considered as normal tissue. This section highlights the experimental results of the proposed segmentation algorithm with real brain MR images and a comparative study with respect to the Wishart distribution is carried out [2]. In order to measure the segmentation accuracy, we also need to calculate the misclassification ratio (M CR). In the following, we take the original image (see Fig. 1) is of size (183 × 183) given from the Magnetic Resonance Imaging (MRI) Center, Road of El-Ain, Sfax-Tunisia.
Fig. 1. Brain image.
For the Lindley of the second kind and the ⎛ Wishart distributions⎞in medical 2.5972 0.2397 0.2130 image segmentation, we take: r = 3, Σ = ⎝ 0.2397 2.7369 0.2369 ⎠ and ψ = 0.2130 0.2369 2.3627 ⎛ ⎞ 1.6262 0.1596 0.1505 ⎝ 0.1596 1.7072 0.1661 ⎠ . 0.1505 0.1661 1.5252
Multivariate Lindley Distributions
397
Fig. 2. Segmentation results of the brain image. (a) Tumor position detected by the Wishart distribution, (b) Tumor position detected by the LDs. Table 1. The M CR values of the proposed distributions. Matrix distributions LD M CR
Wr ( r+1 , Σ) 2
0.0182 0.0631
According to the segmented images (Fig. 2) and the small M CR values (Table 1), the effectiveness of the proposed algorithm is evaluated. Satisfactory segmentation results were obtained by using the proposed method based on the LDs (image (b)). We can notice that the proposed algorithm with LDs gets a better segmentation result than the Wishart distribution. In image (a), we have some mix classified pixels using the Wishart distribution. We have extracted from the original brain image the region of interest (tumor), we have classified the image pixels into two classes: class of brain tumor and class of normal tissue. The numerical results given by the MCR criterion prove the best quality of the LDs to identify the tumor. In future work, we shall apply the new proposed matrix distributions with other real data sets.
398
6
Z. Mouna and T. Mariem
Conclusion
In this investigation, new matrix-variate distributions named the matrix-variate Lindley distributions of the first and of the second kind have been addressed. An iterative hybrid algorithm called EM-FS algorithm for estimation of the model’s parameters is elaborated. It is a combination between the EM algorithm and the FS method to estimate matrix parameters. To demonstrate the effectiveness of the corresponding distributions medical image segmentation using the LDs are conducted. Our work is a step that may be taken further, built upon and extended as it paves the way for future works to explore further the LDs to modeling real data sets.
References 1. Dempster, A., Lair, N., Rubin, R.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Ser. B 39(1), 1–38 (1977) 2. Fang, L., Licheng, J., Biao, H., Shuyuan, Y.: Pol-Sar image classification based on Wishart DBN and local spatial information. IEE Trans. Geosci. Remote Sens. 54(6), 1–17 (2016) 3. Ghitany, M.E., Atieh, B., Nadarajah, S.: Lindley distribution and its application. Math. Comput. Simul. 78(4), 493–506 (2008) 4. Lindley, D.V.: Fiducial distributions and Bayes’ theorem. J. Roy. Stat. Soc. Ser. B 20, 102–107 (1958) 5. McLachlan, G., Peel, D.: Finite Mixture Models. Wiley, Hoboken (2004) 6. Sutarman, S.A.P., Darnius, O.: Maximum likelihood based on Newton Raphson, fisher scoring and expectation maximization algorithm application on accident data. Int. J. Adv. Res. 6(1), 965–969 (2018) 7. Shanker, R., Sharma, S., Shanker, R.: A two-parameter Lindley distribution for modeling waiting and survival times data. Appl. Math. J. 4(2), 363–368 (2013) 8. Shanker, R., Mishra, A.: A two-parameter Lindley distribution. Stat. Trans. New Ser. 1(14), 45–56 (2013) 9. Wang, W., Xiang, D., Ban, Y., Zhang, J., Wan, J.: Superpixel-based segmentation of polarimetric SAR images through two-stage merging. Remote Sens. J. 11(4), 402 (2019). https://doi.org/10.3390/rs11040402 10. Wishart, J.: The generalised product moment distribution in samples from a normal multivariate population. Biometrika 20, 32–52 (1928) 11. Jiang, X., Yu, H., Lv, S.: An image segmentation algorithm based on a local region conditional random field model. Int. J. Commun. Netw. Syst. Sci. 13(9), 139–159 (2020) 12. Wu, X., Bi, L., Fulham, M., Feng, D.D., Zhou, L., Kim, J.: Unsupervised brain tumor segmentation using a symmetric-driven adversarial network. Neurocomputing 455, 242–254 (2021) 13. Zitouni, M., Zribi, M., Masmoudi, A.: Asymptotic properties of the estimator for a finite mixture of exponential dispersion models. Filomat 32(19), 1–24 (2018) 14. Mouna, Z., Mourad, Z., Afif, M.: Unsupervised image segmentation using THMRF model. In: Abraham, A., Hanne, T., Castillo, O., Gandhi, N., Nogueira Rios, T., Hong, T.-P. (eds.) HIS 2020. AISC, vol. 1375, pp. 41–48. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-73050-5 5
Improving Speech Emotion Recognition System Using Spectral and Prosodic Features Adil Chakhtouna1(B) , Sara Sekkate2 , and Abdellah Adib1 1
Team Computer Science, Artificial Intelligence and Big Data, MCSA Laboratory, Faculty of Sciences and Technology of Mohammedia, Hassan II University of Casablanca, Casablanca, Morocco [email protected], [email protected] 2 Higher National School of Arts and Crafts of Casablanca, Casablanca, Morocco [email protected]
Abstract. The detection of emotions from speech is a key aspect of all human behaviors, Speech Emotion Recognition (SER) plays an extensive role in a diverse range of applications, especially in human-computer communication. The main aim of this study is to build two Machine Learning (ML) models able to classify the input speech into several classes of emotions. In contrast, we extract a set of prosodic and spectral features from sound files and apply a feature selection method to improve the SER rate of the proposed system. Experiments are being done to evaluate the accuracy of the emotional speech system with the use of the RAVDESS database. We performed the efficiency of our models and compared them to the existing literature for SER. Our obtained results indicate that the proposed system based on Support Vector Machine (SVM) and K-Nearest Neighbors (KNN) achieves a test accuracy of 69.67% and 65.04% respectively with 8 emotional states. Keywords: Speech emotion recognition · Machine learning and spectral features · Feature selection · SVM · KNN
1
· Prosodic
Introduction
SER is one of the major aspects of the computational study of human communication behavior. Emotions consist of a set of biological and psychological states generated by a range of sensations, memories and behaviors. In SER, it is essential to design reliable and robust systems adapted to real-world applications for enhancing analytical capabilities for human decision making. For example, we can recognize emotions in e-learning programs [1] by detecting the state of users when they get bored during a training session, translation systems for the efficient delivery of the content [2] or call centers to detect customer emotions and satisfaction [3]. SER requires a high-level workflow, from feature extraction through their selection to classification. In this paper, we used a set of the most commonly c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 399–409, 2022. https://doi.org/10.1007/978-3-030-96308-8_37
400
A. Chakhtouna et al.
applied features in SER. Once the number of features is higher, it is often more complicated to have a high-performing model, it is there where the dimension reduction algorithms like Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) come into play to decrease the number of variables. In this paper, we employ a feature selection algorithm, namely Sequential Forward Floating Selection (SFFS), which aims to collect appropriate and relevant attributes and eliminate useless ones. Therefore, it is clear that the best classifier is needed to effectively label different emotions from speech utterances. We then generated two models based on SVM and KNN classifiers. Judging from the experimental results, our model outperforms the preceding frameworks for The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) dataset. The remainder of this document is structured as follows. First, we provide a survey of existing works related to SER in Sect. 2. In Sect. 3, we explain in detail the block schema of our suggested system. Experimental results and discussions are provided in Sect. 4. Finally, the conclusions of this study and future plans for SER are discussed in Sect. 5.
2
Related Works
SER has been widely used in a range of studies. They vary in terms of used databases, feature extraction, feature reduction, classification and performance assessment. All of these factors make challenging comparison between studies. Authors in [4] compared different SVM kernels to classify emotions in a speaker-independent mode. They extracted Continuous Wavelet Transform (CWT) and prosodic features followed by PCA to reduce dimensionality. The obtained results reported a classification accuracy of 60.1% using a quadratic kernel. Getahun and Kebede [5] proposed a prototype for identifying spontaneous speech communication in a collection of telephone dialogues from the Amharic call center. Acoustic features comprising prosodic, spectral and voice quality features were extracted from each sound file. An optimal set of features is chosen by combining three feature selection approaches (Gain Ratio, Info Gain, One R) and fed to Multilayer Perceptron Neural Network (MLPNN). The result on a private dataset indicates a classification accuracy of 72.4% using four emotions. Sun et al. [6] proposed a decision tree SVM model. This ranking framework was built by computing the degree of confusion of the emotion, and then the most meaningful attributes were chosen for each SVM in the decision tree according to the Fisher criterion. The performance of the proposed method showed that the emotion recognition rate was 9% greater than the standard SVM classification method on the CASIA database, and 8.26% greater when considering the EMO-DB corpus. Bhavan et al. [7] aimed to improve SER by employing a bagged ensemble of SVM with a Gaussian kernel and a combination of spectral features including Mel Frequency Cepstral Coefficients (MFCC), its derivatives and Spectral Centroids. Their results showed an overall accuracy of 92.45% on EMO-DB and 84.11% on the Indian Speech Corpus (IITKGP-SEHSC) for eight emotions.
Improving SER System Using Spectral and Prosodic Features
401
With all these mentioned works, our contribution in this study is to build a SER system with an emphasis on two important stages: Feature extraction and feature selection using SFFS.
3
Methodology
The block diagram of the proposed SER system is presented in Fig. 1. It consists of pre-processing of the audio files in the database, the extraction of prosodic and spectral features, splitting the data into training and testing sets, followed by the selection of features and lastly, the classification of different emotions. Each step of the proposed system will be described in detail.
Fig. 1. The block diagram of the proposed SER.
3.1
Emotional Speech Database
Fig. 2. Distribution of the number of speech samples per emotion in RAVDESS.
Choosing an appropriate database is a very challenging task because of the level of naturalness of the emotional speech. The employed corpora in SER systems
402
A. Chakhtouna et al.
are divided into three groups; that are: (1) Acted, collected from professional actors. (2) Elicited, gathered by creating an artificial emotional situation, with no knowledge of the speakers. (3) Spontaneous, expressed in real life situations. RAVDESS database is chosen in our work. It contains 7356 files in three operating formats, audio-only, audio-video and video-only. Audio-only files include speech and songs files. Hence, we focused only on speech file recordings of 24 professional actors (12 men, 12 women). Figure 2 illustrates the number of emotional speech utterances of the eight emotions (Neutral, Calm, Happy, Sad, Angry, Fearful, Disgust and Surprised). Each utterance is produced at two states of emotional intensity (normal, strong), with the exception of the neutral emotion. 3.2
Pre-processing
This phase consists of all operations that can be applied to an audio file before the extraction of its features. The speech signal is first pre-processed by a framing process to decrease the length of the voice file and reduce signal distortion. Each voice signal is framed by the same number of samples, in this case, the number of samples N for each frame is 512. The number of frames differs for each voice signal depending on its duration. The frames of each voice signal are treated with an overlap equal to 50%, so as not to lose voice signal information. After the process of framing, there are some discontinuities at the frame boundaries of the speech signal. To avoid this issue, this is where the role of windowing comes in, among several types of windows, the Hamming window is adopted in this study [8]. Considering α = 0.54, β = 1-α = 0.46, it is defined as: w(n) = α − β cos 3.3
2πn N −1 N −1 f or − ≤n≤ N −1 2 2
(1)
Feature Extraction
This process involves capturing the most meaningful information of a speech utterance. Extracting appropriate features is a key decision that affects the SER system performance. With no universal choice, many researchers are confused about which features are better for SER. Acoustic features such as prosodic and spectral, are considered to be among the strongest features of speech due to the improved recognition results achieved by their combination [9]. In this study, a collection of spectral and prosodic features are extracted with the use of jAudio tool [10]. 1. Prosodic features These are a type of para-linguistic features, their increased popularity is attributable to the fact that they convey more information about the emotional state of speakers. In addition, prosodic features are primarily related to the rhythmic aspects of speech, they are those that can be perceived by humans, such as intonation and rhythm [9]. The information expressed in the prosodic features tends to structure the flow of speech. In this study, three prosodic features are retrieved and are explained below.
Improving SER System Using Spectral and Prosodic Features
403
– Pitch: refers to the perceived fundamental frequency of a given voice and it is generated by the vibrations of the vocal cords. The pitch can be estimated based on the auto-correlation formula given by [11]: bf −m
A(k) =
x(n)x(n + m)
(2)
n=bi
where [bi , bf ] is the frame width and m is the lag. – Energy : measures the strength of a voice, it is physically detected by the pressure of sound in the human respiratory system. The energy of a given signal x is obtained by summing the absolute squared values for each frame. – Zero Crossing : represents the number of times the voice signal changessign from positive to negative and vice versa. 2. Spectral features Additionally to the prosodic features, spectral ones are extracted to improve classification performance, each detailed below. – MFCC : is a popular feature used in speech processing. The process of its extraction is based on a representation of a short-time signal derived from the Fast Fourier Transform (FFT) of this signal [12]. It is compatible with the human hearing system, which ensures a natural and real reference for speech recognition. 13 coefficients had been extracted for MFCC. – LPC : are spectral features obtained from a linear combination of past and present values of a voice signal, it is calculated using auto-correlation and Levinson-Durbin recursion methods [13]. In this study, 9 LPC coefficients had been extracted. – Spectral Centroid : indicates the center of mass of the magnitude spectrum of a speech signal. It is a powerful indicator of the brightness of a sound. – Spectral Flux : is defined as the spectral correlation between two neighbouring frames. It is a measure of the amount of spectral change in a signal. – Spectral Rolloff : is defined as the frequency at which 85% of the energy in the spectrum of signal is below this point. This is a measure of the skewness of the power spectrum. – Spectral Compactness: is very much related to the spectral smoothness as proposed by McAdams [14]. It is calculated by the sums over frequency bins of an FFT. This is a measure of the noisiness of the sound signal [10]. – Spectral Variability : is the standard deviation of the magnitude spectrum. This is a measure of the variance of a signal’s magnitude spectrum [10]. In total, we get a number of eight features extracted from the jAudio tool. For each one, seven related statistics are calculated. For pitch and energy features, they are extracted from other libraries with six statistical features. Figure 3 summarizes the set of features and their statistics used in our study. After feature extraction, we aggregated our data into a csv file with each feature’s numerical value as well as the corresponding emotion classes. Furthermore, in order to make the data in a standard configuration, we normalized it using Z-score normalization.
404
A. Chakhtouna et al.
Fig. 3. The extracted spectral and prosodic features.
3.4
Feature Selection
Feature selection has a major impact on enhancing systems performance. This is done by filtering redundant features to generate an ensemble of the most significant ones. Irrelevant features often lead to unsuitable results; so, a better feature selection technique enables easier visualization and understanding of the data, reducing the computational time and minimizing the dimensionality [15]. Feature selection approaches can be wrapper, filter or embedded algorithms. In our work, we applied a wrapper one that is SFFS. It aims to select the best features firstly by a forward step and then eliminate a number of features by backward step, assuming that the selected features are much better than the previously evaluated ones at this iteration [16]. 3.5
Classifier
In the field of ML, many pattern recognition algorithms have been designed for developing SER systems. In this study, experimental results were performed using SVM and KNN classifiers to evaluate the efficiency of the proposed approach. 1. Support Vector Machine (SVM): They are supervised ML models that are extensively used in many pattern recognition systems [17]. Based on the non-linearity of SVM, the central idea is to find the hyper-plane with the maximum margin between data points of classes. The values of the parameters C (regularization) and gamma (kernel coefficient) are chosen with 6 and 0.01 respectively, and we used Radial Basis
Improving SER System Using Spectral and Prosodic Features
Function (RBF) as a kernel function. It is defined as follows: ||x − y||2 KRBF (x, y) = exp − 2σ 2
405
(3)
where x and y are two samples and σ is the kernel width. 2. K-Nearest Neighbors (KNN): This algorithm is commonly used among various supervised pattern recognition techniques. The algorithm is based on choosing the K nearest neighbors using a distance metric, the point with the most votes in the K nearest neighbors belongs to the corresponding class. Here, we choose the number of K equal to 3 and the Manhattan distance as a distance metric.
4
Experiments
In this section, we evaluate the performance of our proposed method on RAVDESS database using all of the eight emotions. Each one consists of 192 utterances, except neutral emotion which has 96 only. To secure the validity and significance of our results, the dataset was partitioned into 30% for testing and 70% for training. 4.1
Results and Discussions
For SVM, the number of selected features with the SFFS algorithm is 103. The accuracy of fearful, happy and sad is improved by 14.6%, 16.7% and 18.1% respectively. For KNN, the number of features is 100. The accuracy of disgust is improved by 22.4% and for surprise by 14.5%. Table 1 presents the performance of training and testing accuracies obtained for each classifier before and after using feature selection. From these results, we can observe an important improvement when applying the SFFS algorithm. Table 1. Comparison of the recognition rates before and after using SFFS (%). SVM
KNN
Without SFFS Training rate 61.31 Testing rate 59.72
56.35 59.29
With SFFS
Training rate 76.99 74.70 Testing rate 69.67 65.04
As shown in Table 2 and Table 3, the confusion matrices of our models indicate that SVM achieves the highest accuracy compared to KNN, this lies in the strong generalization performance of SVM. The average accuracies for all emotions are very different. Surprise is the most easily identified emotion for the
406
A. Chakhtouna et al. Table 2. Confusion matrix of SVM (%). Emotions Angry Calm Disgust Fearful Happy Neutral Sad
Surprised
Angry
73.7
0
13.1
3.2
6.5
0
0
3.2
Calm
0
67.0
4.8
1.2
4.8
9.7
10.9 1.2
Disgust
17.7
2.2
68.8
2.2
0
2.2
4.4
2.2
Fearful
0
1.7
1.7
77.5
6.8
1.7
8.6
1.7
Happy
2.3
0
4.6
2.3
72.0
2.3
2.3
13.9
Neutral
0
15.6
6.2
0
6.2
59.3
9.3
3.1
Sad
1.8
Surprised 3.5
3.6
9
12.7
3.6
9
54.5 5.4
0
1.7
5.3
7.1
0
1.7
80.3
two classifiers. In contrast, neutral and sad are the hardest emotions to recognize because of the small number of utterances of neutral compared to the other emotions. For the SVM classifier, the sad emotion has an improvement of 1.9% in comparison with KNN. Table 3. Confusion matrix of KNN (%). Emotions Angry Calm Disgust Fearful Happy Neutral Sad
Surprised
Angry
5.7
65.2
10.1
4.3
14.4
0
0
Calm
1.3
63.8
4.1
2.7
2.7
9.7
15.2 0
Disgust
10.2
0
82.0
5.1
0
0
0
Fearful
0
2.1
8.5
76.5
4.2
0
4.2
4.2
Happy
3.9
1.9
3.9
11.7
54.9
7.8
7.8
7.8
Neutral
0
25
4.5
0
6.8
47.7
9
6.8
Sad
1.7
Surprised 7.5
4.2
0
2.5
8.7
5.2
17.5
8.7
0
52.6 5.2
0
1.8
1.8
1.8
5.6
0
81.1
Comparison with the State-of-the-Art
In the case of RAVDESS database, it was a challenging task to find many studies to be compared, because of its novelty in the field of SER versus other databases like EMO-DB, IEMOCAP, etc. Table 4 shows the comparison of our proposed approach with existing literature works. Zhen-Tao et al. [18], presented a SER system based on SVM with RBF kernel, they obtained an accuracy of 63.82% using formants only. In [19], the authors proposed a real-time application based on an ensemble learning classifier. Using a collection of 222 prosodic and spectral features, they reported a recognition rate of 67.19%. Ancilin et al. [20] proposed the use of Mel Frequency Magnitude
Improving SER System Using Spectral and Prosodic Features
407
Table 4. Performance comparison with existing works for RAVDESS database. Reference
Features
[18]
Formants
[19]
Spectral and prosodic features
222
Ensemble classifier 67.19%
[20]
MFMC
360
SVM
64.31%
103
SVM
69.67%
100
KNN
65.04%
Proposed models Spectral and prosodic features
Number Classifier of features 12
SVM
Test accuracy 63.82%
coefficients (MFMC) and compared them to MFCC, LPCC and Log Frequency Power Coefficients (LFPC) features. The highest accuracy was 64.31% using MFMC. In contrast, we reached in our work a higher performance of 69.67% compared to previous works. From these results, we can notice that a higher number of features does not necessarily lead to a best performance.
5
Conclusion
The study of emotions from speech signals is a challenging task because of the difficulty in their characterization. In this paper, a set of spectral and prosodic features has been extracted with their statistics before feeding SVM and KNN classifiers. The SFFS-based feature selection was used to collect the most significant features. The obtained results on RAVDESS database show that for SER, feature selection with SFFS provided an additional benefit in improving the accuracy of the proposed system, obtaining a test accuracy of 69.67% and 65.04% using SVM and KNN classifiers, respectively with a gain of 9, 95% and 5, 75% when comparing without SFFS. In addition, we compared the performance of the proposed system with other existing works, our best model outperforms them in terms of selected features and recognition rate. In the future, we anticipate improving the current system further by using other feature selection techniques. On the other hand, Deep Learning (DL) based approaches achieve considerable performance. This leads us to evaluate our approach using DL algorithms such as Convolutional Neural Networks (CNN) and Long Short Term Memory networks (LSTM). Acknowledgements. This work was supported by the Ministry of Higher Education, Scientific Research and Innovation, the Digital Development Agency (DDA) and the CNRST of Morocco (Alkhawarizmi/2020/01).
408
A. Chakhtouna et al.
References 1. Bahreini, K., Nadolski, R., Westera, W.: Towards real-time speech emotion recognition for affective e-learning. Educ. Inf. Technol. 21(5), 1367–1386 (2015). https:// doi.org/10.1007/s10639-015-9388-2 2. Abdel-Hamid, L., Shaker, N.H., Emara, I.: Analysis of linguistic and prosodic features of bilingual Arabic-English speakers for speech emotion recognition. IEEE Access 8, 72957–72970 (2020) ˜ M., DeliA, ˜ V., Karpov, A.: Call redistribution for a call center based on 3. BojaniA, speech emotion recognition. Appl. Sci. 10(13), 4653 (2020) 4. Shegokar, P., Sircar, P.: Continuous wavelet transform based speech emotion recognition. In: 2016 10th International Conference on Signal Processing and Communication Systems (ICSPCS), pp. 1–8 (2016) 5. Getahun, F., Kebede, M.: Emotion identification from spontaneous communication. In: 2016 12th International Conference on Signal-Image Technology InternetBased Systems (SITIS), pp. 151–158 (2016) 6. Sun, L., Fu, S., Wang, F.: Decision tree SVM model with fisher feature selection for speech emotion recognition. EURASIP J. Audio Speech Music. Process. 2019, 2 (2019) 7. Bhavan, A., Chauhan, P., Hitkul, Shah, R.R.: Bagged support vector machines for emotion recognition from speech. Knowl.-Based Syst. 184, 104886 (2019) 8. Podder, P., Khan, T.Z., Khan, M.H., Rahman, M.M.: Comparative performance analysis of hamming, hanning and blackman window. Int. J. Comput. Appl. 96(18) (2014) 9. Ak¸cay, M.B., O˘ guz, K.: Speech emotion recognition: emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Commun. 116, 56–76 (2020) 10. McKay, C., Fujinaga, I., Depalle, P.: jAudio: a feature extraction library. In: Proceedings of the International Conference on Music Information Retrieval, pp. 600-3 (2005) 11. Park, C.-H., Sim, K.-B.: Emotion recognition and acoustic analysis from speech signal. In: Proceedings of the International Joint Conference on Neural Networks, 2003, vol. 4, pp. 2594–2598. IEEE (2003) 12. Dave, N.: Feature extraction methods LPC, PLP and MFCC in speech recognition. Int. J. Adv. Res. Eng. Technol. 1(6), 1–4 (2013) 13. Makhoul, J.: Linear prediction: a tutorial review. Proc. IEEE 63(4), 561–580 (1975) 14. McAdams, S.: Perspectives on the contribution of timbre to musical structure. Comput. Music. J. 23(3), 85–102 (1999) 15. Aparna, U., Paul, S.: Feature selection and extraction in data mining. In: 2016 Online International Conference on Green Engineering and Technologies (ICGET), pp. 1–3. IEEE (2016) 16. Ferri, F.J., Pudil, P., Hatef, M., Kittler, J.: Comparative study of techniques for large-scale feature selection. In: Machine Intelligence and Pattern Recognition, vol. 16, pp. 403–413. Elsevier (1994) 17. Bandela, S.R., Kishore, K.T.: Speech emotion recognition using semi-NMF feature optimization. Turk. J. Electr. Eng. Comput. Sci. 27(5), 3741–3757 (2019) 18. Liu, Z.-T., Rehman, A., Wu, M., Cao, W.-H., Hao, M.: Speech emotion recognition based on formant characteristics feature extraction and phoneme type convergence. Inf. Sci. 563, 309–325 (2021)
Improving SER System Using Spectral and Prosodic Features
409
19. Deusi, J.S., Popa, E.I.: An investigation of the accuracy of real time speech emotion recognition. In: Bramer, M., Petridis, M. (eds.) SGAI 2019. LNCS (LNAI), vol. 11927, pp. 336–349. Springer, Cham (2019). https://doi.org/10.1007/978-3-03034885-4 26 20. Ancilin, J., Milton, A.: Improved speech emotion recognition with Mel frequency magnitude coefficient. Appl. Acoust. 179, 108046 (2021)
Spare Parts Sales Forecasting for Mining Equipment: Methods Analysis and Evaluation Egor Nikitin1(B) , Alexey Kashevnik2 , and Nikolay Shilov2 1 ITMO University, Saint-Petersburg 197001, Russia
[email protected]
2 SPC RAS, Saint-Petersburg 199178, Russia
{alexey.kashevnik,nick}@iias.spb.su
Abstract. The work is devoted to finding the optimal solution for predicting the sales of spare parts for a supplier of mining equipment. Various methods and approaches were analyzed such as Croston’s method, zero forecast, naive forecast, moving average forecast to forecasting sales of commodity items with variable demand. The market of commercial offers on this topic was studied, conclusions were drawn regarding which method is best suited to solve the problem, and a forecast model was also built. As a result, software was developed to create correlation matrices that allow you to select relevant positions for further training and building forecasting models. As a result of the experiments, a correlation matrix was built, with the help of which it is possible to determine goods dependent on each other. The models are currently being finalized, since they can make a forecast with high accuracy only for those positions whose sales were at least ten for a specific period.
1 Introduction The problem of demand forecast has been attracting significant attention of the business since an accurate forecast can significantly reduce costs related to stock and warehousing as well as the number of lost deals related to the absence of the required goods. This demand in turn causes significant attention in the scientific community. Most of the works are aiming at demand forecast for mass sales, however, it is also extremely important to forecast very volatile sales (when a product is sold once in few months) of expensive products. Equipment and spare parts for mining and metallurgy industries fall into this case. Some specifics of such sales are as follows: • High stock costs per product. It is very expensive to store such products in a warehouse for a long time. • High losses due to most deals. Since the price of a product is high, losing even one deal due to the absence of the required product in a warehouse can be a significant loss for the company. • Availability of little statistical data on the sales of such products.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 410–420, 2022. https://doi.org/10.1007/978-3-030-96308-8_38
Spare Parts Sales Forecasting for Mining Equipment
411
As a result, expensive goods in mining industry are characterized by small amount of sales. The purpose of this research work is to analyze the methods and tools for forecasting sales of the mining and metallurgical industry equipment and spare parts, which are sold rarely, and to propose and develop a proprietary forecasting method. The object of the study is anonymized data on sales of the above-mentioned products. The relevance of the study lies in the novelty of the use of methods for forecasting volatile demand. There are almost no works in this area. The overwhelming majority of research addresses constant demand or volatile demand but with a significant amount of sales. For example, most of the articles studied are devoted to forecasting sales in the field of defense and medical complexes (e.g., [1, 2]). The problem of forecasting volatile demand has arisen as a result of the rapid growth of engineering enterprises. Existing forecasting methods do not allow for a reasonable estimate of the number of necessary components or spare parts, and therefore warehouse and logistics costs increased. The structure of the paper is as follows. In Sect. 2 we present the related work on the topic of spare parts sales forecasting. We presented data analysis and processing in Sect. 3. The conclusion summarizes the paper.
2 Related Work The research is based on the analysis of scientific articles and publications on a given topic. Interestingly, most studies use such methods and approaches as “Weibull distribution”, “Poisson distribution”, “zero forecast and naive forecast”, “moving average forecast”, “exponential smoothing forecast”. The presented analysis is based on Croston’s method as the basis for forecasting volatile demand since it is designed specifically for dealing with unstable and rare data. The implementation of this method consists of two steps. First, separate exponential smoothing estimates are made for the average demand. Second, the average interval between demands is calculated, which is then used as a model to predict future demand. Table 1 compares available scientific articles. In addition, existing commercial solutions were also analyzed. Four main commercial solutions were chosen, specializing in the field of both constant and non-constant demand (Table 2). A total of 15 articles were analyzed. In paper [3] a new Bayesian method based on composite Poisson distributions is proposed and compared to the Poisson-based Bayesian method with a Gamma prior distribution as well as to a parametric frequentist method and to a non-parametric one. Accuracy 80–88%. In paper [4] methods and models such as parametric demand model (DMF) function, c-means clustering, Dombi’s kappa function, linear transformations, and standardization of adapted demand models (SDM) are used. The proposed forecasting method uses a knowledge discovery-based approach that is built upon the combined application of analytic and soft computational techniques and is able to indicate the turning points of the purchase life-cycle curve. Accuracy 90–93%. Paper [5] investigates the problem of demand forecast for repaired spare parts for aircraft. They analyze the factors affecting the demand for spare parts to be repaired,
412
E. Nikitin et al.
and then combine the five types of forecasting models to create a two-tier combined forecasting model. The model was trained on 30 thousand positions and was able to make predictions with the accuracy 86,7%. Paper [6] outlines a methodology for using simulation to predict the failure of military equipment. Authors used Weibull distribution and Poisson distribution as main distributions. They also developed four scenarios based on the location of the tank and its surroundings. Thus, the accuracy depends on the chosen scenario. Paper [7] includes methods and models such as gamma distributions for simulating magnification degradation rate during each period and PHM (Prognostics and Health Monitoring) system. This paper aims at presenting a novel spare parts inventory control model for non-repairable items with periodic review. Accuracy 84%. The authors of the paper [8] try to answer the question «Why is the demand for spare parts intermittent and how can we use models developed in maintenance research to forecast such demand?». They use Bernoulli process, Croston’s method, Weibull distribution, exponential smoothing. Accuracy 75%. Paper [9] is dedicated to a new forecasting method that takes into account not only the demand for spare parts, but also the type of component being repaired. This two-step forecasting method separately updates the average number of parts needed per repair and the number of repairs for each type of component. Accuracy 80%. Paper [10] is an overview of existing methods and distributions. Accuracy 70–90% (depending on the method). The authors of paper [11] develop a hybrid forecasting approach, which can synthetically evaluate autocorrelation of demand time series and the relationship of explanatory variables with demand of spare part. Accuracy 36–72%. Paper [12] presents the simulation results of parts demand forecasting and inventory management to select the best policies for each category based on six years of demand data and includes three alternatives for recording demand data, three demand forecasting models, and six demand distribution models. Accuracy 80–90%. The authors of paper [13] develop a method to forecast the demand of these spare parts by linking it to the service maintenance policy using information from active installed base. By tracking the active installed base and estimating the part failure behaviour, authors provide a forecast of the distribution of the future spare parts demand during the upcoming lead time. Accuracy 89,1–98,69%. Paper [14] proposes a set of installed base concepts with associated simple empirical forecasting methods that can be applied in practice for B2C spare parts supply management during the end-of-life phase of consumer products. Accuracy 75%. The authors of [15] try to determine the characteristics of demand for spare parts that affect the relative efficiency of alternative forecasting methods by developing a logistic regression classification model to predict the relative efficiency of alternative forecasting methods. Accuracy 65%. The authors of [16] improve the empirical method by applying extreme value theory to model the tail of the demand distribution, comparing it with the WSS, Croston and SBA methods for a range of demand distributions. Accuracy 82%. The authors of [17] create their own approach to predict the demand distribution for a fixed lead time using a new type of time series bootstrap. Accuracy 95%.
Spare Parts Sales Forecasting for Mining Equipment
413
When talking about commercial solutions, we have come to the conclusion that Lokad is not suitable for forecasting of mining equipment with a non-constant demand system. It mostly uses probabilistic forecasting, optimization of purchases and Scadian stocks. ForecastNOW! is very similar to Lokad, but sometimes more expensive. The most popular used instruments are automatic accounting for order restrictions and for sales of analog and substitute goods. 1C is a whole complex of different systems and it is not possible to use only the part that is responsible for forecasting. This makes 1C unsuitable for further use. Using the trial period, it was found that Novo Forecast Enterprise is the most suitable for analyzing the available data, since it basically uses the same Croston method and has an Excel add-in. Table 1. Scientific papers Paper
Approach
Methods
Quality indicators
A compound-Poisson Compound Poisson Bayesian approach Bayesian (CPB) method for spare parts inventory forecasting [3]
A new Bayesian method based on composite Poisson distributions, negative binomial distribution (NBD)
Smoothed Mean Absolute Deviation (MAD)
Modeling and long-term forecasting demand in spare parts logistics businesses [4]
A parametric demand model (DMF) function is fitted for each complete historical time series of time demand for end-of-life spare parts. Demand models (DMs) of the life cycle curves for the purchase of end-of-life parts are functions of the fitted demand model. Demand models are converted to standardized demand models (SDMs)
Linear transformations, standardization of adapted demand models (SDM)
C-means clustering, Dombi’s kappa function
A double-level combination approach for demand forecasting of repairable airplane spare parts based on turnover data [5]
Double-level combination forecast, genetic neural network, exponential smoothing, grey model, cubic exponential smoothing model
Poisson distribution, exponential distribution
Mean absolute error (MAE), mean square error (MSE), mean absolute percentage error (MAPE)
A simulation-based optimization approach for spare parts forecasting and selective maintenance [6]
Genetic algorithm, failure simulation
Weibull distribution, Poisson distribution
N/D
(continued)
414
E. Nikitin et al. Table 1. (continued)
Paper
Approach
Methods
Quality indicators
A spare parts inventory control model based on Prognostics and Health monitoring data under a fill rate constraint [7]
PHM (Prognostics and Health Monitoring) system monitors the level component degradation and provides at the beginning of each period a forecast for the RUL of each component
Gamma distributions for simulating magnification degradation rate during each period
Prediction error (difference between the expected value of the RUL distribution and the component failure time) and the prediction variance (the variance of the RUL distribution)
Spare parts demand: Linking forecasting to equipment maintenance [8]
Simulation research, time Bernoulli process, delay model of Croston’s method, probabilities failures and Weibull distribution replacement checks, time series forecasting model
Exponential smoothing
A two-step method for forecasting spare parts demand using information on component repairs [9]
First step: forecasting for each type of component the number of repairs per unit of time for that component and the number of spare parts (of the type in question) required to repair this component. In a second step, these forecasts are combined to forecast the total demand for spare parts
Zero Forecast (ZF) and Naive Forecast (NF), Croston Forecast Method (CR)
Moving Average Forecast (MA), Exponential Smoothing Forecast (ES), Synthetos-Boylan Approximation (SBA)
The development of a hierarchical forecasting method for predicting spare parts demand in the South Korean Navy [10]
No method of its own. This is an overview of the existing ones
TDF - top-down forecasting; DF - direct forecasting; CF Combinatorial Forecasting; MA analytical research; ME empirical research; S modeling; MAD - wed. absolute deviation;
MA - autoregressive (integrated) moving average model, MPE wed. percentage error; MAPE - Wed absolute percentage error
A hybrid support vector machines and logistic regression approach for forecasting intermittent demand of spare parts [11]
Adaptation of the support SVM, Logistic vector machine (SVM) to regression, Poisson predict occurrence of model, Croston methods non-zero demand for spare parts and the development of a hybrid mechanism for integrating the results of the SVM forecast and the relationship of the occurrence of non-zero demand with independent variables
Mean absolute percentage error (MAPE)
(continued)
Spare Parts Sales Forecasting for Mining Equipment
415
Table 1. (continued) Paper
Approach
Methods
Quality indicators
Demand forecasting and inventory control: A simulation study on automotive spare parts [12]
SMA - Simple Moving Average, SBA - Syntetos – Boylan Approximation and Bootstrapping, approximation of Synthetos - Boylan
Negative binomial distribution, compound Poisson normal, compound Poisson gamma and bootstrapping, Krever approach, Poisson distribution
Mean Square Error (MSE)
Forecasting spare part demand using service maintenance information [13]
A method for forecasting Poisson Binomial the demand for spare Distribution parts using information on maintenance operations
Synthetos-Boylan Approximation Method (SBA), Smoothing Exponential (SES), Lognormal Distribution, Weibull Distribution, Mean Error (ME) and Root Mean Squared Error (RMSE)
Spare part demand forecasting for consumer goods using installed base information [14]
A set of five models (AR, Black box methods IBL, IBW, IBE, and IBM) that can be used to predict the demand for spare parts
Autoregression (AR), least squares, exponentially weighted moving average (EWMA), absolute prediction error (MAPE), mean square prediction error (RMSPE)
Classification model predicting performance of forecasting methods for naval spare parts demand [15]
Combinatorial Forecasting (CF) based on TDF and DF (Downward Forecasting)
Classification using logistic regression, Brier model estimation
Mean absolute deviation (MAD), root mean square error (RMSE), robust direct forecast (RDF)
An improved method to forecast spare parts demand using extreme value theory [16]
Extreme value theory (the behavior of many uncertain quantities is modeled using the Generalized Pareto Distribution (GPD))
Croston method, WSS (method from another article)
Synthetos-Boylan approximation (SBA)
A new approach to forecast intermittent demand for service parts inventories [17]
Augmented bootstrap version
Croston’s method„ bootstrap
Exponential smoothing, mean absolute percentage error (MAPE)
416
E. Nikitin et al. Table 2. Commercially available tools
Name
Cost
Functionality
Segment
Lokad [18]
2500–10000$ (depends on the industry)
Probabilistic forecasting, optimization of purchases and Scadian stocks, data import and export, scripting (proprietary Envision language), scaling, integration, co-creation (expert help)
Small and medium businesses, aviation, fashion, retail, manufacturing, watches and jewelry, fresh food
Novo Forecast Enterprise [19]
From 1500$
Calculation of detailed Retailers, food, mathematical clothing and more forecasts/forecast adjustment by internal and external factors/joint planning of promotions, new products, listings, tenders/calculation of optimal orders/surplus management/analysis of forecasts, plans and forecasting factors
ForecastNOW! [20]
45–205 thousand. rub/month
Automatic accounting Position themselves as for order restrictions, for everyone sales of analogues and substitute goods, restoration of demand at times of shortage, accounting for storage areas for goods, calculation of the optimal level of service, forecasting the demand for assortment
1C [21]
Depends on Taking into account the Position themselves as configuration (from 10 seasonality of demand, for everyone thousand and more) forecasting new products, the choice of forecasting method, data preparation and identification of “statistical outliers
Spare Parts Sales Forecasting for Mining Equipment
417
The research made it clear that for a better forecast, each time interval must be considered. This means that the existing data needs to be refined since in the absence of sales for the day/week/month, zero is not set, but the period itself is simply skipped.
3 Data Analysis and Processing We have analyzed spare parts sales data from the mining company in the period 2 years. The data includes: deal number, product type, deal status, customer number, source, opening and closing dates of the deal, product, quantity of goods sold. Further work was carried out with the provided data: cleaning, sorting, filtering, etc. For the convenience of the primary analysis, a pivot table was built, in which only those goods were taken into account for which a deal had already taken place or was concluded. We also took goods whose units of measurement can be comparable and suitable for comparison. Products were selected for which there is a minimum sales statistics for at least ten sales. In the development environment “Jupyter notebook”, a program code was written, and the algorithm of the code is as follows: reading data from a file, data filtering (we have only worked with pieces as units for now), data processing and selection of the required parameters, building a correlation matrix and data visualization based on monthly sales data. Further, all data on sales were summed up, as indicated earlier, and a matrix of correlations (Fig. 1, left) between products was built. Products are shown horizontally and vertically as their unique identifiers. The main task of constructing a correlation matrix was to find those goods, sales of which depend on sales of other products. When constructing a correlation matrix, the main characteristic of comparison is the amount of purchased difference between the expected value of the RUL distribution and the component failure time.
Fig. 1. Product correlation matrix (left) and correlation graph between products (right).
The graph shows that there is both positive and negative correlation. Positive is the direct relationship of sales, and negative is the reverse. Moreover, the largest number of pairs is with a negative correlation. Which, with a high degree of probability, indicates that there is such a product, with an increase in demand for which the demand for another product falls.
418
E. Nikitin et al.
Then, to make sure that the methods work correctly, the most correlated products were selected and a graph was plotted for them (Fig. 2). The x-axis shows time intervals, and the y-axis shows the quantity of goods sold. The graph shows that there is some relationship between the two, as shown in the matrix. Since the dependency ratio is not equal to one, it can be seen that there are some deviations between sales. The next step was to build a correlation matrix for customers, taking one product as a basis. Unfortunately, it turned out that there was not enough data to build such a matrix. It turned out that the maximum number of sales of one product to one customer is not more than three. Based on the data obtained we study the dependence of the buyer and the product in more depth for errors, compare the sales plan and real sales, as well as relationships between the installed bases (the machines themselves that consume the purchased parts). Then we calculated the correlation for not two, but three elements. To do this, we used the standard partial correlation formula.
Fig. 2. Product sales comparison chart with a correlation coefficient 0.8 or higher.
r123 =
r12 − r13 r23 2 − 1 − r2 1 − r13 23
We also used nested loops to count all possible pairs. As a result, we got “triplets” (correlation coefficient within one set with all possible permutations) presented in Fig. 3.
Fig. 3. Product sales comparison chart with a correlation coefficient 0.8 or higher. Two threes. The first value is the correlation coefficient. Others are unique product identifiers
Spare Parts Sales Forecasting for Mining Equipment
419
4 Conclusion During the analysis of the data obtained, it was determined that most commercial solutions are not suitable for predicting rare sales. It was also noted that having more than two thousand commodity items, only 50 came up for analysis since they had ten or more sales, as well as more than 30 sold pieces of these goods. Nevertheless, based on these results, it is planned to build a model for forecasting volatile demand, which will be refined in the future. In this work, a proprietary method for analyzing rare sales was proposed, which, as described above, allows you to study the demand for each of the goods, build a correlation matrix to study the dependencies between several goods, as well as graphs for a more visual interpretation. Working with data on rare sales, it turned out that most of the available tools are not suitable for analysis, since there is very little information about sales and due to unloading errors and incomplete sales data you have to filter all data that is inappropriate for analysis. Filtering is necessary for a more descriptive, representative sample, without any errors, such as an empty customer field, an incorrect date, etc. In order to be able to check the correctness of the calculation algorithms, random values were generated that satisfy the condition of rare sales. When there is little data, then the prediction accuracy is very low, but the experiment showed that even with such a set of data, it is possible to build a fairly accurate forecast. In the future, it is planned to work out the already existing algorithms on a larger amount of data, develop models and improve its prediction accuracy using factors such as seasonality, nat. holidays, etc. Another challenge for the future is the development of a dynamic calculation of the correlation coefficient without reference to the number of analyzed objects. Acknowledgments. The research has been supported by the Russian State Research # FFZF2022-0005.
References 1. Lukinskiy, V., Lukinskiy, V., Strimovskaya, A.: Assessment of inventory indicators for nomenclature groups with rare demand. In: Kabashkin, I., Yatskiv Jackiva, I., Prentkovskis, O. (eds.) RelStat 2018. LNNS, vol. 68, pp. 121–129. Springer, Cham (2018). https://doi.org/10.1007/ 978-3-030-12450-2_11 2. Park, M., Baek, J.: Demand forecast of spare parts for low consumption with unclear pattern. J. Korea Inst. Military Sci. Technol. 21(4), 529–540 (2018) 3. Babai, M.Z., Chen, H., Syntetos, A.A., Lengu, D.: A compound-Poisson Bayesian approach for spare parts inventory forecasting. Int. J. Prod. Econ. 232, 108954 (2021) 4. Dombi, J., Jónás, T., Tóth, Z.E.: Modeling and long-term forecasting demand in spare parts logistics businesses. Int. J. Prod. Econ. 201, 1–17 (2018) 5. Guo, F., Diao, J., Zhao, Q., Wang, D., Sun, Q.: A double-level combination approach for demand forecasting of repairable airplane spare parts based on turnover data. Comput. Ind. Eng. 110, 92–108 (2017) 6. Pankaj Sharma, P., Kulkarni, M.S., Yadav, V.: A simulation based optimization approach for spare parts forecasting and selective maintenance. Reliab. Eng. Syst. Safety 168, 274–289 (2017)
420
E. Nikitin et al.
7. Rodrigues, L.R., Yoneyama, T.: A spare parts inventory control model based on Prognostics and Health monitoring data under a fill rate constraint. Comput. Indust. Eng. 148, 106724 (2020) 8. Wang, W., Syntetos, A.A.: Spare parts demand: linking forecasting to equipment maintenance. Transp. Res. Part E Logist. Transp. Rev. 47, 1194–1209 (2021) 9. Romeijnders, W., Teunter, R., Jaarsveld, W.V.: A two-step method for forecasting spare parts demand using information on component repairs. Eur. J. Oper. Res. 220, 386–393 (2020) 10. Moon, S., Hicks, C., Simpson, A.: The development of a hierarchical forecasting method for predicting spare parts demand in the South Korean Navy—a case study. Int. J. Prod. Econ. 140, 794–802 (2012) 11. Hua, Z., Zhang, B.: A hybrid support vector machines and logistic regression approach for forecasting intermittent demand of spare parts. Appl. Math. Comput. 181, 1035–1048 (2016) 12. Rego, J.R., Mesquita, M.A.: Demand forecasting and inventory control: a simulation study on automotive spare parts. Int. J. Prod. Econ. 161, 1–16 (2015) 13. Auweraer, S.V., Boute, R.: Forecasting spare part demand using service maintenance information. Int. J. Prod. Econ. 213, 138–139 (2019) 14. Kim, T.Y., Dekker, R., Heij, C.: Spare part demand forecasting for consumer goods using installed base information. Comput. Ind. Eng. 103, 201–215 (2017) 15. Moon, S., Simpson, A., Hicks, C.: The development of a classification model for predicting the performance of forecasting methods for naval spare parts demand. Int. J. Prod. Econ. 143, 449–454 (2013) 16. Zhu, S., Dekker, R., Jaarsveld, W.V., Renjie, R.W., Koning, A.J.: An improved method for forecasting spare parts demand using extreme value theory. Eur. J. Oper. Res. 261, 169–181 (2017) 17. Willemain, T.R., Smart, C.N., Schwarz, H.F.: A new approach to forecasting intermittent demand for service parts inventories. Int. J. Forecast. 20, 375–387 (2004) 18. Lokad. https://www.lokad.com/ru/ 19. Novo Forecast. https://4analytics.ru/glavnoe/programmi-dlya-prognozirovaniya.html 20. 1ForecastNOW!, https://fnow.ru/ 21. 1C.Enterprise. https://v8.1c.ru/metod/article/prognozirovanie-sprosa.html
Data-Centric Approach to Hepatitis C Virus Severity Prediction Aniket Sharma, Ashok Arora, Anuj Gupta, and Pramod Kumar Singh(B) Computational Intelligence and Data Mining Research Laboratory, ABV-Indian Institute of Information Technology and Management, Gwalior, MP, India {bcs_2019008,bcs_2019075,img_2019009,pksingh}@iiitm.ac.in
Abstract. Every year, around 1.5 million of the world population succumbs to the Hepatitis C Virus. 70% of these cases develop chronic infection and cirrhosis within the next 20 years. Because there is no effective treatment for HCV, it is critical to predicting the virus in its early stages. The study’s goal is to define a data-driven approach for accurately detecting HCV severity in patients. Our approach achieves the highest accuracy of 86.79% compared to 70.89% using the standard approach. Keywords: Hepatitis C virus · Artificial intelligence · Machine learning · Filter-based statistical feature selection · Cross-validation
1 Introduction The Hepatitis C Virus (HCV) causes hepatitis C, a liver illness. It is spread by blood contact with a previously sick person [11]. This chronic virus infects an estimated 58 million people globally, with 1.5 million new infections happening each year [20]. Nowadays, most people get hepatitis C via exchanging needles or other forms of injection equipment. Hepatitis C is a short-term sickness for some people, but it becomes a long-term, chronic condition for more than half of those who get the virus [6]. Cirrhosis and liver cancer are severe and fatal side consequences of chronic hepatitis C. Patients with chronic hepatitis C usually have no symptoms and do not feel sick. The symptoms that do emerge are frequently indicative of severe liver disease. Hepatitis C has no vaccine or effective treatment. Avoiding behaviors that may spread the disease is the most effective way to avoid hepatitis C. It is essential to get tested for hepatitis C since most persons with the illness at an early stage may be cured with the appropriate treatment [10, 25]. There are several methods for predicting HCV and HCV severity. The fundamental goal of these techniques is to improve the machine learning algorithm to obtain better performance. In this work, we propose a data-centric method to find patterns among features, redundant features, relevant features that provide the most information for predictions, and irrelevant features to the task at hand. We have classified HCV severity into three phases, from mild to severe, and focus on predicting the proper stage for HCV-positive individuals. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 421–431, 2022. https://doi.org/10.1007/978-3-030-96308-8_39
422
A. Sharma et al.
The rest of this paper is organized as follows. We briefly recall the relevant researches in Sect. 2. A detailed description of the proposed method is presented in Sect. 3. Section 4 describes the experimental results. Finally, Sect. 5 summarizes the conclusions, and future scope of the proposed method.
2 Literature Review Hashem et al. [13] explore the connection between features and the target using Pearson’s correlation coefficient and the P-value, a measure of a hypothesis’s statistical significance. However, the relevance of a feature for a prediction task cannot be determined using P-value and is not the appropriate statistical measure for the task [22]. Suwardika [23] focuses on the binary classification of infected and non-infected patients. However, missing values in the data were replaced with the mean without performing any data analysis. Syafa’ah et al. [24] assess models by dividing the data into training and test sets of 80% and 20% composition, respectively. Nevertheless, this is not an accurate assessment criterion for a small dataset of approximately 70 records since it is prone to data variance and can overestimate model performance [5]. Akella and Akella [1] use random oversampling to remove the class imbalance between binary classes. Even though the dataset only comprises 1385 samples, the authors use a 70–30% train test split. The authors perform binary classification of diagnosing the presence of HCV even though the problem specified is severity prediction which is a multi-class problem. Ozer [21] utilizes Recurrent Neural Network (RNN) and 10-fold cross-validation to obtain 97.72% accuracy. Despite the excellent accuracy, the recall score, an important parameter in medical machine learning applications, is relatively low. Hoffmann et al. [14] use decision trees to obtain a 75% accuracy but do not consider precision or recall. Li et al. [16] split the dataset in half for training and testing, which should not be used as a validation criterion for a dataset with just 920 entries. Trishna et al. [26] and Hashem et al. [13] use no data balancing technique despite the data being highly unbalanced. Akyol and Gultepe [2] employ random undersampling to balance the data.
3 Methodology 3.1 Data Description This research makes use of the HCV data introduced in [18] and made available by the University of California, Irvine [17]. The dataset consists of 615 samples of 540 healthy individuals and 75 patients with diagnosed Hepatitis C Virus. As our aim is severity prediction among HCV-positive patients, only data with 75 diagnosed patients is considered. Out of these 75 patients, 53 are male, and 22 are female with ages between 19 and 75 years, and a mean age of 48. The data consists of 19 rows with missing values where ALB, ALP, ALT, CHOL, and PROT have 1, 18, 1, 3, and 1 missing values, respectively. The data is balanced as it consists of 24 samples of low severity, 21 samples of Fibrosis (high severity) and 30 samples of Cirrhosis (extremely high severity).
Data-Centric Approach to Hepatitis C Virus Severity Prediction
423
3.2 Approach We follow a data-centric machine learning approach, in which, in addition to data preprocessing, we aim to find, if any, redundant features that provide no information and thus increase training time without much affecting the model performance, irrelevant features which affect model performance negatively, and relevant features which are beneficial for the severity prediction task. This approach is depicted visually in Fig. 1.
Fig. 1. Visual depiction of the data-centric approach followed
424
A. Sharma et al.
To fill the missing values in the data, we first observe the nature of the data. The ALB was found to be right-skewed, ALP and ALT are left-skewed, and CHOL and PROT are Gaussian-like, as shown in Fig. 2. The missing values of the skewed features, namely ALB, ALP, and ALT, were replaced with the median, and that of the Gaussian-like features, namely CHOL and PROT, are filled with the mean. It helped in ensuring we did not change the nature of the features. We also perform min-max scaling on the data for standardization, i.e., ensure values of all the features between 0 and 1.
Fig. 2. Frequency plot of variables with missing value to check skewness (bins = 20)
To find redundancy among features, we use two statistical correlation coefficients, Pearson’s correlation coefficient for determining linear relationships between features and Spearman’s correlation coefficient for determining non-linear relationships [3]. Figure 3 plots the correlation among the feature observed using both Pearson’s and Spearman’s correlation coefficients. In general, features with the absolute coefficient value close to 1, represented with red in the figure, are considered related to each other, and one of these features is dropped to remove redundancy. As evident from the plot, none of the features are redundant, and thus no feature is removed. To discover features irrelevant for the task, we calculate and compare the ANOVA F-value [15] and Kendall’s τ coefficient [4] for each feature with the target, i.e., the severity of HCV. ANOVA F-value has been used to find linear relationships between features and target. The higher the ANOVA F-value, the more influential the feature is for predicting the target. Kendall’s τ coefficient is used for finding the non-linear relationships between features and target. The closeness of the coefficient value to 1 or −1 shows its importance. ANOVA F-value and Kendall’s τ coefficient are ideal for using with only numerical features [4]. Since there is only one categorical feature, ‘Sex’, we experiment with it individually. Table 1 shows the ANOVA F-value and Kendall’s τ coefficient for the numerical features in the dataset.
Data-Centric Approach to Hepatitis C Virus Severity Prediction
425
Fig. 3. Feature-feature correlation plots
Table 1. Correlation of features with target value. Kendall’s τ coefficient
Features
ANOVA F-value
Age
15.299
0.377
ALB
43.960
−0.598
ALP
06.678
0.410
ALT
04.878
−0.217
AST
01.842
0.217
BIL
08.920
0.350
CHE
48.276
−0.580
CHOL
06.291
−0.300
CREA
02.095
0.003
GGT
01.386
0.186
PORT
05.837
−0.191
We, thus, perform three experiments based on which we decide which features are irrelevant for the problem. In each experiment, we train multiple standard machine learning models. We train the models two times after removing the only categorical feature ‘Sex’ and keeping it for the first experiment. We remove features one by one from the lowest to the highest value using the ANOVA F-value for the second experiment. Similarly, for the third experiment, we use the absolute value of Kendall’s τ coefficient. All the statistical methods described have been chosen after extensive research and experimentation on the data.
426
A. Sharma et al.
3.3 Parametric Description of Models We perform the above-described experiments using seven standard machine learning algorithms. A detailed parametric description of each of these models is described below. Logistic Regression. It is a basic supervised learning algorithm that models the probability of occurrence of each class. Since our problem is multiclass, we use the one-vs-rest strategy for classification. We use elastic net regularization with a regularization strength of 0.8 and an L1 and L2 ratio of 0.5 [27]. k-Nearest Neighbor Algorithm. It is a classification algorithm in which an object is assigned to a class based on the majority vote among the nearest k neighbors [9]. Ball Tree algorithm [19] has been used to compute the nearest neighbors using the Minkowski distance of order 4. The optimal value of the number of neighbors (k) is highly datadependent and is 10 for the HCV data. Gaussian Naive Bayes. It is one of the many naive Bayes classification algorithms. We have used a standard implementation of the Gaussian naive Bayes algorithm with variance smoothing of 10–10 for increased calculation stability. Decision Tree. It is a supervised learning algorithm that creates a flowchart-like tree where each split is based on a decision made regarding a feature. The quality of each split is measured using the information gain, and the best random split is taken at each node. At least 40% of the samples are required to split a node. Random Forest. It is an ensemble of decision trees where bootstrap samples have been taken for building each tree. We have used 50 trees to create this ensemble, where each tree uses Gini impurity to measure the quality of a split. At least five samples are required to split a node, and two samples are needed at the leaf. We have considered at maximum the square root of the total number of features for each split. Support Vector Machine. It is a robust supervised learning algorithm [7]. We use a polynomial kernel function with degree 2 and kernel coefficient that is inversely proportional to the number of features. We are also using the L2 regularization penalty with the parameter value of 0.1. Multi-layer Perceptron with Five Layers. Here, all three hidden layers consist of 32 nodes, and Rectified Linear Unit (ReLU) is used as an activation function. The model is trained for 500 epochs of the dataset.
3.4 Performance Evaluation We evaluate each of the models using four metrics. Classification accuracy is evaluated to measure the closeness of predictions with the actual values. Precision, which is the fraction of relevant instances among retrieved instances, and recall, which is the fraction of relevant instances that were retrieved, are also calculated since classification accuracy can be sometimes misleading if the number of observations in each class is unequal or
Data-Centric Approach to Hepatitis C Virus Severity Prediction
427
if there are more than two classes in the dataset [12]. F1 score, the harmonic mean of precision and recall, is also calculated to measure the degree of precision-recall tradeoff. In order to get a high F1 score, both precision and recall should be high. Since the dataset is small, we use 10-fold stratified cross-validation [8] for model evaluation as it helps curb the overestimation of model performance.
4 Results and Discussion Table 2 shows the model classification accuracy for checking the relevance of the feature ‘Sex’. Most of the models, except Gaussian naive Bayes and Multi-layer perception, obtain better accuracy after removing the feature ‘Sex’; it shows that the gender is irrelevant and adversely affects the results. Therefore, we can remove it from our dataset. Table 2. Accuracy (in %) of models with and without the attribute ‘Sex’. Classifier
Keeping ‘Sex’
Removing ‘Sex’
Decision tree
57.14
62.50
Gaussian naive Bayes
70.89
69.64
Random forest
67.85
76.07
Logistic regression
69.28
76.43
k-nearest neighbor
66.96
76.42
Support vector machine
64.10
76.07
Mutli-layer perceptron
61.25
58.21
Table 3. Accuracy (in %) of models for different ANOVA F-values. Here, DT, GNB, RF, LR, KNN, SVM, and MLP refers to Decision Tree, Gaussian naive Bayes, Random Forest, Logistic Regression, k-nearest neighbor, Support Vector Machine, and Mutli-layer Perceptron, respectively. Classifier 0.00
1.39
1.84
2.09
4.88
5.84
6.29
6.68
8.92
15.30 43.96
DT
62.50 70.71 61.60 55.53 67.67 65.50 64.28 62.50 64.11 62.50 55.53
GNB
70.89 69.64 72.32 75.00 72.14 74.64 77.32 73.57 72.32 67.14 67.14
RF
67.85 76.07 70.89 77.50 74.64 74.64 76.07 73.32 72.14 96.10 57.86
LR
69.28 76.43 78.92 77.50 73.57 73.57 70.89 70.89 70.89 61.25 61.25
KNN
66.96 76.42 76.60 79.10 78.75 75.00 75.00 69.82 73.75 68.75 58.75
SVM
64.10 76.07 76.25 77.32 73.39 76.43 73.57 73.57 72.50 65.71 66.25
MLP
61.25 58.21 74.82 66.60 63.92 51.60 59.82 54.46 63.92 59.96 58.92
Table 3 shows the classification accuracy after removing features for different ANOVA F-values such that all the features having ANOVA F-value less than and equal
428
A. Sharma et al.
to the value mentioned in the columns are removed. A feature having a lesser ANOVA F-value means it provides less information towards the result. ANOVA F-value of 1.84 and 2.09 both obtain good results for different models. However, we decided to take the lesser ANOVA F-value to not wrongly remove a feature that could have been beneficial for prediction. If the feature is irrelevant, it will be removed in the experiment with Kendall’s τ coefficient. Therefore, we only remove the features ‘GGT’ and ‘AST’ from our dataset and keep the feature ‘CREA’, with an ANOVA F-value of 2.09. Table 4 shows the classification accuracy for the absolute value of Kendall’s τ coefficient in an experiment similar to the ANOVA F-value. As evident from the table, Kendall’s τ value of 0.19 obtains the best accuracy for all the models. Based on this result, we conclude that the value 0.19 obtains the optimal performance, and features ‘CREA’ and ‘PROT’ should be removed. The feature ‘CREA’, which was kept during the ANOVA F-value evaluation, is removed here. Table 4. Accuracy (in %) of models for different values of Kendall’s τ coefficient. Here, DT, GNB, RF, LR, KNN, SVM, and MLP refers to Decision Tree, Gaussian naive Bayes, Random Forest, Logistic Regression, k-nearest neighbor, Support Vector Machine, and Mutli-layer Perceptron, respectively. Classifier
0.000
0.003
0.191
0.217
0.300
0.350
0.377
0.410
0.580
DT
61.60
55.53
74.82
65.00
64.28
62.50
61.96
62.50
40.17
GNB
72.32
75.00
78.75
74.64
77.32
73.39
63.92
67.14
62.14
RF
70.19
77.50
80.00
74.64
76.07
73.39
65.00
69.10
62.32
LR
78.92
77.50
80.35
73.57
70.89
70.89
61.25
61.25
65.50
KNN
76.60
79.10
81.60
75.00
75.00
74.82
66.42
68.75
65.35
SVM
76.25
77.32
81.78
76.42
73.57
73.39
68.21
65.71
61.42
MLP
74.82
66.67
86.78
51.60
59.82
70.71
60.53
59.96
62.68
A final evaluation is done (refer to Table 5) to compare the model performance without removing the features found irrelevant and after removing the features. We observe that with our filter-based statistical feature selection, all the models achieve better scores in all the parameters, the most important of which is the recall score as it is more important from a medical standpoint since it reduces false negatives.
Data-Centric Approach to Hepatitis C Virus Severity Prediction
429
Table 5. Comparison of performance after following the data-centric approach and without following it. Here, DT, GNB, RF, LR, KNN, SVM, and MLP refers to Decision Tree, Gaussian naive Bayes, Random Forest, Logistic Regression, k-nearest neighbor, Support Vector Machine, and Mutli-layer Perceptron, respectively. Similarly, Acc, Pre, Rec, and F are accuracy, precision, recall and F1 score, respectively. Classifier
Without using described approach
Using described approach
Acc
Acc
Pre
Rec
F
Pre
Rec
F
DT
70.89
0.73
0.70
0.69
78.75
0.84
0.78
0.77
GNB
57.14
0.39
0.51
0.43
74.82
0.78
0.75
0.74
RF
67.86
0.68
0.67
0.64
80.00
0.83
0.79
0.78
LR
69.29
0.68
0.66
0.64
80.35
0.81
0.78
0.78
KNN
66.96
0.74
0.68
0.65
81.61
0.86
0.82
0.81
SVM
64.11
0.55
0.61
0.55
81.78
0.85
0.81
0.79
MLP
61.25
0.58
0.59
0.56
86.79
0.86
0.86
0.84
5 Conclusion Our data-centric approach obtained better results using all the seven standard machine learning algorithms. Thus, it can be concluded that we can achieve good results, even on small datasets like the one we used, without increasing the complexity of the machine learning algorithms. We achieved the best accuracy of 86.79% for Multi-layer Perceptron, which was way better than the best accuracy of 70.89% achieved for the models not using it. We can also conclude that algorithmically focusing on features Age, Albumin, Alkaline Phosphatase, Alanine Transaminase, Bilirubin levels, Chronic Hepatitis, and Cholesterol is better to predict the severity of HCV positive patients. Further, in-depth research can provide a better picture of why features Sex, Protein, Creatinine, Gamma-glutamyl transferase, and Aspartate aminotransferase are not very useful for the prediction task. We used classic machine learning algorithms that are known to be reliable; future research can aim to test this approach using an ensemble of machine learning algorithms.
References 1. Akella, A., Akella, S.: Applying machine learning to evaluate for fibrosis in chronic hepatitis C (2020). https://doi.org/10.1101/2020.11.02.20224840 2. Akyol, K., Gultepe, Y.: A study on liver disease diagnosis based on assessing the importance of attributes. Int. J. Intell. Syst. Appl. 9(11), 1–9 (2017) 3. Brownlee, J.: How to calculate correlation between variables in python. Machine Learning Mastery. https://machinelearningmastery.com/how-to-use-correlation-to-understand-therelationship-between-variables/. Updated 20 Aug 2020 4. Brownlee, J.: How to choose a feature selection method for machine learning. Machine Learning Mastery. https://machinelearningmastery.com/feature-selection-with-real-and-cat egorical-data/. Updated 20 Aug 2020
430
A. Sharma et al.
5. Brownlee, J.: Train-test split for evaluating machine learning algorithms. Machine Learning Mastery. https://machinelearningmastery.com/train-test-split-for-evaluating-machine-lea rning-algorithms/. Updated 26 Aug 2020 6. Chen, S., Morgan, T.: The natural history of hepatitis C virus (HCV) infection. Int. J. Med. Sci. 3, 47–52 (2006) 7. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995) 8. Diamantidis, N., Karlis, D., Giakoumakis, E.: Unsupervised stratification of crossvalidation for accuracy estimation. Artif. Intell. 116(1), 1–16 (2000) 9. Fix, E., Hodges, J.L.: Discriminatory analysis. Nonparametric discrimination: consistency properties. Int. Statist. Rev./Rev. Int. Statist. 57(3), 238 (1989) 10. Getchell, J., et al.: Testing for HCV infection: an update of guidance for clinicians and laboratorians identifying current HCV infections. MMWR Morb. Mortal. Wkly Rep. 62, 362–365 (2013) 11. Hajarizadeh, B., Grebely, J., Dore, G.: Epidemiology and natural history of HCV infection. Nat. Rev. Gastroenterol. Hepatol. 10(9), 553–562 (2013) 12. Harell, F.: Damage caused by classification accuracy and other discontinuous improper accuracy scoring rules. Statistical Thinking. https://www.fharrell.com/post/class-damage/. Updated 15 Nov 2020 13. Hashem, S., et al.: Comparison of machine learning approaches for prediction of advanced liver fibrosis in chronic hepatitis C patients. IEEE/ACM Trans. Comput. Biol. Bioinform. 15(3), 861–868 (2018) 14. Hoffmann, G., Bietenbeck, A., Lichtinghagen, R., Klawonn, F.: Using machine learning techniques to generate laboratory diagnostic pathways – a case study. J. Lab. Precis. Med. 3(6) (2018). https://jlpm.amegroups.com/article/view/4401 15. Kuhn, M., Johnson, K.: Feature Engineering and Selection: A Practical Approach for Predictive Models (2019) 16. Li, N., et al.: Machine learning assessment for severity of liver fibrosis for chronic HBV based on physical layer with serum markers. IEEE Access 7, 124351–124365 (2019) 17. Lichtinghagen, R., Klawonn, F., Hoffmann, G.: HCV Data Set (2020). https://archive.ics.uci. edu/ml/datasets/HCV+data 18. Lichtinghagen, R., Pietsch, D., Bantel, H., Manns, M.P., Brand, K., Bahr, M.J.: The enhanced liver fibrosis (ELF) score: normal values, influence factors and proposed cut-off values. J. Hepatol. 59(2), 236–242 (2013) 19. Omohundro, S.M.: Five balltree construction algorithms. Tech. Rep. (1989) 20. WHO: Hepatitis C. https://www.who.int/news-room/fact-sheets/detail/hepatitis-c. Updated 26 Aug 2020 21. Ozer, I.: Recurrent neural network based methods for hepatitis diagnosis. In: International Symposium of Scientific Research and Innovative Studies (2021) 22. Vishal, R.: Feature selection – correlation and p-value. Towards Data Science. https://tow ardsdatascience.com/feature-selection-correlation-and-p-value-da8921bfb3cf. Accessed 12 Sept 2018 23. Suwardika, G.: Pengelompokan dan klasifikasi pada data hepatitis dengan menggunakan support vector machine (SVM), classification and regression tree (cart) dan regresi logistik biner. J. Educ. Res. Evaluat. 1(3), 183 (2017) 24. Syafa’ah, L., Zulfatman, Z., Pakaya, I., Lestandy, M.: Comparison of machine learning classification methods in hepatitis C virus. J. Onl. Inform. 6(1), 73 (2021) 25. Thrift, A., El-Serag, H., Kanwal, F.: Global epidemiology and burden of HCV infection and HCV-related disease. Nat. Rev. Gastroenterol. Hepatol. 14(2), 122–132 (2017)
Data-Centric Approach to Hepatitis C Virus Severity Prediction
431
26. Trishna, T.I., Emon, S.U., Ema, R.R., Sajal, G.I.H., Kundu, S., Islam, T.: Detection of hepatitis (a, b, c and e) viruses based on random forest, k-nearest and naive Bayes classifier. In: 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1–7. IEEE (2019) 27. Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Statist. Soc. Ser. B (Statist. Methodol.) 67(2), 301–320 (2005)
Automatic Crack Detection with Calculus of Variations Erika Pellegrino(B) and Tania Stathaki Imperial College London, London, UK [email protected]
Abstract. Nowadays the civil infrastructure is exposed to several challenges such as daily vehicular traffic and extreme weather conditions, i.e. ghastly winds, strong rain etc. It is well known that these may determine structural deterioration and damages, which can even cause catastrophic collapses related to significant socio-economic losses. For this reason it is evident that automatic inspection and maintenance must play a decisive role in the future. With the objective of quality assessment, cracks on concrete buildings have to be identified and monitored continuously. Due to the availability of cheap devices, techniques based on image processing have been gaining in popularity, but they require a rigorous analysis of large amounts of data. Moreover, the detection of fractures in images is still a challenging task, being these structures sensitive to noise and to changes in environmental conditions. This paper proposes an automatic procedure to detect cracks in images along with a parallel implementation on heterogeneous High Performance Architectures aiming both at automatizing the whole process and at reducing its execution time. Keywords: Machine vision · Calculus of variations infrastructure · High perfomance computing
1
· Smart
Introduction
Infrastructures can be exposed to different loading conditions, recurrent ones due to vehicular traffic and extraordinary ones caused by earthquakes, wind, and strong rain. The consequently induced stresses may determine structural deterioration and damage, which can even cause catastrophic collapses related to significant socio-economic losses [8]. Therefore, the issues related to the possibility of reaching/increasing a level of automation for inspection and maintenance of infrastructure are still prone to research interests. During these last years, the classical activities conducted commonly with human inspectors by visual quality control for damage assessment is under a deep innovative renovation due to newly available tools coming from information and communication technologies. Indeed, for example, current visual inspection, which highly relies on an inspector’s subjective and empirical knowledge prone to false evaluations [9], can be enhanced by robotic/automatic assisted operations [30]. Usually, the actions performed by inspectors require a long time to examine large areas, which may be c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 432–440, 2022. https://doi.org/10.1007/978-3-030-96308-8_40
Automatic Crack Detection with Calculus of Variations
433
difficult to access. Inspection can be also performed with specialized equipment like large under bridge units, large trucks, special elevating platforms or scaffolding on structures. Those solutions are in most cases expensive and cause high logistical efforts and costs. These units can even interfere with the operational conditions of structures and infrastructure. Recent works address the problem of the automation of inspection and maintenance tasks based on robotic systems [10]. Existing automatic or robotic systems based on ground or aerial solutions have been proposed for inspection of dangerous sites or those difficult to access, but at the present state-of-the-art, human-based procedures are not yet completely substituted. Examples of ground systems used for inspection are wheeled robots [11], legged robots [12], but efficient type of locomotion is the hybrid one combining the advantages of both types, as discussed in [13,14]. In case of inspection of vertical surfaces, wall-climbing robots were developed using magnetic devices [15] or using vacuum suction techniques [16] and remote-controlled unmanned aerial vehicles (UAVs), equipped with high definition photo and video cameras, were used to get high-quality data. In particular, (UAVs) have shown a great advantage in inspection applications, showed great potentialities not only in inspection applications [7,17], but also in additive building manufacturing [32]. Although robotic systems for inspection, together with newer measurement techniques can significantly enhance infrastructure inspections, the development level of these robotic systems is still much lower than in other areas and it needs to be better developed. Indeed, these good perspectives for the automated inspection will decrease costs, increase inspection speed, accuracy and safety [20]. The integration among robotics, automation and information and communication technologies allows creating useful tools able to help to generate very reliable models, which are helpful in the decision-making processes [6,21]. Most of the infrastructure and civil structures are made by concrete, steel and masonry, which are prone to cracks due to creep, shrinkage and corrosion of reinforcements. Crack information (e.g., the number of cracks and crack width and length) represents the current structural health indicators, which can be used for the proper maintenance to improve the structural safety [22]. Nowadays damages in buildings and bridges can be easily captured using a commercial digital camera and consequently analyzed and classified by image processing algorithms, but the detection of fractures is still challenging in image processing [5]. The main reasons are that they have a complex topology, a thickness similar to the image resolution and are easily corrupted by noise. The most used techniques are those based on color detection. In [23] a comparative analysis is proposed among different color spaces to evaluate the performance of color image segmentation using an automatic object image extraction technique. In [24] a RGB based image processing technique was proposed for rapid and automated crack detection. Even though these techniques allow fast processing and are highly robust to geometric variations of object patterns and viewing directions, they are not suitable for inspections because they are too sensitive to environmental conditions and noise. Recently, algorithms based on Convolutional Neural Networks have showed promising results. In particular, [26] uses these techniques to
434
E. Pellegrino and T. Stathaki
detect concrete cracks without calculating the defect features [27,28]. Furthermore, a Fusion Convolutional Neural Network is proposed and employed in [29] for crack identification in steel box girders containing complicated disturbing background and handwriting. As stated in [30] these methods are affected by an high incidence of false alarms and need to be combined with pre/post processing techniques in order to process corrupted images. In this paper we propose an automatic procedure to detect cracks in images that is robust to changes in light conditions and to noise corruption and can be easily applied in combination with machine learning techniques in order to prevent false alarms. Aiming both at automatizing the whole process and at reducing its execution time a parallel implementation on heterogeneous High Performance Architectures is provided.
2
Mathematical Model and Parallel Implementation
Variational methods have addressed successfully problems such as image segmentation and edge detection. They propose as solution a minimizer of a global energy. A first example is described by Mumford and Shah (MS) in their famous paper [37] where they proposed a first order functional, whose minimization determines an approximation of the image by means of a piecewise smooth function and detects edges as singularities in the image intensity. However, this model is not suitable for cracks because they do not represent singularities in the intensity function, but in its gradient instead. For this reason, we propose a second order variational model based on the Blake-Zissermann (BZ) functional [33]. This was introduced with the aim of overcoming some limitations of the MS approach, such as the over segmentation and the lack in detecting gradient discontinuities. Being the original formulation not suitable for numerical treatment, we had to work on a different approach that is based on the approximation proposed by Ambrosio and Tortorelli (AT) for the MS functional [35,36]. In their model, they replaced the unknown discontinuity set by an auxiliary function which smoothly approximates its indicator function. In our case two auxiliary functions are introduced as indicators of both intensity discontinuity and gradient discontinuity sets. The qualifying terms “free discontinuities”, “free gradient discontinuities” mean that the functional is minimized over three variables: two unknown sets K0 , K1 with K0 ∪ K1 closed, and u, a smooth function on Ω(K0 ∪ K1 ) as follows: F(u, K0 , K1 ) = (|∇2 u|2 + Φ(x, u))dx Ω\(K0 ∪K1 ) (1) n−1 n−1 (K0 ∩ Ω) + βH ((K1 \K0 ) ∩ Ω) +αH α and β being two positive parameters. The set K0 represents the set of jump points for u, and K1 \ K0 is the set of crease points of u, those points where u is continuous, but ∇u is not. Under certain conditions, the existence of minimizers for Blake-Zisserman functional is ensured over the space {u : Ω ⊂ Rn → R|u ∈ L2 (Ω), u ∈ GSBV (Ω), ∇u ∈ (GSBV (Ω))n }, being GSBV (Ω) the space of generalized special functions of bounded variation, see [4]. By properly
Automatic Crack Detection with Calculus of Variations
435
adapting the techniques of [34], two auxiliary functions s, z : Ω → [0, 1] (aimed at approximating the indicator functions of the discontinuity sets) are introduced to the model and a Γ −convergence approximation of F is proposed via the following family of uniformly elliptic functionals 2 2 s + o |∇u|2 dx + (α − β) z 2 ∇2 u dx + ξ |∇s|2 Ω Ω Ω 1 1 2 2 2 2 + (s − 1) dx + β |∇z| + (z − 1) dx + μ |u − g| dx 4 4 Ω Ω
F (u, s, z) =δ
2 where (s, z, u) ∈ W 1,2 (Ω, [0, 1]) × W 2,2 (Ω) = D(Ω). As numerical minimization algorithm we chose an “inexact” block-coordinate descent scheme (BCD) in order to address the heterogeneous hardware environment. Although the model is global, several numerical experiments have highlighted that the solutions weakly depend on boundary conditions and are energetically close to the initial data. This motivate the adoption of a tiling scheme to address very large images: the minimizer is assembled by merging together local solutions restricted to portion of images. Regarding the implementation, initial results pointed out the need to find an approach that increases data locality: this feature can be achieved by partitioning data and variables and considering independent subproblems. In this approach data dimensionality decreases and variables are more likely to fit in the hardware cache, thus leveraging the impact of an extensive memory access. A tiling technique is exploited in order to generate a number of independent tasks that can be concurrently solved. Due to iterative nature of inner BCDA solver, different running times are expected for the solution of subproblems: to overcome this disadvantage we adopted manager/workers pattern that ensures run-time distribution of independent tasks among POSIX threads. A number of computational threads (workers) is initialized and put on wait on a shared task queue, while a monitor thread (master) is responsible to extract initial data and to collect computed solutions for each subproblem. Mutex-protected queues collect both task input and output results, as a consequence two different queues are present in the implementation: – a job queue: a single manager is the producer of the queue elements, while all workers are consumers; – a results queue: in this case each worker fills the queue with results of assigned subproblems, while the manager is responsible to insert them in the overall segmentation variables (u, s, z). Both cases can be handled by the same implementation that provides: – a thread-safe interface for insert/remove operations; – a signaling mechanism for the communication of available resources.
436
E. Pellegrino and T. Stathaki
We provide a simple C++ class that stores resources in a private std :: queue variable, while exposes only two methods push and pop for resource insertion and removal, respectively. This implementation can be used in conjunction of POSIX Threads, since additional private members are present: – a mutex variable of type pthread mutex t, used as safeguard for the shared resource; – a condition variable of type pthead cond t, associated to previously mentioned mutex, for signaling procedures. Such implementation choice allows for a mutually exclusive access to internal queue in multi-threading environment. Moreover, through the adoption of a condition variable, producer threads can communicate information about the state of shared data: for example to signal that a queue is no longer empty. In order to provide a reliable queue implementation even in presence of exceptions, RAII (Resource Acquisition Is Initialization) programming idiom is adopted when locking/unlocking operations are executed on a mutex. Job queue is used to communicate both commands and data from master to workers: in this implementation, only two basic job types are used. First job type contains a complete description of one of the tasks (references to subproblem local data, objective function parameters and algorithm parameters). A second type of job is used by master thread in order to ensure the clean termination of workers threads. Each worker thread is structured as a while loop: as long as the thread can pick a subproblem description, it solves it and puts the results on results queue; when a termination job is picked, the thread exits.
3
Numerical Results
We tried our methods on images of cracks taken in tunnels in Greece and backscattered electron images of concrete samples. In the former case the challenge was to reconstruct the whole structure avoiding the effect of the noise and the environmental conditions (i.e. lights). In the latter the aim was to detect the structure despite the complex texture in the background. In both cases the structures have been detected correctly (Fig. 1 and 2). In order to reduce the execution time and to provide a automatic procedure we tested a sequential implementation with a parallel one based on the OpenMP framework that implements two strategies for collaboratively executing a program on an environment composed by devices of different types (aka heterogeneous architectures). The experiments were performed both on a commodity PC and on a High Performance Computing cluster. A sequential version was executed on a workstation equipped with a processor Intel (R) Xeon CPU E6-79 at 3.40 GHz with 32 GB of RAM and total number of cores 12, running an Ubuntu 18.04 operating system. The parallel version based was executed on a heterogeneous cluster equipped with x86-64 processors, running a CentOS 7.6 operating system. Overall, the results show a significant reduction in the execution time with respect to the sequential algorithm (Table 1).
Automatic Crack Detection with Calculus of Variations
Fig. 1. Crack on a concrete wall
Fig. 2. Microcracks on a BSE microscopy of concrete materials
Table 1. Run time comparisons for a single image Algorithm
Time (s)
Sequential
13.184868
Parallel 24 cores
1.056701
Parallel 48 cores
0.5915374
437
438
4
E. Pellegrino and T. Stathaki
Conclusions
In this paper we proposed a variational method to detect cracks in images and a parallel implementation targeting modern High Performance Computing architectures with the aim of automatizing the detection process and of reducing its execution time. We got promising results for both the quality of the reconstruction and the reduction of its duration. We plan also to reduce further the execution time by testing our software on a domain specific hardware accelerator such as GRAPHCORE [2,3] or on Commodity Single Board Cluster [1]. Moreover, it would be interesting to test its robustness by applying it to different domains such as medical images of blood vessels. Acknowledgment. We would like to thank the National Technical University of Athens for having provided images of cracks in the tunnels of Egnatia Motorway in Metsovo and the Concrete Durability Laboratory at Imperial College London for the back-scattered electron images.
References 1. Johnston, S.J., et al.: Commodity single board computer clusters and their applications. Future Gener. Comput. Syst. 89, 201–212 (2018) 2. Louw, T., McIntosh-Smith, S.: Using the Graphcore IPU for traditional HPC applications. No. 4896. EasyChair (2021) 3. Ortiz, J., et al.: Bundle adjustment on a graph processor. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020) 4. Carriero, M., Leaci, A., Tomarelli, F.: A survey on the Blake-Zisserman functional. Milan J. Math. 83(2), 397–420 (2015) 5. Stent, S., et al.: Detecting change for multi-view, long-term surface inspection. In: BMVC (2015) 6. Sacks, R., et al.: Construction with digital twin information systems. Data-Centric Eng. 1 (2020) 7. Ruggiero, F., Lippiello, V., Ollero, A.: Aerial manipulation: a literature review. IEEE Robot. Autom. Lett. 3(3), 1957–1964 (2018) 8. Phares, B.M., et al.: Reliability of visual bridge inspection. Public Roads 64(5), 22–29 (2001) 9. Kim, H., Sim, S.-H., Cho, S.: Unmanned aerial vehicle (UAV)-powered concrete crack detection based on digital image processing. In: 6th International Conference on Advances in Experimental Structural Engineering, 11th International Workshop on Advanced Smart Materials and Smart Structures Technology (2015) 10. Lim, R.S., La, H.M., Sheng, W.: A robotic crack inspection and mapping system for bridge deck maintenance. IEEE Trans. Autom. Sci. Eng. 11(2), 367–378 (2014) 11. Hirose, S., Tsutsumitake, H.: Disk rover: a wall-climbing robot using permanent. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, vol. 3. IEEE (1992) 12. Figliolini, G., Rea, P., Conte, M.: Mechanical design of a novel biped climbing and walking robot. In: Parenti Castelli, V., Schiehlen, W. (eds.) ROMANSY 18 Robot Design, Dynamics and Control. CICMS, vol. 524, pp. 199–206. Springer, Vienna (2010). https://doi.org/10.1007/978-3-7091-0277-0 23
Automatic Crack Detection with Calculus of Variations
439
13. Ottaviano, E., Rea, P., Castelli, G.: THROO: a tracked hybrid rover to overpass obstacles. Adv. Robot. 28(10), 683–694 (2014) 14. Ottaviano, E., Rea, P.: Design and operation of a 2-DOF leg-wheel hybrid robot. Robotica 31(8), 1319–1325 (2013) 15. Guo, L., Rogers, K., Kirkham, R.: A climbing robot with continuous motion. In: Proceedings of the 1994 IEEE International Conference on Robotics and Automation. IEEE (1994) 16. Savall, J., Avello, A., Briones, L.: Two compact robots for remote inspection of hazardous areas in nuclear power plants. In: Proceedings of IEEE International Conference on Robotics and Automation, Detroit, Michigan (1999) 17. Hallermann, N., Morgenthal, G.: Visual inspection strategies for large bridges using Unmanned Aerial Vehicles (UAV). In: Proceedings of 7th IABMAS, International Conference on Bridge Maintenance, Safety and Management (2014) 18. Ottaviano, E., et al.: Design improvements and control of a hybrid walking robot. Robot. Auton. Syst. 59(2), 128–141 (2011) 19. Rea, P., Ottaviano, E.: Design and development of an inspection robotic system for indoor applications. Robot. Comput. Integr. Manuf. 49, 143–151 (2018) 20. Kang, D., Cha, Y.-J.: Autonomous UAVs for structural health monitoring using deep learning and an ultrasonic beacon system with geo-tagging. Comput.-Aided Civil Infrastruct. Eng. 33(10), 885–902 (2018) 21. Gattulli, V., Ottaviano, E., Pelliccio, A.: Mechatronics in the process of cultural heritage and civil infrastructure management. In: Ottaviano, E., Pelliccio, A., Gattulli, V. (eds.) Mechatronics for Cultural Heritage and Civil Engineering. ISCASE, vol. 92, pp. 1–31. Springer, Cham (2018). https://doi.org/10.1007/9783-319-68646-2 1 22. Liu, Y., et al.: Automated assessment of cracks on concrete surfaces using adaptive digital image processing. Smart Struct. Syst. 14(4), 719–741 (2014) 23. Khattab, D., et al.: Color image segmentation based on different color space models using automatic GrabCut. Sci. World J. 2014 (2014) 24. Jung, H.K., Lee, C.W., Park, G.: Fast and non-invasive surface crack detection of press panels using image processing. Procedia Eng. 188, 72–79 (2017) 25. Wang, X., et al.: Comparison of different color spaces for image segmentation using graph-cut. In: 2014 International Conference on Computer Vision Theory and Applications (VISAPP), vol. 1. IEEE (2014) 26. Cha, Y.-J., Choi, W., B¨ uy¨ uk¨ ozt¨ urk, O.: Deep learning-based crack damage detection using convolutional neural networks. Comput.-Aided Civil Infrastruct. Eng. 32(5), 361–378 (2017) 27. Cha, Y.-J., et al.: Autonomous structural visual inspection using region-based deep learning for detecting multiple damage types. Comput.-Aided Civil Infrastruct. Eng. 33(9), 731–747 (2018) 28. Lin, Y., Nie, Z., Ma, H.: Structural damage detection with automatic featureextraction through deep learning. Comput.-Aided Civil Infrastruct. Eng. 32(12), 1025–1046 (2017) 29. Xu, Y., et al.: Surface fatigue crack identification in steel box girder of bridges by a deep fusion convolutional neural network based on consumer-grade camera images. Struct. Health Monit. 18(3), 653–674 (2019) 30. Protopapadakis, E., Voulodimos, A., Doulamis, A., Doulamis, N., Stathaki, T.: Automatic crack detection for tunnel inspection using deep learning and heuristic image post-processing. Appl. Intell. 49(7), 2793–2806 (2019). https://doi.org/10. 1007/s10489-018-01396-y
440
E. Pellegrino and T. Stathaki
31. Lee, H.X.D., Wong, H.S., Buenfeld, N.R.: Self-sealing of cracks in concrete using superabsorbent polymers. Cem. Concr. Res. 79, 194–208 (2016) 32. Dams, B., et al.: Aerial additive building manufacturing: three-dimensional printing of polymer structures using drones. In: Proceeding of the Institution of Civil Engineers: Construction Materials, pp. 1–12 (2017) 33. Blake, A., Zisserman, A.: Visual Reconstruction. MIT Press, Cambridge (1987) 34. Ambrosio, L., Faina, L., March, R.: Variational approximation of a second order free discontinuity problem in computer vision. SIAM J. Math. Anal. 32(6), 1171– 1197 (2001) 35. Tortorelli, L., Ambrosio, V.M.: Approximation of functionals depending on jumps by elliptic functionals via Gamma-convergence. Commun. Pure Appl. Math. 43, 999–1036 (1990) 36. Ambrosio, L., Tortorelli, V.: On the approximation of free discontinuity problems, pp. 105–123 (1992) 37. Mumford, D., Shah, J.: Optimal approximations by piecewise smooth functions and associated variational problems. Commun. Pure Appl. Math. 42(5), 577–685 (1989)
Deep Squeeze and Excitation-Densely Connected Convolutional Network with cGAN for Alzheimer’s Disease Early Detection Rahma Kadri1(B) , Mohamed Tmar2 , Bassem Bouaziz2 , and Faiez Gargouri2 1
Faculty of Economics and Management of Sfax - FSEG Sfax, MIRACL Laboratory, University of Sfax Tunisia, Sfax, Tunisia 2 Higher Institute of Computer Science and Multimedia, University of Sfax Tunisia, Sfax, Tunisia Abstract. Personalised medicine is a new approach that ensure a tolerant and optimal diagnosis for the patient basing on his own data and its profile information such as life style, medical history, genetic data, behaviours, and his environment. These data is vital to predict the potential disease progression. Extracting insights from these heterogeneous data is a challenging task. Brain disorders such as neurodegenerative diseases detection and prediction is still an open challenge for research. The early prediction of these diseases is the key solution to prevent their progression. Deep learning methods has shown an outstanding performance on the brain diseases diagnosis such as the Alzheimer’s disease (AD). In this paper we present two contributions. Firstly, we adopt a conditional generative adversarial network for data augmentation based on two datasets the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and the Open Access Series of Imaging Studies (OASIS) as a solution for the shortage of health information. Additionally, we present a new method for an early detection of the Alzheimer’s disease based on the combination of the Densenet 201 network and the Squeeze and Excitation network. Furthermore, we compare our approach with the traditional Densenet 201, Squeeze and Excitation network and other networks. We figure out that our approach yields best results over these networks. We reach an accuracy of 98%. Keywords: Deep learning · Alzheimer disease early detection · Deep Squeeze and Excitation-Densely Connected Convolutional Network · cGAN · Data augmentation
1
Introduction
Alzheimer’s disease is one of the most irreversible neurodegenerative diseases that destroys the brain cells and the main cause of brain degeneration. This progressive disorder is a the most leading cause of dementia. Over the time, the person became enable to carry out daily tasks such as walking and communication. The prevention of Alzheimer’s disease progression is the only strategy and c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 441–451, 2022. https://doi.org/10.1007/978-3-030-96308-8_41
442
R. Kadri et al.
solution to decrease its risk. Scientists collect various type of data to understand this disease such as medical history, lifestyle, genetic data, and demography information about the person. Further, they adopt various validated biomarkers for Alzheimer’s disease diagnosis including brain imaging modalities and Cerebrospinal fluid (CSF). The neuroimaging techniques, such as Magnetic Resonance Imaging (MRI) and Positron Emission Tomography (PET) are the most widely used to non-invasively capture and cover the brain structure and function. Extracting knowledge form these images is challenging due to the complexity and the volume of neuroimaging data. Hence, there is a growing need for an effective computer aided diagnosis of Alzheimer’s disease. The emergence of the sophisticated methods to analyze brain modalities data based on different machine learning techniques encourage scientists to obtain impressive results for Alzheimer’s detection and prediction. Deep learning networks such as convolution neural network has shown considerable results within this context based on the MRI data. However, this network have some drawbacks because it requires a huge amount of data for an effective training. In addition, deep networks such as VGG16, Alexnet and inception model could affect the overfitting and the vanishing gradient problem. Recent architectures such as Densenet, Residual networks deal with these problems and exploit the depth of the network to improve the CNN network to extract complex features. Despite this significant improvement these networks cannot improve the feature representation of the CNN and channel information. In this study, we meet these issues by the flowing contributions: – We proposed a data augmentation method based on the conditional generative adversarial neural network to mitigate the imbalanced dataset and the lack of data to improve the training process. – We present a Deep Squeeze and Excitation-Densely Connected Convolutional Network that combine the Densely Connected Convolutional Networks and the squeeze excitation neural network for an early detection of the Alzheimer disease. – We compare our proposed model with the different CNN models.
2
Related Work
Convolution neural networks are widely adopted for the brain diseases detection and prediction [3,4,7]. In this section, we investigate the performance of different CNN models within this context. For example, [2] adopted a deep convolutional neural network for AD detection using MRI data from the kaggle dataset. They also used a transfer learning based on the VGG16 and VGG19. They outlined that the VGG19 yields the best results. However this network requires a huge amount of data and leads to high computational complexity. [12] investigated the use of attention mechanisms within the VGG 16 for AD detection to decrease the computational complexity. They integrated a convolutional block attention modules within the VGG network. The VGG16 with the attention mechanism yields better performance over the traditional VGG newtork. This approach achieves 97% as accuracy using the ADNI dataset. Deep networks are prone to a training issue called a vanishing gradient problem. Residual networks and
Deep Squeeze and Excitation-Densely Connected Convolutional Network
443
Densenet networks are one of the most effective solution for this problem by the short connection. [10] proposed a three-dimensional DenseNet network for an early Alzheimer detection based on the magnetic resonance imaging from the ADNI dataset. The main advantages of this architecture is that require few parameters. However they only achieved a 88% accuracy. [15] improved the architecture of the Densenet network by the connection-wise attention mechanism for the prediction of the conversion from mild cognitive impairment (MCI) to AD stage. The main advantage here is that this model can extract compact highlevel features from the MRI images. They reached a good accuracy. [5] proposed a 2d CNN for 3-class AD classification problem using T1-weighted MRI from the ADNI dataset. The model is based on ResNet34 as feature extractor and then trained a classifier using 64 × 64 sized patches from coronal 2D MRI slices. In this work, authors used the residual blocks to make the network more deeper. However They achieved only an accuracy of 68%. [14] used the same dataset and they improved the Resnet network by adding a selective kernel network (SKNet) with a channel shuffle within its architecture. Through this improvement, authors improved the network feature detection capability. They achieved a good classification accuracy. [11] also proposed an improved residual network based on spatial transformer networks (STN) and the non-local attention mechanism for the Alzheimer prediction using also the ADNI dataset. They modified the RELU function of the Resent50 by a new Mish activation function. The main advantage of this model is that it resolved the problem of local information loss in traditional CNN models. This model achieved a 97.1% accuracy. [6] also modified the structure of the inception v3 for Alzheimer’s disease stages classification on the ADNI dataset. The enhanced inception v3 reach 85.7% as accuracy value. [1] deployed a patch-based approach for AD detection using an ensemble classification through 2 CNN on the MRI modality from the ADNI dataset. They used 2 CNN netwroks for the Left and Right Hippocampus regions. The patch-based approach solved the problem of the overfitting and ensure a good accuracy. However the selection of the patches could affect the classification accuracy. Various studies are based on the hybrid methods that combined the CNN with other networks to overcome the main problems of the CNN related to the feature extraction. [9] proposed a cascade method that combined a 3D CNN and a 3D convolutional LSTM using structural MRI modality from the ADNI dataset for AD classification and detection. Here authors enhanced the capability of the CNN to learn high-level features by the LSTM network. The 3D CNN is used to extract the key features from the image, then the 3D LSTM is adopted to learn the channel-wise higher-level features. This method achieved an accuracy of 94.19%. However it is slow to train and they not solved the lack of an adaptive mechanism channel weighting. Many other studies [8,13] combined CNN networks with different networks to improve the feature extraction and the disease classification. The lack of large annotated neuroimaging datasets and the imbalance datasets are one of the most challenging issues on the Alzheimer’s disease detection using deep learning. Furthermore, current studies that are based on the CNN cannot improves a channel interdependencies with an optimal computational cost. Hence we propose a method that deal with these issues.
444
3
R. Kadri et al.
Methods
In this section we explain the proposed contributions. The first contribution is the data augmentation using the cGAN. The second contribution is the early detection of Alzheimer’s detection using the Deep Squeeze and Excitation-Densely Connected Convolutional Network. One of the most widely adopted methods that can mitigate the effects of imbalanced data and the lack of data is the data augmentation. Traditional data augmentation methods are based on the geometric deformation, translation, flipping, rotation, color augmentation on the original image to create a new images. However these methods are not suitable for medical data generation. Generative adversarial networks(GAN) is an innovative method of synthetic data creation and augmentation. GAN consist of two sub-networks the Generator (G) and the Discriminator (D). The generator network generate new samples and the discriminator network classify the output of the generator into real or fake images. However GAN is not able to generate the data with the target label and the dataset generated from it lacks diversity. A smart extension of the GAN is the Conditional GAN meet this issue. cGAN is a sub type of GAN by adding a label y as a parameter to the input of the generator and try to generate the corresponding data. It also adds labels to the discriminator input to distinguish real data better. 3.1
Data Augmentation Using cGAN
The network takes as input z correlated with a target class y to create a realistic images.
Fig. 1. Conditional generative adversarial network for data augmentation
Deep Squeeze and Excitation-Densely Connected Convolutional Network
445
The generator try to generate a fake image as similar as possible to a real image for the given label. The objective function of the cGAN is defined as: min max V (D, G) = G
D
E
[log D(x, y)] +
x∼pdata (x)
E
[log(1 − D(G(z, y), y))]
z∼pz (z)
The proposed architecture consist of two networks. The first network is the generator which is a deconvolutional neural network consist of an input layer represent a noise random input, dense layer, reshape layer, embedding layer, concatenation layer and three blocks for image upsampling. Each block involved a transposed convolutional layer, batch normalization layer and LeakyReLU layer as depicted in the Fig. 1. We used the Leaky ReLU instead the ReLU activation function to avoid the dying ReLU problem. Firstly, the generator converts a noise input into vector using the reshape layer. Further it converts the additional information y into embedding vector using the embedding layer. Then the network concatenate the 2 inputs using the concatenation layer (Fig. 2).
Fig. 2. Generator network architecture
The model outputs then a 128 * 128 * 3 image through a serious of transposed convolution layers flowed by a batch normalization and LeakyReLU layers. The output of the generator is feed into the discriminator to classify it into fake and real. This discriminator is composed by an embedding layer, dense layer, reshape layer and concatenation layer and a series of three down sampling blocks. Each block is composed by a convolutional layer, batch normalization layer, Leaky rectified linear unit (LeakyReLU) layer, flatten layer, a dropout layer and a sgmoid function. This function is used to classify the images into fake and real. The output of the discriminator represent the prediction of the probability that the given sample of data is real. The generator is updated basing on the decision of the discriminator. The image 3 illustrates the results of the proposed cGan the generated images at the epoch 50 and at the epoch 500. As showed by the images the generator at
446
R. Kadri et al.
Fig. 3. Generated images at the epoch 50 and 100
every epoch improve the quality of the generated images according to the decision of the discriminator. Totally, the number of epochs is 2000. We deployed the binary cross entropy as the loss function for the discriminator and the generator. We used Adam optimizer with learning rate 0.001. 3.2
Deep Squeeze and Excitation-Densely Connected Convolutional Network
Deep convolution neural network have proven a significant performance in various computer vision tasks. The feature representation and the network depth are
Deep Squeeze and Excitation-Densely Connected Convolutional Network
447
a key parameters for an effective convergence and generalization of the model. The key idea is to improve this feature representation and detection for an effective image recognition and classification Despite this success, the deep convolution neural networks present some challenges such as the channel interdependencies, overfetting and the vanishing gradient problem and the lack of content aware mechanism to weight each channel adaptively. Our proposed model deal with these issues by integration the Squeeze-and-Excitation (SE) block into the DenseNet 201 network to boost the representational capability of this network. The main objective of this network is to enhance the quality of the feature representations and ensuring an adaptive weighting for network channel. The network takes an MRI image as input, the image is passed through a convolution layer, max pooling layer and a series of SEDense blocks flowed by transition layer to reduce the number of channels between the SEDense blocks as illustrated by the Fig. 4.
Fig. 4. Layered architecture of the proposed model
The transition layer involved a 1 * 1 convolution layer and an average pooling. Then The output is passed through a global average pooling and a softmax layer to predict the disease stage. The integration of the SE block into the DenseNet network improve the accuracy of the model by enhancing its sensitivity to the reliable features and make it deeper without decreasing the network performance. The network is based on the short connection between layers. This architecture make the network more deeper without decreasing the network performance. Furthermore, The network requires a fewer parameters over the CNN models. For the training strategy, we decompose the data into train, test and validation data. We used Adam optimizer with learning rate 0.01 and the cross-entropy as loss function. The total number of epochs is 100. We adopted a dropout technique rate of 0.5 to avoid the overfitting.
448
R. Kadri et al.
Data Selection and Preparation. We select data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset as illustrated by the Table 1. Table 1. Demographic information Alzheimer’s stage Age
Sex
Samples
AD
(65–95)
Female (F)and Male (M) 490
CN
(65–95)
(F) and (M)
600
MCI
(40–100) (F) and (M)
480
EMCI
(45–90)
(F) and (M)
630
LMCI
(40–100) (F) and (M)
570
We used an advanced search functionality of this data set basing on such criteria: we select 5 stages of Alzheimer’s disease (CN, MCI, EMCI, AD, and LMCI). We only choose the MRI modality. We used the axial plane as an acquisition plan. Further for the weighting, we selected T2. We collect data also from the OASIS data set. Pre-processing steps includes convert image from the DICOM to JPG format, image cropping, image resizing, conversion to Pytroch tensor and image normalization. 3.3
Results
In this section, we outlines the performance of our model. The Table 2 shows the different performances of CNN models and our proposed approach for Alzheimer early detection using the ADNI and the OASIS dataset. Furthermore, we integrated the SE module within the residual blocks to compare it with the Squeeze and Excitation-Densely Connected Convolutional Network. Figure 5 (a) outlines the Model metrics. We used TensorBoard to visualize the accuracy and the loss per epoch (Fig. 5(b)). As illustrated by the table our proposed method outperform different CNN networks. Table 2. Comparative table Network
Accuracy
Densenet
92%
Squeeze-and-Excitation Networks
87%
VGG16
72%
ResNet 50
83%
Residual squeeze-and-excitation network
90%
Deep Squeeze and Excitation-Densely Connected Convolutional Network
98%
Deep Squeeze and Excitation-Densely Connected Convolutional Network
449
Fig. 5. Model metrics
4
Conclusion and Future Work
Convolution neural network has made a big leap in the Alzheimer disease detection and prediction. However its application has many limitations such as overfitting, the lack of an adaptive mechanism channel weighting and the vanishing gradient problem. In this paper we meet this issues. Firstly, we proposed an image generation method based on the cGan. Furthermore, we exploit the main strengths of Densenet 201 and the of the Squeeze and Excitation-Densely network for an early Alzheimer’s early detection and we reach an accuracy rate of 98%. We compared our proposed method with the CNN models and the Residual squeeze-and-excitation network, our method outperforms these networks. However our approach is only based on the MRI modality. In the future work, we will adopt a multi modality approach for an effective disease diagnosis. We can adopt this method for other brain diseases such as the Autism Spectrum Disorders.
450
R. Kadri et al.
References 1. Ahmed, S., et al.: Ensembles of patch-based classifiers for diagnosis of Alzheimer diseases 7, 73373–73383 (2019). https://doi.org/10.1109/access.2019.2920011 2. Ajagbe, S.A., Amuda, K.A., Oladipupo, M.A., AFE, O.F., Okesola, K.I.: Multiclassification of Alzheimer disease on magnetic resonance images (MRI) using deep convolutional neural network (DCNN) approaches 11(53), 51–60 (2021). https:// doi.org/10.19101/ijacr.2021.1152001 3. Al-Khuzaie, F.E.K., Bayat, O., Duru, A.D.: Diagnosis of Alzheimer disease using 2d MRI slices by convolutional neural network. Appl. Bionics Biomech. 2021, 1–9 (2021). https://doi.org/10.1155/2021/6690539 4. Alshammari, M., Mezher, M.: A modified convolutional neural networks for MRIbased images for detection and stage classification of Alzheimer disease. IEEE (2021). https://doi.org/10.1109/nccc49330.2021.9428810 5. de Carvalho Pereira, M.E., Fantini, I., Lotufo, R.A., Rittner, L.: An extended2D CNN for multiclass Alzheimer’s disease diagnosis through structural MRI. In: Hahn, H.K., Mazurowski, M.A. (eds.) Medical Imaging 2020: Computer-Aided Diagnosis. SPIE, March 2020. https://doi.org/10.1117/12.2550753 6. Cui, Z., Gao, Z., Leng, J., Zhang, T., Quan, P., Zhao, W.: Alzheimer’s disease diagnosis using enhanced inception network based on brain magnetic resonance image. In: 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, November 2019 7. Kaur, S., Gupta, S., Singh, S., Gupta, I.: Detection of Alzheimer’s disease using deep convolutional neural network. Int. J. Image Graph., 2140012 (2021). https:// doi.org/10.1142/s021946782140012x 8. Li, F., Liu, M.: A hybrid convolutional and recurrent neural network for hippocampus analysis in Alzheimer’s disease 323, 108–118 (2019). https://doi.org/10.1016/ j.jneumeth.2019.05.006 9. Pan, D., Zeng, A., Jia, L., Huang, Y., Frizzell, T., Song, X.: Early detection of Alzheimer’s disease using magnetic resonance imaging: a novel approach combining convolutional neural networks and ensemble learning. Front. Neurosci. 14 (2020). https://doi.org/10.3389/fnins.2020.00259 10. Solano-Rojas, B., Villal´ on-Fonseca, R.: A low-cost three-dimensional DenseNet neural network for Alzheimer’s disease early discovery. Sensors 21(4), 1302 (2021). https://doi.org/10.3390/s21041302 11. Sun, H., Wang, A., Wang, W., Liu, C.: An improved deep residual network prediction model for the early diagnosis of Alzheimer’s disease. Sensors 21(12), 4182 (2021). https://doi.org/10.3390/s21124182 12. Wang, S.H., Zhou, Q., Yang, M., Zhang, Y.D.: ADVIAN: Alzheimer’s disease VGGinspired attention network based on convolutional block attention module and multiple way data augmentation. Front. Aging Neurosci. 13 (2021). https://doi. org/10.3389/fnagi.2021.687456 13. Xia, Z., et al.: A novel end-to-end hybrid network for Alzheimer’s disease detection using 3D CNN and 3D CLSTM. IEEE (2020). https://doi.org/10.1109/isbi45749. 2020.9098621
Deep Squeeze and Excitation-Densely Connected Convolutional Network
451
14. Xu, M., Liu, Z., Wang, Z., Sun, L., Liang, Z.: The diagnosis of Alzheimer’s disease based on enhanced residual neutral network. In: 2019 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC). IEEE, October 2019. https://doi.org/10.1109/cyberc.2019.00076 15. Zhang, J., Zheng, B., Gao, A., Feng, X., Liang, D., Long, X.: A 3D densely connected convolution neural network with connection-wise attention mechanism for Alzheimer’s disease classification. Magn. Reson. Imaging 78, 119–126 (2021). https://doi.org/10.1016/j.mri.2021.02.001
Recognition of Person Using ECG Signals Based on Single Heartbeat Sihem Hamza(B) and Yassine Ben Ayed Multimedia InfoRmation systems and Advanced Computing Laboratory, MIRACL, University of Sfax, Sfax, Tunisia
Abstract. In this paper, the performance of an ECG biometric recognition system is evaluated. This research focuses on improving the performance of this identification system, using the necessary phases namely the preprocessing phase, the feature extraction phase, and the classification phase. For the preprocessing phase, the bandpass filter is proposed in this paper to eliminate the noise of the signal and the detection of T peaks is carried out to realize the segmentation. For the feature extraction phase, the combination of different parameters such as Zero Crossing Rate (ZCR), Cepstral Coefficients (CC), and Entropy (E) is proposed in this research. Then, the machine learning techniques such as the K-Nearest Neighbor (K-NN) and the Support Vector Machine (SVM) are presented to classify the subject. The experiment results are conducted on 54 subjects of the Physikalisch-Technische Bundesanstalt (PTB) diagnostic database and the best result is obtained with Radial Basis Function (RBF) kernel of the SVM classifier with an accuracy rate is equal to 95.4%. Keywords: Person recognition · ECG signals · Single heartbeat · Features extraction · K-nearest neighbor · Support vector machine
1
Introduction
Today, many applications use biometrics technology [1,2]. Biometrics quickly emerged as the most relevant to identify and authenticate people reliably and quickly based on the characteristics [3]. Two categories of biometric technologies [4] are existed such as behavioral measurements (the most widespread are voice recognition, signature dynamics, computer keyboard typing dynamics, gait, gestures, etc.), and physiological measurements [4]. Physiological measurements can be morphological (these are mainly fingerprints, the shape of the hand, the finger, the venous network, the eye (iris and retina), or the shape of the face) or biological (most commonly found in DNA, blood, saliva, or urine) [5]. But, a different type of characteristics does not have the same level of reliability [6]. It is believed that physiological measurements have the advantage of being more stable in the life of an individual. Identification is determining the c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 452–460, 2022. https://doi.org/10.1007/978-3-030-96308-8_42
Recognition of Person Using ECG Signals Based on Single Heartbeat
453
identity of a person. In recent studies, indicate that ECG signal can be used as a new biometric modality [6]. In recent years, many of the features have been used for human identification using the ECG signals [7–11]. John et al. [7] applied the Linear Discriminant Analysis (LDA) on 104 persons for public database and for the features extraction they pro-posed to use 15 fiducial points. The result obtained is equal to 90%. Wang et al. [8] proposed the AutoCorrelation and Discrete Cosine Transform (AC/DCT) methods applied on 13 subjects lead I. These methods offer accuracy rate equal to 84.61%. Dar et al. [9] was proposed the Discrete Wavelet Transform (DWT) method to identify the person. They applied the K-nearest neighbor to obtain an identification rate equal to 82.3%. In 2019 Hanilci et al. [10] used the AC/DCT and Mel-Frequency Cepstral Coefficients (MFCC) features and they applied the two Dimensional Convolutional Neural Network (2D CNN) model on 42 subjects lead I to obtain an accuracy rate equal to 90.48%. Turky et al. [11] used 200 subjects with single limb based lead I for public database and proposed to use the Common Spatial Pattern (CSP) method for the feature extraction and they obtained accuracy rate equal to 95.15%. The aim of this paper is to realize a recognition system of the person using physiological signals such as ECG signal that is recent research domain. This work present three principal stages such as the preprocessing stage, the feature extraction stage, and the classification stage. In the first stage, we propose to apply the bandpass filter on the signal to eliminate the noises [12]. After the filtering, we propose to use the QRS detection algorithm [13] to detect the T peak of the signal in order to do the segmentation. Now, each ECG signal is segmented into heartbeats. In the second stage, we propose to use a new source of information such as zero crossing rate, cepstral coefficients, and entropy. Then, we integrate them in one vector to realize the third step of our system, which is the classification. In this stage, we propose to utilize K-nearest neighbor and support vector machine in order to have an efficient system. In the rest of this paper, we introduce in the Sect. 2 the material and the proposed methodology. Next, in Sect. 3 we concentrate on the experimental results and discussion. Finally, in Sect. 4 we conclude with a short conclusion and future work.
2
Material and Methodology
2.1
Architecture of Our Proposed System
For human biometric identification systems based on ElectroCardioGram (ECG) signals, four steps are always worked on: 1234-
ECG signal databases, ECG signal pre-processing, Feature extraction, Classification.
454
S. Hamza and Y. Ben Ayed
Fig. 1 presents the architecture proposed in this paper of the human identification system by the ECG signal.
Fig. 1. Architecture of our proposed system.
ECG Signal Databases. The ECG is a signal that can be measured on the skin and indicates the electrical activity of the heart [14]. To evaluate our proposed combination of features we utilize the public data set such as Physikalisch-Technische Bundesanstalt (PTB) Diagnostic database. This database is available on the PhysioNet1 . The database is chosen because it includes more than two recordings for some of its subjects. In this work, all 1
https://physionet.org/content/ptbdb/1.0.0/.
Recognition of Person Using ECG Signals Based on Single Heartbeat
455
recordings of this database are captured in lead I and the sampling frequency 1000 Hz. In this study, we choose 54 subjects of the PTB Diagnostic database and then we compared our work with another work used the same database [10]. ECG Signal Pre-processing. For pre-processing, the bandpass filter is used of each raw ECG signals with cut-off 2 Hz 50 Hz [14]. The ECG signals are corrupted by power-line noise (for the frequency more 50 Hz), baseline wander (for the frequency less 2 Hz) etc. After the preprocessing step, we detected all of T peaks for each raw ECG signal to realize the segmentation step. Then, in each ECG signal we have segmented into heartbeats which depend on T peak detection with the QRS detection algorithm [13]. After that, we normalized all the segments to have the same size. In the next section, we present the characteristics extracted from each segment. Then, we combine them in one descriptor. Features Extraction. In this stage, the feature extraction is an important step after the preprocessing step to perform the classification in order to identify the subject. In our study, we propose to use new characteristics in order to identify the subject. These characteristics such as zero crossing rate, 12 cepstral coefficients, and entropy it can be integrated them in one vector. In other words, each ECG signal has one vector which is specific to it. • Zero Crossing Rate (ZCR) Many of works are used the zero crossing rate in the processing signal field, which detects the number of zero crossings [15]. The following equation is defined the parameter ZCR: ZCR =
N −1 1 sign(s(n)s(n − 1)) N − 1 n=1
(1)
• Cepstral Coefficients (CC) The ECG signal is a signal related to the human body, for that we need the transformations on the signal which is the Fourier Transform (FT). For the majority signal processing applications such as speech, are more used the cepstral coefficients [16]. The cepstral coefficients are given by Inverse Fast Fourier Transform applied (IFFT) to the logarithm of the Fast Fourier Transform modulus (FFT) of the ECG signal [14] (see Fig. 2). • Entropy (E) This characteristic is employed in several fields of data processing. The entropy is a measure of the amount of information in the realizations of a random variable [17], and is determined by the following equation:
456
S. Hamza and Y. Ben Ayed
Fig. 2. Steps to calculate the cepstral coefficients.
H(x) = −
P (xk )log2 [(P (xk )]
(2)
k
With x = {xk } 0 ≤ k ≤ N − 1 and P (xk ) is the probability of the xk . In our study, we propose to concatenate these features (ZCR, CC, and E) in one vector. Then, we propose to apply the K-Nearest Neighbor (K-NN) and the Support Vector Machine (SVM) for the classification in order to identify the person. In the next section, we introduce the classification methods applied on the fusion of the extracted characteristics. Classification. In this section, we introduce the techniques of machine learning for the classification in order to identify the person. The techniques are used on the combination of features are machine learning techniques such as K-NN and SVM. We propose to use the machine learning methods because are used in a variety of applications (bioinformatics, information retrieval, computer vision, etc.) and are gives a good result. • K-Nearest Neighbor (K-NN) The K nearest neighbors is a supervised classification approach, often used in the context of machine learning [18]. The method of K nearest neighbors aims to classify target points according to their distances from points constituting a learning sample. In our study, we propose to use the euclidean distance because the method most used in the literature [8]. • Support Vector Machine (SVM)
Recognition of Person Using ECG Signals Based on Single Heartbeat
457
The SVM is a family of machine learning algorithms allow solving both classification and regression problems [19]. The principle of SVMs is simple; this technique makes it possible to separate the data into classes using a frontier, so that the margin between the different groups of data and the frontier which separates them is maximum [6]. The linear, polynomial, Radial Basis Function (RBF) kernel function are very well answered in the literature [14]. These functions allow separating the data by projecting them in a feature space. The hyperparameters of these kernel function are γ, r, and d which are respectively at kernel flexibility control parameter, a weighting parameter, and a degree of the polynomial [14]. 2.2
Performance Evaluation
In our work, the metric used to evaluate the performance of classification and identification during testing is the accuracy rate, which is determined by the following formula: Accuracy rate (Acc) =
3
N umber of correct predictions T otal number of predictions
(3)
Experimental Results and Discussion
In this paper, experiments are carried out with benchmark database such as PTB Diagnostic database lead I with 54 subjects. In this study, each raw ECG signal of subject we have done the processing stage in order to obtain different filtered samples. Then, we realize the feature extraction stage, in this step we proposed to concatenate different sources of information in one descriptor. To evaluate our proposed methodology (fusion of ZCR, CC, and E) we have proposed to apply the machine learning algorithms such as K-NN and SVMs. In this experiment, we divided each person data into two parts: 70% for training and 30% for testing. Firstly, we propose to apply the K- nearest neighbor algorithm on our proposed descriptor. The principle of this algorithm is each time we vary the value of parameter K (parameter K is the nearest neighbor number). The best accuracy is obtained when the parameter value K is equal to 2 with accuracy rate equal 86.1%. Secondly, after testing with the K- nearest neighbor algorithm we propose to apply the support vector machine to enhance the accuracy rate. The SVMs technique is based on its kernel functions (such as linear, polynomial, and RBF) and the hyperparameters of each kernel function (such as c, γ, r, and d). In each kernel we varied in each time the value of its parameter in order to obtain the best result. We start with linear kernel function; this kernel has only c parameter (is a regularization parameter). We have varied the value of this parameter and the best result obtained when c = 10000 with an accuracy = 94%. Next, we test
458
S. Hamza and Y. Ben Ayed
with polynomial kernel function, in this kernel we varied its parameters one by one and the maximum accuracy rate equal to 94.4%. The varied parameters of this kernel are c, γ, r, and d (the best result is found when c = 1000, γ = 0.05, r = 1000, and d = 3). Then we pass with RBF kernel function; this kernel has two parameters such as c and γ parameter. We varied these parameters one by one, and the maximum Acc rate equal to 95.4% (this result is obtained when c = 1000 and γ = 0.0005). The Table 1 presents the best rate obtained by the kernel of the SVM. Table 1. The best rate obtained with k-NN and SVM kernel functions for the PTB database. Features
[ZCR + 12Cepstral coefficients + Entropy]
Classifier
K-NN
SVM
K parameter
Acc (%)
2
86.1
3
78.2
4
74.5
Kernel function
Acc (%)
Linear
94
Polynomial
94.4
RBF
95.4
Our proposed work is compared with some previous studies that existed in the literature used the same benchmark data sets (PTB Diagnostic database lead I) using other characteristics. The Table 2 presents some previous studies. Table 2. Comparative with some previous studies used the PTB database lead I. Author
Features
Wang et al. [8]
AC/DCT
NN
84.61
Hanilci et al. [10]
AC/DCT, MFCC
2D-CNN
90.48
Turky et al. [13]
CSP
RBF-SVM
95.15
KNN
86.1
Proposed approach
ZCR + 12 Cepstral coefficients + Entropy
Classifier
Accuracy (%)
Linear-SVM
94
Polynomial-SVM
94.4
RBF-SVM
95.4
Recognition of Person Using ECG Signals Based on Single Heartbeat
459
In our research, we have proposed to use the ECG signal in the biometric for the pattern recognition because is recent research field according to the literature. Next, in this context the system of human identification is based on three main steps such as the preprocessing, the feature extraction, and finally the classification. In the first stage, first of all we used the bandpass filter; next we detected all of the T peaks using the QRS detection algorithm; then we realized the segmentation which depends on the T peaks detection. After the preprocessing step, we have proposed to combine all of the features that are extracted from preprocessing ECG signal. To evaluate our work, we have proposed to apply the K-NN and SVM on the PTB Diagnostic database. The best accuracy is achieved by the SVM technique with an accuracy = 95.4%. This work is compared with another work used the same public database [10] which is obtained with the classifier an ACC equal to 90.48%.
4
Conclusion
In this paper, we proposed to perform an identification system based on the physiological signals such as the ECG signal because is recent research field according to the literature. This system is achieved by three main steps namely the preprocessing step, the feature extraction step, and the classification step. The bandpass filter has been proposed to remove the noise of the ECG signal and the QRS detection algorithm is used for detect the T peak in order to realize the segmentation. Next, we proposed to concatenate the zero crossing rate feature, the cepstral coefficients features, and the entropy feature in one vector. Then, the experimental result show that the support vector machine has the best accuracy rate equal to 95.4% with 54 subjects of the public PTB Diagnostic database. In the future, in the preprocessing step we will propose to modify the type of the segmentation (each sample has two R peaks). That is to say, using an algorithm to detect the R peaks (using the pan Tompkins algorithm) then we segment according to the R peaks detection with the algorithm. Because in this research, we detected the T peaks and the segmentation depend the T peaks detection so each sample has one R peak for that we will propose in the future each sample of the ECG signal has two R peaks. Then, we will propose to apply the support vector machine on the proposed approach vector used in this research and to verify our proposed system, we will use a larger data set.
References 1. Sarier, N.D.: Efficient biometric-based identity management on the Blockchain for smart industrial applications. Pervasive Mob. Comput. 71, 101322 (2021) 2. Benouis, M., Mostefai, L., et al.: ECG based biometric identification using onedimensional local difference pattern. Biomed. Signal Process. Control 64, 102226 (2021) 3. Chhabra, G., Sapra, V., Singh, N.: Biometrics-unique identity verification system. In: Information Security and Optimization, pp. 171–180 (2020)
460
S. Hamza and Y. Ben Ayed
4. Rinaldi, A.: Biometrics’ new identity measuring more physical and biological traits. Sci. Soc. 17(1), 22–26 (2016) 5. Houalef, S., Bendahmane, A., et al.: Syst`eme de reconnaissance biom´etrique multimodal bas´e sur la fusion: empreinte digitale, visage, g´eom´etrie de la main. Colloque Africain sur la Recherche en Informatique, D´epartement d’informatique, vol. 1, pp. 1–7 (2012) 6. Hamza, S., Ben Ayed, Y.: Biometric individual identification system based on the ECG signal. In: Abraham, A., Siarry, P., Ma, K., Kaklauskas, A. (eds.) ISDA 2019. AISC, vol. 1181, pp. 416–425. Springer, Cham (2021).https://doi.org/10.1007/9783-030-49342-4 40 7. Irvine, J.M., Wiederhold, B.K., et al.: Heart rate variability: a new biometric for human identification. In: Proceedings of the International Conference on Artificial Intelligence, pp. 1106–1111 (2001) 8. Wang, Y., Agrafioti, F., et al.: Analysis of human electrocardiogram for biometric recognition. EURASIP J. Adv. Signal Process. 148658, 1–11 (2008) 9. Dar, M.N., Akram, M.U., et al.: ECG biometric identification for general population using multiresolution analysis of DWT based features. In: Second International Conference on Information Security and Cyber Forensics (InfoSec), pp. 5–10 (2015) 10. Hanil¸ci, A., G¨ urkan, H.: ECG biometric identification method based on parallel 2-D convolutional neural networks. J. Innov. Sci. Eng. 3(1), 11–22 (2019) 11. Alotaiby, T.N., Alshebeili, S.A., et al.: ECG based subject identification using common spatial pattern and SVM. J. Sens. 2019, 1–9 (2019) 12. Zhang, Q., Zhou, D., et al.: HeartID: a multiresolution convolutional neural network for ECG based biometric human identification in smart health applications. IEEE Access 5, 11805–11816 (2017) 13. Sedghamiz, H.: BioSigKit: a matlab toolbox and interface for analysis of biosignals. J. Open Sour. Softw. 3(30), 671 (2018) 14. Hamza, S., Benayed, Y.: SVM for human identification using the ECG signal. Intell. Procedia Comput. Sci. 176, 430–439 (2020) 15. Al Dujaili, M.J., Ebrahimi-Moghadam, A., et al.: Speech emotion recognition based on SVM and KNN classifications fusion. Int. J. Electr. Comput. Eng. (IJECE), 11(2), 1259–1264 (2021) 16. Pelletier, C.: Classification des sons respiratoires en vue d’une detection automatique des sibilants. PhD thesis, Universit´e Du Quebec (2006) 17. Mohammad-Djafari, A.: Entropie en traitement du signal. Traitement du Signal 15(6), 545–551 (1998) 18. Benouis, M., Mostefai, L., et al.: ECG based biometric identification using onedimensional local difference pattern. Biomed. Signal Process. Control 64, 102226 (2021) 19. Hasan, M., Boris, F.: SVM: machines a vecteurs de support ou s´eparateurs a vastes marges. Survey, Versailles St. Quentin, vol. 64 (2006)
Semantic Segmentation of Dog’s Femur and Acetabulum Bones with Deep Transfer Learning in X-Ray Images D. E. Moreira da Silva1 , Vitor Filipe1,2 , Pedro Franco-Gon¸calo3 , ario Ginja5 , and Lio Gon¸calves1,2(B) Bruno Cola¸co4 , Sofia Alves-Pimenta4 , M´ 1
School of Science and Technology, University of Tr´ as-os-Montes e Alto Douro (UTAD), 5000-801 Vila Real, Portugal [email protected], {vfilipe,lgoncalv}@utad.pt 2 INESC Technology and Science (INESC TEC), 4200-465 Porto, Portugal 3 Department of Veterinary Science, Veterinary and Animal Research Centre (CECAV), UTAD, Vila Real, Portugal [email protected] 4 Department of Animal Science, Centre for the Research and Technology of Agro-Environmental and Biological Sciences (CITAB) and CECAV, UTAD, Vila Real, Portugal {bcolaco,salves}@utad.pt 5 Department of Veterinary Science, CITAB and CECAV, UTAD, Vila Real, Portugal [email protected]
Abstract. Hip dysplasia is a genetic disease that causes the laxity of the hip joint and is one of the most common skeletal diseases found in dogs. Diagnosis is performed through an X-ray analysis by a specialist and the only way to reduce the incidence of this condition is through selective breeding. Thus, there is a need for an automated tool that can assist the specialist in diagnosis. In this article, our objective is to develop models that allow segmentation of the femur and acetabulum, serving as a foundation for future solutions for the automated detection of hip dysplasia. The studied models present state-of-the-art results, reaching dice scores of 0.98 for the femur and 0.93 for the acetabulum.
Keywords: Dog hip dysplasia segmentation
· X-ray · Deep learning · Semantic
This work was financed by project Dys4Vet (POCI-01-0247-FEDER-046914), cofinanced the European Regional Development Fund (ERDF) through COMPETE2020 - the Operational Programme for Competitiveness and Internationalisation (OPCI). The authors are also grateful for all the conditions made available by FCT- Portuguese Foundation for Science and Technology, under the projects UIDB/04033/2020, UIDB/CVT/00772/2020 and Scientific Employment Stimulus-Institutional CallCEECINST/00127/2018 UTAD. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 461–475, 2022. https://doi.org/10.1007/978-3-030-96308-8_43
462
1
D. E. M. da Silva et al.
Introduction
Canine hip dysplasia (CHD) is an inherited disease that manifests itself in the malformation of the hip joint [1]. This condition starts as dogs grow, leading to instability and laxity of the hip joint [2]. In Fig. 1(a) (normal hip) the head of the femur is perfectly adapted to the acetabulum, while in 1(b) (abnormal hip) this does not happen. This radiographic feature is an important aspect taken into account in CHD scoring systems and is known as congruence [3]. This condition mainly affects large dogs and has high debilitating consequences [4]. Moreover, Richardson [5] states that CHD is one of the most common skeletal diseases encountered in veterinary medicine, accounting for 29% of orthopaedic cases. According to the literature, the best way to reduce the incidence of CHD is to carry out selective breeding [2,6]. In other words, individuals who present signs of CHD should be removed from breeding as early as possible. The diagnosis is usually carried out by radiographic imaging (X-ray) analysis, using a scoring scheme, such as the F´ed´eration Cynologique Internationale’s (FCI), which proposes the grades scoring system from A (normal hip joint) to E (severe hip dysplasia) [6]. However, X-ray images are intricate by nature, as they often present noise, overlapping tissues and organs on the bone structure, and low contrast [7]. In the gap between the femur head and the acetabulum, these challenges are more accentuated [8]. As such, the assigned FCI grading to an X-ray is many times subjective, being heavily dependent on the skills of the scrutineers [6]. Therefore, it highlights the need for a complete automated solution that supports veterinary medicine professionals in diagnostics. One of the most common and crucial approaches in medical imaging is image semantic segmentation [9]: isolating the objects (i.e. the femur and acetabulum) from the background. With the present study, we aim to develop a model that can achieve state-of-the-art and human-like segmentations. These segmentations would allow future research to build a new fully automated system on top of the developed model to identify dysplastic cases in X-rays.
(a) Healthy hip joint.
(b) Hip joint with severe dysplasia.
Fig. 1. Comparison of a normal hip joint (a) and an hip joint with a severe case of dysplasia (b). Femur contours are denoted in green, and acetabulum contours in blue.
Semantic Segmentation of Dog’s Femur and Acetabulum Bones
463
The rest of this paper attends the following structure: Literature Review (Sect. 2); Methods (Sect. 3); Experimental Evaluation and Results (Sect. 4); Conclusions (Sect. 5).
2
Literature Review
The democratization of technology allowed the development of computer-aided detection systems (CAD) for medical imaging to improve diagnosis. However, in the past, these systems were misleading with low accuracy. The advent of deep learning (DL) algorithms allowed us to overcome these issues, yielding human-like accuracy and reliability in many medical imaging tasks [10]. In this section, we perform a literature review of the advances in DL, specifically image segmentation, the use of these techniques applied to X-ray images, as well as the use of DL in related CHD work. 2.1
Segmentation
Image segmentation aims to provide pixel-wise classification to isolate the objects from the background, and it is considered one of the most challenging tasks in computer vision [7]. Some of the easiest and classical segmentation methods are thresholding, region-based and edge-based. However, most of these methods often fail to generalize due to the diverse and heterogeneous nature of the data. As such, research has shifted its focus into developing DL models, as the previous showed far superior performance in many visual tasks. Shelhamer et al. [11] used a deep CNN for feature extraction, adding a final deconvolutional layer that upsamples the encoded image back to the original size to produce pixel-wise label predictions. However, this upsampling process generates coarse segmentations because pooling operations from the encoder discard the spatial information. To contend with this effect, the authors use some skip connections from higher resolution feature maps. This network is fully convolutional, and since then, it has influenced many other works of literature. One of the most notorious is UNet [12]. This network uses an encoder-decoder symmetric architecture (Fig. 2). The encoder path successively down-samples the image through convolutional layers and pooling layers, while the decoder path successively upsamples the image. The main reason for the success of this network was the introduction of skip connections between the encoder and decoder layers, which allowed combining (through concatenation) high-resolution features of the encoder path with the upsampled output. This combination of high-level and low-level yields high accuracy classification while retaining high localization, crucial in biomedical applications [9].
464
D. E. M. da Silva et al.
Fig. 2. Original U-Net architecture. Source: [12]
2.2
Bone Segmentation in X-Ray
According to Siddique et al. [9] survey only ≈3.9% of biomedical segmentation papers address bone X-ray imaging, despite being the dominant tool of CHD diagnosis. In addition, most of the literature we found focuses on human bone segmentation. Nonetheless, it is still important to mention such works. Ouertani et al. [13] introduced a dual-edge detection algorithm to extract the contour of the femur head and the acetabulum arch. Yet, it required a priori form knowledge, thus not being an end-to-end automated segmentation system. Hussain et al. [14] performed the manual extraction of seven features maps from dual-energy X-ray absorptiometry images, performing the pixel-wise classification (segmentation) through a decision tree. The authors achieved a 91.4% femur segmentation accuracy, significantly higher than the 84.8% obtained with an artificial neural network. However, this comparison suffers from the fact that deep convolutional neural networks (CNN) were not used, even though CNNs have seen tremendous success in medical image applications, even with low amounts of data, and have been one of the main reasons for the emergence of DL [15]. U-Net has been used and modified in several works focused on biomedical semantic segmentation. For instance, Bullock et al. [16] proposed a modified U-Net, that uses multistage down-sampling and up-sampling blocks for entire human body X-ray segmentation. The authors state that this network is suitable for applications where the available data is scarce, which is often the case in medical applications [9,12], and have achieved a dice score of 0.9. On a similar note, Ding et al. [17] proposed a lightweight multiscale U-Net, by reducing the number of downsampling and upsampling operations, for human hand bone segmentation. Their best dice score was 0.931. Similar to the present work, Lianghui et al. [8], also using a U-Net, addressed femur segmentation by removing pooling layers and introducing Batch Normalization (BN) layers [18], achieving a dice score of 0.966. In recent work, following a similar path, Shen et al. [19] introduced
Semantic Segmentation of Dog’s Femur and Acetabulum Bones
465
a novel U-Net architecture by removing pooling layers and introducing residual connections for the segmentation of the femur and tibia. However, cutting out the pooling layers is a bottleneck when it comes to having deeper neural networks. As we are not reducing the image size at each layer, we quickly run into memory allocation issues. The authors achieve a dice score of 0.973 and compare it with a classical U-Net which only reaches a dice score of 0.918. We decide to benchmark our models against these results for the femur segmentation task. Regarding the acetabulum segmentation task, we could not establish a baseline performance, due to the lack of literature addressing acetabulum segmentation in X-ray imaging. However, this highlights the need for such work. 2.3
Related Canine Hip Dysplasia Work
Searching for works that directly address the problem of CHD, whether it is through segmentation or direct classification, we found less literature work. McEvoy et al. [20] applied a two-step transfer-learning approach using a model pre-trained on a large public dataset and adapting its prior knowledge to a specific domain. First, they trained a YoloV3 Tiny CNN [21] to predict two bounding boxes per image, one for each hip joint. This step allowed the authors to isolate the two regions of interest (ROI) in each image. This first model achieved decent detections, with an intersection over union (IOU) score of 0.85. With the cropped regions (output of the previous model), the authors trained another YoloV3 Tiny CNN [21] to perform binary classification: dysplasia/no dysplasia. This two-stage approach reduces unnecessary convolutions when classifying an image by allowing the model to focus on feature extraction of the ROI. However, their model only achieved 0.53 sensitivity in the dysplasia class (true positive rate), meaning that their classifier struggled to identify dysplastic joints. Likewise, Gomes et al. [22] used the pre-trained Inception-v3 [23] model and tried to predict, using the entire image, if a dog was dysplastic or not. However, their model generalized poorly to the test images. Even though it correctly identified 83% of the dysplastic cases, it had many false positives, with a specificity of 0.66. The last two approaches tried to predict dysplastic conditions through convolutions (for feature extraction) and fully connected layers (for class prediction). However, such models often are called a black-box [24], as their predictions might be hard to explain. We believe that precise segmentation of the femur and acetabulum is an indispensable step for further feature analysis. For instance, it has been studied that simple features like the acetabular area occupied by the femoral head can be associated with the FCI grading system [unpublished work]. This technique of numeric feature analysis would allow for a precise, objective and explainable diagnosis.
3
Methods
U-Net is the most used architecture for segmentation in the literature [9]. As such, we decide to build upon it. To compare results with U-Net, we decided to build an alternative model, Feature Pyramid Network (FPN) [25].
466
3.1
D. E. M. da Silva et al.
Dataset
We collected 138 DICOM images from Veterinary Teaching Hospital of University of Tr´ as-os-Montes and Alto Douro, where about 70% of the hips were normal or near normal and the remaining 30% dysplastic. Then, we carried out manual segmentation of every X-ray using the open-source polygonal annotation tool LabelMe [26]. With this tool, we annotated two regions for each image: the femur and the acetabulum. Each generated mask is a three-channel image, where each channel contains a binary mask for a particular label (including the background). Figure 3 exhibits an example image and its corresponding generated mask.
(a) Original DICOM Image.
(b) Generated 3-channel ground truth mask (colourized). For a comprehensive overview, refer to Figure 6.
Fig. 3. Sample data.
3.2
Transfer Learning and Architectures
Both U-Net and FPN follow the fully convolutional encoder-decoder architecture. One of the advantages is that it allows changing one of the parts independently. We propose swapping the entire encoder path for EfficientNet [27] modules. EfficientNet is a state-of-the-art CNN that has been trained on ImageNet to classify 1000 different classes. This network has seven convolutional modules, and we integrate these pre-trained modules to serve as feature extractors. The use of transfer learning is beneficial due to the low amount of data we have, as the pre-trained CNN has already learned to extract many low-level features that generalize well to different domains. Furthermore, EfficientNet has eight different variants, ranging from B0 to B7, with the lower offering more speed and the higher offering more performance. We opt to use the B5 variant, as it showed to provide the best performance without overfitting in our experiments. Unlike the original U-Net, our architecture is asymmetric (Fig. 4).
Semantic Segmentation of Dog’s Femur and Acetabulum Bones
467
To further prevent overfitting, we use a smaller number of decoder blocks. The first decoder block (DB1) has 256 filters, and each subsequent block has the number of filters reduced by half. Likewise, our FPN uses the same seven encoder blocks and five decoding blocks (Fig. 5), with a fixed number of 256 filters.
Fig. 4. Proposed U-Net architecture.
Fig. 5. Proposed FPN architecture.
468
D. E. M. da Silva et al.
Another important aspect is the fact that our architectures use the sigmoid activation function to produce segmentation maps, instead of the classical softmax. In our images, the femur head and the acetabulum overlap. As such, the class predictions must be interdependent. Figure 6 provides a comprehensive overview of the segmentation masks structure.
Acetabulum 0 0 0 0 00 0 0 0 1 1 00 0 0 0 1 0 00 0 0 0 1 0 0 0 1 0 0
0 0 0 Background
Height
0 1 1 1 1 1
1 1 1 1 1
1 0 0 0 0
0 0 1 1 1
1 0
0 0 0 1 0
Femur
10 10 10 1 1
0 0 0 0 0 0
Overlapping region
Width Fig. 6. Mask structure (simplified).
3.3
Loss Function and Metrics
To evaluate our model’s performance we use Dice coefficient, one of the most common metrics in semantic segmentation [9], which measures how similar two images are. This metric has the following range: [0, 1] (higher indicates more similarity). It is calculated according to (1): Dice =
2 × |GT ∩ P S| |GT | + |P S|
(1)
where GT : Ground Truth PS : Predicted Segmentation Additionally, we use another common metric, IOU score, which measures the overlap between the ground truth and the predicted mask. It is calculated according to (2): GT ∩ P S (2) IOU = GT ∪ P S
Semantic Segmentation of Dog’s Femur and Acetabulum Bones
469
where GT : Ground Truth PS : Predicted Segmentation As a loss function, we choose to optimize the dice score. This can be achieved by using the following formula (3): Ldice = 1 − dice
(3)
However, our classes are highly imbalanced, as noticed in Fig. 3(b). If we solely used Ldice , the background and femur class would dominate the acetabulum class, and our model would only optimize these dominant classes. As such, we introduce a second loss function to penalize the dominant classes, Focal Loss [28] (4): Lf ocal = −GT α(1 − P S)γ log(P S) (4) − (1 − GT )αP S γ log(1 − P S) where GT : PS : γ: α:
Ground Truth Predicted Segmentation 2 0.25
Focal Loss has proven to be a prominent tool against highly imbalanced data, such as our case. It works by down-weighting easy-to-classify samples and giving a higher weight to hard-to-classify samples. Therefore, preventing the model from focusing too much on dominant classes. Our final loss function (5) a linear combination of the two aforementioned functions. It allows dice coefficient optimization with an imbalance constraint. L = Ldice + Lf ocal
4 4.1
(5)
Experimental Evaluation and Results Experimental Settings
In this section, we briefly present the pre-processing steps, as well as the data augmentation procedures. Moreover, we provide details about our experimental setup. First, all DICOM images were converted to PNG format and reduced to size 576 × 448 through cropping operations. Data augmentation procedures introduce artificially generated data from the original data, increasing the variability of the data. This technique is widely used and plays a vital role in low data amount cases, such as the present study, preventing model overfitting. Generally, augmentations are applied a priori. However, segmentation models are memory intensive, and to avoid the additional
470
D. E. M. da Silva et al. Table 1. Data augmentation transformations. Transformation
Description
p (Probability)
Image invert
Subtract pixel values from 255
0.5
Horizontal flip
Flips the image around the y-axis
0.5
Random rotate
Rotates the image between [−5◦ , 5◦ ]
0.5
Random shift
Shifts the image by a factor between [−0.1, 0.1]
0.5
Random contrast
Changes the image contrast by a factor between [−0.2, 0.2]
0.5
Random brightness Changes the image brightness by a factor between [−0.2, 0.2]
0.5
memory overhead, we decided to use on-the-fly data augmentations. Before feeding it to the model, the training data goes through a pipeline that applies transformations based on probabilistic laws. Table 1 describes the transformations present in our pipeline. We implemented all of our practical experiments using Keras with Tensorflow and CUDA 11.2 backend. Our system has an NVIDIA RTX 3090, with 24 GB of VRAM, for parallel computing. 4.2
Femur and Acetabulum Segmentation
To train and evaluate the proposed models, we split our data according to a standard 70-15-15 split: 70% for training, 15% for validation and another 15% for testing. To train our models, we used the Adam optimizer with a learning rate of 0.001 and a small batch size of 4 due to our limited system VRAM. It is also worth noting that we freeze all the weights of the layers of the EfficientNet modules. The initial random weights of the decoder modules cause large gradient updates, which would destroy the pre-learned features of the EfficientNet modules. If the dice score is not improving after eight epochs, we reduce the learning rate by a factor of 0.1 to improve convergence. Additionally, we implemented early stopping to prevent overtraining - we halt the training if the dice score does not improve after ten epochs. As such, we set the number of epochs to a very high value of 500. U-Net stopped its training at epoch 39, with a wall time of 10 m 19 s, while FPN stopped at epoch 26, with a wall time of 7 m 35 s. In Fig. 7 and 8 we exhibit the dice score curves for our models, respectively. As noticeable, FPN presented a steadier training, starting to converge earlier than the U-Net. Furthermore, in U-Net it took longer for early stop to kick in, reaching a dice score of 0.9691 for train data, and 0.9493 for validation data. On the other hand, FPN reached a dice score of 0.9699 for train data, and 0.9482 for validation data. In short, both models achieved the same performance level, but FPN was faster to train. Next, we used the test data to evaluate the model
Semantic Segmentation of Dog’s Femur and Acetabulum Bones
471
U-Net - Wall time 10m 19s 1
Dice Score
0.8 0.6 0.4 0.2
U-Net Dice Train U-Net Dice Validation 0
10
20 Epoch
30
40
Fig. 7. U-Net dice score curves.
FPN - Wall time 7m 35s 1
Dice Score
0.8 0.6 0.4 0.2
FPN Dice Train FPN Dice Validation 0
5
10
15 Epoch
20
25
Fig. 8. FPN dice score curves.
generalization capability. However, to provide more realistic and non-inflated results, we calculated each metric individually for the femur and the acetabulum, discarding the background class. Tables 2 reveals our results.
472
D. E. M. da Silva et al. Table 2. Test set results. Model Dice score IOU Score Femur Acetabulum Femur Acetabulum U-Net 0.9775 FPN
0.9340
0.9803 0.9145
(a) Ground Truth Femurs.
(c) Ground Truth Acetabulums.
0.9560
0.8777
0.9613 0.8434
(b) Predicted Femurs.
(d) Predicted Acetabulums.
Fig. 9. Comparison between the ground truth and the predictions made by U-Net in a test image.
As noticeable, FPN has slightly better femur segmentations, superior to the ones we found in the literature. However, U-Net presents considerably better results for the acetabulum, the most challenging region due to its reduced area. This difference is explainable by the fact that U-Net has skip connections in earlier EfficientNet modules, introducing higher resolution features, which allowed it to preserve a higher amount of spatial information [12]. Additionally, we
Semantic Segmentation of Dog’s Femur and Acetabulum Bones
(a) Predicted Femurs.
473
(b) Predicted Acetabulums.
Fig. 10. Predictions made by U-Net highlighted in the original test image.
measured the average prediction wall time for each model. In this aspect, UNet is slightly faster, with 215.82 ms, while FPN took an average of 307.15 ms. However, this difference does not seem to be impactful in biomedical applications, where precision is prioritized over speed. Figure 9 illustrates the segmentation maps, split by labels, output by the model (U-Net) for a particular test image. This model can segment the different regions with a very granular level of detail, as the predicted masks are nearly identical to the ground truth. In addition, in Fig. 10 we show the same segmented regions highlighted in the original image. These visually demonstrate how highly effective our model is.
5
Conclusions
Our goal with this work is to define a basis and a starting point for future research regarding the development of automated end-to-end systems for reliable detection of CHD. Such a system could be a powerful ally in preventing and decreasing the occurrence of CHD, a very debilitating condition. We achieved and pushed state-of-the-art segmentation results, with a training time of only 10 m 19 s, for the highest performing model, as opposed to several hours, as often described in the literature. With our transfer-learning approach, U-Net provides highly-accurate segmentations, even for the acetabulum, the smallest and hardest region. In greater detail, this model was able to segment the femur regions with a dice score of 0.98 and the acetabulum regions with a dice score of 0.93.
474
D. E. M. da Silva et al.
References 1. Ginja, M.M., Silvestre, A.M., Gonzalo-Orden, J.M., Ferreira, A.J.: Diagnosis, genetic control and preventive management of canine hip dysplasia: a review, pp. 269–276, June 2010. https://pubmed.ncbi.nlm.nih.gov/19428274/ 2. Fries, C.L., Remedios, A.M.: The pathogenesis and diagnosis of canine hip dysplasia: a review, pp. 494–502 (1995) 3. Ginja, M.M., et al.: Hip dysplasia in Estrela mountain dogs: prevalence and genetic trends 1991-2005. Vet. J. 182(2), 275–282 (2009) https://pubmed.ncbi.nlm.nih. gov/18722145/ 4. Ginja, M.M., et al.: Early hip laxity examination in predicting moderate and severe hip dysplasia in Estrela mountain dog. J. Small Anim. Pract. 49(12), 641–646 (2008) 5. Richardson, D.C.: The role of nutrition in canine hip dysplasia, pp. 529–540 (1992) 6. Fl¨ uckiger, M.: Scoring radiographs for canine hip dysplasia - the big three organizations in the world. Eur. J. Companion Anim. Pract. 17, 135–140 (2007). Table 1. www.fci.org 7. Shah, R., Sharma, P.: Bone segmentation from X-ray images: challenges and techniques. In: Bhateja, V., Nguyen, B., Nguyen, N., Satapathy, S., Le, D.N. (eds.) Information Systems Design and Intelligent Applications. AISC, vol. 672, pp. 853– 862. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-7512-4 84 8. Lianghui, F., Gang, H.J., Yang, J., Bin, Y.: Femur segmentation in X-ray image based on improved U-Net. In: IOP Conference Series: Materials Science and Engineering, vol. 533, no. 1 (2019) 9. Siddique, N., Paheding, S., Elkin, C.P., Devabhaktuni, V.: U-Net and its variants for medical image segmentation: a review of theory and applications. IEEE Access (2021). https://arxiv.org/abs/2011.01118v1 10. Kim, M., et al.: Deep learning in medical imaging, pp. 657–668 (2019). /pmc/articles/PMC6945006//pmc/articles/PMC6945006/?report=abstract. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6945006/ 11. Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2015). https://arxiv.org/abs/1411.4038v2 12. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W., Frangi, A. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi. org/10.1007/978-3-319-24574-4 28. https://arxiv.org/abs/1505.04597v1 13. Ouertani, F., Vazquez, C., Cresson, T., De Guise, J.: Simultaneous extraction of two adjacent bony structures in X-ray images: application to hip joint segmentation. In: Proceedings - International Conference on Image Processing, ICIP, vol. 2015, pp. 4555–4559. IEEE Computer Society, December 2015 14. Hussain, D., Al-Antari, M.A., Al-Masni, M.A., Han, S.M., Kim, T.S.: Femur segmentation in DXA imaging using a machine learning decision tree. J. X-Ray Sci. Technol. 26(5), 727–746 (2018). https://www.researchgate.net/publication/ 326597757 15. Xu, W., He, J, Shu, Y., Zheng, H.: Advances in convolutional neural networks. In: Advances and Applications in Deep Learning. IntechOpen, October 2020. https:// www.intechopen.com/chapters/73604 16. Bullock, J., Cuesta-Lazaro, C., Quera-Bofarull, A.: XNet: a convolutional neural network (CNN) implementation for medical X-ray image segmentation suitable for small datasets, p. 69 (2019)
Semantic Segmentation of Dog’s Femur and Acetabulum Bones
475
17. Ding, L., Zhao, K., Zhang, X., Wang, X., Zhang, J.: A lightweight U-Net architecture multi-scale convolutional network for pediatric hand bone segmentation in X-ray image. IEEE Access 7, 68436–68445 (2019) 18. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: 32nd International Conference on Machine Learning, ICML 2015, vol. 1, pp. 448–456. PMLR, June 2015. http://proceedings. mlr.press/v37/ioffe15.html 19. Shen, W., et al.: Automatic segmentation of the femur and tibia bones from Xray images based on pure dilated residual U-Net. Inverse Probl. Imaging (2020). https://www.aimsciences.org/article/doi/10.3934/ipi.2020057 20. McEvoy, F.J., et al.: Deep transfer learning can be used for the detection of hip joints in pelvis radiographs and the classification of their hip dysplasia status. Vet. Radiol. Ultrasound 62, no. 4, pp. 387–393, July 2021. https://onlinelibrary. wiley.com/doi/full/10.1111/vru.12968. https://onlinelibrary.wiley.com/doi/abs/ 10.1111/vru.12968. https://onlinelibrary.wiley.com/doi/10.1111/vru.12968 21. Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement, April 2018. https://arxiv.org/abs/1804.02767v1 22. Gomes, D.A., Alves-Pimenta, S., Ginja, M.M., Filipe, V.: Predicting canine hip dysplasia in X-ray images using deep learning. In: International Conference on Optimization, Learning Algorithms and Applications, Bragan¸ca, Portugal, pp. 1–8 (2021) 23. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016, pp. 2818–2826. IEEE Computer Society, December 2016. https://arxiv.org/abs/1512. 00567v3 24. Buhrmester, V., M¨ unch, D., Arens, M.: Analysis of explainers of black box deep neural networks for computer vision: a survey (2019). https://arxiv.org/abs/1911. 12116v1 25. Lin, T.-Y., Doll´ ar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection, December 2016. https://arxiv.org/ abs/1612.03144v2 26. Wada, K.: Labelme: image polygonal annotation with Python (2016). https:// github.com/wkentaro/labelme 27. Tan, M., Le, Q.V.: EfficientNet: rethinking model scaling for convolutional neural networks. In: 36th International Conference on Machine Learning, ICML 2019, vol. 2019. International Machine Learning Society (IMLS), pp. 10691–10700, May 2019. https://arxiv.org/abs/1905.11946v5 28. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 318–327 (2020). https:// arxiv.org/abs/1708.02002v2
Automatic Microservices Identification from Association Rules of Business Process Malak Saidi1 , Mohamed Daoud2 , Anis Tissaoui3(B) , Abdelouahed Sabri4 , Djamal Benslimane2 , and Sami Faiz5 1
National School for Computer Science, Manouba, Tunisia 2 University Lyon 1, Villeurbanne, France [email protected] 3 VPNC Lab, University of Jendouba, Jendouba, Tunisia [email protected] 4 Dhar Mahraz, University of Sciences, Fes, Morocco 5 ISAMM, University of Manouba, Manouba, Tunisia [email protected]
Abstract. Compared to monolithic systems, microservice-oriented architectures is becoming an architectural style that is gaining more and more popularity whether in academia or the industrial world. Microservices emerged as a solution to breaking down monolithic applications into small, self-contained, highly cohesive, and loosely coupled services. However, identifying microservices remains a major challenge that could compromise the success and importance of this migration. In this article, we have proposed an association rules-based architecture to automatically identify microservices from a business process. Indeed, in this approach we have exploited the association analysis method to discover the hidden relationships between the attributes of each activity and consequently the activities which will share the same attributes will be classified in the same microservices. A case study on a bicycle rental system is adopted with the aim of illustrating and demonstrating our approach. Keywords: Monolithic systems Association rules
1
· Microservices · Business process ·
Introduction
The business life cycle is characterized today by increasingly frequent phases of change induced by a continuous search for competitiveness [8,9]. As a result, the fundamental stake in each company is to ensure the control of its evolution and to quickly adjust its information system and its business process to bring them into conformity with their working practices (taking into account of new customer needs and new regulations. . . .). Thus, the business process represents the organized set of activities and software, hardware and human resources [6], it is considered the central element c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 476–487, 2022. https://doi.org/10.1007/978-3-030-96308-8_44
Microservices Identification
477
and the backbone of every company. Suddenly, any economic success depends on the ability of its business process to easily integrate changes imposed by the environment. However, this BP is represented as a monolithic system. In other words, it is formed as a single block with components that are interconnected and interdependent [11] rather than flexibly associated, which will make their maintenances and upgrades too slow and inefficient. Unlike monoliths [10], microservice-based systems [2,7] have emerged in order to allow the company to react more quickly to new demands and to avoid an endless development process over several years. Indeed, this new approach aims to break down the system into small autonomous services, each one carries out its business process autonomously [1,3] and each microservice has its own database. One of the challenges today is finding the right granularity and cohesion of microservices, both when starting a new project and when thinking about maintenance, the evolution and scaling of an existing system. To deal with these problems, we aim in this paper to propose an approach, which aims to identify microservices based on the dependence linked to the activities of our BP in terms of control and in terms of correlation between attributes. Each activity will be described by a set of artifacts and attributes. Therefore, in order to guarantee a weak coupling as well as a high cohesion we will decompose our space so that each microservice has its own space and its own operations. In other words, microservices partition the activities of our BP into disjoint sets so that the operations of each microservice can directly access only its variables. In this regard, we adopted the technique of association rules, which is widely used in datamining. Through this famous technique, we will identify the interesting correlations between the different attributes of our system and subsequently the activities that will share data, we will classify them together and at the end, we will have a local database per microservice and in this way we have minimized the interactions between the different components.
2
Related Work
Over monolithic systems, microservices have become the software architecture of choice for business applications. Initially originating from Netflix and Amazon, they resulted from the need to partition both software development teams and runtime components to respectively promote agile development and scalability. Currently, there are a large number of monolithic applications that are migrating to a microservices architecture. In [4], authors presented an approach for rebuilding an integration platform that is based on an SOA architecture to a new microservices-oriented platform. This new platform overcomes the gaps related to the number of messages that must be processed as well as the number of new integrations that must be supported. It offers flexibility and agility in the maintenance, scalability and deployment of software products.
478
M. Saidi et al.
Since enterprise developers are faced with the challenges of maintaining and evolving large applications, in [5] Escobar and al. proposed a model-centric approach to analyze and visualize the current application structure and the dependencies between business capacity and data capacity. This approach allows developers to decompose the JEE application into microservices through diagrams resulting from the analysis of data belonging to each EJB (Enterprise Java Beans) via the clustering technique. In [1], authors have proposed a method of identifying microservices that decomposes a system using the clustering technique. To this end, they modeled a system as a set of business processes and they took into account two aspects of structural dependency and data dependency. In addition, the authors conducted a study to assess the effect of the process characteristics on the accuracy of identification approaches. Their approach is essentially based on three steps: First, a relation TP which shows the structural dependence of activities within a business process. Then a relation TD is defined to show the dependency of the activities according to their used data objects and finally they aggregated these two relations to define the final relation T. Amiri and al. were the first to work on identifying microservices from a set of BPs. However, they did not model the control dependency by taking into account the different types of logical connectors (Xor, or..). Recently, the approach of Daoud and al.[3] was proposed to address the limitations of the approach of amiri et al. already mentioned in their work. The main goal of the approach is to automatically identify microservices based on two types of dependencies (control and data) using collaborative clustering. To do this Daoud et al. proposed formulas for calculating direct and indirect control dependencies as well as proposed two strategies for calculating data dependency. Then they used a collaborative clustering algorithm to automatically extract candidate microservices. Although both works in [1,3] discuss control and data dependencies between activities, they did not take into account the correlation between attributes in order to reduce interactions and define a database per microservice. Concerning the data partitioning problem, the K-means clustering is one of the simplest and popular unsupervised machine learning algorithms. The authors in [12] presented an approach which describes a global K-means algorithm. this incremental approach makes it possible to add one center cluster at a time through a deterministic global search technique made up of N execution of the K-means algorithm starting from the appropriate initial position.
Microservices Identification
3 3.1
Our Approach for Identifying Microservices Case Study
See Table 1. Table 1. Bicing’s BPs as a set of activities, artefacts and attributes Activity Artefact
Attributes
a1
Bike
bike id, bike status
User
user id
Bike
bike status
User
user destination, user id
Bike
bike id, bike status, anchor point
User
user id
Bike
bike status, anchor point
User
user id, user history
Bike
bike id
Broken
problem id
a6
Bike
bike status
a7
Bike
bike status, anchor point
a2 a3 a4 a5
Repairment repair id, agree repair a8
User
user id, user history, user validity
a9
User
user id, user status
a10
Bike
bike id, bike status
a11
User
user id, authenticate
a12
User
user id, user credit, user destination
a13
User
user history
Validity
validity id
a14
User
user id, user history, user validity, user credit
a15
User
user history, user status
Validity
validity id
a16
Bike
bike id, bike status
a17
Bike
bike id,
Rental
rent id, duration rent
Booking
booking id
a18
Rental
rent id, duration rent, rent cost
a19
Notification notification id
a20
Invoice
invoice id,
Rental
rent id, rent cost
User
user id
Rental
rent id, duration rent, rent cost
Booking
booking id
Payment
payment status
a21
479
480
3.2
M. Saidi et al.
Foundations
A business process is defined as a set of interrelated activities to collectively achieve a business objective by defining roles and functional interactions within an organizational structure. These activities, which are interrelated represent a strong dependence in term of the logical relations which control the routing and the order of execution of these activities and in term of association between attributes. Indeed, to migrate to an architecture based on microservices, we must guarantee a strong cohesion and a loose coupling. The association rules technique will allow us to identify weak associations and strong associations. In other words, we will determine the dependence between the activities via the correlation between the different attributes of our system. Indeed, the relationships discovered can be represented in the form of association rules or a set of frequent items. Association Rules Modeling Let I = (i1, i2, ... in) be a set of binary attributes distinct from the database, and A = (a1, a2, .. an) a set of activities. An activity being a subset of items I such as A⊆ I. Let D be the database containing all the activities. Each activity a is represented by a binary vector with a[i] = 1 if the activity shares the attribute, else a[i] = 0. A non-empty subset X = {i1, i2, ...} of A is called itemsets and we denote it by I. The length of I is given by the value of k corresponds to the number of items contained in X, we denote it: K-itemsets. An association rule is a 2-tuple (X, Y) itemsets of A representing an implication of the form X → Y with X ⊂ I, Y ⊂ I and such that X ∩ Y = ∅. It is generally expressed by: if (x1, x2, .... xn) then (y1, y2, .. yn). In the two parts premise {X}and the conclusion {X}, we find a conjunction of items in the logical sense. Given the set of activities A, we are interested in generating all rules that satisfy certain constraints such as support and confidence. Through this dimension, each microservice has its own database in order to minimize communication. Suddenly activities that share attributes, most likely will be classified in the same microservice. Our approach described in Fig. 1 is composed of three essential steps which are: Dependency examination, which aims to extract the association rules dependencies. In the second step, we will generate the dependency matrices and finally, we will be based on clustering algorithm in order to identify the microservices.
Microservices Identification
481
Fig. 1. Our architecture
3.3
Association Rules Analysis
Binary Representation: This phase consists in selecting the data (attributes and activities) of our system useful for the extraction of the association rules and transforming this data into an extraction context. Indeed, the data of our BP can be represented as a two-dimensional Boolean matrix. This context, or dataset, is a triplet T = (A, I, R) in which “A” is a set of activities, “I” is a set of attributes, also called items, and R is a binary relation between A and I. In such representation, each tuple represents a transaction while the different fields correspond to the items included in the transaction. We denote by “0” the absence of the attribute and by “1” its presence in the activity. Extraction of Frequent Itemsets: The Extraction of frequent patterns is considered as our first step to generate association rules. It allows us to extract the context of the set of binary attributes (I). This step takes as input a database and minimal support to give as output a set of frequent items with their supports. First, we calculate the item support and remove those that do not reach the minimum support. Then, we calculate the support of the itemset of level (n + 1) and we remove those which have a support less than a minimal support. Therefore, the frequent patterns are calculated iteratively in ascending order according to their sizes. The following table describes the association rules generated from the activities a1, a2 and a3 (Table 2).
482
M. Saidi et al. Table 2. Frequent itemset and support Frequent itemset
Support
{Bike status, Bike id}
0.19
{User id, Bike id}
0.09
{User id, Bike status}
0.19
{user destination, User id}
0.09
{Anchor point, Bike id}
0.04
{Anchor point, Bike status}
0.14
{User id, Bike status, Bike id}
0.09
{User id, Bike status, Bike id}
0.09
{Anchor point, Bike status, Bike id}
0.04
{Anchor point, User id, Bike status, Bike id} 0.04
Association Rules Generation: After the calculation of the frequent items, in this step we will calculate the confidence of each association rule which results from it and we keep only those which satisfy our confidence criterion (in our case, the minimum confidence threshold is equal to 0.5). More confidence is high, better is the association rule (Table 3). Table 3. Generation of association rules Frequent itemsets
Association rules
Confidence
{Bike status, anchor point, user id}
Bike status → anchor point, user id
0.25
{Bike id, bike status, anchor point, user id}
Anchor point → bike status, user id
0.66
User id → bike status, anchor point
0.2
Bike status, anchor point → user id
0.66
Bike status, user id → anchor point
0.5
Anchor point, user id → bike status
1
Bike id → bike status, anchor point, user id 0.16 bike status → Bike id, anchor point, user id 0.12 anchor point → Bike id, bike status, user id 0.33 user id → bike status, anchor point, Bike id 0.1 Bike id, bike status → anchor point, user id 0.25 Bike id, anchor point → bike status, user id 1 Bike id, user id → bike status, anchor point 0.5 bike status, anchor point → Bike id, user id 0.33 bike status, user id → anchor point, Bike id 0.25 anchor point, user id → bike status, Bike id 0.5 Bike id, bike status, anchor point → user id 1 Bike id, bike status, user id → anchor point 0.5 Bike id, user id, anchor point → bike status 1 Bike status, user id, anchor point → bike id 0.5 (continued)
Microservices Identification
483
Table 3. (continued) Frequent itemsets
Association rules
Confidence
{Bike id, bike status, anchor point}
Bike id → bike status, anchor point
0.16
Bike status → Bike id, anchor point
0.12
Anchor point → Bike id, Bike status 0.33
{Bike id, bike status} {Bike id, user id} {Bike status, user id} {Bike id, bike status, user id}
{User id, user destination} {Bike id, anchor point} {Bike status, anchor point}
Bike id, bike status → anchor point
0.25
Bike id, anchor point → bike status
1
bike status, anchor point → Bike id
0.33
Bike id → bike status
0.6
Bike status → bike id
0.5
Bike id → user id
0.33
User id → bike id
0.2
Bike status → user id
0.5
User id → bike status
0.4
Bike id → bike status, user id
0.33
bike status → Bike id, user id
0.25
user id → Bike id, bike status
0.2
Bike id, bike status → user id
0.5
Bike id, user id → bike status
1
bike status, user id → Bike id
0.5
User id → user destination
0.2
User destination → user id
1
Bike id → anchor point
0.16
Anchor point → bike id
0.33
Bike status → anchor point
0.37
Anchor point → bike status
1
The Table 4 describes the association rules generated from the activities a1, a2 and a3. To evaluate the effectiveness and to appreciate more finely the generated rules we will measure the LIFT which quantifies the mutual dependence between the attributes in the antecedent and the consequence of the rule (Tables 5 and 6). Table 4. Lift calculation Association rules
Confidence LIFT
Bike id → bike status
0.6
1.57
Bike status → bike id
0.5
1.78
Bike status → user id
0.5
1.06
Bike id, bike status → user id
0.5
1.06
Bike id, user id → bike status
1
2.63
bike status, user id → Bike id
0.5
1.78
User destination → user id
1
2.12
Anchor point → bike status
1
2.63
Bike id, anchor point → bike status
1
2.63 (continued)
484
M. Saidi et al. Table 4. (continued) Association rules
Confidence LIFT
anchor point → Bike status, user id
0.66
Bike status, anchor point → user id
0.66
1.14
Bike status, user id → anchor point
0.5
3.57
anchor point, user id → Bike status
3.47
1
2.63
Bike id, anchor point → bike status, user id 1
5.26
Bike id, user id → bike status, anchor point 0.5
3.57
anchor point, user id → bike status, Bike id 0.5
2.63
Bike id, bike status, anchor point → user id 1
2.12
Bike id, bike status, user id → anchor point 0.5
1.06
Bike id, user id, anchor point → bike status 1
2.63
Bike status, user id, anchor point → bike id 0.5
1.78
Intermediate Matrices and Dependancy Measure: After the generation of the association rules and the calculation of the Lift, in this step we aim to determine our dependency matrix. To do this we will start by calculating the intermediate matrices. Therefore, we will go through the list of rules with their lift and for each rule, we compute the initial dependence value for each pair of activity (ai, aj). If ai ∩ aj= {RG1 ..RGn }, then (ai, aj) = Lift * P; where P is the total number of rule items; else (ai, aj) = 0 Thus, for N rules we will have N intermediate matrices. Once these matrices are generated, we move to calculate our dependency matrix. Indeed the following equation calculates D(ai, aj) as follows: n (Lif t ∗ P ) i=1
As an example we have calculated the intermediate matrices for the two first rules described in the Table 4. Rule1 = Bike id → bike status Table 5. Matrix 1 a1
a2 a3
a1 –
0
K * 1.57
a2 0
–
0
a3 K * 1.57 0
–
Rule2 = Bike status → bike id Table 6. Matrix 2 a1
a2 a3
a1 –
0
K * 1.78
a2 0
–
0
a3 K * 1.78 0
–
Microservices Identification
4
Implementation
Algorithm 1: Dependancy measure Input: RG , Lif t, P Output: Dai ,aj 1 begin 2 if ai ∩ aj= {RG1 ..RGn } then 3 foreach RG ∈ {RG1 ..RGn } do 4 for i=1 to n do 5 for j=1 to n do 6 C[i] [j] ← P× Lift(RG ); 7 8
else C[i] [j] ← 0 ;
12
return C for i=1 to n do for j=1 to n do N D[i] [j] ← L=1 c [i] [j] ;
13
return Dai ,aj
9 10 11
Algorithm 2: Frequent itemset generation Input: Binary database B, minimal support minsupp Output: Frequent itemset Li 1 begin 2 Li =∅; i=0 3 C1 = The condidate itemsets with size 1 in B 4 L1 = The frequent itemsets of C1 5 if Li+1 = not empty then 6 Ci+1 = Candidate-gen (Li); 7 Li+1 = frequent itemset of Ci+1 ; 8 i++ 9
return ∪ Li
Algorithm 3: Association rules generation Input: itemF Output: RG 1 begin 2 RG ← AprioriGen (itemF ) 3 return RG
485
486
5
M. Saidi et al.
Conclusion
The automatic identification of microservices is used with the goal of migrating monolithic systems to microservices oriented architectures composed of finegrained, cohesive and weakly coupled services. We have proposed in this article an approach based on association rules and which treat BP as input. Indeed, we went through four essential steps: First, we generated the frequent items from the binary representation of our BP. Second, we generated the association rules and for each rule, we calculated the Lift. Third, we have calculated the intermediate matrices and we have proposed a formula for the calculation of the final matrix. Finally, the final matrix is used as an input for K-means algorithm to identify microservice candidates. In terms of future work, we aim to improve our contribution and compare what we have done with existing work and we aim afterwards to deal with other types of input, for example configurable bp and its specificities.
References 1. Amiri, M.J.: Object-aware identification of microservices. In: 2018 IEEE International Conference on Services Computing (SCC), pp. 253–256. IEEE (2018) 2. Chen, R., Li, S., Li, Z.: From monolith to microservices: a dataflow-driven approach. In: 201724th Asia-Pacific Software Engineering Conference (APSEC), pp. 466–475. IEEE (2017) 3. Daoud, M., Mezouari, A.E., Faci, N., Benslimane, D., Maamar, Z., Fazziki, A.E.: Automatic microservices identification from a set of business processes. In: Hamlich, M., Bellatreche, L., Mondal, A., Ordonez, C. (eds.) SADASC 2020. CCIS, vol. 1207, pp. 299–315. Springer, Cham (2020). https://doi.org/10.1007/978-3-03045183-7 23 4. Djogic, E., Ribic, S., Donko, D.: Monolithic to microservices redesign of event driven inte-gration platform. In: 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pp. 1411–1414. IEEE (2018) 5. Escobar, D., et al.: Towards the understanding and evolution of monolithic applications as microservices. In: 2016XLII Latin American Computing Conference (CLEI), pp. 1–11. IEEE (2016) 6. Ferchichi, A., Bourey, J.P., Bigand, M.: Contribution ` a l’integration des processus metier: application a la mise en place d’un referentiel qualite multi-vues. Ph.D. thesis, Ecole Centralede Lille; Ecole Centrale Paris (2008) 7. Indrasiri, K., Siriwardena, P.: Microservices for the Enterprise. Apress, Berkeley (2018) 8. Kherbouche, M.O., Bouneffa, M., Ahmad, A., Basson, H.: Analyse a priori de l’impact duchangement des processus m´etiers. In: INFORSID (2013) 9. Kherbouche, M.O.: Contribution ` a la gestion de l’´evolution des processus m´etiers. Universit´edu Littoral Cˆ ote d’Opale (2013) 10. Ponce, F., M´ arquez, G., Astudillo, H.: Migrating from monolithic architecture to microservices: a rapid review. In: 2019 38th International Conference of the Chilean Computer Science Society (SCCC), pp. 1–7. IEEE (2019)
Microservices Identification
487
11. Richardson, C.: Pattern: monolithic architecture. Dosegljivo (2018). https:// microservices.io/pattern-s/monolithic.html 12. Likas, A., Vlassis, N., Verbeek, J.J.: The global k-means clustering algorithm. Pattern Recognit. 36(2), 451–461 (2003)
Toward a Configurable Thing Composition Language for the SIoT Soura Boulaares1(B) , Salma Sassi2 , Djamal Benslimane3 , Zakaria Maamar4 , and Sami Faiz5 1
2
National School for Computer Science, Manouba, Tunisia Faculty of law, Economic, and Management Sciences, Jendouba, Tunisia 3 Claude Bernard Lyon 1 University, Lyon, France 4 Zayed University, Dubai, UAE 5 Higher Institute of Multimedia Arts, Manouba, Tunisia
Abstract. With the advent of the Social Internet-of-Things (SIoT), which builds upon the success of IoT, different IoT devices engage in relationships to achieve common goals calling for a composition language that would dictate who will do what, when, and where. Since the number of service interactions may become overwhelming the composition plan may have many alternatives. For that reason, variability is an important concern for IoT and SIoT. A configurable composition model could provide a consolidated view of multiple variants of composition plans. It promotes the reuse of proven practices by providing analysts with a generic modeling approach from which to derive individual composition plan models. Unfortunately, the scope of existing notations for configurable composition modeling is restricted or even non-existent in the IoT domain. In this paper, we propose a variability awareness IoT service Composition Framework. The new Framework handles both configurable and classical IoT service composition. We represent the new primitives, the configurable composition language, and algorithms of customization. Finally we demonstrate the features of the framework through a proof of concept application.
Keywords: SIoT
1
· Variability · IoT Composition · Configuration
Introduction
Thanks to the latest Information and Communication Technology (ICT) progress, not only humans engage in social relationships, but components like Internet-of-Things (IoT) devices [3] and social Web services [9] do. The former components exemplify the result of blending social computing with IoT leading to what the ICT community refers to as Social Internet-of-Things (SIoT). Unlike traditional business systems, IoT services will require the interaction of billions of services as the number of connected objects (and therefore services) increases c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 488–497, 2022. https://doi.org/10.1007/978-3-030-96308-8_45
Toward a Configurable Thing Composition Language for the SIoT
489
rapidly. The new approaches on Social IoT, studied the concept of If This Then That (IFTTT) [1,5,10] the example of application where the service operation are controlled by a unique service. Recently, studies on the SIoT [6,7] focused on the programmatic representation of the inter-thing relationships. The inter-thing relationship framework broadens the social IoT thing-level relationships with service-level relationships that logically and functionally show how the things’ services may tie to build applications.SIoT things [2]are characterised by their complexity, heterogeneity, variety and scalability in multiple dimensions. Moreover, in a real-world situation, one or more of such heterogeneous devices need to be orchestrated in a specific sequence to achieve a specific objective. Taking into consideration that the service composition (SC) [6] enables the interaction between objects. Most of the time, multiple composition plans can represent the same goal using related IoT services that looks the same. As a consequence, a SIoT Composition Framework must facilitate the “Principle of Reuse” for modeling, re-designing the variable composition rather than designing it “from scratch”. The variability has been modeled in the context of configurable Business Process cBP which reflects inclusive representations of multiple variants of an applied BP in a given domain and recently in cBP/IoT [11,13,13]. Their model [8,12] handle commonalities between variants which are captured by mandatory elements that cannot be removed or customized while variability is captured by optional elements and by those that can be instantiated multiple times. The questions are: how is it possible to represent SIoT composition plans, which are similar in many ways and at the same time they are different in other ways, into one single reference model? And how to achieve the customization process of this reference model. Our work focuses on adapting the configurable cBP approach to handle the variability of SIoT composition to support flexibility and re-usability. The rest of the paper is structured as follows. In Sect. 2 we present the related works, in Sect. 3. We represent our atlas+ language, in Sect. 4 we validate our approach by the proposed algorithms. And we conclude our work in the last section. 1.1
Motivation Scenario
Our illustrative scenario concerns the general IoT composition where several composition plans could be selected to establish an IoT service composed from multiple composite services Fig. 1. As illustrated in Fig. 1 the composition plans1..n the variants that represent the same action specification. Each of which handle common and variable parts. The common parts consists of permanent IoT services that are essential for the composition completeness. In the other side, the variable parts are optional IoT services that could be present in some variant and absent in other variants. This large number of possible variants for the same composition plan accentuate the challenges of re-usability, efficiency and quality of composition. For the
490
S. Boulaares et al.
Fig. 1. Commonality and variability in IoT CP
sake of reuse we aim to propose a language that could fuse all the possible variant in one composition reference model. This is handled through a composition configuration language.
2
State of the Art
In the field of SIoT/IoT composition and cBP domain many approaches proposed new solution for service relationship representation, composition languages and variability. Khaled et al.[7] proposed an inter-thing relationships programming framework as basis for a distributed programming ecosystem for SIoT services. Authors in [3] proposed a paradigm of a social network of intelligent objects called SIoT to mimic human behavior. If This Then That (IFTTT) [1,5,10] is a web service that allows users to connect to various Internet services by creating rules (called recipes). The World Wide Web Consortium (W3C) Web of Things (WoT) framework [4,14] is an active research area that explores accessing and managing digital representations of objects through a set of web services. These services are based on eventcondition-action rules that involve these virtual representations as proxies for physical entities. Suri et al. [13] proposed an approach to integrate IoT perspective in the BPM domain and to support the development of IoT-Aware CPMs. They provided a novel concepts to enable configurable IoT resource allocation in CPMs. Despite that some works handled variability in IoT while adapting existing approaches for cBP for example none of them has offered a language specifically to IoT domain. As a consequence in the next section we will present our proposition of the configurable IoT language.
3 3.1
Configurable Composition Language: Atlas+ An Overview of the Atlas+ Architecture Components:
Figure 2 depicts the Atlas+ which is an extension of Atlas SIoT Framework [7]. Atlas+ architecture is composed from three levels; Configurable Thing service level, configurable Thing relationship level and configurable thing recipe level.
Toward a Configurable Thing Composition Language for the SIoT
491
Fig. 2. Atlas+ architecture
3.2
Primitives and Operator: The Variation Points
In this section we presents the different variation points. Theses primitives and operator that we propose are an extension of [7]. A) Primitive 1 – Configurable Thing Service:cTS is an abstraction of a service offered by an Atlas thing. A cTS is also a Thing Service TS that has attributes (SpaceId,name,..) and an interface (Input, Output). In our case a cTS (Eq. 1) could be optional i.e. configured then to be activated or deactivated in the composition plan. cT S =< {Attributes, Interf ace} , State ∈ {conf igurable, classic} >
(1)
B) Primitive 2 −Configurable Thing Relationship: cTR is an abstraction of a configurable connection between two or more TSs and cTSs to compose the IoT application. As a classic Atlas relationship our cTR (Eq. 2) is composed from Attributes and an Interface. However its formula is specified differently according to each cTR’s type. cT R =< T Si , cT Sj >
(2)
C) Primitive 3 –Configurable Recipe:cR A cR (Eq. 3) is an abstraction of how different TSs, cTS, TR and cTR are composed to build up a segment of an application. i.e. the IoT app composition plan is a sequence of one or more recipe. Additionally, each recipe is composed of the two members; attributes and interface. cR =< cT Ri , T Rj , cT Sk , T Sh >
(3)
D) Operator – Configurable Evaluate: cEvaluate Accepts either TS/cTSs or TR/cTR and triggers the interface member defined in the corresponding
492
S. Boulaares et al.
object (Eq. 4). Each primitive is evaluated approprially; the TS/cTS is evaluated by sending announcements to the thing that offers the service and return the result. The TR/cTR is evaluated by accessing the formula of the interface and evaluating each TS/cTS in the input and returning the result. cEvaluate(prim), whereprim ∈ {T R, T S, cT R, cT S} = accessT S | cT S.interf ace if prim is ∈ {T R, T S, cT R, cT S} cEvaluate T Ss and cT Ss in cT R f orumla if prim is cT R
3.3
(4)
Configurable Thing Relationship Formalisation:cTR
In this section, we present the different types of relationships and the corresponding formula defined in the configurable TR interface member. These relations are extensions of the relations presented in [7] in order to support the variability. A relationship type can be configurable cooperative or configurable competitive. In this paper we discuss only the cooperative family(control/controlled by, drive/drive-by, support/supported-by and extend/extended-By), other types will be treated later. – Configurable Relation 1 - Control/Controlled by: evaluates TSb /cTSb (Eq. 5) if the evaluation of TSa /cTSa logically results in condition C. Condition C reflects either the successful evaluation of TSa /cTSa or the output of the evaluation of cTSb /TSb is mathematically comparable to an input value. The Control/Controlled by relation is said to be configurable if the relation contains in its specification at least one TS service configurable in the action or precondition part. We say that TSa /cTSa controls cTSb /TSb . A Control/Controlled by configurable relationship can be extended to replace T Sa (Eq. 5) (the precondition) by two or more services accumulated by configurable logic connectors cAND, cOR, cXOR accumulated by configurable operators. For the action part of the relation the control can be extended to evaluate two more alternatives. C
cEvaluate | Evaluate(T Sa ) → cEvaluate | Evaluate(T Sb ) Successf ul call f or T Sa interf ace C= T Sa.output ◦ v, ◦ ∈ {=, ≡, >, IG(Z|Y ).
Enhanced Prediction of Chronic Kidney Disease
531
2.3 Adaptive Boosting Technique Boosting is a popular ensemble learning technique that builds a robust classifier from multiple weak classifiers. Firstly, a training algorithm is selected to build a model from the training set. Another model is trained to correct the classification errors in the previous model. This iteration continues until the training data is correctly classified or a suitable specified stopping criterion is met. Adaptive boosting or AdaBoost is a technique used for training a boosted classifier, and it was first developed by Freund and Schapire [23]. Assuming S = {(x1 , y1 ), . . . , (xi , yi ), . . . , (xn , yn )} represents the training dataset, where yi denotes the class label corresponding to sample xi , and yi ∈ Y = {−1, +1}. Let t = {1, . . . , T } represent the number of iterations, while ht (x) denote the weak classifiers trained using the base ML algorithm L. In the AdaBoost implementation, weights are assigned to each sample in S at every iteration. The sample weight D1 and weight update Dt+1 are obtained using: D1 (i) = Dt+1 (i) =
1 , i = 1, 2, . . . , n n
Dt (i) exp(−αt yi ht (xi )), i = 1, 2, . . . , n Zt
(4) (5)
where Zt is a normalization factor and αt denotes the weight of the classifier ht (x): n Zt = Dt (i)exp(−αt yi ht (xi )) (6) i=1
1 − εt 1 αt = ln 2 εt
(7)
where εt is the classifier’s error rate. The classifier weight αt estimates the importance of ht (x) when computing the final strong classifier. Furthermore, in the AdaBoost algorithm, the examples that are misclassified in ht (x) are given higher weights in t + 1 iteration. Meanwhile, the objective of the base classifiers is to minimize the error rate εt : n εt = P ht (xi ) = yi = Di (i)I ht (xi ) = yi (8) i=1
Lastly, after the specified number of iterations, the strong classifier is obtained as: T (9) H (x) = sign αt ht (x) t=1
Algorithm 1 provides a summary of the steps followed in implementing the AdaBoost classifier.
532
I. D. Mienye et al. Algorithm 1: AdaBoost Algorithm Input: training data = {( , ), … , ( , the number of iterations . Initialize the weight of the sample
:
), … , (
,
)}, base ML algorithm , and
( ) = , = 1,2, … , .
for = 1, … , : 1. Apply base ML algorithm and current weight to fit the weak classifier ℎ ( ) by minimizing the error rate . according to Eq. 7. 2. Compute the classifier weight 3. Update the weight of the samples according to Eq. 5 end for combine the outputs of the weak classifiers using Eq. 9 to obtain the strong classifier. Output: The final strong classifier ( ).
In this study, three boosted classifiers are developed using three base learners L, i.e. LR, decision tree, and SVM. For simplicity, these boosted classifiers are called AdaBoost-LR, AdaBoost-SVM, and AdaBoost-DT, respectively.
3 Results and Discussion The CKD prediction models developed in this study include the following: AdaBoostLR, AdaBoost-SVM, and AdaBoost-DT, and these boosted classifiers are benchmarked against the standard LR, SVM, and DT. Meanwhile, accuracy, precision, sensitivity, and F-measure are employed to effectively present the results and conduct performance comparisons. These metrics are calculated using the following formulas: TP + TN TP + TN + FP + FN
(10)
Sensitivity =
TP TP + FN
(11)
Precision =
TP TP + FP
(12)
2 × precision × recall precision + recall
(13)
Accuracy =
Fmeasure =
where TP and TN represent the true position and true negative, while FP and FN denote the false positive and false negative predictions [24, 25]. In the context of this study, true positive indicates the correctly predicted CKD samples, and false-positive shows the NOTCKD samples that are incorrectly classified. Furthermore, true negative shows the correctly classified NOTCKD examples, and false-negative indicates the incorrectly classified CKD samples. After applying the IG based feature selection technique on the CKD dataset containing 24 independent variables, the ranked features are visualized in Fig. 1.
Enhanced Prediction of Chronic Kidney Disease
533
Fig. 1. IG Feature ranking
We set the threshold at feature ‘pe’, and all the features with information gain lower than it are discarded. Therefore, the reduced feature set contains the following attributes: ‘al’, ‘hemo’, ‘pcv’, ‘rc’, ‘sc’, ‘bgr’, ‘sg’, ‘bu’, ‘sod’, ‘wc’, ‘htn’, ‘pot’, ‘age’, ‘pc’, ‘dm’, ‘bp’, ‘rbc’, ‘pe’. Meanwhile, to assess the significance of the ranked features, we conducted two experiments. Firstly, the selected algorithms are utilized to build models using the complete feature set, and the various model performances are recorded in Table 2. Secondly, the reduced feature set is employed in training the models and their performance tabulated in Table 3. Table 2. Performance evaluation of the classifiers using the complete feature set Algorithm
Accuracy
Precision
Sensitivity
F-Measure
LR
0.940
0.951
0.936
0.943
SVM
0.927
0.934
0.930
0.932
DT
0.931
0.947
0.900
0.922
AdaBoost-LR
0.970
0.978
0.983
0.981
AdaBoost-SVM
0.986
0.991
0.970
0.980
AdaBoost-DT
0.984
0979
0.970
0.974
The results in Tables 2 and 3 show that the feature selection enhanced the classification performance of the various algorithms. Also, in both experimental conditions, the boosted algorithms outperformed their standard counterparts. For example, using the complete feature set, the decision tree and AdaBoost-DT obtained accuracies of 93.1% and 98.4%, respectively, whereas, when trained with the reduced feature set, the decision tree obtained an accuracy of 94% and the AdaBoost-DT achieved an accuracy of 100%. We can draw two inferences from the experiments; firstly, the information gain based feature selection enhanced the classification performance of the various classifiers. Secondly, the boosted classifiers outperformed their standard versions. Therefore,
534
I. D. Mienye et al. Table 3. Performance evaluation of the classifiers using the reduced feature set
Algorithm
Accuracy
Precision
Sensitivity
F-Measure
LR
0.961
0.949
0.975
0.962
SVM
0.947
0.950
0.945
0.947
DT
0.940
0.951
0.930
0.940
AdaBoost-LR
0.992
0.987
0.990
0.988
AdaBoost-SVM
1.000
0.992
1.000
0.996
AdaBoost-DT
1.000
1.000
1.000
1.000
the combination of the IG-based feature selection and AdaBoost algorithm significantly improved CKD prediction. Figure 2 shows the receiver operating characteristic curves (ROC) for the various classifiers to further validate the enhanced performance. The corresponding area under the ROC curve (AUC) values of the classifiers are also shown. The ROC curve is a plot of the true positive rate against the false-positive rate, and it shows the diagnostic ability of a classifier [26]. Meanwhile, the AUC indicates a measure of separability, i.e. it shows a classifier’s capability to distinguish between the CKD and NOTCKD classes. The higher the AUC value, the better the classifier predicts CKD as CKD and NOTCKD as NOTCKD.
Fig. 2. ROC curve of the various classifiers
Figure 2 further demonstrates the performance of the boosted classifiers over the other methods. It is observed that the boosted classifiers obtained better performance compared to the individual classifiers. Notably, the AdaBoost-SVM and AdaBoost-DT got perfect AUC values, which indicates that both models can perfectly distinguish between the sick and healthy patients in the test set. Furthermore, it is observed that the AdaBoost-DT obtained the best performance and is benchmarked against other chronic
Enhanced Prediction of Chronic Kidney Disease
535
kidney disease prediction models in recent literature. The methods include an integrated model which combines random forest (RF) and logistic regression [8], a composite hypercube on iterated random projection (CHIRP) technique [27], an improved extreme gradient boosting (XGBoost) method [28], a cost-sensitive random forest [29], and a linear support vector machine (LSVM) combined with SMOTE [9]. The performance comparison is shown in Table 4. Table 4. Performance comparison with other methods in recent literature Reference
Method
Accuracy
Precision
Sensitivity
F-Measure
Qin et al. [8]
RF + LR
0.998
–
0.998
0.998
Khan et al. [27]
CHIRP method
0.997
0.998
0.998
0.998
Ogunleye and Wang [28]
XGBoost
1.000
1.000
1.000
1.000
Mienye and Sun [29] Cost-sensitive RF
0.986
0.990
1.000
0.995
Chittora et al. [9]
LSVM + SMOTE
0.988
0.966
1.000
0.983
This paper
AdaBoost-DT
1.000
1.000
1.000
1.000
From Table 4, it is observed that the approach proposed in this study achieved excellent performance compared to other state-of-the-art CKD prediction methods in the literature. Furthermore, this research has shown the importance of effective feature learning combined with robust boosted classifiers for the efficient prediction of CKD.
4 Conclusion In this paper, an efficient chronic kidney disease prediction approach is presented. The method employs the information gain technique for feature selection, which ranked the CKD features. The least informative features are discarded, and only the relevant features were used in building the CKD prediction model. Secondly, six machine learning models were developed using the LR, SVM, DT, AdaBoost-LR, AdaBoost-SVM, and AdaBoostDT algorithms. The models were trained using both the complete feature set and the reduced feature set. Meanwhile, the boosted decision tree (AdaBoost-DT) achieved the best performance with a value of 1.000 in all the four performance evaluation metrics, i.e. accuracy, precision, sensitivity, and F-measure. Compared to existing CKD prediction models in the literature, our approach achieved excellent performance. Furthermore, recent advances in deep learning-based image recognition could help build efficient CKD detection models. Therefore, future research could use CKD image data in building ML models since deep learning has shown great potential in medical image feature extraction and classification.
References 1. Forbes, A., Gallagher, H.: Chronic kidney disease in adults: assessment and management. Clin. Med. 20(2), 128–132 (2020). https://doi.org/10.7861/clinmed.cg.20.2
536
I. D. Mienye et al.
2. Elshahat, S., Cockwell, P., Maxwell, A.P., Griffin, M., O’Brien, T., O’Neill, C.: The impact of chronic kidney disease on developed countries from a health economics perspective: a systematic scoping review. PLoS ONE 15(3), e0230512 (2020). https://doi.org/10.1371/jou rnal.pone.0230512 3. Wilson, S., Mone, P., Jankauskas, S.S., Gambardella, J., Santulli, G.: Chronic kidney disease: definition, updated epidemiology, staging, and mechanisms of increased cardiovascular risk. J. Clin. Hypertens (Greenwich) 23(4), 831–834 (2021). https://doi.org/10.1111/jch.14186 4. Ontiveros-Robles, E., Castillo, O., Melin, P.: Towards asymmetric uncertainty modeling in designing general type-2 Fuzzy classifiers for medical diagnosis. Exp. Syst. Appl. 183, 115370 (2021). https://doi.org/10.1016/j.eswa.2021.115370 5. Melin, P., Sánchez, D.: Optimal design of type-2 fuzzy systems for diabetes classification based on genetic algorithms. Int. J. Hybrid Intell. Syst. 17(1–2), 15–32 (2021). https://doi.org/10.3233/HIS-210004 6. Carvajal, O., Melin, P., Miramontes, I., Prado-Arechiga, G.: Optimal design of a general type-2 fuzzy classifier for the pulse level and its hardware implementation. Eng. Appl. Artif. Intell. 97, 104069 (2021). https://doi.org/10.1016/j.engappai.2020.104069 7. Mienye, I.D., Sun, Y.: Improved heart disease prediction using particle swarm optimization based stacked sparse autoencoder. Electronics 10(19), 2347 (2021). https://doi.org/10.3390/ electronics10192347 8. Qin, J., Chen, L., Liu, Y., Liu, C., Feng, C., Chen, B.: A machine learning methodology for diagnosing chronic kidney disease. IEEE Access 8, 20991–21002 (2020). https://doi.org/10. 1109/ACCESS.2019.2963053 9. Chittora, P., et al.: Prediction of chronic kidney disease - a machine learning perspective. IEEE Access 9, 17312–17334 (2021). https://doi.org/10.1109/ACCESS.2021.3053763 10. Almustafa, K.M.: Prediction of chronic kidney disease using different classification algorithms. Inf. Med. Unlocked 24, 100631 (2021). https://doi.org/10.1016/j.imu.2021.100631 11. Bakshi, G., et al.: An Optimized Approach for Feature Extraction in Multi-Relational Statistical Learning. JSIR 80(06) (2021). http://nopr.niscair.res.in/handle/123456789/57918. Accessed 17 Nov. 2021 12. Singh, H., Rehman, T.B., Gangadhar, C., Anand, R., Sindhwani, N., Babu, M.V.S.: Accuracy detection of coronary artery disease using machine learning algorithms. Appl. Nanosci. (2021). https://doi.org/10.1007/s13204-021-02036-7 13. Mienye, I.D., Ainah, P.K., Emmanuel, I.D., Esenogho, E.: Sparse noise minimization in image classification using genetic algorithm and DenseNet. In: 2021 Conference on Information Communications Technology and Society (ICTAS), pp. 103–108 (2021). https://doi.org/10. 1109/ICTAS50802.2021.9395014. 14. Sindhwani, N., Anand, R., Shukla, M.S,R., Yadav, M., Yadav, V.: Performance analysis of deep neural networks using computer vision. EAI Endorsed Trans. Ind. Netw. Intell. Syst. 8(29) (2021). https://eudl.eu/doi/10.4108/eai.13-10-2021.171318. Accessed 17 Nov 2021 15. Awan, S.E., Bennamoun, M., Sohel, F., Sanfilippo, F.M., Chow, B.J., Dwivedi, G.: Feature selection and transformation by machine learning reduce variable numbers and improve prediction for heart failure readmission or death. PLoS ONE 14(6), e0218760 (2019). https:// doi.org/10.1371/journal.pone.0218760 16. Mienye, I.D., Sun, Y., Wang, Z.: Improved predictive sparse decomposition method with densenet for prediction of lung cancer. Int. J. Comput. 1, 533–541 (2020). https://doi.org/10. 47839/ijc.19.4.1986 17. Shahraki, A., Abbasi, M., Haugen, Ø.: Boosting algorithms for network intrusion detection: a comparative evaluation of Real AdaBoost, Gentle AdaBoost and modest AdaBoost. Eng. Appl. Artif. Intell. 94, 103770 (2020). https://doi.org/10.1016/j.engappai.2020.103770
Enhanced Prediction of Chronic Kidney Disease
537
18. Mienye, I.D., Sun, Y., Wang, Z.: An improved ensemble learning approach for the prediction of heart disease risk. Inf. Med. Unlocked 20, 100402 (2020). https://doi.org/10.1016/j.imu. 2020.100402 19. UCI Machine Learning Repository: Chronic_Kidney_Disease Data Set. https://archive.ics. uci.edu/ml/datasets/Chronic_Kidney_Disease. Accessed 20 Jul 2021 20. Han, J., Kamber, M., Pei, J.: Data preprocessing. In: Data Mining, pp. 83–124. Elsevier (2012). https://doi.org/10.1016/B978-0-12-381479-1.00003-4 21. Gao, Z., Xu, Y., Meng, F., Qi, F., Lin, Z.: Improved information gain-based feature selection for text categorization. In: 2014 4th International Conference on Wireless Communications, Vehicular Technology, Information Theory and Aerospace Electronic Systems (VITAE) (2014), pp. 1–5. https://doi.org/10.1109/VITAE.2014.6934421. 22. Witten, I.H., Frank, E., Hall, M.A.: Data transformations. In: Data Mining: Practical Machine Learning Tools and Techniques, pp. 305–349. Elsevier (2011). https://doi.org/10.1016/B9780-12-374856-0.00007-9 23. Schapire, R.E.: A brief introduction to boosting. Ijcai 99, 1401–1406 (1999) 24. Aruleba, K., Obaido, G., Ogbuokiri, B., Fadaka, A.O., Klein, A., Adekiya, T.A., Aruleba, R.T.: Applications of computational methods in biomedical breast cancer imaging diagnostics: a review. J. Imag. 6(10), 105 (2020). https://doi.org/10.3390/jimaging6100105 25. Mienye, I.D., Sun, Y., Wang, Z.: Improved sparse autoencoder based artificial neural network approach for prediction of heart disease. Inf. Med. Unlocked 18, 100307 (2020). https://doi. org/10.1016/j.imu.2020.100307 26. Ebiaredoh-Mienye, S.A., Esenogho, E., Swart, T.G.: Integrating enhanced sparse autoencoder-based artificial neural network technique and softmax regression for medical diagnosis. Electronics 9(11), 1963 (2020). https://doi.org/10.3390/electronics9111963 27. Khan, B., Naseem, R., Muhammad, F., Abbas, G., Kim, S.: An empirical evaluation of machine learning techniques for chronic kidney disease prophecy. IEEE Access 8, 55012–55022 (2020). https://doi.org/10.1109/ACCESS.2020.2981689 28. Ogunleye, A., Wang, Q.-G.: XGBoost model for chronic kidney disease diagnosis. IEEE/ACM Trans. Comput. Biol. Bioinf. 17(6), 2131–2140 (2020). https://doi.org/10.1109/TCBB.2019. 2911071 29. Mienye, I.D., Sun, Y.: Performance analysis of cost-sensitive learning methods with application to imbalanced medical data. Inf. Med. Unlocked 25, 100690 (2021). https://doi.org/10. 1016/j.imu.2021.100690
An Adaptive-Backstepping Digital Twin-Based Approach for Bearing Crack Size Identification Using Acoustic Emission Signals Farzin Piltan and Jong-Myon Kim(B) Department of Electrical, Electronics and Computer Engineering, University of Ulsan, Ulsan 680-749, South Korea [email protected]
Abstract. Bearings are used to reduce inertia in numerous utilizations. Lately, anomaly detection and identification in the bearing using acoustic emission signals has received attention. In this work, the combination of the machine learning and adaptive-backstepping digital twin approach is recommended for bearing anomaly size identification. The proposed adaptive-backstepping digital twin has two main ingredients. First, the acoustic emission signal in healthy conditions is modeled using the fuzzy Gaussian process regression procedure. After that, the acoustic emission signals in unknown conditions are observed using the adaptivebackstepping approach. Furthermore, the combination of adaptive-backstepping digital twin and support vector machine is proposed for the decision-making portion. The Ulsan Industrial Artificial Intelligence (UIAI) Lab dataset is used to test the effectiveness of the proposed scheme. The result shows the accuracy of the fault diagnosis by the proposed adaptive-backstepping digital twin approach is 96.85%. Keywords: Bearing · Backstepping · Fault analysis · Digital twin · Neural network · Fuzzy technique · Support vector machine · Gaussian process regression
1 Introduction The bearing has many applications in various industries. These components play an important role in reducing friction. Due to the widespread use of these components, it is particularly significant to investigate the faults created in these components [1]. In recent years, various methods for fault analysis in the bearings have been investigated, which can be divided into the following groups: model-based methods, artificial intelligence approach, signal-based approach, and hybrid schemes. Recently, a lot of attention has been paid to hybrid techniques for fault diagnosis. These techniques will be able to increase the reliability and accuracy of fault identification with the help of a combination of previous procedures [2, 3]. Digital twins are one of the emerging techniques that can be used in anomaly detection. These methods provide the possibility of various and much more detailed analyzes © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 538–547, 2022. https://doi.org/10.1007/978-3-030-96308-8_50
An Adaptive-Backstepping Digital Twin-Based Approach
539
by designing a digital model of the system. Digital twins have different parts, but the most considerable section is the modeling unit. Various techniques can be selected for system modeling, including mathematical and data-driven approaches. Recently, many articles represented data-driven methods for system modeling. The basis of most data-driven modeling techniques is regression. Linear regressors can provide acceptable results for modeling stationary signals, but they have many challenges for non-stationary signals such as vibration or acoustic emission (AE) bearing signals [4, 5]. In this work nonlinear regressor is suggested for AE signal modeling the bearing. To solve the challenge of robustness, reliability, and accuracy for bearing anomaly detection, observation approaches are proposed [6]. Estimators can be divided into linear and nonlinear groups. For AE signals due to the high volume of samples per unit time, linear estimators will not have a good result. Thus, nonlinear approaches such as feedback linearization, sliding mode, fuzzy logic, neural network, and backstepping observers are recommended. The main limitations of feedback linearization observers are the system’s model dependency and robustness in unknown and uncertain conditions. Moreover, the chattering phenomenon is the main limitation in sliding mode observers. Furthermore, reliability is the main challenge of fuzzy and neural network observers [7–10]. The application of fuzzy logic approach for fault classification of bearing is presented in [11]. Besides, to improve the reliability of fuzzy technique the global fuzzy entropy with fuzzy based classifier is explained in [12]. In this work the backstepping observer is recommended to improve the reliability and robustness of the signal estimation. In this research, an adaptive-backstepping digital twin is recommended for signal modeling and estimation. The support vector machine (SVM) is recommended for fault decision and signal classification. This work has two contributions: • Design an adaptive-backstepping digital twin, which is a combination of fuzzyGaussian regression for modeling and fuzzy adaptive neuro-backstepping observer for AE signal estimation. So, first, the AE signals are modeled by fuzzy gaussian process technique. After that, the digital twin is designed using the combination of backstepping observer, neural network, and adaptive fuzzy technique. Next, the AE residual signal is generated. • Combination of adaptive-backstepping digital twin and SVM for fault decision. The division of this article is as follows. The dataset is described in Sect. 2. The combination of the proposed adaptive-backstepping digital twin and support vector machine is represented in the Sect. 3. The fourth part has analyzed the results. Finally, in the Sect. 5, the conclusions and future works are presented.
2 Dataset Figure 1 illustrates an Ulsan Industrial Artificial Intelligence (UILI) Lab testbed for simulation of the bearing’s faults. This testbed has the following parts: a) a three-phase induction motor, b) a gearbox to transfer the load to the shaft, and c) acoustic emission sensors for data collection. In this paper, 3 mm and 6 mm crack sizes in the bearing are tested. Furthermore, 8 different conditions to test the bearing are introduced: healthy
540
F. Piltan and J.-M. Kim
condition (HC), ball condition (BC), inner condition (IC), outer condition (OC), innerball condition (IBC), inner-outer condition (IOC), outer-ball condition (OBC), and innerouter-ball condition (IOBC). Moreover, the sampling rate to collect the data is 250 kHz. Besides, the motor rotational speeds are 300 and 500 RPM.
3 Proposed Approach Figure 2 shows the proposed scheme for fault diagnosis in the bearing. It has three main parts: a) adaptive-backstepping digital twin (ADT), b) acoustic emission (AE) residual signal generator, and c) decision making for detecting and identifying the fault. The ADT has two main parts: a) AE bearing signal modeling in healthy condition using the combination of the Gaussian Process Regression (GPR) approach and fuzzy logic (FL) procedure and b) unknown AE signal estimation using the combination of the backstepping (BS) approach, neural network (NN) procedure, and adaptive fuzzy (AF) technique. Thus, ADT is designed using the combination of the fuzzy-GPR for modeling and an adaptive neural-fuzzy backstepping scheme for AE signal estimation. Next, the residual signals are obtained. The fault decision section has two main parts: a) extract the energy feature from the resampled AE residual signals and b) decision making the resampled energy AE residual signals using SVM. 3.1 Adaptive-Backstepping Digital Twin Based on Fig. 2, the proposed ADT has two steps: modeling the normal signal and estimating all signals. To modeling the data in the healthy condition, the combination of the GPR technique and fuzzy logic approach, GF, is recommended in this work. The GPR method is a nonlinear regressor to modeling the AE signals. The state-space definition of the GPR approach is represented using the following equation. QX −G (k + 1) = CG QX −G (k) + δQi Qi (k) + eG (k) (1) QY −G (k) = (δQo )T (xn )CG−1 × QX −G (k) Here, QX −G (k), Qi (k), eG (k), QY −G (k), and δQi , δQo are the state of the AE data modeling for the healthy condition of bearing using GPR approach, the measurable original AE signal in normal condition, the data modeling’s error of the AE signal in normal condition of the bearing using GPR approach, the AE modeled in normal condition using GPR technique, and the coefficient to tuning the signal’s model in normal condition using GPR algorithm, respectively. The covariance matrix to find the GPR approach, CG , is denoted as the following definition. CG = ∅2 e
−0.5QXT −G W −1 QX −G
+∂
(2)
and W = diag(ε)2
(3)
An Adaptive-Backstepping Digital Twin-Based Approach
541
where ∅ is the variance of the AE signal, ∂ is the variance’s noise, and ε is the width of the kernel. The data modeling’s error of the AE signal in normal condition of the bearing, eG (k), using GPR approach can be introduced by the following definition. eG (k) = QY −G (k) − QY −G (k − 1)
(4)
Fig. 1. Experimental setup for bearing data collection: a) record the data, b) acoustic emission data acquisition.
Fig. 2.Adaptive-backstepping digital twin and machine learning for bearing fault diagnosis.
Fig. 2. Adaptive-backstepping digital twin and machine learning for bearing fault diagnosis.
To reduce the error of the signal modeling and reduce the complexity if system modeling in the GPR approach, the combination of the GPR algorithm and fuzzy logic (FL)
542
F. Piltan and J.-M. Kim
approach, FG, is suggested in this work. The fuzzy approach is a nonlinear technique to reduce the effect of uncertain conditions. The state-space AE signal modeling in normal conditions using fuzzy Gaussian Process Regression (FG) is represented as follows. QX −FG (k + 1) = CG QX −FG (k) + δQi Qi (k) + δQf Qf (k) + eFG (k) (5) QY −FG (k) = (δQo )T (xn )CG−1 × QX −FG (k) Here, QX −FG (k), eFG (k), QY −FG (k), Qf (k), and δQf are the state of the AE data modeling for the healthy condition of bearing using the combination of GPR and fuzzy logic approaches, the data modeling’s error of the AE signal in normal condition of the bearing using the combination of GPR and fuzzy logic approaches, the AE modeled in normal condition using the combination of GPR and fuzzy logic approaches, and the coefficient to fuzzy logic part, respectively. The data modeling’s error of the AE signal in normal condition of the bearing, eFG (k), using the combination of GPR approach and fuzzy logic technique can be introduced by the following definition. eFG (k) = QY −FG (k) − QY −FG (k − 1)
(6)
After modeling the AE signals in normal conditions by the combination of GPR approach and fuzzy logic technique, the proposed digital twin should be designed. First, the backstepping approach is suggested to improve the robustness and accuracy of the combination of GPR approach and fuzzy logic technique. Thus, the combination of backstepping approach and fuzzy-GPR, BFG, is represented as the following equation. ⎧ ⎨ QX −BFG (k + 1) = CG QX −BFG (k) + δQi Qi (k) + δQf Qf (k) + δQo ( (7) Q (k) + ωB (k)) + δB ϑB Qi (k) + B (k) ⎩ X −BFG −1 T QY −BFG (k) = (δQo ) (xn )CG × QX −BFG (k) Here, QX −BFG (k), ωB (k), ϑB , δB , QY −BFG (k), and B (k) are the state of the AE signal estimation for unknown conditions of bearing using the combination of fuzzyGPR and backstepping approach, the nonlinear function of the backstepping technique for AE signals, the parameter to calculate the backstepping observer, the coefficient to tuning the parameter in backstepping approach, the output state of the AE signal estimation for unknown conditions of bearing using the combination of fuzzy-GPR and backstepping approach, and the impact of uncertainty estimation in the bearing AE signals using backstepping approach, respectively. To reduce the estimation error in the combination of fuzzy-GPR and backstepping approach, the uncertainties can be estimated using the following definition. B (k + 1) = B (k) + ωB (k) + δB ϑB (Qi (k) − QY −BFG (k))
(8)
Furthermore, the error of backstepping signal estimation is represented as the following equation. eBFG (k) = (Qi (k) − QY −BFG (k))
(9)
To improve the power of uncertainties estimation, reduce the error of signal estimation, and visualization the combination of backstepping fuzzy-GPR and artificial neural
An Adaptive-Backstepping Digital Twin-Based Approach
543
network, NBFG, is suggested. To design an artificial neural network (ANN) approach, a three-layer ANN technique with a nonlinear-based hidden layer and a linear-based output layer is suggested. In this work we selected the tangent hyperbolic and the derivative of the function for activation and derivative of the function, it can be introduced using the following equations.
1 −1 (10) H (x) = 2 1 + ex 1 − H 2 (x) H˙ (x) = 2 Furthermore, the ANN is represented using the following definition. m 1 2 QN (k) = nk 2 −1 + Bk2 . − 1nj ψj +Bn1 1+e n=1
(11)
(12)
Here, QN (k), (1nj , 2nk ), Bn1 , Bk2 , ψj , and m are the estimated signal to improve the accuracy of signal estimation using neural network, the weight of the first and second layers, the acoustic emission measurable signal, and the number of hidden layer neurons, respectively. The state-space artificial neural network-based backstepping fuzzy-GPR, NBFG, is introduced using the following description. ⎧ ⎨ QX −NBFG (k + 1) = CG QX −NBFG (k) + δQi Qi (k) + δQf Qf (k) + δQN QN (k) + δQo ( Q (k) + ωB (k)) + δB ϑB Qi (k) + NB (k) ⎩ X −NBFG QY −NBFG (k) = (δQo )T (xn )CG−1 × QX −NBFG (k) (13) Here, QX −NBFG (k), NB (k), QY −NBFG (k), and δQN are the state of the AE signal estimation for unknown conditions of bearing using the combination of backstepping fuzzy-GPR and ANN approach, the impact of uncertainty estimation in the bearing AE signals using combination of ANN and backstepping approach, the output state of the AE signal estimation for unknown conditions of bearing using the combination of backstepping fuzzy-GPR and ANN approach, and the coefficient, respectively. To reduce the estimation error in the combination of backstepping fuzzy-GPR and ANN approach, the uncertainties can be estimated using the following definition. NB (k + 1) = NB (k) + ωB (k) + δB ϑB (Qi (k) − QY −NBFG (k))
(14)
To improve the robustness and reliability of the NBFG (neural-backstepping fuzzyGPR), the adaptive technique is suggested. Based on (14), δB is one of the main coefficients in the proposed method. Thus, to improve the performance of the proposed technique, adaptive approach is used for online tuning the δB . To optimize the coefficient, the fuzzy adaptive technique is implemented based on the following description. δAB = δB × γ
(15)
544
F. Piltan and J.-M. Kim
where δAB and γ are tunned coefficient using adaptive fuzzy algorithm and the fuzzy output to tuning the coefficient, respectively. The state-space adaptive artificial neural network-based backstepping fuzzy-GPR (adaptive digital twin), ADT, is introduced using the following description. ⎧ ⎨ QX −ADT (k + 1) = CG QX −ADT (k) + δQi Qi (k) + δQf Qf (k) + δQN QN (k) + δQo ( Q (k) + ωB (k)) + δB ϑB Qi (k) + ADT (k) ⎩ X −ADT QY −ADT (k) = (δQo )T (xn )CG−1 × QX −ADT (k) (16) Here, QX −ADT (k), QY −ADT (k), and ADT (k). are the state of the AE signal estimation for unknown conditions of bearing using the proposed adaptive digital twin, the output state of the AE signal estimation for unknown conditions of bearing using the proposed adaptive digital twin, and the impact of uncertainty estimation in the bearing AE signals using the proposed digital twin, respectively. Here, the uncertainties can be estimated using the proposed ADT and represented as the following definition. ADT (k + 1) = ADT (k) + ωB (k) + δAB ϑB (Qi (k) − QY −ADT (k)).
(17)
3.2 Acoustic Emission Residual Signal Generator The AE residual signal, RADT (k), is obtained using the difference between the measurable original AE signal and the estimated signals ung the proposed adaptive digital twin. RADT (k) = Qi (k) − QY −ADT (k)
(18)
3.3 Fault Decision After calculating the residual signal, the signals are characterized for training and testing. Table 1 illustrates the windows characterization for training and testing of the residual signal. Table 1. Window characterization for dataset for 8 conditions. Classes
8 Classes
Samples/Classes
400
Training samples/Classes
300
Testing samples/Classes
100
Moreover, the energy signal, E ADT (k), is extracted from AE bearing signals and represented as follows: E ADT (k) =
K i=1
RADTi (k)2
(19)
An Adaptive-Backstepping Digital Twin-Based Approach
545
Here, RADTi (k) and K are the window residual signal and the number of windows, respectively. Finally, the SVM is suggested for fault diagnosis and crack size identification.
4 Results Figure 3 shows the error of normal signal modeling using the proposed fuzzy-Gaussian process regression (FGPR) and Gaussian process regression (GPR). Based on this figure, the error of the proposed FGPR is less than GPR approach. Figure 4 illustrates the residual signal using the proposed ADT. Based on this figure, the visibility of the residual signals for all conditions based on the proposed ADT is excellent.
Fig. 3. Error of AE signal modeling using proposed fuzzy-GPR and GPR approaches.
Fig. 4. Residual signal in all conditions using the proposed adaptive digital twin.
Moreover, Fig. 5 illustrates the energy of residual signals for all conditions. Based on this figure, it is clear, the level of energy based on the proposed method is separated for different conditions. To validate the effectiveness of the proposed method, the proposed ADT is compared with the neural-backstepping fuzzy-GPR (NBFG) technique.
546
F. Piltan and J.-M. Kim
Figures 6(a) and 6(b) illustrate the fault diagnosis in the proposed method and NBFG when the motor speed is 300 RPM and 500 RPM, respectively. Based on this figure, the average accuracy of the proposed ADT is 96.505% and 97.19% when the torque load are 300 RPM and 500 RPM, respectively. Furthermore, the average accuracy in the NBFG technique is 88.76% and 89.44%, respectively. Thus, the proposed ADT outperforms the NBFG technique, respectively, yielding an accuracy improvement of 7.745% and 7.75% for 3 mm and 6 mm crack sizes.
Fig. 5. Energy of resampled residual signal monitoring using the proposed adaptive digital twin.
Fig. 6. Crack-variant fault diagnosis techniques: a) torque speed is 300 RPM, and b) torque speed is 500 RPM.
5 Conclusion In this work the proposed adaptive-backstepping digital twin and machine learning schemes was proposed for the bearing crack size identification in variant torque load speed. In the first step the AE signal in normal condition was modeled using the fuzzyGaussian process regression (FGPR) approach. Next, the adaptive-back stepping was developed using the combination of FGPR and adaptive-neural backstepping approach to estimate the AE signals in normal and abnormal states. After developing the adaptivebackstepping digital twin, the residual signals were generated by the difference between
An Adaptive-Backstepping Digital Twin-Based Approach
547
original and estimated AE signals. Finally, in the fault decision part, the energy feature was extracted from residual signals and support vector machine is suggested for fault classification. The UIAI Lab bearing dataset is selected to test the effectiveness of the proposed adaptive-backstepping digital twin. The result shows the accuracy of the fault diagnosis by the proposed adaptive-backstepping digital twin approach is 96.85%. In the future work, the robust backstepping approach will be developed to improve the robustness of adaptive-backstepping digital twin. Acknowledgements. This work was supported by the Korea Technology and Information Promotion Agency(TIPA) grant funded by the Korea government(SMEs) (No. S3126818).
References 1. Neupane, D., Seok, J.: Bearing fault detection and diagnosis using case western reserve university dataset with deep learning approaches: a review. IEEE Access 8, 93155–93178 (2020) 2. AlShorman, O., Irfan, M., Nordin Saad, D., Zhen, N.H., Glowacz, A., AlShorman, A.: A review of artificial intelligence methods for condition monitoring and fault diagnosis of rolling element bearings for induction motor. Shock Vibrat. 2020, 1–20 (2020) 3. Liu, Z., Zhang, L.: A review of failure modes, condition monitoring and fault diagnosis methods for large-scale wind turbine bearings. Measurement 149, 107002 (2020) 4. Xia, M., et al.: Intelligent fault diagnosis of machinery using digital twin-assisted deep transfer learning. Reliabil. Eng. Syst. Saf. 215, 107938 (2021) 5. Guo, K., Wan, X., Liu, L., Gao, Z., Yang, M.: Fault diagnosis of intelligent production line based on digital twin and improved random forest. Appl. Sci. 11(16), 7733 (2021) 6. Piltan, F., Kim, J.-M.: Crack size identification for bearings using an adaptive digital twin. Sensors 21(15), 5009 (2021) 7. Zaki, A.A., Diab, A.-H., Al-Sayed, H.H., Mohammed, A., Mohammed, Y.S.: Literature review of induction motor drives. In: Development of Adaptive Speed Observers for Induction Machine System Stabilization. SECE, pp. 7–18. Springer, Singapore (2020). https://doi.org/ 10.1007/978-981-15-2298-7_2 8. Ontiveros-Robles, E., Castillo, O., Melin, P.: Towards asymmetric uncertainty modeling in designing General Type-2 Fuzzy classifiers for medical diagnosis. Exp. Syst. Appl. 183, 115370 (2021) 9. Ontiveros, E., Melin, P., Castillo, O.: Designing hybrid classifiers based on general type-2 fuzzy logic and support vector machines. Soft. Comput. 24(23), 18009–18019 (2020) 10. Ontiveros-Robles, E., Melin, P.: A hybrid design of shadowed type-2 fuzzy inference systems applied in diagnosis problems. Eng. Appl. Artif. Intell. 86, 43–55 (2019) 11. Wenhua, D., et al.: A new fuzzy logic classifier based on multiscale permutation entropy and its application in bearing fault diagnosis. Entropy 22(1), 27 (2019) 12. Ziying, Z., Xi, Z.: A new bearing fault diagnosis method based on refined composite multiscale global fuzzy entropy and self-organizing fuzzy logic classifier. Shock Vibrat. 2021, 1–11 (2021)
Implementation-Oriented Feature Selection in UNSW-NB15 Intrusion Detection Dataset Mohammed M. Alani(B) School of Information Technology Administration and Security, Seneca College of Applied Arts and Technology, Toronto, ON, Canada [email protected] Abstract. With a daily global average of 30,000 websites breached in 2021, security challenges grow in difficulty, complexity, and importance. Since it’s publication, UNSW-NB15 dataset was used in many machinelearning, and statistics based intrusion detection solutions. It provides over 2.5 million instances of benign and malicious network flow captures. In this paper, we present an implementation-oriented feature selection that reduces the number of features while maintaining high accuracy. The proposed reduction resulted in a dataset with 5 features that are focused on making machine learning models more implementable, practical, and efficient. Testing showed that the reduced dataset maintained an accuracy of 99% with a testing time reduction of up to 84%. Keywords: Intrusion detection
1
· Dataset · IoT · Malware
Introduction
The dataset UNSW-NB15 was introduced in 2015 in [1]. The dataset contains 2,540,044 instances of malicious and benign network flows. Table 1 shows a list of the different attack categories captured in this dataset along with the number of instances captured for each category. Table 1. Types of attacks captured in the UNSW-NB15 dataset Attack category Number of network flows Benign
2,218,761
Fuzzers
24,246
Reconnaissance
12,228
Shellcode
1,511
Analysis
2,677
Backdoors DoS
2,329 16,353
Exploits
44,525
Generic
215,481
Reconnaissance Worms
1,759 174
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 548–558, 2022. https://doi.org/10.1007/978-3-030-96308-8_51
Implementation-Oriented Feature Selection in UNSW-NB15
549
UNSW-NB15 dataset’s raw network packets were about 100GB in size that were captured as pcap files. The dataset was created by extracting 49 features from the raw packets to produce the 2,540,004 instances. Detailed list of features of the dataset can be found in [1]. In this paper, we present an implementation-oriented feature selection that produces a smaller number of features while maintaining high accuracy, and supporting implentability. The proposed feature reduction produced an implementation-oriented dataset that can be used in building machine-learning classification models that are easier to deploy. The work focuses on selecting the features that are easy to extract such that the real-life deployments of the trained models can be realized. Although the dataset captures data for 10 different types of attacks, our work is focused on detecting these attacks without identifying a specific attack type. We found that this approach provides higher accuracy. Identifying the specific attack type is beyond the scope of this work.
2
Previous Works
Since its introduction in 2015, UNSW-NB15 dataset was referenced in over 1000 papers according to Google Scholar[2]. However, only few of these papers targeted feature reduction. In 2015, the same author of the data set, Moustafa et al.[3], presented a paper discussing the features they considered significant in UNSW-NB15. The study focused on replicating the features of UNSW-NB15 dataset into KDD99[4] to measure their efficiency. The research applies an Association Rule Mining algorithm as a method of feature selection to generate the strongest features in both datasets. The study concluded that although UNSW-NB15’s features were more efficient, the accuracy of KDD99 was higher. In 2017, Janarthanan and Zargari published a paper discussing feature selection in KDD99 and UNSW-NB15 [5]. The paper concluded that five features are considered significant and can lead to high accuracy. Those features are service, sbytes, sttl, smean, and ct dst sport ltm. However, the paper did not go into details of how these selections were made. The proposed reduction of the dataset led to an accuracy of 81.6%. Another study comparing UNSW-NB15 to KDD99 was presented in 2020 by Al-Dewari et al. in [6]. The study employed rough-set theory to calculate dependency ratio between features. Then, the features were fed into a back-propagation neural network, and feature selection was performed afterwards based on the performance of these features in the neural network classifier. The study looked into selecting different features for each category of attack to achieve higher accuracy with number of features ranging from 12 to 29 features. However, the study could not come up with a unified version of the dataset that can be used to train IDS system to capture all of the attacks.
550
M. M. Alani
In 2019, Khan et al. proposed performance improvement to machine learning models using feature selection[7]. The proposed method also employs feature importance to select highest ranking features to improve accuracy and reduce prediction time. The proposed subset of the dataset include 11 features only and resulted in accuracy of 71 to 75% depending on the classifier type. Kanimozhi and Jacob proposed, in 2019, feature selection based on feature importance[8]. The paper proposed the use of an ensemble classifier of random forest and decision tree to reduce the number of features to 4. After measuring the feature importance, the four features with the highest importance were selected and used to train a deep learning network based classifier. The classifier very well in detecting benign traffic. However, it performed with lower accuracy in detecting the malicious class with around 15% false-negative rate. In 2020, Kasongo and Sun published a paper that proposes a filter-based feature reduction technique using the XGBoost algorithm[9]. The proposed method calculates feature importance using XGBoost algorithm and ranks features accordingly. The research provided a conclusion that 19 features are considered the ”optimal” and were noticed to produce an accuracy ranging between 90.85% to 60.89% depending on the training classifier type. Almomani published, in 2020, a research paper proposing feature selection model for UNSW-NB15 based on Particle Swarm Optimization (PSO), Grey Wolf Optimizer (GWO), Firefly Optimization (FFA) and Genetic Algorithm (GA)[10]. The proposed model deploys filtering-based methods for the Mutual Information (MI) of the GA, PSO, GWO and FFA algorithms that produced 13 sets of rules. Based on the experiment, Rule 13 reduces the features into 30 features. Rule 12 reduces the features into 13 features. The study concluded that intrusion detection systems with fewer features will have higher accuracy. In 2021, Moualla et al. presented a paper introducing a scalable multiclass machine learning-based network IDS[11]. The proposed system is composed of several stages starting from Synthetic Minority Oversampling Technique (SMOTE) method to solve the imbalanced classes problem in the dataset and then selects the important features for each class existing in the dataset by the Gini Impurity criterion using the Extremely Randomized Trees Classifier (Extra Trees Classifier). The number of selected features was different for each attack class. Results varied from an accuracy of 0.82 to 0.99 in some classes. Kumar and Akthar presented, in 2021, a paper discussing efficient feature selection based on decision trees[12]. The proposed feature selection resulted in the selection of 20 features, and provided an accuracy of 0.9734.
3
Preprocessing
The preprocessing, training, and testing in this research was conducted on a computer with the following specifications: • Processor: AMD Ryzen 5 3600 (4.2 GHz) • RAM: 32GB • Operating System: Windows 10 Pro
Implementation-Oriented Feature Selection in UNSW-NB15
551
• Python: 3.8.5 • Sci-Kit Learn: 0.24.2 Upon detailed examination of the dataset, we found certain anomalies that we wanted to address at the preprocessing phase. These findings are listed below. • The number of records in each specific attack type was highly imbalanced, as shown in Table 1. • Over one million records contained missing values. • Only 12% of the instances in the dataset are labelled “Malicious”. • The dataset contains many non-numerical features, such as IP addresses, transaction protocol, service, and state. • Some port number fields were filled with hexadecimal values. We decided to address these findings with the preprocessing steps shown in algorithm 1. Algorithm 1. Dataset Preprocessing Input: Original Dataset (2,540,044 instances, 49 features) Output: Balanced Dataset with no missing data (89,533 instance, 48 features) Array ← RawDataset In (Array) remove attack cat feature for instance ∈ Array do if f eature ∈ instance is empty then Remove instance end if end for Array ← RandomU ndersampling(Array) convert srcip, dstip to decimal label-encode proto, state, service for instance ∈ Array do if sport, dsport ∈ instance is hexa then Convert sport, dsport to decimal end if end for
In the first step of preprocessing, shown in Algorithm 1, we decided to remove the specific attack label because we noticed very large disparity between the number of packet flows in each attack. As shown in Table 1, some classes formed as little as 0.000069% while others formed 87.35%. This kind of severe imbalance could cause very poor performance in detecting the under-represented minority attack types. Hence, we decided to remove the specific attack labels and use the “malicious” and “benign” general labels instead to create a binary classifier. The next step was to remove instances with one or more empty features. This resulted in a reduction of the total number of instances to 1,087,202, of which 1,064,987 labelled “benign”, and only 22,215 labelled “malicious”. To address this significant imbalance in the dataset, we performed random undersampling
552
M. M. Alani
of the majority class down to 67,318 instances. This brought the total number of instances to 89,533. In the following steps, we performed label encoding on the features named proto, state, and service, as their values were not numerical in the original dataset. At the end of the preprocessing phase, the dataset included 89,533 instances (67,318 benign, and 22,215 malicious), with 48 features in each instance.
4
Proposed Feature Selection
The direction of this research is focused on generating a sub-dataset from UNSWNB15 that has a smaller number of features, while maintaining high accuracy when used to train machine learning models. This dictates that the selected features have to be explainable, and can be easily acquired during real-life data acquisition. This approach rules out certain statistical dimensionality reduction algorithms such as Principal Component Analysys (PCA), Singular Value Decomposition (SVD), and Linear Discriminant Analysis (LDA). The method that we used to reduce the number of features in this research was successive feature reduction using feature importance. The steps followed are shown in Algorithm 2. As shown in Algorithm 2, we created a Random Forest classifier model and trained it with 75% of the dataset records, and tested it with the remaining 25% records. Then, the feature importance is calculated for each feature. The feature with the lowest feature importance is then removed and another cycle of training and testing is done, and so on. Based on this reduction method, the number of features that need to be captured in live deployments is reduced, and not only the dimensionality of the data input to the system. This enables more efficient data acquisition, training, testing, and more agile real-life deployment. Several other papers, as explained in Sect. 2 relied on feature importance as well to select the features with the highest importance. In our research, we did not follow the Algorithm 2. Successive Feature Reduction Using Feature Importance Input: Dataset with 48 features Output: Dataset with 5 features Array ← Dataset model ← RandomF orestClassif ier T argetF eatures ← 5 while F eatures(Dataset) > T argetF eatures do RandomSplit(Array) → T rain Array, T est Array train model with T rain Array importance ← F eatureImportance(model) i ← index of feature with lowest importance Array.DeleteF eature(i) end while Dataset ← Array
Implementation-Oriented Feature Selection in UNSW-NB15
553
same method of selecting the features with the highest importance. Instead, our proposed method relies on repetitive elimination of the feature with the lowest importance, and re-training the model again. We proposed this method because the features might be correlated and can be impacting each other’s importance. This means that the importance of one features might be impacted by the existence (and therefore the elimination of) another feature. That is why we re-train and re-calculate the importance after the elimination of each feature.
5
Implementation and Results
The proposed algorithm was implemented in Python using Sci-KitLearn package. We created two groups of four machine-learning models using the algorithms listed below. • • • •
Random Forest (RF) Logistic Regression (LR) Decision Tree (DT), and Gaussian Naive-Bayes (GNB)
The first group of classifiers was trained and tested prior to the feature selection and reduction process, while the second group of models was trained and tested with the feature-reduced dataset. The consecutive elimination of the feature with the lowest importance resulted in the selection of five final features; scrip, sttl, Dload, dmeansz, and ct state ttl. Fig. 1 shows the change in the average f1-score with the reduction of features. As shown in the Fig. 1, the f1-score remained within the 99% region even when the number of features was reduced to five. We did not proceed with further reduction because the accuracy and the f1-score were noticeably impacted by further reduction.
Fig. 1. Average f1-score change with feature reduction
554
M. M. Alani Table 2. Machine-learning models performance before and after reduction Model 48 features Accuracy f1-score FP
FN
5 features Accuracy F1-score FP
FN
RF
0.9978
0.9978
0.0028 0.0001 0.9969
0.9969
0.0027 0.0041
LR
0.9966
0.9967
0.0044 0.0000 0.8974
0.9019
0.1250 0.0337
DT
0.9965
0.9965
0.0022 0.0074 0.9965
0.9965
0.0028 0.0056
GNB
0.9964
0.9964
0.0042 0.0014 0.9965
0.9965
0.0042 0.0012
Table 3. Feature importance of the selected features Feature
Importance
srcip
0.3746
sttl
0.2784
Dload
0.0641
dmeansz
0.0037
ct state ttl 0.2791
To have a clearer view on the impact of reduction on accuracy, we created Table 2 to compare the system performance before and after reduction. The f1score mentioned in the table is the weighted average f1-score, along with the False-Positive (FP) and False-Negative (FN) performance measures. By examining Table 2, we can see that only Logistic Regression suffered noticeable performance degradation after the feature reduction while other algorithms maintained similar performance measures. This shows that the reduction was successful in achieving its first goal which is maintaining the high accuracy with a smaller number of features. Table 3 shows the feature importance of the selected 5 features after feature reduction. By examining Table 3, we can see that srcip address carries the highest importance along with sttl and ct state ttl, while dmeansz, and Dload were in the lower side.
6
Implementation Consideration
As one of our research goals was to improve implementability of the models trained with the UNSW-NB15 dataset, while maintaining accuracy, we focused on two points; making sure that the selected features are easy to extract from real-life packet captures, and improving efficiency of the models, without sacrificing accuracy.
Implementation-Oriented Feature Selection in UNSW-NB15
555
Table 4. Timing parameters of machine-learning models Model 48 features
5 features
Training time (s) Testing time (µ s) Training time (s) Testing time (µ s) RF
4.1470
4.647
1.520
4.199
LR
0.6388
0.0893
0.1067
0.0892
DT
0.2785
0.0653
0.312
0.0402
GNB
0.0556
0.5362
0.0107
0.1340
As we examined the ease of acquisition of the five selected features, we reached the following conclusions: • srcip: The source IP address is one of the easiest features to extract. It can be extracted from a single packet without the need of collecting the rest of the flow packets. • sttl : The source to destination time-to-live value. This value is calculated as the average TTL value from the flow IP packets. • Dload: The dataset definition of this feature is ’Destination bits per second’. In general, this feature counts the bit-rate of data transmitted from the destination. This requires capturing the full packet flow to be able to calculate it accurately. • dmeansz: The mean size of packets transmitted by the destination host. This can be calculated based on the complete flow information. Hence, it requires capturing the full flow of packets. • ct state ttl: This feature is calculated by finding the mean number of each state’s occurrence within the packets flow. The value of this feature is highly dependent on the values of the features state, sttl, and dttl. Hence, it also needs information collected from the whole packet flow after capturing. As mentioned in the list above, 4 out of 5 of the selected features require capturing the whole packet flow to be able to calculate them. In terms of implementation, this can either be implemented on a network-border device, such as a proxy server or a firewall, or in a host-based model. If it to be implemented in host-based model, further study of the memory, and storage requirements of deploying such a machine-learning based model in an IoT device. This is beyond the scope of this study. Border devices can have the memory, processing, and storage requirements to capture packet-flows to feed them in to the model for predictions. In terms of efficiency, Table 3 shows the timing parameters for all machine learning models with the full 48-feature dataset, and with the reduced 5-feature dataset. The training time is for the full training subset of the dataset, while the testing time, is calculated per a single instance. According to the Table 4, with the exception of Decision Tree, all algorithms achieve noticeable reduction in training time that ranged between 84%–64%, while DT witnessed an increase of 12%.
556
M. M. Alani
Fig. 2. Change of training time before and after feature reduction
Fig. 3. Change of testing time before and after feature reduction
On the other hand, testing time witnessed a reduction ranging between 8%– 77%, in exception of Logistic Regression that witnessed an increase of 5% in testing time per instance. Figures 2 and 3 show the impact of feature reduction on training time, and testing time, respectively.
7
Proposed System Comparison
A brief comparison with previous works is presented in Table 5. The comparison included papers [7–9] as these papers used a feature selection method that employs feature importance, which is the closest to what we are proposing. As it can be seen in the table, our proposed system provided higher accuracy with 3 out of 4 algorithms used. In terms of accuracy, our Random Forest model achieved the highest accuracy with 99.69%. With regards to training and testing time, it was not possible to reach a conlusiove answer, as [8,9] did not include timing parameters, while [7] did not specify whether the timing mentioned is for a single instance or all of the testing subset.
Implementation-Oriented Feature Selection in UNSW-NB15
557
Table 5. Comparison of proposed system with related works Paper
Features Classifier Accuracy (%) Training time(s) Testing time(s)
[7]
11
74.875 71.437 74.641 74.227 71.103
3.656 109.946 9.817 2.753 25.189
0.532 6.331 1.144 0.06 118.457
[8]
4
MLP
89
–
–
[12]
20
DT
97.34
–
–
[9]
19
ANN LR kNN SVM DT
84.39 77.64 84.46 60.89 90.85
– – – – –
– – – – –
5
RF LR DT GNB
99.69 89.74 99.65 99.65
4.1470 0.6388 0.3120 0.0107
4.199µ 0.0892µ 0.0402µ 0.1340µ
Our work
8
RF XGB BME DT KNN
Conclusions and Future Work
In this paper, we proposed a feature-reduced version of the UNSW-NB15 dataset. This reduced version of the cleaned and preprocessed dataset produced superior training and testing speed along with an accuracy exceeding 99% in detection of attacks. Our proposed feature reduction was based on successive elimination of features based on feature importance measured based on training of a Random Forest model. The proposed implementation-oriented reduction of features produced a dataset of 89,533 instances (67,318 benign, and 22,215 malicious) with 5 features only compared to the original 48-feature version. Testing demonstrated that the high accuracy achieved after did not deteriorated after the elimination of 43 features. It also showed a noticeable reduction in training time that exceeded 60% in most algorithms used, while the testing time witnessed a reduction ranging between 8%–77% in most algorithms. Future directions in our research are listed below: 1. Deploying the trained models on border firewalls and measure their performance. 2. Deploying the trained models on IoT devices and measure their processing requirements and performance. 3. Further comparisons with the original dataset (UNSW-NB15) using deep neural networks.
558
M. M. Alani
References 1. Moustafa, N., Slay, J.: UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In: 2015 Military Communications and Information Systems Conference (MilCIS), pp. 1–6 (2015). IEEE 2. UNSW-NB15 - Google Scholar (2021). Accessed 30 Sept 2021. https://scholar. google.com/scholar?q=unsw-nb15 3. Moustafa, N., Slay, J.: The significant features of the UNSW-NB15 and the KDD99 data sets for network intrusion detection systems. In: 2015 4th International Workshop on Building Analysis Datasets and Gathering Experience Returns for Security (BADGERS), pp. 25–31. IEEE (2015) 4. Elkan, C.: Results of the KDD’99 classifier learning. ACM SIGKDD Explorat. Newslett. 1(2), 63–64 (2000) 5. Janarthanan, T., Zargari, S.: Feature selection in UNSW-NB15 AND KDDCUP’ 99 datasets. In: 2017 IEEE 26th International Symposium on Industrial Electronics (ISIE), pp. 1881–1886. IEEE (2017) 6. Al-Daweri, M.S., Zainol Ariffin, K.A., Abdullah, S., et al.: An analysis of the KDD99 and UNSW-NB15 datasets for the intrusion detection system. Symmetry 12(10), 1666 (2020) 7. Khan, N.M., Nalina Madhav, C., Negi, A., Thaseen, I.S.: Analysis on improving the performance of machine learning models using feature selection technique. In: Abraham, A., Cherukuri, A., Melin, P., Gandhi, N. (eds.) Intelligent Systems Design and Applications, pp. 69–77. Springer, Cham, Switzerland (2019). https:// doi.org/10.1007/978-3-030-16660-1 7 8. Kanimozhi, V., Jacob, P.: UNSW-NB15 dataset feature selection and network intrusion detection using deep learning. Int. J. Recent Technol. Eng. 7(5S2), 443– 446 (2019) 9. Kasongo, S.M., Sun, Y.: Performance analysis of intrusion detection systems using a feature selection method on the UNSW-NB15 dataset. J. Big Data 7(1), 1–20 (2020) 10. Almomani, O.: A feature selection model for network intrusion detection system based on PSO, GWO, FFA and GA algorithms. Symmetry 12(6), 1046 (2020) 11. Moualla, S., Khorzom, K., Jafar, A.: Improving the performance of machine learning-based network intrusion detection systems on the UNSWNB15 dataset. Comput. Intell. Neurosci. 2021 (2021) 12. Suresh Kumar, P., Akthar, S.: Building an efficient feature selection for intrusion detection system on UNSW-NB15. In: Jyothi, S., Mamatha, D.M., Zhang, YD., Raju, K.S. (eds.) Proceedings of the 2nd International Conference on Computational and Bio Engineering, pp. 641–649 (2021). https://doi.org/10.1007/978-98116-1941-0 64
Augmented Reality SDK’s: A Comparative Study El Mostafa Bourhim1(B) and Aziz Akhiate2 1 EMISYS: Energetic, Mechanic and Industrial Systems, Engineering 3S Research Center,
Industrial Engineering Department, Mohammadia School of Engineers, Mohammed V University, Rabat, Morocco [email protected] 2 Artificial Intelligence and Complex Systems Engineering (AICSE), Hassan II University of Casablanca, Ecole Nationale Supérieure Des Arts Et Des Métiers, ENSAM, Casablanca, Morocco
Abstract. Augmented Reality (AR) is one of the most exciting and rapidly developing technology in recent years. AR technology is utilized by a variety of devices and platforms, each with its own set of specifications and software development kits (SDKs). The major benefit of SDKs is that they allow developers to apply typical and time-tested techniques and shortcuts in development, rather than tackling all generic and typical problems individually and wasting time on them. This paper provides a study of various SDKs, including their advantages and disadvantages. Four different types of AR SDK’S were considered: Vuforia, Wikitude, ARToolKit and Kudan. The study compares the AR SDK’S in terms of licence, supported platforms, cloud recognition, geolocation, SLAM (Simultaneous Localization and Mapping), and other additional features. The comparative study has shown that none of the discussed AR SDK’s can satisfy all the criteria because each framework is suitable for a particular application as well as its own area of implementation. Keywords: Augmented reality · AR SDK’s · Vuforia SDK · Wikitude SDK · ARToolKit SDK · Kudan SDK
1 Introduction The term “AR”, refers to a new technology that allows the real-time combining of digital data processed by a computer with data from the actual environment via appropriate computer interfaces. AR is a complete information technology that includes image computing, computer design, artificial intelligence, multimedia and other areas. Simply defined, AR employs computer-aided visuals to provide an additional layer of information to enhance comprehension and/or interaction with the real world. The most widely recognized definition of AR [1] is based on: creating a virtual image on top of a real image, allowing real-time interaction, and Blending 3D (or 2D) virtual things with real ones in a seamless manner. The AR interface makes the implicit explicit, which implies © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 559–566, 2022. https://doi.org/10.1007/978-3-030-96308-8_52
560
E. M. Bourhim and A. Akhiate
that information that is implicitly connected with a context is rendered useable and immediately accessible. AR mixes real and virtual things and is both interactive and 3D registered. A special SDK must be used to integrate AR into a mobile application. These SDKs are widely available on the market, and each has its unique set of standards. The decision is made based on the requirements and specific factors. The most popular AR SDKs on the market now provide intriguing possibilities for projecting oneself in the creation of a commercial application. AR Development Kits, such as Apple’s ARKit for iOS and Google’s ARCore for Android, have made it easier for developers to create AR applications. The goal of these platforms is to make AR more accessible to developers. As a result, the creation of diverse AR tools is equally important. To build any AR tool, however, it is necessary to select proper development tools. In this paper, we suggest a survey of the most regularly used SDKs. This study compares four well-known AR SDK’S (Vuforia, Wikitude, ARToolKit and Kudan) to demonstrate the accuracy and benefits of each SDK in terms of licence, supported platforms, cloud recognition, geolocation, SLAM, and other additional features. In addition, this study compares the advantages and disadvantages of one SDK against the other. The remaining content of this work is organized as follows: In Sect. 2, the literature review of related work is presented. Section 3 contains description of the selected AR SDK’s for this research. Section 4 covers the methodology used and the comparative analysis of these AR SDK’s, while in Sect. 5 and 6, the discussion and direction for future research are presented.
2 Literature Review 2.1 Augmented Reality More than two decades ago, Milgram and Kishino laid the conceptual foundation for AR and virtual reality (VR). On the virtuality spectrum, real and virtual worlds are at opposite extremes [2]. At one end of the spectrum are environments constructed completely of real objects. On the other hand, some environments are totally made up of computer-generated elements. Mixed Reality environments are in the center of the continuum, combining real and virtual elements. Virtual settings containing some real-world material are referred to as augmented virtuality (AV). Reality is the primary component of AR settings, with computer-generated visual information serving as a supplementary component (Fig. 1). Different sorts of applications are possible due to the technological distinctions between AR and VR devices. Users can practice in entirely computer-generated settings in VR [3], giving them access to a theoretically infinite number of training situations and circumstances. Because AR apps incorporate virtual material in the actual environment, training choices are slightly different. For example, training may be customized to specific situations.
Augmented Reality SDK’s: A Comparative Study
561
Fig. 1. Simplified representation of a “virtuality continuum”
2.2 Augmented Reality Frameworks The tools used to construct AR apps are known as SDKs, or frameworks. Frameworks provide a coding environment where users may create and implement all functions that will comprise the core applications with AR capabilities. According to [4], these AR frameworks make various components of the AR application easier to use, such as: a) Recognition: The recognition component serves as the AR application’s brain; b) Tracking: The tracking component serves as the AR experience’s eyes; and c) Content Rendering: It is essentially the rendering of creative virtual objects in real-time. Because this field is on the rise and is frequently mentioned in current studies, there are a variety of AR frameworks on the market. Similar studies [4, 5] that established comparisons in order to establish the major current AR platforms, as well as the collaborative creation of an online comparative table [6], are also available. All of these studies, on the other hand, either gave a broad overview of AR or focused on specific technologies that helped them achieve their goals. There are now numerous tools in use for the execution of activities including the creation of AR-enabled applications. The next section goes through some of the many AR SDKs that may be used to create apps for smartphones, tablets, and even smart glasses.
3 Augmented Reality SDK SDK - a set of development tools that enables software experts to produce applications for a certain software package, hardware platform, computer system, gaming consoles, operating systems, and other platforms. The SDK maximizes the capabilities of each platform while reducing integration time. The next section goes through some of the many AR SDKs that may be used to create apps for smartphones, tablets, and even smart glasses. We’re concentrating on SDKs that offer a native platform, Unity support, and compatibility with common hardware. Finally, we’ll give comparison tables for those SDKs. 3.1 Vuforia Vuforia is an AR SDK that consistently ranks at the top of most “Top AR” rankings. Vuforia offers a number of AR development tools, including the Vuforia Engine, Studio,
562
E. M. Bourhim and A. Akhiate
and Chalk. It has numerous important characteristics that make it one of the finest for object identification and 3D modeling. Ground Plane (for adding content to horizontal surfaces), Visual Camera (for expanding supported visual sources beyond mobile phones and tablets), and VuMarks are among these functionalities [7]. 3.2 Wikitude Another excellent option for AR app development is Wikitude. It is a newcomer to the market, having been created in 2008, yet it has already established a strong reputation. In reality, the argument between Vuforia and Wikitude has lately heated up. Wikitude may be used to create AR apps for iOS, Android, and Smart Glasses. Wikitude offers a range of tracking methods and technologies, as do other top AR development packages, but it also includes geolocation, cloud detection, and distance-based scaling capabilities [8]. 3.3 ARToolKit ARToolKit is an open-source and free-to-use SDK for AR development on a variety of platforms. ARToolKit is utilized for AR programs for Windows, Linux, and OS X in addition to Android and iOS. ARToolKit was first released in 1999 and has since received several upgrades. Tracking of planar pictures and basic black squares, natural feature marker creation, real-time speed support, and easy camera calibration are all incorporated in the current release. ARToolKit also includes numerous extra plugins for Unity and Open Scene Graph development [9]. 3.4 Kudan Kudan offers a professional AR SDK that can be used to create engaging mobile applications.. This high speed SDK with small data size and footprint is compatible with iOS and Android as well as providing Unity plugins. It does not require the latest mobile devices due to the hardware agnostic algorithm. Kudan also gives CV SDK, which is a tracking component without rendering [10].
4 Comparison of Augmented Reality SDK’s The comparative study of the above-mentioned SDK’s has been done relative to some important parameters, as shown in Table 1. Also advantages and disadvantages of AR SDK’s are discussed in this section (Table 2). We can classify the AR SDK into following categories, they are: Each AR SDK has some advantages and disadvantages. We give some benefits and limitations of these frameworks.
Augmented Reality SDK’s: A Comparative Study
563
Table 1. Differences among frequently used augmented reality SDK’s Comparison parameters
Vuforia
Wikitude
ARToolKit
Kudan
Licence
Free, commercial (annual licence of $504)
Commercial
Free, open source
Free, commercial
Supported platforms
Android, iOS, UWP
Android, iOS
Android, iOS, Linux, Windows, macOS
Android, iOS
Cloud recognition
Yes
Yes
No
No
Geolocation
Yes
Yes
Yes
No
SLAM
No
Yes
No
Yes
Unity support
Yes
Yes
Yes
Yes
3D recognition
Yes
Yes
No
Yes
Smart glasses support
Yes
Yes
Yes
No
Cost
Free. But there are also paid plug-ins that cost $ 99 a month
Free trial. Ful functionality starts from e 1990
Free
The free version is for application testing only. The cost of a paid license is $ 1230
OpenCV (Open Source Computer Vision)
No
No
No
No
Table 2. Augmented reality SDK’s advantages and disadvantages AR SDK
Advantages
Dısadvantages
Vuforia
– Enable to preserve monitoring even when the goal is out of view and view them from increased distance – Cloud Database lets in storing thousands of image targets – 3d Tracking – VuMarks are used for augmenting and identifying objects as part of a series – Webcam/Simulator Play Mode
– limitations of the number of VuMarks and the amount of Cloud recognition – Device database can only support 100 image targets – Vuforia SDK for Android does not expose any utility function to easily load a 3D model from any standard format – Poor developer documentation – SDK has some issues and bugs (continued)
564
E. M. Bourhim and A. Akhiate Table 2. (continued)
AR SDK
Advantages
Dısadvantages
Wikitude
– AR content can be programmed using – Doesn’t track 3D model which limits basic HTML5, JavaScript and CSS is use to only 2D tracking – Easy portability of AR apps from one – Target image to track need to be of platform to another solid colors to be recognized – Offline 2D Image Recognition and 2D Image tracking – Distance-Based Scaling – Offline Object Recognition and Object Tracking – Wikitude supports a wide variety of development frameworks
ARToolKit – Multiple platforms AR app – It has a huge variety of functions, so development possible it is difficult to integrate the library, and it takes more time to explore all – Simultaneous Tracking the available options and settings – Provides fast and precise tracking – Supports square marker, multimarker, – Less accuracy in tracking markers even when camera and marker are and 2D barcode still – It itself doesn’t support location based AR Kudan
– Flexible to work on mobiles, Head-Mounted Displays, and Robotics applications – Makes use of Instantaneous SLAM with high-quality models – Used for Advanced IOT (Internet of Things) and AI (Artificial Intelligence) – Unlimited local visual search (no network connection required)
– Kudan also use markers, but they do not facilitate their creation in their platform. The markers can only be stored directly in the software’s source code
5 Discussion In this work, the discussion was held on four popular AR SDK’s such as vuforia, wikitude, ARToolKit and kudan by describing their working mechanism and showing their strengths and weaknesses as shown in Table 1 and Table 2. Vuforia is an excellent tool for capturing, identifying, and recognizing physical things and/or their constituents. Vuforia, on the other hand, is not without its drawbacks. The majority of them are linked to flat images that serve as markers and object recognition in general, such as: If an item covers a portion of the marker, it may not be identified. Wikitude provides an SDK that can be used with a variety of development platforms to allow apps to be built using Wikitude’s content. This framework can monitor planar markers, numerous targets, 3D objects, geolocated markers, and even markerless SLAM
Augmented Reality SDK’s: A Comparative Study
565
technology. Wikitude faces a lot of challenges as example: Doesn’t track 3D model which limits is use to only 2D tracking. ARToolKit is an open-source AR programming framework that allows you to customize the source code to fit your own needs. Despite the fact that ARToolKit contains a lot of features for a free SDK, it will take some effort to integrate and set up. Kudan can perceive 2D and 3D images and supports SLAM. This AR SDK makes use of the KudanCV motor, which has a minimal memory footprint. It faces a lot of challenges as example: Kudan, allow only offline target tracking. The comparative study has shown that none of the discussed AR SDK’s can satisfy all the criteria because each framework is suitable for a particular application as well as its own area of implementation.
6 Conclusion There is no one way to choose an AR development tool; different authors categorize their choices depending on the amount of features they offer: licence, supported platforms, cloud recognition, geolocation, SLAM, and more. Depending on the purpose of the application, you can use not only cross-platform engines but also sets of development tools. Such SDKs allow you to speed up and simplify the process of developing any program with elements of AR. These advantages and disadvantages will help beginners to choose the most convenient tool for developing AR applications. The aim behind this study was not to figure out which SDK is superior to others but to know that the selection of the correct SDK depends on the required application conditions and domain. In the future study, we will use other methods such as AHP [11], AWOT [12], FUZZY AHP [13, 14] for AR SDK’s selection.
References 1. Milgram, P., et al.: Augmented reality: a class of displays on the reality-virtuality continuum. In: Telemanipulator and Telepresence Technologies. Vol. 2351. International Society for Optics and Photonics (1995) 2. El mostafa, B., Abdelghani, C.: How can the virtual reality help in implementation of the smart city? In: 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT). IEEE (2019) 3. El mostafa, B., Abdelghani, C.: Simulating pre-evacuation behavior in a virtual fire environment. In: 2018 9th International Conference on Computing, Communication and Networking Technologies (ICCCNT). IEEE (2018) 4. Amin, D., Govilkar, S.: Comparative study of augmented reality SDKs. Int. J. Computat. Sci. Appl. 5(1), 11–26 (2015) 5. Jooste, D., Rautenbach, V., Coetzee, S.: Results of an evaluation of augmented reality mobile development frameworks for addresses in augmented reality. Spat. Inf. Res. 24, 1–13 (2016). https://doi.org/10.1007/s41324-016-0022-1 6. Social Compare. Official Social Compare Web Page (2017). http://socialcompare.com/en/ comparison/augmented-reality-sdks/ 7. Vuforia. Official Vuforia Web Page (2021). https://www.vuforia.com/ 8. Wikitude. Official Wikitude Web Page (2021). https://www.wikitude.com/
566
E. M. Bourhim and A. Akhiate
9. ARToolKit. Official ARToolKit Web Page (2021). https://artoolkit.org/ 10. Kudan. Official Kudan Web Page (2021). https://www.kudan.io/news 11. El mostafa, B., Abdelghani, C.: Selection of optimal game engine by using AHP approach for virtual reality fire safety training. In: International Conference on Intelligent Systems Design and Applications. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-16657-1_89 12. Bourhim, E.M., Cherkaoui, A.: Exploring the potential of virtual reality in fire training research using A’WOT hybrid method. In: Thampi, S., et al. (eds.) Intelligent Systems, Technologies and Applications. Advances in Intelligent Systems and Computing, vol 1148 Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-3914-5_12 13. El Mostafa, B., Abdelghani, C.: Usability evaluation of virtual reality-based fire training simulator using a Combined AHP and fuzzy comprehensive evaluation approach. In: Data Intelligence and Cognitive Informatics. Springer, Singapore, pp. 923–931 (2021).https://doi. org/10.1007/978-981-15-8530-2_73 14. Bourhim E.M., Cherkaoui A.: Efficacy of virtual reality for studying people’s pre-evacuation behavior under fire. Int. J. Human-Comput. Stud. 142, 102484 (2020). https://doi.org/10. 1016/j.ijhcs.2020.102484. ISSN 10715819
Hybrid Neural Network for Hyperspectral Satellite Image Classification (HNN) Maissa Hamouda1,2(B) and Med Salim Bouhlel2 1
2
ISITCom, Sousse University, Sousse, Tunisia SETIT Laboratory, Sfax University, Sfax, Tunisia [email protected]
Abstract. The computer vision field is becoming very interesting. Several algorithms have been applied for object detection such as deep learning methods (Convolutional Neural Networks (CNN) for example) and clustering methods (Fuzzy C Means (FCM) for example), and have generated good results. However, some types of data (Hyperspectral Satellite Image (HSI) for example) require special handling, due to their amounts of information and correlation between features. In this paper, we present a deep approach, mix of extraction, reduction, clustering and classification, of Hyperspectral Satellite Images. This Hybrid method which takes advantage of CNN in precision and FCM in computational time is composed of two parts: (1) extraction and reduction of spectral characteristics by CNN; (2) extraction and classification of spatial features by FCM. The tests carried out on two public hyperspectral images show the effectiveness of the proposed approach in terms of precision and computation time.
1
Introduction
Artificial Intelligence and its various fields (Computer Vision, Machine Learning, etc.) are very important these years [1,2], to automate the learning tasks, which the human visual system can perform [3,4]. The classification of Hyperspectral Satellite Images (HSI) [5–7] is very important in various areas of security, mapping and natural disasters [8–10]. The extraction and classification methods are various [5,11] and their efficiency depends on the types of images and objectives. A The HSI is a scene captured at multiple wavelengths and at great distances, which makes it very difficult to process. Several techniques have been proposed for the classification of the HSI, such as: HSI and LiDAR data fusion [12], automation of selective parameters [13,14], use of graphs [15], reinforcement learning [16], adaptive learning [17,18], alternative hybrid learning [19], geo-transformation [20], etc. Among the most successful methods for classification we cite deep methods based on artificial neural networks. Among the deep techniques most used for image processing: Convolutional Neural Networks (CNN) [21] and the Generative Adversarial Networks (GAN) [22,23]. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 567–575, 2022. https://doi.org/10.1007/978-3-030-96308-8_53
568
M. Hamouda and M. S. Bouhlel
CNNs are perfect for image classification, in general. Thus, there are several architectures and modes of application of these networks. They are all based on three types of repeating layers: convolutional, correction, subdivision, full connection and classification. The order of layers can influence the precision. And the feature extraction method also influences the results, especially in the case of the HSI. The HSI are made up of several spectral bands with large spatial resolution. There are several methods of extracting characteristics of the HSI; either the extraction of the spatial features of each band separately then do a merge [24,25], either the extraction of spectral and spatial features together [21,26,27], either the selection of a few relevant banners only [28], either by reducing the spectral features then processing the spatial features [29–31]. A recently proposed method DCNN [32], which uses a double CNNs, the first to extract the spectral characteristics of the HSI and the reduction, and the second to extract the spatial characteristics of the HSI and the classification. The results obtained by this method are very effective. Thus, in order to increase the precision and reduce the computation time, we propose in this paper an improvement of the second part of the DCNN [32], where we replace the 2nd CNN branch by an Fuzzy C-Means (FCM) clustering. The injection of this algorithm in the approach allows to reduce especially the processing time and allows to increase the precision. In the next sections, we explain in detail the proposed approach and we show its effectiveness by the results obtained.
2
Hybrid Neural Network
The HSI are made up of several spectral bands (10 to 200 bands) [18]. Thus, each provides characteristics of the scanned land area. The classification of the HSI requires several skills of extraction, reduction and processing of spectro-spatial data. The proposed approach is composed on two parts (Fig. 1a): First, the extraction and reduction of the spectral characteristics of the HSI and reduction of information by CNN. Second, the extraction of the spatial characteristics of the HSI and the random clustering (number of random classes) by the FCM. 2.1
Extraction of Spectral Characteristics and Reduction of Dimensionality
In this section, we describe the first phase of the approach. In this paper we take the first part of the DCN N network proposed in [32]. The used network - is simple and is not very deep, however it can give very good results in a short time - and it is made up of the following layers:
Hybrid Neural Network for Hyperspectral Satellite
569
(a) CNN architect
(b) The convolution layer
(c) The max pooling layer
Fig. 1. The proposed CNN model
• The Convolutional layer (Fig. 1b) which makes it possible to add to each active pixel its neighbors by multiplying them by the corresponding weights l−1 ⊗ of the kernel. The convolution layer is calculated by ynl = f l ( m∈Vnl ym l l l l ηm,n + βn ); with n: neuron, l layer, yn input data, Vn the list of all groups l the convolution kernels of the neuron m in the layer in the layer l − 1,ηm,n l l − 1, and βn the bias. • The ReLU layer which makes it possible to introduce a non-linearity into the functioning of the neuron by replacing all the negative values received as inputs by zeros. The ReLU layer is calculated by Xout = max(0, Xin), with Xin being the input value. • The M ax − P ooling layer (Fig. 1c)which reduces the size of the images while preserving their important characteristics. The Max-Pooling layer is calculated by Xout = max(X1, X2, . . . , Xn), with X1, X2,. . . , Xn the pixels to reduce and n the size of the subdivision filter. • The F ullyConnected layer which makes it possible to connect each neuron to each neuron of the previous layer and for each connection we have a weight. N l−1 l−1 l The Fully-Connected layer is calculated by ynl = f l ( m=1 ym ηm,n + βnl ). • The Summation layer which allows to calculate the summation of the output neurons instead of using the Sof tmax f unction. At the end of this phase, we obtain an image with reduced spectral characteristics (2D image), which go through the following phase for the extraction of spatial characteristics and clustering.
570
M. Hamouda and M. S. Bouhlel Table 1. The HSI datasets
Datasets
Spectral bands Sensor
SalinasA
204
Region
Size
Classes
AVIRIS Salinas Valley, California 86 × 83 pixels
Indian Pines 200
6
145 × 145 pixels 16
AVIRIS Indiana, USA
Table 2. Parameters of the Convolution Neural Network (CNN) Kernel Size2 window Size2 Subdivision Filter2 Kernels number
Parameters
Configuration 3 × 3
2.2
15 × 15
3×3
10
Extraction of Spatial Characteristics and Clustering
In this section, we describe the second phase of the approach. At this level, we propose to use the famous FCM clustering algorithm. The FCM algorithm [33] partition a finite collection X of n elements X = {x1 , ..., xn } in a collection of c fuzzy clusters with respect to a given criterion. Given a finite set of data, the code returns a list of c cluster-centers C = {c1 , ..., cn } with the W , partition matrix W = ωi,j ∈ [0, 1], where i = 1, ..., n, j = 1, ..., c, and each element ωi,j indicates the degree to which the element, xi belongs to cluster cj . FCM aims to minimize an objective function n c n m ωij xi 2 1 m ωij xi − cj , where ωij = , cj = i=1 argmin n 2 ωm c
c
i=1 j=1
k=1
xi −cj xi −ck
i=1
m−1
ij
and 1 ≤ m ≤ ∞. By applying the FCM function (predefined in Matlab) [34], we must indicate to it the input image (2D image), the number of classes (which is unknown for general cases) to obtain a subdivided image as output.
3
Results and Discussions
Table 3. Classification results of SalinasA datasets Class 1 Class 2 Class 3 Class 4 Class 5 Class 6 OA(%) AA 0.8903
0.6277
0.8256
0.5730
0.8105
0.7758
Kappa Time (s)
98.8792 0.9884 0.9864 224.6225
Table 4. Classification results of IndianP ines datasets Class 1
Class 2
Class 3
Class 4
Class 5
Class 6
Class 7 Class 8 Class 9 Class 10
0.9956
0.8638
0.9201
0.9768
0.9537
0.9298
0.9973
0.9556
0.9981
0.9080
Class 11 Class 12 Class 13 Class 14 Class 15 Class 16 OA(%) AA
Kappa Time(s)
0.7671
0.9751
0.9430
0.9802
0.8805
0.9628
0.9911
98.2307 0.9746
658.6338
Hybrid Neural Network for Hyperspectral Satellite
571
In this section, we present the datasets used for the tests, the parameters of the CNN network, the results obtained by the approach, and the discussion with comparisons. 3.1
Datasets
To test the approach, we used tow public satellite images which are shown in the Table 1 [35]; 3.2
Results
In this section, we test the results obtained by the approach on the different hyperspectral datasets (Fig. 2). We first present the parameters used in the first part, i.e. using CNN (Table 2). For the choice of parameters we kept the same parameters affected during the previous approaches (state of the art approaches), so that we can compare our new fusion approach with the previous results. In other upcoming work we will discuss these parameters and look for the best possibilities. For the second part, we have to work with the predefined function of Matlab, FCM [34], with a number of classes chosen at random, equal to 20. The results are presented in tables whose standard evaluation criteria used are: Cohen’s Kappa (K) (the effectiveness of classification with respect to the random assignment of values), Overall accuracy(OA) (the number of correctly classified pixels divided by the total number of reference pixels), Average precision (AA) (the average of each precision per class (sum of the precision for each class/number of classes)) values for the two datasets, and T ime (seconds). In the Tables 3&4, we present the results obtained by the proposed approach. The results obtained by the first dataset (SalinasA) are 99.1454% 100%. The execution time for this same dataset is 227.1330 seconds. The results obtained by the second dataset (IndianP ines) are 98.2307%. The execution time for this same dataset is 663.6364 seconds. From all the tests done, we noticed two things; The first, the results obtained are very precise and efficient, compared to the methods of the state-of-art (Table 5). The second, the calculation time is reduced and very acceptable, compared to the methods of the state-of-art (Table 5). Table 5. Comparaison with state of art methods Method
CNN (Only) FE-CNN-1 FE-CNN-2 Ad-CNN-1
Ad-CNN-2
Proposed CNN-FCM
SalinasA
77.1224
90.6556
93.1634
91.8605
94.8585
98.8792
Indian Pines 71.7337
81.0131
96.0428
94.5493
96.7087
98.2307
Ad-CNN-2
Proposed CNN-FCM
(a) OA (%) Method
CNN (Only) FE-CNN-1 FE-CNN-2 Ad-CNN-1
SalinasA
356.0504
Indian Pines 1.1384e+03
484.2463
710.5860
1.6805e+04 1.3528e+04 224.6225
778.3728
2.2785e+03 1.3428e+05 1.3315e+05 658.6338 (b) Time (s)
572
M. Hamouda and M. S. Bouhlel
Fig. 2. Comparaison with state of art methods (OA%)
3.3
Discussions
In this last section, we compare the proposed method to the state-of-art approaches (Table 5): • CNN (Only): We tested CNN (the same architecture used in the approach) with reduction of the spectral characteristics of the images by the mean. • CNN based on adaptive parameters: Ad-CNN-1 [14](CNN based on adaptive kernels and batches, with Mean-pooling layers), and Ad-CNN-2 [21](CNN based on adaptive kernels and batches, with Max-pooling layers). • CNN based on spectral feature reduction: FE-CNN-1 [5] (CNN based on Smart Spectral Feature Extraction), FE-CNN-2 [32](CNN based on Dual Extraction). All these methods previously mentioned were re-calculated with the same initial parameters of this current paper (Table 2: Size of window 2 =15 × 15, Filter of Subdivision2 = 3 × 3, Size of Kernel2 = 3 × 3, and Number of Kernels=10) and spectral reduction by the mean for some cases (Ad-CNN-1 [14] and Ad-CNN-2 [21]). From the experiments, we notice that the proposed method is always very efficient for the different data tests in precision and sometimes in the calculation time.
4
Conclusions
In this paper, we have proposed a mix between Deep Learning and Fuzzy Clustering, in an Hybrid processing aspect, for the classification of the HSI. Indeed, the approach is based on two steps: the first is the phase of extraction of spectral characteristics and reduction of spectral dimensionality, using a modified CNN based on an output summation. The second phase is the extraction of spatial features and clustering using FCM. The results obtained by this approach prove its efficiency in terms of precision and processing time. Acknowlegment. This work was supported by the Ministry of Higher Education and Scientific Research of Tunisia.
Hybrid Neural Network for Hyperspectral Satellite
573
References 1. Landolsi, M.Y., Haj Mohamed, H., Ben Romdhane, L.: Image annotation in social networks using graph and multimodal deep learning features. Multimed. Tools Appl. 80, 12009–12034 (2021). https://doi.org/10.1007/s11042-020-09730-8 2. Meftah, L.H., Braham, R.: Transfer learning for autonomous vehicles obstacle avoidance with virtual simulation platform. In: Abraham, A., Piuri, V., Gandhi, N., Siarry, P., Kaklauskas, A., Madureira, A. (eds.) Intelligent Systems Design and Applications. ISDA 2020. Advances in Intelligent Systems and Computing, vol. 1351. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-71187-0 88 3. Abdmouleh, M.K., Khalfallah, A., Bouhlel, M.S.: A novel selective encryption scheme for medical images transmission based-on JPEG compression algorithm. Proc. Comput. Sci. 112, 369–376 (2017) 4. Ferchichi, O., Beltaifa, R., Jilani, L.L.: A reinforcement learning approach to feature model maintainability improvement. SCITEPRESS - Science and Technology Publications, Setubal (2021) 5. Hamouda, M., Ettabaa, K.S., Bouhlel, M.S.: Smart feature extraction and classification of hyperspectral images based on convolutional neural networks. IET Image Process. (2020) 6. Singh, M.K., Mohan, S., Kumar, B.: Hyperspectral image classification using deep convolutional neural network and stochastic relaxation labeling. J. Appl. Remote Sens. 15 (2021) 7. Deshpande, A.M., Roy, S.: An efficient image deblurring method with a deep convolutional neural network for satellite imagery. J. Indian Soc. Remote Sens. (2021) 8. Hamouda, M., Bouhlel, M.S.: Modified convolutional neural networks architecture for hyperspectral image classification (extra-convolutional neural networks). IET Image Process. (2021) 9. Rawal, R., Pradhan, P.: Climate adaptation: reliably predicting from imbalanced satellite data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 78–79 (2020) 10. Nishchal, J., Reddy, S., Priya, N.N., Jenni, V.R., Hebbar, R. and Babu, B.S.: Pansharpening and semantic segmentation of satellite imagery. In: 2021 Asian Conference on Innovation in Technology (ASIANCON). IEEE, August 2021 11. Woodbright, M., Verma, B., Haidar, A.: Autonomous deep feature extraction based method for epileptic EEG brain seizure classification. 444, 30–37 (2021) 12. Mohla, S., Pande, S., Banerjee, B., Chaudhuri, S.: Fusatnet: dualattention based spectrospatial multimodal fusion network for hyperspectral and lidar classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 92–93 (2020) 13. Merrill, N., Olson, C.C.: Unsupervised ensemble-kernel principal component analysis for hyperspectral anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2020 14. Hamouda, M., Ettabaa, K.S., Bouhlel, M.S.: Modified convolutional neural network based on adaptive patch extraction for hyperspectral image classification. In: 2018 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). IEEE, pp. 1–7 (2018) 15. Rahiche, A., Cheriet, M.: Forgery detection in hyperspectral document images using graph orthogonal nonnegative matrix factorization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 662–663 (2020)
574
M. Hamouda and M. S. Bouhlel
16. Rout, L., Shah, S., Moorthi, S.M., Dhar, D.: Monte-Carlo Siamese policy on actor for satellite image super resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 194–195 (2020) 17. Zhang, L., Nie, J., Wei, W., Zhang, Y., Liao, S., Shao, L.: Unsupervised adaptation learning for hyperspectral imagery super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020 18. Hamouda, M., Ettabaa, K.S., Bouhlel, M.S.: Hyperspectral imaging classification based on convolutional neural networks by adaptive sizes of windows and filters. IET Image Process. 13(2), 392–398 (2018) 19. Garnot, V.S.F., Landrieu, L., Giordano, S., Chehata, N.: Satellite image time series classification with pixel-set encoders and temporal self-attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12 325–12 334 (2020) 20. Lu, X., Li, Z., Cui, Z., Oswald, M.R., Pollefeys, M., Qin, R.: Geometryaware satellite-to-ground image synthesis for urban areas. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 859–867 (2020) 21. Hamouda, M., Saheb Ettabaa, K., Bouhlel, M.S.: Adaptive batch extraction for hyperspectral image classification based on convolutional neural network. In: Mansouri, A., El Moataz, A., Nouboud, F., Mammass, D. (eds.) International Conference on Image and Signal Processing, pp. 310–318. Springer (2018). https://doi. org/10.1007/978-3-319-94211-7 34 22. Mehta, A., Sinha, H., Narang, P., Mandal, M.: Hidegan: a hyperspectral-guided image dehazing GAN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 212–213 (2020) 23. Tasar, O., Tarabalka, Y., Giros, A., Alliez, P., Clerc, S.: Standardgan: multi-source domain adaptation for semantic segmentation of very high resolution satellite images by data standardization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 192–193 (2020) 24. Fotiadou, K., Tsagkatakis, G., Tsakalides, P.: Deep convolutional neural networks for the classification of snapshot mosaic hyperspectral imagery. Electron. Imaging 2017(17), 185–190 (2017) 25. Yang, J., Zhao, Y.-Q., Chan, J.C.-W.: Learning and transferring deep joint spectral–spatial features for hyperspectral classification. IEEE Trans. Geosci. Remote Sens. 55(8), 4729–4742 (2017) 26. Liu, Q., Zhou, F., Hang, R., Yuan, X.: Bidirectional-convolutional LSTM based spectral-spatial feature learning for hyperspectral image classification. Remote Sens. 9(12), 1330 (2017) 27. Hamouda, M., Ettabaa, K.S., Bouhlel, M.S.: Framework for automatic selection of kernels based on convolutional neural networks and CK means clustering algorithm. Int. J. Image Graph. 19(04), 1950019 (2019) 28. Lorenzo, P.R., Tulczyjew, L., Marcinkiewicz, M., Nalepa, J.: Hyperspectral band selection using attention-based convolutional neural networks. IEEE Access 8, 42 384–42 403 (2020) 29. Santara, A., et al.: BASS net: band-adaptive spectral-spatial feature learning neural network for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 55(9), 5293–5301 (2017) 30. Luo, F., Zhang, L., Du, B., Zhang, L.: Dimensionality reduction with enhanced hybrid-graph discriminant learning for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 58(8), 5336–5353 (2020)
Hybrid Neural Network for Hyperspectral Satellite
575
31. Luo, F., Zhang, L., Zhou, X., Guo, T., Cheng, Y., Yin, T.: Sparse-adaptive hypergraph discriminant analysis for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 17(6), 1082–1086 (2020) 32. Hamouda, M., Bouhlel, M.S.: Dual convolutional neural networks for hyperspectral satellite images classification (DCNN-HSI). In: Yang, H., Pasupa, K., Leung, A.C.S., Kwok, J.T., Chan, J.H., King, I. (eds.) ICONIP 2020. CCIS, vol. 1332, pp. 369–376. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-63820-7 42 33. Bezdek, J.C., Ehrlich, R., Full, W.: FCM: the fuzzy C-means clustering algorithm. Comput. Geosci. 10(2–3), 191–203 (1984) 34. Fuzzy c-means clustering. https://www.mathworks.com/help/fuzzy/fcm.html 35. GIC: Hyperspectral remote sensing scenes. Grupo de Inteligencia Computacional, April 2014
Implementation of the Business Process Model and Notation in the Modelling of Patient’s Clinical Workflow in Oncology Nassim Bout1,5(B) , Rachid Khazaz2 , Ali Azougaghe1,3 , Mohamed El-Hfid3,4 , Mounia Abik1 , and Hicham Belhadaoui5 1
3
5
National Higher School of Computer Science and Systems’ Analysis, Mohammed V University, Rabat, Morocco nassim [email protected] 2 ENOVA R&T, Rabat, Morocco Tangier-Tetouan-Al Hoceima University Hospital Center, Tangier, Morocco 4 Faculty of Medicine and Pharmacy, Abdelmalek Essaadi University, Tangier, Morocco National Higher School of Electricity and Mechanics, Hassan II University, Casablanca, Morocco
Abstract. In the field of oncology, the patient’s case needs to be treated by various services due to the crucial dependency on the advice of several specialists, hence multiple activities and tasks are performed. A lack of organization is accrued in the absence of a standard business process model that unifies the oncology patients’ clinical workflow in Morocco, which affects the Hospital Information System operations directly. The main objective of this research is to create a generic business process model of the patient’s clinical workflow in oncology which unifies the service’s process regarding the Moroccan requirements. After exploring the state of the art, we established a qualitative field study to fully understand the specificity of the Moroccan hospital systems’ organization then model its business processes using the Business Process Model and Notation. This research proposed a business process model of the patient’s clinical workflow in oncology to contribute to the intelligent digital transformation of the Moroccan hospital information system that will improve the clinical regime of the patients. This research fills an organizational gap by proposing a clear representation of the patient’s clinical workflow at the Moroccan hospitals’ oncology services, this representation can be understood by both administrative and technical staff without misinterpretation. Keywords: BPMN · Clinical workflow · Digital transformation Hospital Information System · Oncology
1
·
Introduction
Humanity went through a successive number of outstanding innovations that changed the shape of the world, mostly in healthcare, among these innovations, c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 576–586, 2022. https://doi.org/10.1007/978-3-030-96308-8_54
Implementation of the BPMN in the Modelling of Patient’s CW in Oncology
577
healthcare information technologies are considered as foremost tools for improving healthcare quality and safety. Telemedicine and asynchronous telemedicine for example are as effective as face-to-face care, also electronic consultations may reduce patient wait times for specialists’ appointments, meetings, and decisionmaking [2]. A collaboration is launched during this research with the motivation of investigating the possible ways to contribute to the intelligent digital transformation that the Moroccan hospitals’ known these recent years by creating and ameliorating the telemedicine capabilities among different specialists and services. The success of any medical treatment is time management, especially for the oncology domain. The early cancer detection increases the chances of a successful diagnosis and might save the patient’s life. Due to the COVID-19 pandemic, the idea of adopting virtual proceedings such as meetings becomes a need to solve and avoid the blockage of medical services and the severe delays that affect cancer patients [10]; Nevertheless, the problem that rises is the complexity of the Patient’s Clinical Workflow. The Patient’s Clinical Workflow mainly is the interaction between actors in the hospitals over various tasks held in appropriate services, the medical information flows respecting the clinical workflow which must be represented very carefully with respect to detailed specification of the facility, any errors occurring the patient’s clinical workflow will affect the data recorded in the Electronic Health Record (EHR), hence the whole Hospital Information System (HIS). This research intends to model a generic Business Process Model based on the Patient’s Clinical Workflow of the University Hospital Tangier-Tetouan-Al Hoceima - which relies on the procedures established by Moroccan legislation and the Ministry of Health - concerning any type of oncology facility, information system, variant health records, and especially software engineering foundations. The objectives of the current research can be summarized as follows: – Understanding the previous oncology service’s business processes (if they exist). – Gathering field information extracted by observing the Regional Oncology Center’s (RCO) clinical workflow by applying a qualitative approach. – Modeling a business process of the patient clinical workflow in oncology by unifying its representation.
2
State of the Art
Workflow in healthcare or clinical workflow speaks for the sequence of tasks performed (physically and mentally) by patients, professionals, and administrators within and between hospital environments [1]. Whether sequentially or simultaneously the workflow occurs at various levels [3]: – Inter-Organisational Workflow: between services, for example, hospitalization service and pharmacy service.
578
N. Bout et al.
– Clinical-Level Workflow: between people (patients, doctors, nurses, etc.) or between patients EHR and their clinical and healthcare process. – Intra-Visit Workflow: the workflow occurred within a given service by following protocols and rules of the facility. – Cognitive Workflow: the mental workflow that occurs during the decisionmaking or the orders’ processes. 2.1
The Clinical Workflow’s Conceptual Aspects
Usually, the clinical workflow is a process that occurs at various levels within the health facility, in information technology, the workflow and processes are often represented using business process modeling languages. The most popular among them is BPMN (Business Process Model and Notation) [8], and the UML Activity Diagram [9]. In a study concerning the healthcare processes and their representation using BPMN [7], the researchers are addressing and identifying the particular problems related to the roles and task assignment, they claim that BPMN fails to produce “nice and easily comprehensible results”, they also precisely specify that the problems become apparent during process elicitation in a medical environment. For this reason, they proposed an original approach by incorporating role information in process models using the color attribute of tasks as a complementary visualization to the usage of lanes. The clinical processes in a multidisciplinary hospital are inherently complex, which leads the researchers to introduce specific modeling requirements of the healthcare domain which will next be supported by BPMN to capture the processes: – – – –
Many roles participate in one process. Several specialists work together on a shared task. A task can be alternatively performed by different roles. A task can optionally involve additional roles.
Fig. 1. Core BPMN graphical modeling elements [7]
Implementation of the BPMN in the Modelling of Patient’s CW in Oncology
579
Fig. 2. A process of the preparation for a surgery [7]
BPMN is known for its core modeling elements (Fig. 1) Pools and lanes are used to structure the process diagram and separate respectively the organizational units and organizations. Without going into the details of the approach, (Fig. 2) shows how a clinical process is modeled using BPMN by describing in this example the preparation process for a difficult surgery, particularly this is done by capturing the requirements of many roles and shared tasks simultaneously, where the task is performed, by whom, and which role is affected to the performer [7].
Fig. 3. A demonstration model of wisdom tooth treatment [4]
580
N. Bout et al.
Another example of the usage of BPMN to model clinical workflow is shown in [4], where the researchers provide a valid BPMN extension for clinical pathways (CPs) that can be applied by domain experts or even customized by model engineers. (Figure 3) demonstrates the evolved BPMN extension by presenting a simplified specific clinical practice guideline of a wisdom tooth treatment.
Fig. 4. Workflow diagram for the chemotherapy department [12]
An additional study [12], in which researchers present a conceptual model of an oncology information system based on the users’ requirements. They used a UML Activity Diagram to model the structural and behavioral workflow of the system - more precisely the chemotherapy and radiotherapy clinical workflow (Fig. 4) - based on data elements and functional requirements extracted, along with a cancer care workflows reported in [11]. Now that we reviewed the conceptual aspects of the patient’s clinical workflow we have the technical guidelines to conceptualize its business process; However, there is still a need to match these guidelines with the Moroccan context and investigate their applicability.
Implementation of the BPMN in the Modelling of Patient’s CW in Oncology
2.2
581
Related Works to the Moroccan Context
In the context of the Moroccan hospital information system, several research studies have focused on the organization of the Patient Clinical Workflow, such as [5], a group of researchers analyzed the existing EHR at The National Institute of Oncology (NIO) at Rabat, the results were a reduction of the steps taken by the patient going back and forth between the institute’s services, and improvement of the healthcare quality. They claim that they were able to reduce these twelve steps into two (Fig. 5). This proposal lacks precision because it is only focused on the medical assistance plan’s patients (RAMED), while they assert that the proposal is global for the Moroccan context without paying attention to the plurality of services and the different types of patients’ medical coverage within the institute. The hypothesis is that such a plurality will cause redundancy, which will affect severe delays, therefore rather than generalizing we end up multiplying the clinical workflow.
Fig. 5. NIO’s EHR clinical workflow [5]
Another study [6] addressed a very important point which is the factors that influence delays throughout a patient’s clinical pathway by documenting time intervals in cervical cancer care pathways. According to them the concept of delayed diagnosis has become an important issue; They categorized it into four components: – – – –
Patient delay Healthcare provider’s delay Referral delay System delay
Taking into account other influencing factors such as clinical, sociodemographic, and treatment they studied time intervals between 2013/2017 starting
582
N. Bout et al.
from symptoms onset to disease detection and beginning of treatment. They concluded that the integration of a model that standardizes the care pathways of the Moroccan health system is essential to unify cervical cancer care process in the country, this unification makes it possible to improve care pathways and reduces long waiting times. The Patient Clinical Workflow is extremely advantageous as it sleeks the communication process and makes all information rapid, available, and interoperable, which makes a better connected HIS. The main benefit is allowing physicians to spend more time with the patient, where there is no need to chase clinical information (reports, paper records, files, and scans) it can also be shared within the hospital’s services and among practitioners, not only at the local boundaries but also between any HIS around the world. In the light of these studies regarding the Moroccan context, the current research proposed a Patient Clinical Workflow representation of the oncology service of the RCO Tangier.
3
Methods and Materials
By investigating the various services, actors, and events within the Tangier’s Regional Center of Oncology (RCO) workflow process, which is affiliated with the UHC of TTA. The current research proposes a generic business process model that can be implemented in oncology facilities regardless of the patient’s type of medical coverage; To achieve that, we organized field research at the RCO, the field research has helped to propose several solutions that meet the: – – – –
Oncology Patient Clinical Workflow specifications. Analysis of existing Clinical Workflow. Analysis of existing clinical workflow’s situation. Alignment, genericity, and interoperability of any proposed business process with Hospital information systems’ specifications.
By observing and conducting a qualitative study that includes noting all the characteristics of the various services provided by the RCO, the intention here was to include them effectively in the business process to clarify and reduce steps taken by the patients during their visits or their stay, and most importantly extracting the main medical information included in the workflow and necessary during other procedures done in the RCO, for example, the Multidisciplinary Team Meeting’s discussions; These noted services include: – – – – – – –
Consultation service. Anatomical Pathology service. Biological Analysis service. Radiotherapy service. Chemotherapy service. Pharmacy service. Hospital Care service.
Implementation of the BPMN in the Modelling of Patient’s CW in Oncology
583
Using BPMN we represent the patient workflow, which contains various activities in oncology services performed in both administrative and operational ways (Fig. 6), these activities are controlled by several events, connected using multiple associations, and communicated using messages or documents. As we are interested in elaborating a complete workflow, we studied the existing one in the RCO; In addition, we mapped various documents which are used in the other services and included them in the contribution.
Fig. 6. The RCO’s Oncology Clinical Workflow Main Components
By studying the existing situation at the RCO, we found that the patient’s clinical workflow is divided into four BMPN models, two models for new patients and previously registered patients, the second models representing patients having the RAMED, who have medical coverage, and who do not. Following that logic, each service will have at least four business process models representing a simple clinical workflow with minor changes. The more complex the clinical workflow’s representation gets, the more redundancy, and errors that occur in the OIS, which also affects the EHR. If the process is time-consuming, it will create a loop effect that never ends. For this reason, our contribution is to unify the clinical workflow for each service of the oncology department using BPMN, then merge the verification process of patients’ medical coverage to have one path avoiding business processes’ redundancy.
584
4
N. Bout et al.
Results
This section exhibits a part of the proposed business process model and reveals its relationship with some proceedings in the oncology facility. Each time a patient arrives at the RCO the Reception and Admission Service (RAS) checks if he already exists in the system or not to be oriented to the reception of new cases. the RAS is in charge of the management of medical analysis and anatomopathological examinations. (Figure 7) shows the Reception and Admission Service reception workflow.
Fig. 7. Reception and Admission Service Patient’s Clinical Workflow
This is fairly one portion of the created business process that covers all the components mentioned previously, as a result, this proposal mainly is remarking the development of a strategic vision to guide the implementation of wellconceptualized digital solutions in hospitals. Such a standardized description of the clinical workflow (healthcare process) will guarantee a logical assessment of data and technological infrastructures necessary for the development and deployment of information technology in Moroccan hospitals, especially the intelligent one. (Table 1) reviews the sequence of tasks performed by patients and professionals within various services in oncology department, which are represented in our business process model. After visiting each service and specialist by the patient, multiple diagnoses, analyses, medications, and check-ups from all the mentioned services earlier are generated and included directly in the EHR, these pieces of information are founding the base of different proceeding in the RCO such as the Multidisciplinary Team Meetings’ sheet and the Annual Patient’s Reports.
Implementation of the BPMN in the Modelling of Patient’s CW in Oncology
585
Table 1. Patient’s Clinical Workflow analysis for the Sequence of Tasks Performed
5
Actor/Swimlane
Activity/Task
Patient
The patient is oriented by the RAS to be received by other services and either give him an appointment or a consultation, every service given to the patient is invoiced and must be recorded
Nursing Service
Conducts a preliminary sorting of patients then orients them
Reception and Admission Service
Orients new cases and checks patients admission, manages the cash register
Reception of New Cases
Registers new patients, edition of patients’ cards and EHR labels
Physicians/Specialists (Medical Oncology, Radiotherapy, etc.)
Performs a physical or/and an artificial exam to assess the patients to determine the possible appropriate diagnosis for them
Receptions (Medical Oncology, Radiotherapy, etc.)
Inform and refer patients during their visit to a service, or refer them to a related service if necessary
Treatment Rooms (Medical Oncology, Radiotherapy, etc.)
Where patients are treated based on the service
Conclusion
Nowadays, with the blast of information technology usage, it becomes the backbone of healthcare and patient treatment in hospitals. In big complexes such as hospitals, adopting an information technology solution requires study and modeling, which gives us the Hospital Information System (HIS). Within this HIS, information is circulating in a specific Patient’s Clinical Pathway (according to the form of the organization) under the Electronic Health Records. Patients Information is recorded in the EHR and then could be extracted when needed. However, regarding the literature review and the field research done at the Regional Center of Oncology Tangier, the Patient’s Clinical Workflow must be represented with precision to gain a fluent information system with the minimum errors possible. This research implemented the Business Process Model and Notation to create a representation of the Patient’s Clinical Workflow in oncology by reviewing previous practical experiences of national and international oncology facilities filling an organizational gap regarding the Patient’s Clinical Workflow in general and unifying its representation over the national scale. As for extending the current study, the researchers are working on ameliorating various telemedicine solutions based on detailed BPMN representations like the current study suggested; However, the continuity of this work shall be preserved where complex hospital services in Morocco still lake a clear clinical workflow analysis and standardization such as the surgery service.
586
N. Bout et al.
Acknowledgement. We thank MD. Prof. Mohamed El Hfid (CHU TTA) for providing support in the construction of the models in this paper, we thank the Bio-MSCS Master’s degree community (ENSIAS), also a special thanks to Prof. Hicham Belhadaoui (ENSEM).
References 1. What is workflow? https://cutt.ly/IRjf3nJ 2. Alotaibi, Y.K., Federico, F.: The impact of health information technology on patient safety. Saudi Med. J. 38(12), 1173–1180 (2017). https://doi.org/10.15537/ smj.2017.12.20631. https://smj.org.sa/lookup/doi/10.15537/smj.2017.12.20631 3. Birch, R.: What are workflows (clinical workflow) (2020). https://cutt.ly/PQTeqw1 4. Braun, R., Schlieter, H., Burwitz, M., Esswein, W.: Bpmn4cp: Design and implementation of a BPMN extension for clinical pathways. In: 2014 IEEE international conference on bioinformatics and biomedicine (BIBM), pp. 9–16. IEEE (2014). https://doi.org/10.1109/bibm.2014.6999261 5. Fakhkhari, H., Bounabat, B., Bennani, M., Bekkali, R.: Moroccan patient-centered hospital information system: Global architecture. In: Proceedings of the ArabWIC 6th Annual International Conference Research Track, pp. 1–6 (2019). https://doi. org/10.1145/3333165.3333175 6. Mimouni, H., et al.: The care pathway delays of cervical cancer patient in Morocco. Obstetrics and Gynecology International 2020 (2020). https://doi.org/10.1155/ 2020/8796570 7. M¨ uller, R., Rogge-Solti, A.: BPMN for healthcare processes. In: ZEUS (2011). https://cutt.ly/FQkXBGn 8. OMG, O.M.G.: Business Process Model and Notation (BPMN) Version 2.0. OMG, Object Management Group (2011). http://www.omg.org/spec/BPMN/2.0 9. OMG, O.M.G.: Unified Modeling Language (UML) Version 2.5.1. OMG, Object Management Group (2017). https://www.omg.org/spec/UML/2.5.1/PDF 10. Sidpra, J., Chhabda, S., Gaier, C., Alwis, A., Kumar, N., Mankad, K.: Virtual multidisciplinary team meetings in the age of COVID-19: an effective and pragmatic alternative. Quantit. Imaging Med. Surg. 10(6), 1204–1207 (2020). https://doi. org/10.21037/qims-20-638. http://qims.amegroups.com/article/view/42380/html 11. Yazdanian, A., Ayatollahi, H., Nahvijou, A.: Oncology information system: a qualitative study of users’ requirements. Asian Pacif. J. Cancer Prevent.: APJCP 20(10), 3085 (2019). https://doi.org/10.31557/APJCP.2019.20.10.3085 12. Yazdanian, A., Ayatollahi, H., Nahvijou, A.: A conceptual model of an oncology information system. Cancer Manage. Res. 12, 6341 (2020). https://doi.org/10. 2147/CMAR.S259013
Mobile Cloud Computing: Issues, Applications and Scope in COVID-19 Hariket Sukesh Kumar Sheth1(B)
and Amit Kumar Tyagi2
1 School of Computer Science and Engineering, Vellore Institute of Technology,
Chennai, Tamil Nadu 600127, India [email protected] 2 Centre for Advanced Data Science, Vellore Institute of Technology, Chennai, Tamil Nadu 600127, India [email protected]
Abstract. As the world is transitioning into a tech-savvy era, the twenty-first century is evidence of many technological advancements in the field of AI, IoT, ML, etc. Mobile Cloud computing (MCC) is one such emerging technology, providing services regardless of the time and place, contours the limitations of mobile devices to process bulk data, providing multi-platform support and dynamic provisioning. Not only there is an enhancement in computation speed, energy efficiency, execution, integration, but also incorporates considerate issues in terms of clientto-cloud and cloud-to-client authentication, privacy, trust, and security. Reviewing and overcoming addressed concerns is essential to provide reliable yet efficient service in nearing future. Mobile Cloud Computing has the potential to bring wonders in the fields such as education, medical science, biometry, forensics, and automobiles, which could counter the challenges faced in the ongoing COVID-19 Pandemic. To combat the prevailing challenges due to COVID-19, it has become critical that more efficient and specialized technologies like Mobile Cloud Computing are accepted that enable appropriate reach and delivery of vital services by involving gamification, cloud rendering, and collaborative practices. This paper provides a detailed study about MCC, mitigated security and deployment attacks, issues, applications of MCC, providing developers and practitioners opportunities for future enhancements. Keywords: Mobile cloud computing · Data processing · Security · Authentication system · Issues in MCC · COVID-19 · Smart applications
1 Introduction Cloud Computing is an ever-growing technology, incorporating the potential to attract most organizations because of its incomparable efficiency to provide services like Virtual Machines, storage custom networks, Middleware, and resources to their customers and users. Now the question arises, how Mobile Cloud Computing is different. Entitling Mobile Cloud Computing (MCC) as an inheritance of Cloud Computing would not be wrong. Incorporating the services offered by Cloud Computing, with the presence of a Mobile Computing Environment, MCC has proved to be the potential technology © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 587–600, 2022. https://doi.org/10.1007/978-3-030-96308-8_55
588
H. S. K. Sheth and A. K. Tyagi
of the future. The Cloud Services are made available via wireless media, responsible for successful communication between mobile devices and clouds. There is a need for interfacing between Mobile Devices and Cloud so that the computational phases of any application can be offloaded and resources are received whenever re-quested. MCC is not only smartphone-specific but is implementable for a wide range of devices. [1]. It enables the delivery of applications and services from a remote cloud server or environment. The spike in the number of mobile users and contouring the issues faced in mobile devices like slow processing power, limited storage space, low bandwidth – MCC anticipates reaching USD 118.70 billion by the end of 2026, recording a compound Annual Growth Rate (CAGR) of 25.28% during the forecast period (2021–2026) [2]. Mobile computing involves how mobile devices learn the context related to their mobility and networking, access the Internet in an ad havoc communication environment. Despite the benefits offered by MCC, the proportion of new users switching to this technology is not equivalent and comparatively less. Since the world is now transitioning to an era where paradigms such as IoT, and Artificial Intelligence (AI), are consistently researched for integrating with mobile computing technologies, it is the need of the hour to review the challenges, issues, and solutions that are being addressed and proposed till date. The foremost reason for the same is crucial issues in disciplines like Client-To-Cloud and Cloud-To-Client authentication, security, communication channels, and protecting resources. Apart from this, there is a need for ensuring QoS (Quality of Service) provisions, standard protocol, signaling, Context-Aware Mobile Cloud Services, and Service integration.
2 Motivation and Structure of Work Artificial Intelligence, IoT (Internet of Things) integration with Mobile Cloud Computing can propose solutions for real-world problems. Additionally, give practical solutions in the ongoing COVID-19 Pandemic in various sectors such as Healthcare, Education, logistics, management by addressing the research gaps, comparing the solutions proposed till now, and analyzing the contributions made in this field. The structure of this paper is as follows: Sect. 3 would be discussing the applications of Mobile Cloud Computing (MCC) in various domains. Section 4 would be highlighting the implementation of various models for the discussed applications. Section 5 would be stating the current challenges and issues faced in MCC and the solutions given to challenges by other researchers. Sections 6 would be discussion on the scope of Mobile Cloud Computing in COVID-19 Pandemic in the mentioned applications. In Sect. 7, the authors would be concluding the paper, along with highlighting the research gaps and topics for future research.
3 Applications of Mobile Cloud Computing The number of databases can make it difficult to find accurate and appropriate information from the available resources. Even when we have well-defined needs, they may not be accessible from our current location, with mobile cloud computing, access to any database with an internet connection, and a device that supports cellular data or wifi where available.
Mobile Cloud Computing: Issues, Applications and Scope in COVID-19
589
• Blockchain: Specifically, in this pandemic time, we have witnessed a transition to Electronic Health Records (EHRs) [4, 5] rather than the traditional printed Medical Reports on mobile cloud environments. This new shift also raises concerns about data privacy and network security. We have frameworks that combine blockchain and decentralized interplanetary file systems (IPFS) [3] on a mobile cloud platform. Using Ethereum blockchain with Amazon cloud computing can provide an effective solution for reliable data exchanges on mobile clouds, eliminating the need for specialized and centric storage systems. • Artificial Intelligence (AI): Artificial Intelligence (AI) tools handle large workloads and, organizations are selling products with improved abilities, enabling users to access inordinate functionalities of the software. Integrating AI with the cloudbased application will suggest services based on a behavioral study. Provide live and automated services like chats and emails, prediction of user tone. Cloud-based AI is an asset enabling many organizations to deliver the utmost in this digitalizing era. • Internet of Things (IoT): Mobile IoT Cloud Computing is the intersection of fields like Cloud Computing, IoT Cloud Computing, IoT, Mobile IoT Computing, Mobile Computing, and Mobile Cloud Computing [6]. IoT follows the principle of multiple data offloading schemes to increase smartphones, devices applications performance, energy efficiency, and execution support. Machine Learning-Based Mobile Offloading Scheduler (MALMOS), [7] having a novel approach to using online machine learning algorithms. It assumes attributes as independent of each other and also has a drawback of biasing towards earlier observations. The computing models proposed for the MCC can not only be limited to the field of IoT. It also extends to branches such as Nano Things (IoNT) and Under Water (IoUW). • Internet of Mobile Things (IoMT) – IoMT is one abused, misunderstood thing. As previously mentioned about Mobile IoT Cloud Computing – IoMT deals with challenges faced by devices such as mobiles, smartphones, smartwatches, and wearable technologies put forward. Such mobile devices include smartphones, vehicles, wearable devices, and smartwatches. Internet of Mobile Things [8] is mainly referred to as the Internet of Moving Things [9], Internet of Medical Things [10], Internet of Multimedia Things [11], and the Internet of Manufacturing Things. • Machine Learning: Machine learning and the code offloading mechanism in the Mobile Cloud Computing concept enables the operation of services to be optimized, among others, on mobile devices. This technology will enable hybrid applications to be built with code transfer that runs on different operating systems (such as Android, iOS, or Windows), which decreases the amount of work required from developers, as the same code is executed on a mobile device and in the cloud. • DevOps: App Development faces challenges such as handling multiple screen sizes and variant operating systems. Deployment of apps with cross-functional capabilities by transferring program data and moving servers on the cloud. Both the Development and Operations team processes can be tremendously speeded up. The storage capabilities with added computing power have played a vital role in the advancement in app development. There are several pros of using MCC like MultiPlatform Support for Cloud-based applications, Faultless integration of database, expeditious app development, Comprehensive Data Recovery and, secure data storage.
590
H. S. K. Sheth and A. K. Tyagi
• Healthcare: Healthcare is a very important application to be adapted via the mobile cloud computing approach. It has become the need of the hour to innovate and implement certain technologies that reduce the communication gaps between several departments in this sector as well as the services are offered with speed and utmost efficiency. Unlike resorting to Traditional models, MCC encourages to shift to a consumer-driven healthcare model. MCC not only would ensure faster transmission of data, sending g EHRs, generating reports, maintaining the confidential data of the patients but also would open the scope for a new future field that contours the common issues faced these days such as Hug storage, Online Services, Adapted Protocols and an architecture that can be molded to any type of system architecture without any need of complex permutations. • Education: Being highly efficient, feasible and flexible, Mobile Cloud Computing based systems cam surely be used to integrate with the educational services and softwares. This not only will benefit the management for an easier execution but also it would be beneficial for the students to learn from anywhere and anytime. Mobile Cloud Computing would be the best option to avail when it comes to gamification of the course content and educational resources. Gamification consumes and requires sufficient memory and computations for the system in whole to work without any disruptions. MCC can ease this situation by Cloud Rendering and Offloading - which would ensure that the content, quantity and productivity are not at all compromised. Integrating MCC with Education sector would ensure that students are able to access the content almost from anywhere, would encourage collaborative practices as well.
4 Mobile Cloud Computing Models After the discovery and evolution of personal computers, Mobile Cloud Computing is one such paradigm that has changed how data is stored and shared. It disregards the process followed by the traditional computational models (Distributed Computing), Von Neumann Architecture, and Turing machines. The present proposed computing models are human-machine and machine-machine interactions based. Even though it is future technology, MCC capital costs are more or less the same as estimated for the traditional computational systems. Combining the plethora of fields catering to accessibility, authorization, accounting, and efficiency, amalgamizing wireless web technology, cloud, and mobile processing. In MCC, the data from the mobile devices intended to be sent to a cloud-based platform is moved to a central processor via a base Transceiver. Similar to the Version Control technologies present like Git and BitBucket, information about user identity, location, and network statistics and routes are stored and maintained. MCC thus ensures the maintainer to have an appropriate check on the authenticity of the changes being made. Before any transfer, MCC also certifies that a legible copy of the data files sent over the channels is created (which are used for re-transmission/transfer faults). As explained in Fig. 1, MCC extends its services by catering to authorization, accessibility, accounting, and efficiency.
Mobile Cloud Computing: Issues, Applications and Scope in COVID-19
591
Fig. 1. Mobile cloud computing infrastructure
4.1 Service and Deployment Models 4.1.1 Service Models • Infrastructure as a Service (IaaS): IaaS deals with Resource Access like networks, data storage, and computers. However, the user is not authorized to have control over the resources or any sort of deployment. Even though these are considered to be basic, but are very important building blocks of the cloud. It allows the user to pay only for the resources utilized and have limited control over the network components. • Platform as a Service (PaaS): It grants resources for the development and management of applications. User is allowed to access these resources by Programming Language, tools, libraries, and services. It mainly deals with the software application deployment and testing, because even though the deployment is successful there might be some concerns and parameters to be checked. It facilitates the users to not focus on the underlying architecture. • Software as a Service (SaaS): SaaS is a software application delivery to the end-users. SaaS focuses on an easily reachable application that is accessed from web browsers or any program interface. In this service, a user is not supposed to be concerned about the architecture and maintenance of the underlying infrastructure. • Virtualization: Virtualization is a concept that involves decoupling the hardware from the system and then putting it up on the machine. Unlike Physical Machines, Virtual machines can be considered as their illustrations that are closely maintained to run on monitor software host, which is termed as a hypervisor [7]. These hypervisors are manly of 2 types and are responsible for implementing virtualization on the physical machine Variety 1 type hypervisors are native hypervisors that run on bare metal or can directly control the host’s hardware and monitor the guest operating systems. On the other hand, type 2 types of the hypervisor are hosted hypervisors and run within an environment. • Computing as a Service (CaaS) – In this, the main focus, is on investing less on the hardware related services and rather opt for the services structured and designed according to your needs and requirements. Hence, computing as a Service gives the real essence of the cloud based computing systems, removing the hassle of any sort of maintenance and installations but at the same time availing a lot of features and benefits. In this, the computations are handled on the virtual servers. For ex: EC2 service [27].
592
H. S. K. Sheth and A. K. Tyagi
• Security as a Service (SECaaS) – Security as a service is simply a model in which an external organization or third-party service is responsible for handling and managing the security of the services offered by the host. SECaaS is suited best for the corporate infrastructures and provides subscription-based services. SECaas outsources cybersecurity services along with added advantages such as getting the latest security tools, Identity and Access Management (IAM), Security Information and Event Management (SIEM) [28]. 4.1.2 Background Models Public Cloud: Public clouds are one of the most common and widely used model because it is owned by third-party cloud service providers that manages the tasks such as Resource delivery like storage and servers. The software, hardware, and other infrastructure are owned and controlled by the cloud service provider. The user can use the services after proper authentication and following specific security norms and then finally logging into their account on a browser. Private Cloud: Resources are utilized mainly by an organization or a company. The private network is responsible for the Maintenance of the complete infrastructure. Hybrid Cloud: Public and Private Clouds are combined and bound together using technology enabling data and applications sharing. Such a combination in hybrid clouds offers more flexibility, deployment options, and efficient optimization of resources.
5 Challenges in Mobile Cloud Computing-Based Systems Mobile Cloud Computing broadly focuses on offloading the two vital processes of data processing and data storage. The issues are specifically in energy, QoS (Quality of Service), application, and Security. Because of the numerous advancements in MCC, industries and a wide range of sectors are resorting to MCC. Nevertheless, of the benefits, MCC brings a substantial increase in the number of security hacks, breaches of data privacy, malicious attacks. Any attack intended on the Cloud systems/architectures with malicious intentions to access the resources illegally or without proper permission, acquiring unsecured data, modifying or deleting the resources, etc. are termed as cloud attacks. Figure 2 broadly specifies the challenges faced by MCC. 5.1 Analysis of Issues in Mobile Cloud Computing 1. Issues related to Energy: The cause of these issues is mostly the low battery lifetime. Mobile Devices aren’t efficient in terms of energy. The solution proposed for dealing with the issues raised was to offloading specific complex computational tasks. Specific frameworks are adopted where the devices offload such tasks by dynamically resorting to the application content, with the use of Virtual Machines using the concept of virtualization and parallelization [13]. The frameworks were tested for different types of networks were in the results were recorded and analyzed based on the performance metrics set.
Mobile Cloud Computing: Issues, Applications and Scope in COVID-19
593
The models proposed by the authors of Dynamic Energy-aware Cloudlet-based Mobile cloud computing model for green computing [14], Mobile Cloud offloading Architecture (MOCA) [15], and Performance Evaluation of Remote Display Access for Mobile cloud computing (PERDAM) [16] have addressed the issues of low computation power, issue of remote display access, restrictions of wireless bandwidth and latency delays. The authors of [14, 15], and [16] implemented and proposed solutions for the above-mentioned issues by giving an implementation on the lines of setting up AlterNet, Testbed, user-developed simulator (DECM-Sim), and OpenStack Cloud Platform. 2. Issues related to Security: Security and Privacy are some of the most abused words used in the literature because it is misapprehended. Security mainly covers maintaining confidentiality, integrity, and availability whereas, Privacy only concerns access to personal information and ensuring data quality. The Mobile Cloud Computing issues are further divided into Cloud Data Center Security, Mobile Data Security. Authors of [17, 18] have discussed extensively, the RMTAC, EACDAC, and PADMC, the issue stressed out by the authors of the mentioned contributions and related to Lack of Proper system for authentication, absence of fine-grained secure access. 3. Security issues related to Dynamic Offloading: Offloading is the formal process of transferring the computational tasks that are complicated and hard to handle on the web servers. In [18], the author has evaluated the cost by taking execution time as one of the primary parameters consumed in making the offloading decision. The cons in the proposed models are that offloading disregards the status of both the device and the cloud. Even though security is disregarded in [18], but it enhances mobile device resource consumption. The computations are classified such that the trusted cloud is mainly used for critical operations, whereas requests to the offloaded data are processed in parallel by the modified cloud on encrypted data. The research paper [19] focuses on the fundamental issue of deployment decisions, Battery life of the mobile device. Experiments on Android devices for individual components were performed. 5.2 Mobile Cloud Computing Protocols The challenges faced by MCC are summarized in Fig. 2 as well: – – – – – – – – – –
Multifactor Authentication Client to Cloud Authentication Encryption key, Data Security Privacy Cloud to Client Authentication Denial of Service (DoS) Unsecure Protocols Attacks Related to IaaS Internal and External Attacks VM (Virtual Machines) Attacks
594
H. S. K. Sheth and A. K. Tyagi
Fig. 2. Challenges faced by mobile cloud computing
1. Hyperjacking: Hyperjacking is a type of Virtual machine attack (VM Attack) that is very rare, but has the potential of creating great havoc to the virtualized environment and servers. Because of the fatal effect, Hyperjacking can have on the system, it is considered a real-world threat. In hyperjacking, the Hackers/Attackers gain complete control over the processes and activities happening in the virtualized cloud environments by targeting the vulnerable hypervisors. If the attacker succeeds, then all the services associated with the Hypervisor would be affected and can be manipulated. Hence, Hypervisor is one of the foremost issues in which the researchers should stress more upon. 2. Denial of Service (DoS) Attacks: A method is proposed in [20] that considers the advantages and pros of Virtual Machines (VM’s) along with the CPU. This method identifies the Denial of Service (DoS) attacks and in data cloud centers. The information entropy mentioned by the authors in [22] needs to be applied in the monitoring stage to stay updated about the malicious Virtual machines. Such malicious Virtual Machines exhibit a particular VM status, which is similar to launching a DoS attack. Distributed DoS Attacks (DDoS), where the conventional traffic of the virtual environment is disturbed by overburdening the server by transmitting an immense amount of spam or bogus data, resulting in a DoS attack 3. Cache Side-Channel Attacks (CSCA): A cache attack is a kind of side-channel attack, which uses the time information leaked at the interface. The CSCA attack is system-centric rather than an attack. Any protocols/algorithms used for ensuring the security and privacy of the data transmitted. [21] For the CSCA attack, the intruder or the hacker needs to have a good idea about the internal architecture of the system. Even though various mitigation techniques are proposed or implemented, the probability of facing a CSCA attack remains the same. Processor caches are often shared globally or used by multiple cores. Processor caches consider the execution time taken by the cache. It tends to bypass some of the essential common security isolation mechanisms. [12] 4. Internal and External Attacks: An attack executed by a cloud service provider, customer, or third-party provider, basically anyone authorized to access the system. Internal attacks can also occur because of any existing privileges given to the users in the near past, who are directly/indirectly dependent on third parties for the task execution. It can pose critical threats like Data Leak, etc. Internal attackers do not resort to similar or one type of attacks, making it difficult to build a robust system, safe from all such attackers. The range of attacks can be from accessing sensitive
Mobile Cloud Computing: Issues, Applications and Scope in COVID-19
595
or private data, overwhelming the servers, and introducing viruses in the network. Internal Attacks like DoS, ICMP, and UDP Flood Attacks are becoming prominent. External Attacks are the vulnerabilities where the attackers get unauthorized access over the resources from the outside environment. 5.3 Analysis of MCC Security Models Proposed The open issues and challenges faced in the MCC (Mobile Cloud Computing), are broadly mentioned in sub-section 5.2. The security models proposed by the authors will be summarized and discussed in this section. A free-pairing incremental re-encryption model mainly revolves around certificateless file modification operations. In this model, all the users are allocated a specific partial secret key. The data owner further generates a Full private and public key. Encrypted EHR is sent and uploaded to the cloud. If the user requests downloading the EHR, it is fetched from the cloud using the private key. Decryption starts once the data is received. The proposed scheme in [24] is an improved model for Tsai et al’s protocol. [24] gives a robust authenticated key agreement protocol in the formal security analysis. Tsai et al’s proposed protocol has some the vulnerabilities, such as more probable desynchronization and server-based attacks. In [25], the main contributions are for maintaining the integrity of the data and ensuring data security. Outsource ciphertext attribute-based encryption (SO-CP-ABE) scheme and the probable data possession (PDP) scheme. The model proposed in [26] uses the IMEI Number of mobile devices for authentication. One of the most common issues faced in the MCC is experiencing an overhead of communication. [26] delegates extensive tasks to the cloud and complex methods, which becomes the main reason for communication overhead between the mobile device and the cloud. Authentication is one of the open challenges to MCC. Some of the security models have completely excluded the use of the Authentication module. Recent models focus on adopting only one dimension for authentication. But a potential issue in adopting to one dimension level of authentication, the mobile devices to overcomes their resource capabilities. Even though authentication modules are added. [23] and [26] have proposed requiring the second level of authentication, known as the cloud to client authentication. In MCC Based System, there is a need for passing all the user requests via the attack detection module. The user requests, once approved, are passed to the security Module. After this, the user requests are safe to handle access to the requested data from the cloud (Fig. 3).
6 Mobile Cloud Computing in Covid-19 1. HealthCare: The first and foremost industry of critical importance is health care. During this COVID-19, the world has seen a tremendous increase in Online Video Conferencing Services such as Zoom, Microsoft Teams, Google Meet, having cloudbased facilities that have helped the physicians and doctors to monitor the patients and conduct a virtual checkup. It does not violate any Social Distancing guidelines, which are indirectly violated in physical checkups. Apart from Online Video conferencing,
596
H. S. K. Sheth and A. K. Tyagi
Fig. 3. Proposed system for mobile cloud computing based systems
Other Services such as the Amazon Web Services and Microsoft Azure offer features that can ease the process and include automation in tasks. During this COVID-19, Electronic Health Reports (EHRs) and epidemiology tools [29] are used extensively. Even though many new novel approaches are published. But the issues regarding security and maintenance still are a matter of concern. 2. Education: Education is one of the sectors, which came to a halt abruptly because of this COVID-19 Pandemic. Even though some of the schools, universities, and institutions were utilizing some of the digital resources and services, the transition from an offline mode of teaching to online teaching was quite burdensome for the academicians, faculty, and students. Not only there was also a need for a platform that handles the needs of the students. But also ease the process of management and evaluation for the teachers and faculties. MCC not only can save hefty costs. But also is highly efficient in the proper functioning of the E-Learning platforms, archiving the student data, assignments, and works on the cloud. The challenge of conducting sessions online are resolved by the Video Conferencing Services. But there is still a challenge in terms of conducting assessments and exams in the online mode. 3. Artificial Intelligence: Artificial Intelligence, an advancing technology that has a lot of unexplored potentials. Similarly, in [30] the author proposed a novel voice analysis model that helps in the early detection (Asthama and COVID-19) by analyzing the change in the voice patterns. The main aim of this model was to be able to distinguish between the COVID-19 and Asthama detections. The mobile application proposed in these records and stores the user/patient’s voices regularly to the cloud storage. That, in turn, helps in quick and accurate analysis. Even the models combine Machine learning and Artificial Intelligence to analyze and do image processing to detect the COVID-19. Since an adequate number of photos need to be stored for analysis. Hence a cloud-based infrastructure would be in need. 4. Blockchain: Blockchain is a versatile field because it has applications in almost all of the other sectors, be it healthcare, education, governance, management, transportation, etc. Applications such as sharing of Patient Information, Contactless delivery, Online Education, surveillance, automations and contact tracing. Blockchain ensures that privacy is maintained, while the data is shared over the network or any
Mobile Cloud Computing: Issues, Applications and Scope in COVID-19
597
platforms. Such as in Patient Sharing Information, since the data of the COVID-19 affected patients have to shared nationally as well as internationally so as to conduct specific research and analysis in terms of the identification. Blockchain ensures that the patient personal information are safe. Apart from these Blockchain technology can help in linking stakeholders, which is very commonly stated but not solved. Secondly, It can be a game changer in terms of developing chains that have provenance and complete transparency [31].
7 Conclusion and Future Remarks In this paper, we analyzed that how Mobile Cloud Computing plays an ideal role in increasing the functionalities provided by mere devices such as Mobile, smartphones, and PDA’s. MCC widely accepted lately in the cloud and mobile computing communities because of its cost-effectiveness, accessibility, availability, and it can have a wide variety of applications in the sectors such as Blockchain, Education, Healthcare, IoT, Artificial Intelligence. But, even though MCC has scopes to address the real-world challenges and problems faced during the lockdown and COVID-19 pandemic, MCC still faces several challenges in security, authentication, data encryption and security, assurance, and interoperability. In this paper, the models proposed by various authors and researchers are reviewed in terms of technicalities and methodologies followed for contouring the issues present in the MCC technology. The paper discusses the models, challenges faced by MCC. The number of mobile users has witnessed an exponential increase, one of the prominent reasons to research and adopt MCC. MCC not only provides access to complex computational processes irrespective of the device configuration and the location from which the user raises requests. But addresses processing and resource constraints. The findings concluded after reviewing the proposed models by various researchers and authors are: There is a need for a comprehensive model that addresses all the issues. Disregarding any of the aspects in such an architecture can pose critical threats and attacks. The issues of the authentication process, specifically cloud-to-client authentication haven’t been addressed widely. It is advisable not to disregard cloud-to-client authentication because it calls Man in the Middle Attacks. Future researches should be done considering realworld scenarios like COVID-19. It can give insights into underlying issues and possible benefits. Acknowledgement. We thank our college, Vellore Institute of Technology, Chennai for their constant support, encouragement and support. The authors have properly acknowledged and cited the scholars whose articles or content were consulted for this manuscript. The authors extend their gratitude to authors/editors/publishers of all those articles, journals and books. All the diagrams/figures/illustrations used and added in the paper are made using Open-Source software, ensuring no copyright issues.
598
H. S. K. Sheth and A. K. Tyagi
Authorship Statement. All persons who meet authorship criteria are listed as authors and took the public responsibility for the content, including participation in the concept, analysis, writing, and revision of the manuscript. All authors revised and gave final approval for the version submitted. Conflict of Interest. The authors have thoroughly discussed about the contents of these research paper and declare that they surely do not have any conflicts of interest.
References 1. Shahzad, A., Hussain, M: Security issues and challenges of mobile cloud computing. Int. J. Grid Distrib. Comput. 6, 37–50 (2013). www.sersc.org 2. Mobile Cloud Market – Growth, Trends, COVID-19 Impact, and Forecasts (2021–26). https:// www.mordorintelligence.com/industry-reports/global-mobile-cloud-market-industry 3. Nguyen, D.C., Pathirana, P.N., Ding, M., Seneviratne, A.: Blockchain for secure EHRs sharing of mobile cloud based E-Health systems. IEEE Access 7, 66792–66806 (2019). https://doi. org/10.1109/ACCESS.2019.2917555 4. Dubovitskaya, A., Xu, Z., Ryu, S., Schumacher, M., Wang, F.: Secure and trustable electronic medical records sharing using blockchain. AMIA Annu. Symp. Proc. 2017, 650–659 (2018). PMID: 29854130; PMCID: PMC5977675 5. Holbl, M., Kompara, M., Kamisalic, A., Zlatolas, L.N.: A systematic review of the use of blockchain in healthcare. Symmetry 10(10), 470 (2018). https://doi.org/10.3390/sym101 00470 6. Elazhary, H.: Internet of Things (IoT), mobile cloud, cloudlet, mobile IoT, IoT cloud, fog, mobile edge, and edge emerging computing paradigms: disambiguation and research directions. J. Netw. Comput. Appl. 128, 105–140 (2019). https://doi.org/10.1016/j.jnca.2018. 10.021 7. Eom, H., Figueiredo, R., Cai, H., Zhang, Y., Huang, G.: MALMOS: machine learning-based mobile offloading scheduler with online training. In: 2015 3rd IEEE International Conference on Mobile Cloud Computing, Services, and Engineering, pp. 51–60 (2015). https://doi.org/ 10.1109/MobileCloud.2015.19 8. Talavera, L.E., Endler, M., Vasconcelos, I., Vasconcelos, R., Cunha, M., Silva, F.J.d.S.e.: The mobile hub concept: enabling applications for the internet of mobile things. In: 2015 IEEE International Conference on Pervasive Computing and Communication Workshops (PerCom Workshops), pp. 123–128 (2015). https://doi.org/10.1109/PERCOMW.2015.7134005 9. Hernandez, L., Cao, H., Wachowicz, M.: Implementing an Edge-fog-cloud Architecture for Stream Data Management. Cornell University Library (2017). https://arxiv.org/abs/1708. 00352 10. UST Global. Internet of Medical Things (IoMT) Connecting Healthcare for a Better Tomorrow (2017). https://www.ust-global.com/sites/default/files/internet_of_medical_things_iomt.pdf 11. Alvi, S.A., Shah, G.A., Mahmood, W.: Energy efficient green routing protocol for internet of multimedia things. In: 2015 IEEE Tenth International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP), pp. 1–6 (2015). https://doi.org/10. 1109/ISSNIP.2015.7106958 12. Amit Kumar, T., Shamila, M.: Spy in the crowd: how user’s privacy is getting affected with the integration of Internet of Thing’s devices (March 20, 2019). In: Proceedings of International Conference on Sustainable Computing in Science, Technology and Management (SUSCOM), Amity University Rajasthan, Jaipur - India, February 26-28, 2019
Mobile Cloud Computing: Issues, Applications and Scope in COVID-19
599
13. Kosta, S., Aucinas, A., Hui, P., Mortier, R., Zhang, X.: ThinkAir: dynamic resource allocation and parallel execution in the cloud for mobile code offloading. Proc. IEEE INFOCOM 2012, 945–953 (2012) 14. Gai, K., Qiu, M., Zhao, H., Tao, L., Zong, Z.: Dynamic energy-aware cloudlet-based mobile cloud computing model for green computing. J. Netw. Comput. Appl. 59, 46–54 (2016) 15. Banerjee, X., Chen, J.E., Gopalakrishnan, V., Lee, S., Van Der Merwe, J.: MOCA. In: Proceedings of the Eighth ACM International Workshop on Mobility in the Evolving Internet architecture, pp. 11–16, ACM, Miami, FL, USA (2013) 16. Lin, Y., Kämäräinen, T., Di Francesco, M., Ylä-Jääski, A.: Performance evaluation of remote display access for mobile cloud computing. Comput. Commun. 72, 17–25 (2015). https://doi. org/10.1016/j.comcom.2015.05.006 17. Khan, A.N., Kiah, M.L., Ali, M., Shamshirband, S.: A cloud-manager-based re-encryption scheme for mobile users in cloud environment: a hybrid approach. J. Grid Comput. 13(4), 651–675 (2015) 18. Kaur, S., Sohal, H.S.: Hybrid application partitioning and process offloading method for the mobile cloud computing. Proc. First Int. Conf. Intell. Comput. Commun. 87, 95 (2016). https://doi.org/10.1007/978-981-10-2035-3_10 19. Gu, Y., March, V., Sung Lee, B.: GMoCA: Workshop on Green and Sustainable Software GREENS), Zurich, 3–3 June 2012, pp 15–20, Print ISBN: 978-1-4673-1833-4. https://doi. org/10.1109/GREENS.2012.6224265 20. Cao, J., Yu, B., Dong, F., Zhu, X., Xu, S.: Entropy-based denial-of-service attack detection in cloud data center. Concurrency Computat. Pract Exper. 27, 5623–5639 (2015). https://doi. org/10.1002/cpe.3590 21. Tong, Z., Zhu, Z., Wang, Z., Wang, L., Zhang, Y., Liu, Y.: Cache side-channel attacks detection based on machine learning. In: 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), pp. 919–926 (2020). https:// doi.org/10.1109/TrustCom50675.2020.00123 22. Donald, A.C., Arockiam, L., Kalaimani, S.: ORBUA: An effective data access model for MobiCloud environment. Int. J. Pure Appl. Math. 118, 79–84 (2018) 23. Bhatia, T., Verma, A.K., Sharma, G.: Towards a secure incremental proxy re-encryption for e-healthcare data sharing in mobile cloud computing. Concurr. Computat. Pract. Exper. 32, e5520 (2020). https://doi.org/10.1002/cpe.5520 24. Irshad, A., Chaudhry, S.A., Shafiq, M., Usman, M., Asif, M., Ghani, A.: A provable and secure mobile user authentication scheme for mobile cloud computing services. Int. J. Commun. Syst. 32, e3980 (2019). https://doi.org/10.1002/dac.3980 25. Yadav, H., Dave, M.: Secure data storage operations with verifiable outsourced decryption for mobile cloud computing. In: International Conference on Recent Advances and Innovations in Engineering (ICRAIE-2014), pp. 1–5 (2014). https://doi.org/10.1109/ICRAIE.2014.690 9236 26. Rashidi, O., Azzahra Mohd Zaifuddin, F., Mohd Hassan, Z.: Carotenoid biosynthesis regulatory mechanisms in plants. J. Oleo Sci. 63(8), 753–760 (2014). Released July 29, 2014, [Advance publication] Released July 14, 2014, Online ISSN 1347-3352, Print ISSN 1345-8957https://doi.org/10.5650/jos.ess13183 27. Mathew, S.M.: Implementation of cloud computing in education - a revolution. Int. J. Comput. Theory Eng. 4, 473–475 (2012). https://doi.org/10.7763/IJCTE.2012.V4.511 28. Agrawal, S.: A survey on recent applications of cloud computing in education: COVID-19 perspective. J. Phys. Conf. Ser. 1828(1), 012076 (2021). https://doi.org/10.1088/1742-6596/ 1828/1/012076 29. Dadhich, P., Kavita, Dr.: Cloud Computing Impact on Healthcare Services During COVID-19 Pandemic (2021)
600
H. S. K. Sheth and A. K. Tyagi
30. Popadina, O., Salah, A.-M., Jalal, K.: Voice analysis framework for asthma-COVID-19 early diagnosis and prediction: AI-based mobile cloud computing application. IEEE Conf. Russian Young Res. Electr. Electron. Eng. (ElConRus) 2021, 1803–1807 (2021). https://doi.org/10. 1109/ElConRus51938.2021.9396367 31. Sharma, A., Bahl, S., Bagha, A.K., Javaid, M., Shukla, D.K., Haleem, A.: Blockchain technology and its applications to combat COVID-19 pandemic. Res. Biomed. Eng. 1, 8 (2020). https://doi.org/10.1007/s42600-020-00106-3
Designing a Humanitarian Supply Chain for Pre and Post Disaster Planning with Transshipment and Considering Perishability of Products Faeze Haghgoo1 , Ali Navaei1 , Amir Aghsami1 , Fariborz Jolai1 , and Ajith Abraham2,3(B) 1 School of Industrial Engineering, College of Engineering, University of Tehran, Tehran, Iran
{Faeze.haghgoo1997,a.navaei,a.aghsami,fjolai}@ut.ac.ir 2 Machine Intelligence Research Labs, Scientific Network for Innovation and Research Excellence, Auburn, WA, USA [email protected] 3 Center for Artificial Intelligence, Innopolis University, Innopolis, Russia
Abstract. Every year, many human lives are endangered by natural disasters. Preparing to deal appropriately with various crises can prevent potential risks or reduce their effects. In this research, for better preparation, both pre and postcrisis stages have been addressed. A multi-product and multi-period model is presented. And location of warehouses and local distribution centers to send the required items in each crisis scenario must be selected between the candidate choice. The perishability of products in warehouses in this study is also considered. The purpose of the model presented in this research is to reduce operating costs in two stages before the disaster and after it. Finally, the presented model is solved by GAMS software and some sensitivity analysis are provided to validate the model and evaluate the parameters. Keywords: Disaster management · Perishable products · Transshipment · Location
1 Introduction In the current century, various disasters have endangered human lives, and these incidents fall into the category of natural or manmade, such as earthquakes, floods, storms or wars and fires, etc. Human and natural crises are inevitable, and as the population grows, more crises occur. According to statistics, 106,654 people were killed by natural disasters between 2003 and 2012, and the financial costs are estimated at $ 157 billion [1]. Therefore, we must have accurate plans to respond to the crisis to save human lives and avoid potential dangers. The emergence of a crisis in an area increases the demand for some specific commodity, and responding to these needs as quickly as possible is a factor that must be planned in advance; Because the delay in meeting these needs causes a secondary crisis and doubles the costs of the crisis. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 601–612, 2022. https://doi.org/10.1007/978-3-030-96308-8_56
602
F. Haghgoo et al.
In general, crisis management is described in two parts. First, we try to reduce the risk of dealing with the crisis, and then we prepare for the crisis, which preparedness for the crisis dramatically reduces the risk of a secondary crisis. Many activities are recommended to prepare, such as educating people or pre-arranged plans for each crisis. In post-crisis management, in addition to improving the relief chain as much as possible, logistics cost issues are also raised, and we should try to minimize them with proper management [2]. In this regard, places are considered for storing the required items before the crisis and then the distribution plan of these items in the post-crisis phase is considered [3]. In previous research, the existence of warehouses before the crisis was discussed, but during a crisis, the proximity of distribution centers to the affected areas can reduce crisis costs and handling time as much as possible [4]. Lateral transshipment between distribution centers is provided to prevent shortages after the crisis. Due to the disaster conditions and the temporary distribution center allocated, some relief items needed may be in short supply to respond as soon as possible [2]. Relief Items can be considered as perishable items due to their particular characteristics. At this stage, to determine a policy, we must consider the shortcomings and losses resulting from the maintenance and disposal of corrupt inventories [5]. In this study, we examine the maintenance of items required for the crisis that have perishable properties in the pre-crisis periods and replace them according to cost factors, and then in the post-crisis phase by considering shorter periods, we will reduce the cost of providing relief items, and consider lateral transshipment between temporary distribution centers. In the next section, related works are reviewed and research gap is determined. Problem description and mathematical modeling are presented in Sect. 3. Numerical experiments and sensitivity analysis are conducted in the fourth section and in the final section, conclusion and some suggestions for future research are discussed.
2 Literature Review In this part of the research, a review of other studies is provided and the research gap is found and the research requirements are clarified. In a study, a factor such as the volume of the accident occurred and the allocation of a time window for each area was examined [6]. Different methods are used to deal with the shortage of products. In [7] authors have proposed a demand function based on price and inflation. In another study [8], uncertainties in the model are considered and the customer is directly related to the manufacturer. However, in critical situations, it is not possible to produce the required goods at the moment. Rawls & Turnquist [9] considered responding to the shelters in a timely manner and in the shortest possible time, and the aim of the pre-crisis preparations was to reduce the response time. Then, a model was presented with the decision to buy the required items before and after the crisis because the location of items before the crisis and the ability to purchase items after the crisis will reduce the shortage of emergency items. The items required vary depending on the type of incidents and even the earthquake and flood are different. In a study, different types of earthquakes have been studied [3]. There are also studies that are presented in a scenario-based manner. Scenarios were proposed for the probability
Designing a Humanitarian Supply Chain for Pre and Post Disaster Planning
603
of each scenario occurring. Another study used the Monte Carlo simulation method for the probability of scenarios occurring [10]. Some research has been conducted on the location of distribution centers after the crisis. Finding optimal locations for distribution centers minimizes logistics costs [11]. In the pre-crisis phase, distribution centers are considered warehouses for items needed for the crisis. When a crisis occurs, the facilities available in the warehouse are used and placed in the chain as distribution centers. Roh et al. [12] used TOPSIS and AHP methods to find the best locations for warehouses. Few research works have been done in the field of supply chain in the COVID19 situation. A production–distribution–inventory–allocation–location problem in a sustainable medical supply chain was designed, and three new hybrid meta-heuristic algorithms were developed to solve the problem [15]. Doodman et al. [2] considered several local distribution centers (LDC) to enhance communications between affected areas and warehouses. The perishability of products is one of the vital matters that has received less attention. Several other studies exist in this field are as follows: [9, 13, 14]. Using mobile technology in rigs location problem can help in crisis [15]. Momeni et al. [16] designed a humanitarian relief supply chain by considering repair groups and reliability of routs. The related works are summarized in Table 1. We proposed a multi-product, multi-period, MINLP model. In disaster situations, the number of items needed by the victims is more than one item. Here, it is necessary to examine the model in the case of multiple products. The main contributions of our paper are handling both pre and post-disaster phase in planning and also considering a different scenario in preparing phase. Transshipment between LDCs and the perishability of products are two cases that were used less in other studies. Table 1. The summary of literature review Name
Objective function Cost
Alem et al.
Scenario
Pre-disaster
Post-disaster
*
*
* *
Sebatli et al. Tavana et al.
*
Shahparvari et al.
*
Yeon Roh et al.
*
Transshipment
Perishable
Solution
Covering demand
*
Najafi et al. Rath & Gutjahr
Time
*
Heuristic & CPLEX
*
(DDRA)
*
NSGA-II
*
*
*
*
*
*
*
*
*
NSGA-II, RPBNSGA -II
*
*
fuzzy AHP, fuzzy TOPSIS
(continued)
604
F. Haghgoo et al. Table 1. (continued)
Name
Objective function Cost
Baskaya et al.
Time
Scenario
Pre-disaster
Post-disaster
Transshipment
*
*
*
*
*
Dehghani et al.
*
Pradhananga
*
*
*
Bozorgi-Amiri et al.
*
*
*
*
*
*
*
*
*
*
*
Noham, et al..
*
Loree et al
*
Doodman et al.
*
This study
*
Perishable
Solution
Covering demand
*
*
*
*
*
*
GAMs solver
3 Problem Description Response to demand happens in two stages before and after the disaster. There are many suppliers and warehouses before the disaster. Periodically warehouses are filled by suppliers and issues of supplier selection are raised before the disaster. When a disaster occurs, a series of local distribution centers (LDCs) are specified. One of their essential features is its proximity to the disaster site and warehouses feed them. Warehouses and DLCs can be used depending on the circumstances. Allocation of each facility is based on the time between locations (points). In the event of a disaster, a number of goods are essential for the disaster area. So we should prepare for this kind of situation. In this study, we consider perishable goods, with the possibility of a breakdown of goods purchased from suppliers with α percent degradation. Sometimes an LDC does not meet the needs of the allocated points. Therefore, it is possible to transfer commodities between LDCs to respond to demand in the best possible way. Hence the humanitarian constraints have been implemented to investigate this in the model. This model aims to minimize the costs of preparing for crisis situations while considering humanitarian relief supply and shortage and perishability of products. The main assumptions of the model are as follows: • Communication between suppliers and warehouses even remains after the disaster. • After the disaster, the goods can be transferred between supplier and warehouses. • The first period after the catastrophe, the transfer of goods from suppliers to warehouses is prohibited. The notations of model are as follows: Sets I set of suppliers J set of warehouses L set of candidates LDCs
Designing a Humanitarian Supply Chain for Pre and Post Disaster Planning
K S T R T h
605
set of demand points set of scenarios set of periods after disaster occurrence set of relief goods set of periods before disaster occurrence set of the remaining life time period of goods type r (hr = 1, . . . , Hr )
Parameters CLr CCL RCr Ps s drkt PSr Pb pcirhr tcij tcjls s tclk tclls βr αr M
capacity at LDC L for relief goods r The cost of stablishing LDC L removal cost for commodity type r probability of the scenario s. demand of commodity type r at demand point k under scenario s at period t Cost of each product’s shortage before disaster Cost of each product’s shortage after disaster The supplying cost of the relief item r from the supplier i in pre-disaster The unit transportation cost from the supplier i to the warehouse j The unit transportation cost from the warehouse j to the LDC L The unit transportation cost from the LDC L to the demand point k the scenario s The unit transportation cost from the LDC L to the LDC L under the scenario s allowable remaining lifetime (time period) for commodity type r for removing from warehouses Current lifetime (time period) for commodity type r for purchasing. a large number
Decision variables YLs s Xrkt qjr Insrlt Insrjt Srjt s yrjlt s xrlkt s urLL t brjt hr s Brkt
Quantity of product j produced in period t 1 if change-over from product j to j happens in period t and 0 otherwise amount of relief item type r held at warehouse j The backlog of product j at period t The inventory of product j at the end of period t shortage of relief items type r at warehouse j at time period t 1 if product j is produced in period t and 0 otherwise. The quantity of the relief item r sent from the LDC L to the demand point k The quantity of the relief item r sent from the LDC L to the LDC L amount of relief item type r that is removed from warehouse j at time period t’ with hr remaining lifetime Shortage of commodity type r at demand point k under scenario s at period t
The proposed MINLP model is as follows: Model Formulation r βr MinZ = t =2 H i r j qcirjt hr · pcirhr + hr =1 t =2 r j RCr · brjt h hr =αr Hr s r + i h =α =2 r j tcij · qcirjt hr + r t =2 j Srjt · spr + s r k t Brkt · Pb r r t s s s s s s + s Ps j L r t tcjl · yrjlt + L r L t tcll · urLL t + L k r t tclk · xrlkt
(1)
606
F. Haghgoo et al.
s.t. s ≤C x ∀s, r, t, L Lr k rlkt s s Xrkt = xrlkt ∀s, k, r, t
(2) (3)
L
s s Xrkt ≤ drkt
∀s, k, r, t
(4)
s s s Brkt = drkt − Xrkt ∀s, k, r, t
∀r, j, t ∈ {2, . . . , T }, hr ∈ {1, . . . , βr }
brjt hr ≤ Inrjt hr Inrjt hr =
i
(5) (6)
qcrijt hr + Inrj(t −1)(hr +1) ∀r, j, t ∈ {2, . . . , T }, hr ∈ {αr , . . . , Hr } (7) ∀r, j, t ∈ {2, . . . , T }, hr ∈ {1, . . . , βr }
Inrj(t +1)(hr −1) = Inrjt hr − brjt hr Hr
Srjt = qjr −
Hr hr =1
Hr hr =1
Hr hr =1
Inrjt hr −
ys L rjlt
InsrLt = InsrL(t−1) +
j
j
s yrjlt +
s yrjlt −
L
(10)
∀r, j, t
= Insrjt
ys L rjlt
xs k rlkt
us − rL Lt
(9)
∀r, j, t ∈ {2, . . . , T }
Inrjt hr
Inrjt hr ≤ qjr
Insrjt = Insrj(t−1) − InsrLt =
t = 1, ∀r, j
Inrjt hr = qjr
hr =1
L
(8)
(11)
∀r, j, s, t = 1, t = T
(12)
∀r, j, s ∀t ≥ 2
(13)
∀r, j, s ∀t = 1
(14)
us − rLL t
xs k rlkt
∀s, r, L, t ≥ 2 (15)
qcrijt hr ≤ M · Uj ∀i, r, j, t , hr
(16)
s yrjlt ≤ M · YLs
∀s, r, j, t, l
(17)
s xrlkt ≤ M .YLs
∀s, r, l, t, k
(18)
Designing a Humanitarian Supply Chain for Pre and Post Disaster Planning
607
s s urL Lt ≤ M · YL
∀s, r, L , t, l
(19)
s s urLL t ≤ M · YL
∀s, r, L , t, l
(20)
s s s s s Xrkt , urL Lt , urLL t , yrjlt , qcrijt hr , Inrjt , qjr ≥ 0
(21)
Uj, YLs ∈ {0, 1}
(22)
The objective function (1), we try to reduce logistics and supply chain costs as much as possible which include cost of purchasing from the supplier and the cost of building warehouses before the disaster, cost of transporting relief items from warehouses to (LDCs) and transporting relief items to affected area and also cost of exchanging goods between (LDCs) and cost of removing relief items which have reached the brink of corruption from the decision maker’s point of view and finally the costs of shortages. Constraint (2) states that the amount of relief items that are sent to all the affected area should not exceed the capacity of each relief items in each period. Constraint (3) ensure that the total number of relief goods sent to affected area should be equal to the sum of the number of relief goods sent to area from all LDCs in all periods. Constraint (4) state that each damaged point should not receive more than its needs from each relief items in each period. Constraint (5) express the shortage of any kind of relief items at the affected area in each scenario of post-disaster period. Constraint (6) ensures that the number of the removed relief items cannot exceed the inventory level at each warehouse at each time, relief items are removed upon reaching βr . βr is determined based on the decision of the decision maker. Constraints (7) and (8) show the inventory balance of warehouses before the disaster. Constraints (9) shows that the quantity of relief items in first period should be equal to the inventory level. Constraints (10) denote the amount of shortage of relief items in warehouses before the disaster. Constraints (11) ensure that inventory level should not exceeds amount of relief items held at warehouse. Constraints (12) to (15) express the inventory balance in warehouses and (LDCs). Constraints (16) to (20) state that the transfer of relief items from warehouse is done if the warehouse is open. Constraints (21) and (22) domain the decision variables.
4 Experimental Analysis To investigate the proposed model, we use GAMS 25.13 software on a computer of Intel core i7 3.30 GHz and 8.00 GB of RAM to evaluate the proposed model’s performance and applicability. We examined the proposed model in two dimensions: small scale problem and medium scale problem. The dimensions and results of each are given in Tables 2, 3 and 4. The following of our study investigate the effects of parameters on decision variables and objective function.
608
F. Haghgoo et al. Table 2. Initial information of test problems
Small scale problem
Medium scale problem
Set of supplier
I 1.I 2, I 3
Set of supplier
I 1.I 2, . . . I 6
Set of warehouses
J 1, J 2, J 3
Set of warehouses
J 1, J 2, . . . J 6
Set of LDC
L1, L2, L3, L4
Set of LDC
L1, L2, . . . L6
Set of demand point
K1, . . . K5
Set of demand point
K1, . . . K10
Set of relief good
R1, R2
Set of relief good
R1, R2 . . . R4
Set of period pre disaster
T , . . . T 24
Set of period pre disaster
T , . . . T 30
Set of period post disaster
T 1, . . . T 10
Set of period post disaster
T 1, . . . T 15
Set of remaining life time
hr 1, . . . hr 5
Set of remaining life time
hr 1, . . . hr 5
Set of scenario
S1, S2
Set of scenario
S1, S2 . . . S4
Table 3. The range of parameters Input data CLr
Uniform(30,50)
PSr
CCj
Uniform(300,800) tclls
Uniform(100,400) Uniform(20,50)
CCL Uniform(300,800) pcirhr Uniform(20,50) RCr
Uniform(20,50)
tcij
Uniform(20,50)
Ps
Uniform(0.2,08)
tcjls
Uniform(20,50)
s drkt
Uniform(40,100)
s tclk
Uniform(20,50)
All parameters are determined and we solve proposed model. Because of the lack of real data in this field, we produce the values of parameters randomly. We want to minimize logistics cost. So objective function calculated as below Table 4. Table 4. The results of solution approach Objective function 626971.0 61195479.51
Shortage cost 219000.0 109803.262
Removal cost
Post disaster cost
135000.0
22471.0
1466573.764
1902460.507
As we can see in result, large amount of objective function affected by shortage cost. In special situation such as disaster, shortage cost should be more than usual so we investigate effect of cost of each product’s shortage in post disaster situation in amount of shortages to evaluate our model in two different test problem, the result in Fig. 1 prove
Designing a Humanitarian Supply Chain for Pre and Post Disaster Planning
609
shortage in post disaster
that model condition is right, because by increasing shortage cost, amount of shortage decreasing. 200 150 100 50 0 1000
750 shortage cost small scale meduim scale *10^3
500
Fig. 1. The amount of shortage cost
4.1 Sensitivity Analysis In this section, we investigated sensitivity of decision variables and objective function on main parameters. In this study we consider transshipment between LDCs in post disasters scenarios to make network more flexible, we investigate effect of transporting cost between LDCs on the total of transshipped goods.
160 AMOUNT OF GOODS
140
150
120 105
100 80
70
60
54
40 20
25
26
28
29
35
0
1
2
3
0 4
0 SET OF TRANSSHIPMENT COST amount of transshipment
amount of shortage in demand node
Fig. 2. Transshipment cost effect.
As evident in Fig. 2, increasing transshipment cost, increase amount of transshipped good, Therefore, the cost of transshipment has the reverse effect on the number of goods moved between the LDCs. The reason for using this possibility is to reduce the shortages in disaster situations. The decrease in transshipment goods severely affects the inventory of products in LDCs and it happens shortage in LDCs and this causes a shortage of vital items in affected area as we can see number of shortages become increasing.
610
F. Haghgoo et al.
AMOUNT OF SHORTAGE
300 250
250
200 150 130 100
100
50
54 21 1 [20-30]
0 [5-10]
[10-20] SET OF CAPACITY AT LDCS R1
R2
Fig. 3. Capacity of LDC
We investigate effect of LDCs capacity on amount of shortage for each relief in Fig. 3, If the capacity exceeds the demand, it is not logistically difficult to supply, but the reduction of capacities in any LDCs requires the opening of a new one, otherwise we will face a significant shortage. The amount of shortage is inversely related to the capacity of the LDCs (Fig. 4).
NUMBER OF GOODS
600 500
400 300 200 100 0 1 qc
2 removal items
3 4 post disaster shortage
Fig. 4. Comparing between removal items, qc and shortage based α
In this study, we examined the issue of goods corruption. In fact, every product has a certain life. Goods must be purchased (qc) at a certain age and removed from the supply chain at a certain age. The purchasing age and the removing age are determined by the decision maker (DM). DM decisions have a significant impact on supply chain status, such as the amount of deficiency or the amount of corruption that incurs the cost of removal. By keeping the β_r constant in the sixth period, we examine the effect of decision makers on the α_r parameter. With the increase of α_r, the goods with this specific life decrease and as a result, the number of purchases of goods decreases. Decreasing the total purchase also has a significant effect on other decision variables. As
Designing a Humanitarian Supply Chain for Pre and Post Disaster Planning
611
inventory of each commodity decreases, commodities that reach the age of six periods decrease relatively and we incur lower removal costs, however, the decline in the initial purchase of products increases the shortage after the crisis and we face the irreparable costs of shortages. So the decision maker has to choose the equilibrium point between the number of deficiencies and the removal number. With analyzing small scale test problem, At the point where the α_r has a value of three, the location of the collision of the two vectors of deficiency and removal item is located, the logistics costs of the system are at a minimum. 4.2 Managerial Insights Shortages in times of crisis are irreparable and cost more than financial penalties because the lack of facilities endangers many people’s lives. Therefore, the sensitivity analysis was performed on the post-disaster shortage. One of the factors influencing the coverage of demand after the crisis is the delivery of important goods to the affected areas promptly, and shortages can be reduced by transshipping goods by LDCs and reducing shipping costs by lower-cost transportation or by establishing closer LDCs and as far as possible measure the capacity of LDCs in each period commensurate with the demand for goods.
5 Conclusions and Future Research In this paper, a multi-product and multi-period model is presented for pre and postdisaster phase planning. Location of warehouses and local distribution centers to send the required items in each crisis scenario must be selected between the candidate choice. The perishability of the product is also considered, and operation cost is minimized. Finally, the presented model is solved by GAMS software and some sensitivity analysis are provided to validate the model and evaluate the parameters. For future research, sustainability and resiliency measures can be considered in a mathematical model, and also several parameters can be considered uncertain to increase the reality of conditions. Meta-heuristic approaches such as Non-dominated Sorting Genetic Algorithm, Particle Swarm Optimization, Strength Pareto Evolutionary Algorithm or uncertain solution approaches like possibilistic programming, robust optimization, and stochastic programming should be used to solve the problem and deal with uncertainty. Acknowledgement. This research has been financially supported by The Analytical Center for the Government of the Russian Federation (Agreement No. 70-2021-00143 dd. 01.11.2021, IGK 000000D730321P5Q0002).
References 1. Baskaya, S., Ertem, M.A., Duran, S.: Pre-positioning of relief items in humanitarian logistics considering lateral transshipment opportunities. Soc. Econ. Plan. Sci. 57, 50–60 (2017). https://doi.org/10.1016/j.seps.2016.09.00
612
F. Haghgoo et al.
2. Doodman, M., Shokr, I., Bozorgi-Amiri, A., Jolai, F.: Pre-positioning and dynamic operations planning in pre- and post-disaster phases with lateral transhipment under uncertainty and disruption. J. Indust. Eng. Int. 15(1), 53–68 (2019) 3. Bozorgi-Amiri, A., Jabalameli, M.S., Mirzapour Al-e-Hashem, S.M.J.: A multi-objective robust stochastic programming model for disaster relief logistics under uncertainty. OR Spectrum 35(4), 905–933 (2011) 4. Noham, R., Tzur, M.: Designing humanitarian supply chains by incorporating actual postdisaster decisions. Eur. J. Oper. Res. 265(3), 1064–1077 (2018) 5. Ferreira, G.O., Arruda, E.F., Marujo, L.G.: Inventory management of perishable items in longterm humanitarian operations using Markov decision processes. Int. J. Dis. Risk Reduct. 31, 460–469 (2018) 6. Kim, S., Shin, Y., Lee, G.M., Moon, I.: Network repair crew scheduling for short-term disasters. Appl. Math. Model. 64, 510–523 (2018) 7. Haghgoo, F., Rabani, M., Aghsami, A.: Designing supply chain network with discount policy and specific price-dependent demand function. J. Ind. Syst. Eng. 13 (Special issue: 17th International Industrial Engineering Conference), 30–36 (2021) 8. Sanjari-Parizi, M., Navaei, A., Abraham, A., Torabi, S.A.: A daily production planning model considering flexibility of the production line under uncertainty: a case study. In: Abraham, A., Piuri, V., Gandhi, N., Siarry, P., Kaklauskas, A., Madureira, A. (eds.) Intelligent Systems Design and Applications: 20th International Conference on Intelligent Systems Design and Applications (ISDA 2020) held December 12-15, 2020, pp. 621–634. Springer International Publishing, Cham (2021). https://doi.org/10.1007/978-3-030-71187-0_57 9. Rawls, C.G., Turnquist, M.A.: Pre-positioning and dynamic delivery planning for short-term response following a natural disaster. Socioecon. Plann. Sci. 46(1), 46–54 (2012) 10. Dehghani, M., Abbasi, B., Oliveira, F.: Proactive Transshipment in the Blood Supply Chain: A Stochastic Programming Approach. Omega, 102112 (2019) 11. Loree, N., Aros-Vera, F.: Points of distribution location and inventory management model for post-disaster humanitarian logistics. Transp. Res. Part E: Log. Transp. Rev. 116, 1–24 (2018) 12. Roh, S.Y., Shin, Y.R., Seo, Y.J.: The Pre-positioned warehouse location selection for international humanitarian relief logistics. Asian J. Shipp. Logist. 34(4), 297–307 (2018) 13. Pradhananga, R., Mutlu, F., Pokharel, S., Holguín-Veras, J., Seth, D.: An integrated resource allocation and distribution model for pre-disaster planning. Comput. Ind. Eng. 91, 229–238 (2016) 14. Samani, M.R.G., Torabi, S.A., Hosseini-Motlagh, S.M.: Integrated blood supply chain planning for disaster relief. Int. J. Dis. Risk Reduct. 27, 168–188 (2018) 15. Goodarzian, F., Taleizadeh, A.A., Ghasemi, P., Abraham, A.: An integrated sustainable medical supply chain network during COVID-19. Eng. Appl. Artif. Intell. 100, 104188 (2021) 16. Momeni, B., Aghsami, A., Rabbani, M.: Designing humanitarian relief supply chains by considering the reliability of route, repair groups and monitoring route. Adv. Ind. Eng. 53(4), 93–126 (2019)
Innovative Learning Technologies as Support to Clinical Reasoning in Medical Sciences: The Case of the “FEDERICO II” University Oscar Tamburis1(B) , Fabrizio L. Ricci2 , Fabrizio Consorti3 , Fabrizio Pecoraro2 , and Daniela Luzi2 1
3
Department of Veterinary Medicine and Animal Productions, University of Naples Federico II, 80137 Naples, Italy [email protected] 2 Institute for Research on Population and Social Policies, National Research Council, 00185 Rome, Italy {f.ricci,f.pecoraro,d.luzi}@irpps.cnr.it Department of Surgical Sciences, Sapienza University of Rome, 00185 Rome, Italy [email protected]
Abstract. The paper describes the first deployment phases of the HIN (Health Issue Network) approach as innovative learning technology for both the Departments of Veterinary Medicine and Animal Productions and of Public Health of the “Federico II” University of Naples. To test this approach, the researchers involved were called to translate clinical cases from their professional experiences by means of a friendly version of HIN’s Petri Nets-based formalism, called f-HINe. A specific software learning environment (fHINscene) was also tested, which allows drawing a f-HINe diagram, as well as designing clinical exercises for medical students according to the Case-Based Learning approach. The results of the tests proved the importance of having a synthetic graphic representation able to analyze complex clinical cases and encouraging inquiry-based learning methods. Keywords: Human medicine · Veterinary medicine Innovation · Learning technology
1
· HIN · f-HINe ·
Introduction
Students of medical classes are required to reach the end of their curricula with a level of competence that makes them capable to address clinical practice as early as possible. Medical professionals are requested as well a continuing education, to be constant active players in the community setting. The Dreyfus model [1] describes the structure of the learning process, and features students and early postgraduate physicians in the lowest rankings levels. They need therefore a rich and challenging simulation environment based on real/realistic clinical c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 613–623, 2022. https://doi.org/10.1007/978-3-030-96308-8_57
614
O. Tamburis et al.
cases that integrate theory and practice to explore and understand the issues, identify problems, and make decisions. This context suffers from the lack of: (i) real/realistic clinical cases reflecting the case-mix of the “epidemiological transition”; (ii) teaching methods that allow the learner to analyze the evolution of a patient’s health issue. Moreover, education and professional development need to be linked to the ability to competently practice medicine within changing and evolving health care systems: new knowledge and skills are necessary to develop unique and iterative approaches to addressing medical problems. In this scenario the Petri Nets-based Health Issue Network (HIN) approach [2–5] has been designed to support teaching and learning activities for medical sciences, so a learner can: (i) browse a clinical case over time; (ii) train to detect the interactions and the evolutions of the health issues; (iii) represent, via the use of diagrams, different kinds of clinical cases in a synthetic way; (iv) develop the ability of clinical reasoning over time. A lighter version of HIN, named f-HINe, was developed to: (i) provide users with a “friendly” tool to handle the evolution of a patient’s health status instead of Petri Nets formalism as such, although based on the same mathematical properties [6]; (ii) design networks reproducing teacher-designed realistic clinical histories or clinical stories related to a real, specific subject, “extracted” from a veterinary/electronic health record (V/EHR) to meet specific learning objectives [7]; and (iii) support an automatic assessment of the learner’s performance during the execution of the clinical exercises. The present work shows the first results of a project focused on the implementation of f-HINe as innovative learning technology for researchers of both human and veterinary classes within the “Federico II” University of Naples.
2
Materials
Several didactic methods are reported in the literature focusing on the education of health sciences students, among which are the following: • Didactic methods based on expert patients. The “patient expert” and the “patient trainer” are both figures who have lived or are living a given pathological condition and then possess the skills to lead doctors, students, social and health workers to acquire a thorough knowledge of the disease as well as its related problems; • Didactic methods based on the use of predefined questions. The International Comorbidity Evaluation Framework, or ICEF, was developed to timely introduce in the training programs of future physicians the concept of comorbidity, intended as “a morbid condition that, more than the others, causes a worsening of an individual’s health status” [9]. The effectiveness of the ICEF method is especially relevant when applied in association with CBL, as ICEF supports the integration between an abstract model and the simulation of a real clinical practice;
Innovative Learning Technologies as Support to Clinical Reasoning
615
• PBL/CBL - based didactic methods. Contemporary educational methods such as Problem-Based Learning (PBL) and Case-Based Learning (CBL) are being increasingly recognized as important research topics in medical science education. CBL in particular, making use of real/realistic clinical cases links theory to practice through the direct application of theoretical knowledge to the cases themselves and encourages the use of inquiry-based learning methods [10].
3
The f-HINe Model
A Health Issue (HI), or clinical condition, can be a disease hypothesis, a sign/symptom, a diagnosis, a risk factor, or any other piece of clinical information. A HI network (HIN) describes the health status of an individual throughout his/her life. It is therefore capable to highlight how e.g.: (i) clinical conditions have changed over time; (ii) the interactions between different conditions have influenced their evolutions; (iii) a treatment plan for a specific condition may have changed into a structured treatment pathway. A clinical condition in the HIN model can: (i) evolve (spontaneously or after treatment) either to improve or to worsen; (ii) generate (although remaining active) other clinical conditions as a complication/cause, or catalyse the evolution of another problem as comorbidity; (iii) relapse after resolution. A clinical condition can also trigger an in-depth examination, which points out the passage from: (i) a symptom to a diagnostic hypothesis/a diagnosis; (ii) a diagnostic hypothesis to a specific diagnosis, by means of a diagnostic test; (iii) a diagnosis to another one, in the case the first one turned out to be incorrect. Compared to HIN’s PNs-based graphs, the f-HINe origins diagrams composed by nodes (HIs) and edges (evolutions from input HIs to output HIs). Edges can be drawn via: (i) a solid line, when evolutions do not affect or alter HIs’ nature (e.g. recurrence, worsening, improvement, examining in-depth); (ii) a dashed line, in case the evolution of a HI implies the generation of a new HI (e.g. complication, cause). A static branch node (or aggregator) can be used in case of more than one input HIs and/or output HIs involved in the same evolution: it is the case of e.g. a worsening or a complication in presence of a co-morbidity. The possible persistence of one or more conditions over time can lead to the design of another primitive, depicted as a thick edge that connects the same duplicated HI. Evolutions are therefore always labelled, and their related descriptive data sheets report information about the activities performed during the diagnostic-therapeutic process. The whole set of activities associated to the evolutions sets forth the actual treatment process the patient has undergone. Table 1 shows the main graphic primitives of the f-HINe model, along with specific clinical examples.
616
O. Tamburis et al. Table 1. Graphic representation of the main f-HINe primitives
A f-HINe diagram can be set up so as to provide two different analytical perspectives, clinical/semeiotic and pathophysiological, related to the two different points of view on the same clinical history. Time plays an important role as well in the evolution of a clinical condition. In a f-HINe diagram the problems are partially ordered: to this end, an implicit right-oriented time abscissa can be associated to the diagram. In case e.g. of two HIs, the agreed rule points out the rightmost one as occurred later [4,5].
Innovative Learning Technologies as Support to Clinical Reasoning
4
617
Methodology
The project involved researchers, directly dealing with clinical activities, from both the Departments of Veterinary Medicine and Animal Productions and of Public Health of the Federico II University of Naples. A first training phase was conducted via a series of meetings (virtual or in presence), with the purpose of showcasing the main features of the HIN approach. A first “teacher version” of a didactic textbook for the deploying of HIN for Medical Sciences was presented to the subjects. They were then asked to figure out as many case studies as possible starting from their actual clinical experiences. The objective was to train them toward a twofold objective: (i) how to translate a real clinical study by means of the HIN features, thus matching the learning objectives and addressing the specific needs related to the human- and animal-related nature of the cases; and (ii) how to design CBL-like exercises based on f-HINe diagrams, to be later delivered to the students. To this regard, a database of exercises to be downloaded for educational use is meant to be hosted in a website specifically designed for the project. The case studies figured out in the first place were then translated into f-HINe diagrams with fHINscene, a tailor-made software learning environment for the design, validation, and evaluation of f-HINe networks for educational purpose, whose features have been introduced and extensively described in [5]. The evaluation of the outputs of the first phase was performed through the submission to the same subjects of a written survey, composed with open-ended questions, delivered and returned via email. The answers were analyzed according to the thematic analysis, in order to extract patterns of meanings [11]. After a careful, repeated reading, the units of meanings - i.e. portions of text that convey a meaning significant for the purposes of the researcher - were identified. The units were then coded with a term or a short expression that conveyed in turn the meaning, and the codes were inductively gathered into themes. The choice of exploring with a method of qualitative research the reaction of the subjects involved, was motivated by the very preliminary phase of the project. Qualitative research can easily produce hypotheses and drafts of systems of meanings, to be then further assessed with quantitative methods. The following three themes emerged: • “Learning needs” that grouped the topics of “applying theory to practice” and “applying legal rules to prevention”; • “Teaching problems” that grouped the topics of “how to develop critical thinking”, “how to develop the ability of identifying the relevant information from the background” and “how to enhance self-directed, problem-based learning”; • “Added value” that grouped the topics of “the power of a synthetic graphic representation” and “the ability to represent patho-physiological correlations”.
618
5
O. Tamburis et al.
Results
Each researcher was asked to represent the original material by means of a clinical vignette, as this method is widely used to report clinical cases to provide insight into clinical practice and generate hypotheses for innovations in clinical practice, education, and research [12]. The two following examples focus on a human- (Table 2) and a veterinary-based (Table 3) clinical studies, respectively. When necessary, the ICD-10 code [21] of the HI has been reported within square brackets. Table 2. Human-based case study (Mary)
The main GUI of fHINscene contains the work environment where the user can draw the f-HINe diagram, which is divided into two main sections separated by a slidebar, representing the clinical/semeiotic (up) and the pathophysiological (down) levels. The drawable HIs feature some specific characteristics (at the present moment, only related to human medicine): HI code (e.g. ICD-9 coded [13] issue or free term), HI description (e.g. ICD-9 coded or free description), status (e.g. diagnostic hypothesis, etc.). It is anyway possible to handwrite the name of the HI whether it is missing from the ICD list (it is e.g. the case for most veterinary HIs). The distinction between clinical/semeiotic and pathophysiological levels makes it possible to distinguish between the different perspectives the same case can be analyzed through. On the one hand, the analysis of the sole clinical/semeiotic level (upper side of Fig. 1) provides an immediate, comprehensive vision of the “classic” sequence of clinical activities performed to work out Mary’s health issues during the considered time period. As expressed in Table 2, only an occasional check of Mary’s chronic conditions - represented by means of the “persistence” evolution - led to acknowledge the parathyroid neoformation
Innovative Learning Technologies as Support to Clinical Reasoning
619
that turned out to be the real origin of her whole clinical history; on the other hand, the addition of the pathophysiological level (lower side of Fig. 1) showed what really lied “behind the curtain”, making it clear that in a given moment a remarkable increase of PTH/parathormone caused an as high increase of the level of calcemia, which led in turn to the first witnessed episode of bilateral renal microlithiasis. The persistence of these phenomena - as only a treatment of the kidney-related issues was made - caused eventually the adenoma.
Fig. 1. The f-HINe diagram of the patient “Mary” diagnosed with parathyroid adenoma Table 3. Veterinary-based case study (Nala) (Source: [14])
Also in this case, the clinical/semeiotic level (upper side of Fig. 2) reflects the contents of Table 2, which respond to the traditional way of describing a clinical case, that is reporting the steps performed to find out the origin(s) of the main health issues, as well as the activities to (try to) work them out. Completing it with the pathophysiological level (lower side of Fig. 2) unveils the real role of the pathogenic agent, which stands as the most likely sole origin, although in different moments of time, of the whole set of HIs the dog’s clinical history is characterized by.
620
O. Tamburis et al.
Fig. 2. The f-HINe diagram of the patient “Nala” diagnosed with Angiostrongylus
6
Discussion
According to the “dual process theory” [15] clinical reasoning is based on two different cognitive systems: (i) Intuitive or non-analytic, based on experience and characterized by the rapidity of implementation; (ii) Analytical, through which a decision is reached through a process of rational and conscious analysis, implemented in complex situations, or when there is availability of time. After an initial predominance of the intuitive system, the analytical system takes progressively over, allowing to confirm or reject the hypothesis made in the first place, through careful analysis of data, reports and clinical examinations. This means for human medicine, to deal not only with those diseases included in the list of Essential Levels of Care (LEA, see [22]) but also to e.g. focus, in the near future, on the likely increase of the incidence of rare diseases. On the other hand, for the veterinary medicine this means the possibility to set up a correct clinical-diagnostic approach for as many animal species as possible, and to facilitate the definition and implementation of effective standard operating procedures (SOPs): it is the case, for example, of the increasingly widespread food-borne diseases, which could highlight the presence of links between human and animal health. One of the main innovations introduced by HIN is related to its capacity to allow students to develop the ability to manage these as dynamic phenomena, in their evolution over time, becoming therefore able to anticipate the mutual influences between multiple diseases and co-present conditions.
Innovative Learning Technologies as Support to Clinical Reasoning
621
In particular, the exercises analyzed show the added value of f-HINe, related to its capacity to provide a synthetic graphic representation of a complex clinical case. The diagram that translates the “traditional” written clinical case also enables students to apply in a new way theoretical knowledge to the case itself and encourages the use of inquiry-based learning methods. It is therefore clear how HIN is capable to address in many ways the issues that the didactic methods related to the education of health sciences students are called to deal with. With specific reference to those previously mentioned: (i) the empowered patient can play as a valid support for the health mentor/doctor teacher in order to contribute to an as more realistic as possible design of a network [16]; (ii) the HIN approach allows students to make practice with an instrument that helps them learning how to address complexity of a patient’s clinical path [5]; (iii) the Petri Nets-based graphic formalism allows to represent the patient’s clinical history distributed over a period of several years. Moreover, the deployment of fHINscene to define, validate and compare f-HINe networks falls within the more general process of refinement of the case-based educational research methods (CBL) coming with it [17].
7
Conclusions and Future Perspectives
In this paper the first results have been described of a project focused on the implementation of HIN as innovative learning technology for professors and researchers of both human and veterinary classes within the “Federico II” University of Naples. In the steps performed so far, the subjects involved have approached the HIN model to work out the ways it is supposed to be integrated within their classes, and several case studies have been analyzed and discussed to elicit the added value related to an effective deployment of HIN The next steps will focus on the implementation of such learning technology - including the new discovered P-HIN horizon - to the students, in order to gather information as to the ease of use, the immediacy in delivering clinical/organizational outputs, as well as the possibility to pursue common paths of case analysis involving both humans and animals, towards a “One Health” perspective [19,20]. Acknowledgements. This research was partially supported by the Board of Directors of the University of Naples Federico II through the FEDERICO project (Innovation in education for tenure-track researchers, grant EO/2020/933 of 11/06/2020) funded by University of Naples Federico II. Authors wish to thank the researchers and professors from “Federico II” University (M. D’Ambra, L. D’Angelo, G. Della Valle, P. Gargiulo, M. Gizzarelli, J. Guccione, R. Marrone, M.P. Maurelli) for the support provided in testing and validating both the HIN approach, and the fHINscene software.
622
O. Tamburis et al.
References 1. Dreyfus, S. E., Dreyfus, H. L.: A five-stage model of the mental activities involved in directed skill acquisition. California Univ Berkeley Operations Research Center (1980) 2. Ricci, F.L., Consorti, F., Pecoraro, F., Luzi, D., Mingarelli, V., Tamburis, O.: HINhealth issue network as means to improve case-based learning in health sciences education. Decis. Support Syst. Educ. Help Support Healthcare 255, 262 (2018) 3. Ricci, F.L., et al.: Understanding petri nets in health sciences education: the health issue network perspective. Stud. Health Technol. Inform. 270, 484–488 (2020) 4. Ricci, F.L., Pecoraro, F., Luzi, D., Consorti, F., Tamburis, O.: HIN (Health Issue Network). Rete dei problemi di salute. Uso delle reti di Petri per l’educazione nelle scienze mediche. IRPPS Working Papers, pp. 1–140 (2020) 5. Pecoraro, F., Ricci, F.L., Consorti, F., Luzi, D., Tamburis, O.: The friendly health issue network to support computer-assisted education for clinical reasoning in multimorbidity patients. Electronics 10(17), 2075 (2021) 6. Murata, T.: Petri nets: properties, analysis and applications. Proc. IEEE 77(4), 541–580 (1989) 7. Caroprese, L., Veltri, P., Vocaturo, E., Zumpano, E.: Deep learning techniques for electronic health record analysis. In: 2018 9th International Conference on Information, Intelligence, Systems and Applications (IISA), pp. 1–4. IEEE (2018) 8. Phillpotts, C., Creamer, P., Andrews, T.: Teaching medical students about chronic disease: patient-led teaching in rheumatoid arthritis. Musculoskeletal Care 8(1), 55-60 (2010) 9. Valderas, J.M., Starfield, B., Sibbald, B., Salisbury, C., Roland, M.: Defining comorbidity: implications for understanding health and health services. Ann. Family Med. 7(4), 357–363 (2009) 10. Banchi, H., Bell, R.: The many levels of inquiry. Sci. Child. 46(2), 26 (2008) 11. Braun, V., Clarke, V.: Using thematic analysis in psychology. Qualitat. Res. Psychol. 3(2), 77–101 (2006) 12. Brenner, M., O’Shea, M.P., Larkin, P., Luzi, D., Pecoraro, F., Tamburis, O., et al.: Management and integration of care for children living with complex care needs at the acute-community interface in Europe. Lancet Child Adolescent Health 2(11), 822–831 (2018) 13. MEA, V. D., Vuattolo, O., Frattura, L., Munari, F., Verdini, E., Zanier, L., et al.: Design, development and first validation of a transcoding system from ICD-9-CM to ICD-10 in the IT. DRG Italian project. In: Digital Healthcare Empowering Europeans: Proceedings of MIE2015, vol. 210, p. 135 (2015) 14. Ciuca, L., et al.: Irreversible ocular lesions in a dog with Angiostrongylus vasorum infection. Topics Compan. Ani. Med. 36, 4–8 (2019) 15. Pelaccia, T., Tardif, J., Triby, E., Charlin, B.: An analysis of clinical reasoning through a recent and comprehensive approach: the dual-process theory. Med. Educ. Online 16(1), 5890 (2011) 16. Towle, A., Brown, H., Hofley, C., Kerston, R.P., Lyons, H., Walsh, C.: The expert patient as teacher: an interprofessional Health Mentors programme. Clin. Teach. 11(4), 301–306 (2014) 17. Hege, I., et al.: Experiences with different integration strategies of case-based elearning. Med. Teach. 29(8), 791–797 (2007) 18. www.healthissuenetwork.org
Innovative Learning Technologies as Support to Clinical Reasoning
623
19. Wong, D., Kogan, L.R.: Veterinary students’ attitudes on one health: implications for curriculum development at veterinary colleges. J. Veterin. Med. Educ. 40(1), 58–62 (2013) 20. de Lusignan, S., Liyanage, H., McGagh, D., Jani, B.D., Bauwens, J., Byford, R., et al.: COVID-19 surveillance in a primary care sentinel network: in-pandemic development of an application ontology. JMIR Publ. Health Surveill. 6(4), e21434 (2020) 21. World Health Organization: International statistical classification of diseases and related health problems – 10th revision, Fifth edition (2016) 22. https://www.salute.gov.it/portale/lea/menuContenutoLea.jsp?lingua=italiano& area=Lea&menu=leaEssn. Accessed 27 Jan 2022
Convolutional Neural Networks (CNN) Model for Mobile Brand Sentiment Analysis Hamidah Jantan(B)
and Puteri Ika Shazereen Ibrahim
Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA (UiTM) Terengganu Kampus, 23000 Kuala Terengganu, Terengganu Darul Iman, Malaysia [email protected]
Abstract. Nowadays, the demand of sentiment analysis is increasing due to the requirement of analyzing and structuring hidden information from unstructured data. The sentiment on public views such as on products, services, and issues can be collected from social media platforms in text form. Convolutional neural network (CNN) is a class of deep neural networks method which can enhance the learning procedures by utilizing the layers with convolving filters that are updating to local features CNN models. This will achieve excellent results for Natural Language Processing (NLP) tasks especially as it can better reveal and obtain the internal semantic representation of text information. Due to this reason, this study attempts to apply this technique in mobile brand reviews sentiment analysis. There are four phases involved in this study which is knowledge acquisition and data preparation; CNN model development and enhancement; and model performance evaluation phases. As a result, the CNN model has been proposed by enhancing the model strategic mapping for optimal solution in producing high accuracy model. In future work, this study plans to explore other parameters such as in data pre-processing and network training to enhance the performance of CNN model. The proposed method can be used as sentiment analysis mechanism in many areas such as in review analytic, search query retrieval and sentence modelling. Keywords: Sentiment analysis · Convolutional neural network (CNN) · Mobile brands review
1 Introduction Therefore, the purpose of the project is to use CNN classification model for develop a prototype of hotel recommendation system to leverage the contextual information and feelings from the feedback and recommends the name of the hotels to travelers based on their preferences, such as features on the hotel. Recently, there are tremendous and continuous growth of user opinionated texts at everywhere in the web or social media. Sentiment analysis is a Natural Language Processing (NLP) method that is widely used to mine the web or social media contents including texts, blogs, reviews, and comments. It is also known as opinion mining that is used to analyze people’s opinions expressed in written language [1–3]. In business field, the demand of sentiment analysis is nurtured © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 624–636, 2022. https://doi.org/10.1007/978-3-030-96308-8_58
Convolutional Neural Networks (CNN) Model
625
as customer reviews are crucial and play important role for future business growth. However, millions of reviews are available on the Internet about the feedbacks on their product or services. Due to this reason, the systematic sentiment analysis approach is needed to be implemented and it will enhance future business direction that based on their customer feedbacks. The patterns or trends of the reviews produced by sentiment analysis are important to determine whether the feedbacks are good (positive) or not (negative). The review analysis is beneficial for both the customers and company. For customer, it can help them in deciding whether to buy or not the mobile product. While for the company, they can use it for their future business planning on their product. Deep learning method is used to model the sentences and has been proven in previous work as superior compared to the traditional structure corresponding to learning models [4–6]. Besides that, deep learning method can better reveal and obtain the internal semantic representation of text information in different domains; it is generally superior to the traditional graph model method and the statistical learning method. Convolutional neural network (CNN) is a class of deep neural networks, commonly applied to analysing visual imagery. However, it has also achieved great successful results for NLP task. CNN are more efficient than existing approaches such as conditional random fields (CRFs) or linguistic patterns analysis [7, 8]. In this study, the CNN model is proposed to analyze the customer reviews on mobile brand products by enhancing the model strategic mapping for optimal solution in producing high accuracy CNN model. The reviews that commented by the customers will be classified whether they are positive or negative feedbacks towards the product. This paper is organized in the following manner: related work on sentiment analysis, CNN applications and social media review analysis in the second section. The Sect. 3 describes the research method. Then the Sect. 4 is on the result analysis and discussions; and followed by the conclusion and future work in the Sect. 5.
2 Related Work 2.1 Sentiment Analysis Sentiment analysis refers to the management of sentiments, opinions, and subjective text that provide the comprehension information related to public views [9, 10]. Public reviews are used to evaluate a certain entity such as a person, product, location and many others that might be found on the different websites. Sentiment analysis focuses on processing the text in order to identify opinionated information. Nowadays, the demand of sentiment analysis is raising due to the increase requirement of analysing and structuring hidden information which comes from social media in the form of unstructured data [11]. In data analysis, sentiment analysis is a field of computational study that analyses people’s opinions expressed in written language to identify the hidebound information [1]. Besides that, sentiment analysis can be used in text mining tasks such as classification, association, prediction, and clustering. For examples, classifying the review on customer experiences using linguistic based approach [2], predicting the gender in online forums for hotspot detection and forecasting [3], predicting the Box Office success [10] and many others. Sentiment analysis is also used for document clustering by exploring
626
H. Jantan and P. I. S. Ibrahim
the performance of clustering methods in document analysis [12]. The implementation of sentiment analysis is quite challenging due to the task complexity especially in data preparation and model analysis. Nowadays, deep neural networks approach has shown great results to figure out the salient features for these complex tasks in NLP. These networks have number of parameters that can be effectively used in training the huge number of text dataset [3, 13–15]. In business field, sentiment analysis can help the business growth by analyzing the feedbacks or opinions from their customers. Sentiment analysis methods will analyze people’s opinion, judgment, emotions, and attitude towards individual, organization, products, issues and events [1, 6]. In addition, customers review about the product is very important in enhancing future development and marketing strategic direction. The reviews will in the form of unstructured text and need the right method to identify the concepts, patterns, topics, keywords, and other terms in the reviews. In text analysis, the identified terms or keywords will will be transformed to numerical values for analysis purposes. In this stage, the patterns and relationships among the terms will be identified. There are several algorithms that has been used such as Naive Bayes, Association Rule Mining, Support Vector Machine, K-Means and many others [10]. 2.2 Convolutional Neural Network (CNN) Technique and Application Convolutional neural network (CNN) is a class of deep neural networks, commonly applied to analysing visual imagery. It has also achieved great successful results in Natural Language processing (NLP) task, because it has a convolutional layer to make a piece of words that to be considered together [3, 14, 16]. CNN comprised of a convolutional layer to extract information by a large piece of text. The importance task at this stage is to formulate the mapping strategic model for the convolutional feature maps and activation unit to develop the higher performance of CNN model. This task will be executed by enhancing the learning procedures to handle the challenges in deep learning method especially in learning the structure of model, and the quantity of layers and hidden variables for each layer. This can be achieving by initialize the weight of CNN, to train the model accurately and to avoid the requirement in adding new feature [5]. Figure 1 shows the framework of sentiment analysis using CNN technique. Convolutional layer in CNN has a set of filters that are applied to a sliding window of length over each sentence. The filters are learned during the training phase of the neural network. Before entering a pooling layer, the output of the convolutional layer has passed through a non-linear activation function. While the latter aggregates vector elements by taking the maximum over a fixed set of non-overlapping intervals. For hidden layer, a fully connected hidden layer computes the transformation. The output vector of this layer corresponds to the sentence embedded for each data. Finally, the outputs of the hidden layer are fully connected to a soft-max regression layer [13]. In previous works, CNN has been applied in text classification or NLP tasks whether to distribute or to discrete the embedding of words [16]. CNN can capture the underlying semantic information of input texts, such as dependency relations, co-references, and negation scopes [17]. The advantage of CNN is the relevant information that contained in word order, proximity, and relationships are not lost [8, 11, 18].
Convolutional Neural Networks (CNN) Model
627
Fig. 1. Sentiment analysis using CNN framework
2.3 Sentiment Analysis Using CNN Deep Neural Network has shown a great performance achievement in many NLP tasks such as in text mining, information retrieval and sentiment analysis [19–21]. There are several studies on sentiment analysis using CNN as shown in Table 1. For example, CNN model proposed in Stanford Sentiment Treebank analysis shows that a significant gap between the training error and test error discovered the model over-fitting. Hence, to handle this issue, they used predefined 300-dimensional vectors from word2vec and kept it fixed during the training phase, this will make the over fitting decreases and the performance improves. The result also shows significant accuracy improvement [21]. In other study, CNN for sentiment analysis on tweets dataset discovered the initialization of parameter weights is very important in model training to produce an accurate model and to avoid to adding any additional features [22]. Besides that, the study on pre-trained CNN model in extracting the sentiment, emotion, and personality features for sarcasm detection in news articles also produce a good result. The result shows that CNN can directly extract the keys features of the training data where the contextual local features will be grasped from the sentences after several convolution processes by producing a global features [23]. In classifying business reviews using a word embedding on large datasets, where the study aim is to capture the semantic relationship between business reviews using deep learning techniques obtained a good [7]. In another study on business review, it has been proved that model that use CNN produced high accuracy score which is 95.6% when applying 3-fold cross-validation in training stage [5].
628
H. Jantan and P. I. S. Ibrahim Table 1. CNN applications for sentiment analysis
Dataset
Architecture
Network model
Movie reviews [21]
Sentence, matrix, activations, Convolutional, pooling and softmax layers
Convolutional Neural Network and selected machine learning algorithms
Message-level from semeval [22]
Sentence, matrix, activations, Convolutional, pooling and softmax layers
Convolutional Neural Network and word2vec
News articles [23]
Convolution, pooling, fully connected and dropout layer
CNN-LSTM
Twitter data [7]
Input sentence matrix, convolution layer, max-pooling, fully connected layer without dropout, and softmax layers
CNN-SVM
Business reviews [5]
Sentence matrix, convolution, max-pooling and softmax layers
Convolutional Neural Network, word2vec and GloVe word
3 Research Method This study attempts to use CNN technique as a potential method for sentiment analysis on mobile brand reviews. Figure 2 shows the process flow of this study that consist of several phases such as data collection, data pre-processing, implementation of CNN and result analysis. Mobile brands review data are extracted from Amazon website as described in Table 2 and Fig. 3 show the sample of dataset that will be used in model development and analysis. In data preparation phase, there are several steps involved such as lowercase setting, stop word removal, tokenization and normalization as shown in Fig. 4. Besides that, in lowercase setting, all words in the sentence will be converted into lowercase. Next is the process to remove all stop words. It is hen followed by tokenization that convert sentences into vectors. Lastly, in normalization step, lemmatization approach is used to group together the inflected forms of a word so they can be analyzed as a single item, known as lemma. The representation will correctly identify the meaning of the word in a sentence by referring to the vocabulary and morphological analysis of words. The class label is assigned using sentiment intensity analyses to label the sentences whether it is positive or negative based on product reviews. Figure 5 shows the sample of clean dataset after pre-processing.
Convolutional Neural Networks (CNN) Model
Fig. 2. Sentiment analysis process for mobile brand reviews using CNN Method Table 2. Data description Characteristic
Descriptions
Data source
Amazon website (web scraping)
Data type
Primary data
Data
Reviews about phone brand
Attribute
Customer reviews, Brands
Mobile brands
LG, IPHONE, HUAWEI, SAMSUNG
Language
English
Number of data
1500
Fig. 3. Sample of dataset
629
630
H. Jantan and P. I. S. Ibrahim
Fig. 4. Data preprocessing step-by-step process
Fig. 5. Sample of clean dataset
In experimental phase, Fig. 6 shows the CNN architecture for this study that consist of several layers. Convolution layer plays an important role in CNN for feature extraction to identify the kernel that work well for the given task in dataset. Besides that, in network training, it involved the process of finding kernels in convolution layers and the weights in fully connected layers that minimize differences between actual output and desired output as given in class labels for the training dataset. At pooling layer, max pooling is used to reduce dimensionality by extracting the patches from the features, outputs the maximum value in each patch and discards other values. Then, at concentrate layer, once the features are extracted by the convolution layers and down sampled by the pooling layers, it will be mapped with the subset in fully connected layers to produce the final outputs of the network training process. Finally, at Softmax layer, the process of classification will be implemented to determine the positive or negative review. In this study, there are several experiments conducted by splitting the dataset into training and testing datasets to produce CNN classifier. Table 4 shows the experiment setup involved in this study (Table 3).
Convolutional Neural Networks (CNN) Model
631
Fig. 6. CNN Architecture for mobile brand review
Table 3. Experiment setup Properties
Values
Numbers of dataset
1500
Numbers of experiment
27
Parameter
Experiment range
Setting selection
Train_Test_Split
90:10, 80:20, 70:30
90:10
Size Input Vector
300, 500
500
Number of Feature Map
50, 100, 200
50
Filter Size
2–6, 8–12
2–6
Dropout
0.1, 0.5
0.1
Batch Size
8, 32, 64
32
In evaluation phase, confusion matrix is used to measure the performance of CNN classifier by evaluating their accuracy, recall and precision. Besides that, a prototype system is developed to determine the sentiment of mobile brands review input given by the user.
4 Results and Discussion There are several experiments was conducted by parameter tuning to determine the performance of CNN classifier on mobile brands dataset. Tables 4, 5 and 6 show the sample of results at on CNN Classifier analysis on different setting of parameters. As a result, the split dataset for 90:10 shows higher accuracy as compared to other split datasets. Besides that, dropout 0.1 produced better accuracy as compared to 0.5 especially for batch size 64. However, for filter size 8–12 and dropout 0.1, the accuracy increased
632
H. Jantan and P. I. S. Ibrahim
from 0.8023 to 0.9530 which produced higher accuracy on the classifier. The size input vector 500 is better than 300 to produce high accuracy of classifier. Table 4. CNN classifier analysis for split 90:10, size input vector 500 and number of feature map 200 Filter size
Dropout
Batch size
Accuracy
2–6
0.5
32
0.8460
2–6
0.1
32
0.9366
2–6
0.5
64
0.7677
2–6
0.1
64
0.8847
8–12
0.5
32
0.8204
8–12
0.1
32
0.9506
8–12
0.5
64
0.8023
8–12
0.1
64
0.9530
Table 5. CNN classifier analysis for split 80:20, size input vector 500 and number of feature map 200 Filter size
Dropout
Batch size
Accuracy
2–6
0.5
32
0.8350
2–6
0.1
32
0.9416
2–6
0.5
64
0.7785
2–6
0.1
64
0.8684
8–12
0.5
32
0.7980
8–12
0.1
32
0.9055
8–12
0.5
64
0.7868
8–12
0.1
64
0.9305
Table 6. CNN classifier analysis for split 70:30, size input vector 500 and number of feature map 200 Filter size
Dropout
Batch size
Accuracy
2–6
0.5
32
0.8125
2–6
0.1
32
0.8750
2–6
0.5
64
0.7691 (continued)
Convolutional Neural Networks (CNN) Model
633
Table 6. (continued) Filter size
Dropout
Batch size
Accuracy
2–6
0.1
64
0.8400
8–12
0.5
32
0.8189
8–12
0.1
32
0.9343
8–12
0.5
64
0.7235
8–12
0.1
64
0.8962
In evaluation phase, 50 new data on mobile brands reviews are used to evaluate the accuracy of the proposed CNN classifier using confusion metrics analysis. As a result, the accuracy of CNN classifier is 86% and both recall, and precision show the values more than 0.8, which can be considered as good enough for classification. Besides that, the AUC is 0.82 that also demonstrates the proposed classifier high accuracy of the classification. Table 7 shows the results in evaluation phase. In prototype system development, Fig. 7 show the example of results on mobile brands analysis and the positive or negative words from the reviews after applying to the proposed CNN classifier. In this study, CNN method would be considered as a potential approach for sentiment analysis especially for mobile brand reviews. By applying the proposed CNN classifier, the reviews can be classified by the classifier for the purpose of analyzing the reviews and visualizing the common words that appear in review on their products. The sales and marketing personnel no longer need to read the reviews from their customers to monitor their product feedbacks. Table 7. CNN classifier evaluation Properties
Precision
Recall
F1-score
Negative
1.00
0.65
0.79
Positive
0.81
1.00
0.90
Accuracy
0.86
Macro Avg
0.91
0.82
0.84
Weighted Avg
0.89
0.86
0.85
AUC
0.82
634
H. Jantan and P. I. S. Ibrahim
Fig. 7. Sample of prototype system
5 Conclusion This study designed a convolutional neural network (CNN) for the sentiment analysis for mobile brands reviews. From the experimental results, it is possible to identify the most suitable parameters that can produce high accuracy of the classifier. As a result, it showed that the split dataset ratio, input vector size, feature map, filter size, dropout and batch size are among the parameters that contribute to the accuracy of CNN classifier. In future work, this study will attempt to explore on other parameters such as in data preprocessing and training, to enhance the performance of classification model. Nowadays, big data technologies accelerate the processing of information in decision making such as for cross lingual problems, textual and product review analysis, and many others This study will give a new direction of applying the CNN method especially in sentiment classification for other reviews datasets.
Convolutional Neural Networks (CNN) Model
635
References 1. Mali, D., et al.: Sentiment analysis of product reviews for E-commerce recommendation. Int. J. Manage. Appl. Sci. 2(1), 127–131 (2016) 2. Ordenes, F., et al.: Analyzing customer experience feedback using text mining: a linguisticsbased approach. J. Serv. Res. 17(3), 278–295 (2014) 3. Nhan, C., María, N., Prieta, F.: Sentiment analysis based on deep learning: a comparative study. Electronic 9(3), 483–512 (2020) 4. Lia, N., Wu, D.D.: Using text mining and sentiment analysis for online forums hotspot detection and forecast. Decis. Support Syst. 48(2), 354–368 (2010) 5. Salinca, A.: Convolutional neural networks for sentiment classification on business reviews. In: CEUR Workshop Proceedings (2017) 6. Patel, D.K.A.: Comparison of sentiment analysis and domain adaptation techniques with research scopes. In: Proceedings of the Sixth International Conference on Inventive Computation Technologies [ICICT 2021], pp. 900–908 (2021) 7. Poria, S., Cambria, E., Gelbukh, A.: Aspect extraction for opinion mining with a deep convolutional neural network. Knowl.-Based Syst. 108, 42–49 (2016) 8. Liao, S., et al.: CNN for situations understanding based on sentiment analysis of twitter data. Procedia Comput. Sci. 2017(111), 376–381 (2015) 9. Taj, S., Shaikh, B., Fatemah Meghji, A.: Sentiment analysis of news articles: a lexicon based approach. In: 2019 2nd International Conference on Computing, Mathematics and Engineering Technologies, ICoMET 2019. Sukkur, Pakistan (2019) 10. Kim, Y., Kang, M., Jeong, S.R.: Text mining and sentiment analysis for predicting box office success. KSII Trans. Inter. Inf. Syst. 12(8), 4090–4102 (2018) 11. Ain, Q.T., et al.: Sentiment analysis using deep learning techniques: a review. Int. J. Adv. Comput. Sci. Appl. 8(6), 424–433 (2017) 12. Ma, B., Yuan, H., Wu, Y.: Exploring performance of clustering methods on document sentiment analysis. J. Inf. Sci. 43(1), 57–74 (2017) 13. Deriu, J., Cieliebak, M.: Sentiment analysis using convolutional neural networks with multitask training and distant supervision on Italian tweets. In: Basile, P., Cutugno, F., Nissim, M., Patti, V., Sprugnoli, R. (eds.) EVALITA. Evaluation of NLP and Speech Tools for Italian: Proceedings of the Final Workshop 7 December 2016, Naples, pp. 184–188. Accademia University Press (2016). https://doi.org/10.4000/books.aaccademia.2009 14. Nassif, A., Elnagar, A., Shahin, I., Henno, S.: Deep learning for Arabic subjective sentiment analysis: Challenges and research opportunities. Appl. Soft Comput. J. 98(106836), 2–24 (2021) 15. Vicari, M., Gaspari, M.: Analysis of news sentiments using natural language processing and deep learning. AI & Soc. 36(3), 931–937 (2020) 16. Georgakopoulos, S.V., et al.: Convolutional neural networks for toxic comment classification. In: ACM International Conference Proceeding Series (2018) 17. Zhang, L., Chen, C.: Sentiment classification with convolutional neural networks: an experimental study on a large-scale Chinese conversation corpus. In: Proceedings - 12th International Conference on Computational Intelligence and Security, CIS 2016. pp. 65–169 (2017) 18. Madasu, A., Rao, V.A.: Gated convolutional neural networks for domain adaptation. In: Métais, E., Meziane, F., Vadera, S., Sugumaran, V., Saraee, M. (eds.) NLDB 2019. LNCS, vol. 11608, pp. 118–130. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-232818_10 19. Zhang, L., Wang, S., Liu, B.: Deep learning for sentiment analysis: a survey. WIREs Data Min. Knowl. Disc. 8(4), 1–25 (2018)
636
H. Jantan and P. I. S. Ibrahim
20. Ibrahim, N.M.: Text mining using deep learning article review. Int. J. Sci. Eng. Res. 9(9), 1916–1933 (2018) 21. Dholpuria, T., Rana, Y.K., Agrawal. C.: A Sentiment analysis approach through deep learning for a movie review. In: 8th International Conference on Communication Systems and Network Technologies (CSNT), IEEE (2018) 22. Severyn, A., Moschitti, A.: Twitter sentiment analysis with deep convolutional neural networks. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 959–962. Association for Computing Machinery, Santiago, Chile (2015) 23. Nguyen, D., Vo, K., Pham, D., Nguyen, M., Quan, T.: A deep architecture for sentiment analysis of news articles. In: Le, N.-T., van Do, T., Nguyen, N.T., Thi, H.A.L. (eds.) Advanced Computational Methods for Knowledge Engineering, pp. 129–140. Springer International Publishing, Cham (2018). https://doi.org/10.1007/978-3-319-61911-8_12
How Knowledge-Driven Class Generalization Affects Classical Machine Learning Algorithms for Mono-label Supervised Classification Houcemeddine Turki(B) , Mohamed Ali Hadj Taieb, and Mohamed Ben Aouicha Data Engineering and Semantics Research Unit, Faculty of Sciences of Sfax, University of Sfax, Sfax, Tunisia [email protected], {mohamedali.hajtaieb, mohamed.benaouicha}@fss.usf.tn
Abstract. Beyond class imbalance and the sample size per class, this research paper studies the effect of knowledge-driven class generalization (KCG) on the accuracy of classical machine learning algorithms for mono-label classification. We apply our analysis on five classical machine learning models (Perception, Support Vectors, Random Forest, K-nearest neighbors, and Decision Tree) to classify a set of animal image files generated from Wikimedia Commons, a large-scale repository of free images. Thanks to our analysis, we found that the accuracy rates of mono-label classification models, mainly Support Vectors and K-nearest neighbors, are affected by KCG. The increasing or decreasing behavior of accuracy rates is driven by the settings of the classification categories and the generalization. The analysis of KCG should be useful to understand the limitations of classical machine-learning algorithms and to fuel debates about the improvement of classical models and supervised classification evaluation methods within the framework of Explainable Artificial Intelligence. Keywords: Mono-label classification · Class generalization · Evaluation · Classical machine learning · Knowledge-driven generalization · Explainable Artificial Intelligence
1 Introduction Nowadays, supervised classification primarily depends on statistical methods that compare between predicted labels and assigned labels to provide insights of the efficiency of the classical machine learning algorithms: Support Vectors, Random Forest, Perceptron, K-nearest neighbors, and Decision Tree [1]. Particularly, the evaluation of classical mono-label classification algorithms assigning a unique class to every considered item is mainly based on measuring intuitive metrics such as accuracy, recall, precision, Fmeasure, ROC Curves and ROC Gini [1]. Despite the value of these measures, they are valid assessment options only for the classification of well-distributed and unbiased datasets [1]. Effectively, there are several situations where the evaluation metrics can be easily tricked [2–6]. On one hand, the class imbalance problem, defined as the existence © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 637–646, 2022. https://doi.org/10.1007/978-3-030-96308-8_59
638
H. Turki et al.
of an uneven distribution of classified items among categories, is associated with a false increase in accuracy rates [2, 3]. On the other hand, the accuracy rates for supervised classification cannot be precise for a sample size per class below a significant threshold [4–6]. A common scenario where these two major deficiencies can happen is class generalization. Class generalization is the process of merging two specific labels (e.g., German Shepherd and Belgian Shepherd) in a supervised classification problem into a general label (e.g., Dog) [7–9]. It is true that such a process allows to have an adequate label granularity for the supervised classification. However, it can cause a random change of the values of evaluation metrics making the assessment of the supervised classification algorithms quite challenging for the developers [7–9]. That is why the effect of class generalization on the evaluation metrics should be analyzed to develop adjusted classification evaluation approaches that are robust to class generalization [7–9]. There are currently two methods for class generalization in the context of mono-label classification: Statistical based on the statistical similarity between recognized patterns [7, 8] and Knowledge-Driven based on the taxonomic relations between classes (e.g., instance of , subclass of , and part of ) in a reference semantic resource. In this research paper, we analyze the effect of knowledge-driven class generalization (KCG) on the accuracy of classical mono-label classification algorithms. We provide such an analysis thanks to Traingenerator (https://traingenerator.jrieke.com), a web service that generates source codes for image classification and object detection [10], and to Wikimedia Commons (https://commons.wikimedia.org), a free and collaborative repository of media files created in 2004, regulated by Wikimedia Foundation and currently involving over 68 million items [11]. First, we define the context for this study by explaining the use of large-scale controlled taxonomies for item categorization as well as outlining the classical machine learning algorithms (Sect. 2). Then, we outline the methods that will be used to assess our assumption through the construction of an image file dataset from Wikimedia Commons and its analysis using multiple classical machine learning models (Sect. 3). After that, we provide the results of our experimental study and discuss them with reference to previous related research papers (Sect. 4). Finally, we draw conclusions for this work, and we specify future directions for its outcomes (Sect. 5).
2 Context 2.1 Using Large Semantic Resources for Item Categorization Currently, many controlled vocabularies are used for annotating large-scale datasets. For example, Medical Subject Headings (MeSH) Terms are used as keywords to scholarly publications in PubMed Citation Index [12, 13]. Similarly, Tencent ML Images, a large image database, uses WordNet concepts to classify its items [14]. In such a situation, the concepts that are assigned to entities are themselves linked to their parents (so-called hypernyms) using an “is-a” relation forming a tree of categories where every specific term is matched to its direct hypernyms. This can be clearly found in Wikimedia Projects, particularly Wikipedia and Wikimedia Commons, where content pages and category pages are both linked to parent categories as shown in Fig. 1 [15]. This allows every item to be related to direct categories as well as to more general categories encompassing
How Knowledge-Driven Class Generalization Affects
639
these specific categories. Such an option allows the easy generalization of a class for a set of items using the semantics of the category graph (i.e., class taxonomy).
A
B
Fig. 1. Category of sample Wikimedia Commons pages. A) Image File Page [Source: https://w. wiki/yRW], B) Category Page [Source: https://w.wiki/yRW]
2.2 Classical Machine Learning Classical Machine Learning algorithms are the first models that have been developed for the controlled learning (supervised classification and correlations) as well as for unsupervised learning (clustering, association rule and dimensionality reduction) of facts [16]. They include Markov random fields, k-means and random forest among other common techniques [16]. Despite being less effective than the advanced machine learning techniques, particularly neural networks and embeddings, they have a simple and easily implementable structure that allow them to be more practical in various situations [16]. That is why they are currently used as a baseline for evaluating the efficiency of newly developed learning algorithms in terms of accuracy and runtime [17–19]. This could not be achieved without the comparative analysis of the behavior of each classical machine learning model for performing every regular task, particularly supervised classification [20, 21]. This comparison mostly aims to map the most accurate set of classical models for each task and consequently to generate recommendations over the use of models in machine learning [20, 21]. But the comparison can be also useful to infer the common features of the behaviors of all the classical models when doing the same type of assigned work [20, 21]. This is just why we aim to apply multiple machine learning models for our analysis.
3 Methods The process includes the creation of a dataset of image files. This is made possible through extraction from Wikimedia Commons, a large scale free, open and collaborative database of media files hosted by Wikimedia Foundation and driven by MediaWiki Software (https://commons.wikimedia.org) [11]. Just like the other Wikimedia Projects including Wikipedia, all the pages of Wikimedia Commons are assigned categories including item pages and category pages [11]. This caused the development of a collaboratively created taxonomy of categories where specific classes are linked to their
640
H. Turki et al.
direct hypernyms [11]. This resource is called Wikimedia Commons Category Graph and can be very valuable for driving knowledge engineering applications such as measuring semantic similarity [15]. That is why it can be useful to create fully structured semantic databases of freely available images like DBPedia Commons [11] and IMGPedia [22] and to train image classification algorithms [23], particularly in our study as Wikimedia Commons Categories can be easily generalized to their direct hypernyms. The experiment also involves the application of classical machine learning techniques on the created dataset to assess whether the training accuracy of the classical models can be influenced by KCG or not. For simplicity, we restrict our assessment to five main classical machine learning models: Support Vectors (SVM), Random Forest (RF), Perceptron (PER), K-nearest neighbors (KNN), and Decision Tree (DT). The images are resized and center-cropped to 28 pixels and then processed using the five considered models thanks to Scikit-learn [24]. The source codes are generated using Traingenerator tool [10] and then compiled using Spyder [25]. The analysis will cover two types of KCG: General KCG where all classes are generalized to their hypernyms and Restricted KCG where only classes having the same common hypernym are substituted. 3.1 Dataset Construction We manually extract a subset of the Wikimedia Commons Category Graph dealing with animals. This allowed the creation of a three-level animal taxonomy where the common hypernym of the taxonomy (Level 0, General term) is Animal and each third-level concept (Level 3, Most specific term) is linked to 50 randomly chosen corresponding image files in Wikimedia Commons (Fig. 2). The images are downloaded from Commons using the Imker tool (https://w.wiki/yVS) and are linked to third-level concepts to form an image dataset including 18 categories each constituted of 50 images (Class 3). The same images are related to second-level concepts to create a database involving 6 categories each composed of 150 images (Class 2) and to first-level concepts to develop a dataset including two categories each having 450 images (Class 1). The three developed databases are later uploaded to Zenodo to ensure the reproducibility of the work [26]. Instead of constructing the dataset from scratch, we could use known benchmarks for hierarchical mono-label image classification, particularly CIFAR-100 [27]. However, we wanted to emphasize how Wikimedia Commons can be used to build customized datasets with a broader coverage of classes and topics for studying the effect of KCG in the efficiency of mono-label classification algorithms. By contrast to CIFAR-100 that included 20 first-level classes and 100 s-level classes, the Wikimedia Commons Category Graph includes various types of categories as revealed by https://commons.wikimedia. org/wiki/Category:CommonsRoot and these categories form a large category tree with more than five category levels. Such characteristics allow Wikimedia Commons-based datasets to be more relevant resources for studying KCG effects. 3.2 Experimental Study To study the effect of KCG, we train the five classical models on all the image files labelled with Class 3 categories, then with Class 2, and finally Class 1 categories. We
How Knowledge-Driven Class Generalization Affects
641
subsequently compute the training accuracy returned by every considered machine learning algorithm. Then, to get a deeper understanding of the obtained tendencies of accuracy measures, we reproduce our analysis for Class 3 Categories by adding a Class 2 category and eliminating all its subclasses from the modified dataset. To ensure that class imbalance will not alter and bias accuracy measures for every generated combined dataset [28], we restrict the number of items assigned to the generalized second-level class to 50 random images instead of 150 images.
Fig. 2. Taxonomy of considered classes for this research study: Animal is the meta-class for this “is-a” taxonomy. Bird and Mammal are first-level classes (Class 1).
4 Results and Discussion When conducting our analysis, we found that Decision Tree has been the most efficient model in classifying animal images followed by Random Forest and then by Perceptron and Support Vectors and that the least efficient model is K-nearest neighbors as shown in Fig. 3. It seems that the better efficiency of one model depends on the type of classified items and images. For instance, KNN has been identified as less accurate than SVM in the image classification of cars, airplanes, faces and motorbikes [29]. By contrast, KNN has been proved as the best model for multiple sclerosis detection from brain images [21]. Consequently, the better values of training accuracy for a given model does not show a significant advantage of that model over other classical machine learning algorithms in performing mono-label classification. Yet, when seeing the variation of the training accuracy of the models over category levels, most of the models are clearly affected by the generalization of all considered classes, particularly KNN and SVM as shown in Fig. 3. While the general behavior of the classical models reveals an increase of training accuracy with a global generalization of the classes, this action can be rarely coupled with a decrease of accuracy values. Clear examples of this behavior are PER and RF when moving from Class 2 to Class 1. This goes in line with the findings that confirm a general trend that the sample size per class (increasing thanks to global class generalization) can be generally associated with an
PER
RF
1
0.9977
0.7144
SVM
1
Class3
0.4366
Class2
0.5188
0.9044
0.9933
0.9944
0.9911
0.9055
0.8811
0.8288
Class1
0.7622
H. Turki et al.
0.82
642
KNN
DT
Fig. 3. Accuracy of classical machine-learning models for classifying animal images according to the three taxonomic levels of considered classes.
increase of accuracy rates [4, 6]. This also proves the tendency of training accuracy to vary between high and limited values for different training sets until returning a precise value when a significantly large threshold of sample size per class is achieved [5]. However, this does not give enough proofs of a direct effect of KCG. That is why we should evaluate the effect of KCG in a situation where the sample size per class does not change during the experiment. Changing each set of third-class categories by their common hypernyms one-by-one and restricting the size of the upper-level category to 50 items similarly to Class 3 categories can be the solution. When applying the five classical models to the combined datasets where only a subset of classes is generalized to their common direct hypernym, we identified that the substitution of several classes by their direct hypernym caused an increase of training accuracy in some situations and a decrease of training accuracy in some other circumstances as shown in Table 1. The obtained accuracy rates can even be outside the interval (noted INT) between the computed accuracy for Class 2 categories and the one for Class 3 Categories. Table 1. Accuracy of classical machine-learning models for combined datasets with restricted KCG: The accuracy rates outside INT ranges are underlined, the values for Class 2 and Class 3 are put in Italics, and the best value for each model is marked in bold. Dataset
PER
RF
SVM
KNN
DT
Class3 + Dog
0.7762
0.9925
0.7725
0.4525
0.9975
Class3 + Cat
0.8650
0.9975
0.7837
0.4412
1.0000
Class3 + Cattle
0.8312
0.9912
0.7587
0.4275
0.9975
Class3 + Columbidae
0.8625
0.9937
0.7737
0.4350
0.9975
Class3 + Phoenicopteridae
0.8475
0.9900
0.7712
0.4262
0.9975
Class3 + Psittacidae
0.8662
0.9900
0.7600
0.4275
0.9975
Class3
0.8288
0.9933
0.7622
0.4366
0.9977
Class2
0.9055
0.9944
0.8200
0.5188
1.0000
How Knowledge-Driven Class Generalization Affects
643
This behavior can be explained by the links between confused classes in the classification process. As shown in Fig. 4, when two confused categories have the same direct hypernym, their generalization will neutralize the confusion (e.g., Class3 + Psittacidae for PER). But, when two confused categories have distinct direct hypernyms, their confusion will be inherited by their hypernyms during generalization and this will result in a decrease of training accuracy (e.g., Class3 + Dog for PER).
Fig. 4. Two situations of the consequence of KCG on the confusion between two classes: The blue color demonstrates the confusion propagation for every situation. A green arrow defines the confusion between two classes. A black arrow defines class subcategorization.
Such a finding can be very important in the context of Explainable Artificial Intelligence [30]. First, this implies that restricted KCG can be used to identify the strength and weaknesses of every machine learning model in a mono-label multi-class classification problem (i.e., the categories that are classified the best or the worst by each considered model). This is clearly encouraged by the different trends of accuracy rates for every model in Table 1 towards various situations of restricted KCG. This can also be useful to expand comparisons of the accuracy of classical machine learning models in mono-label classification and obtain explanations for their outcomes through a precise mapping of category types where each model achieves better precision and recall [21, 29]. At a later stage, this can even contribute to the development of fully explainable hybrid models for the classification of items according to heterogenous types of categories with a less computational complexity and a better quality. Second, KCG can be very efficient in debugging datasets to identify its limitations that affect the work of mono-label classification models. This can be achieved using a process of restricted KCG imitating multivariate linear regression modeling that eliminates one-by-one the variables affecting the correlation until the best fit is achieved [31]. The benefit of such a process can range from eliminating labeling faults in datasets and identifying a scarcity of items for a given category to the recognition of deficiencies in the considered classification taxonomy including the useless splitting of a unique category and the false association between a category and its hypernym.
5 Conclusion In this research work, we investigate how knowledge-driven class generalization (KCG) can affect the accuracy of the classical Machine Learning models for mono-label classification. We found that KCG significantly affects the training accuracy of the main classical machine learning models (Perceptron, Random Forest, Support Vector Machine, K-NN
644
H. Turki et al.
and DT) where the variation of accuracy measures is influenced by the situation of confusion categories inside the classification taxonomy. Given this, KCG can be a tool that reveals the deficiencies of machine learning algorithms for mono-label classification and that feeds discussions about the improvement of machine learning models in the quest of Explainable Artificial Intelligence. As a future direction of this work, we are working to reproduce this work to assess the effect of KCG on multi-label classification as well as on deep learning algorithms for supervised mono-label classification, particularly AlexNet, ResNet, DenseNet and VGG [32]. We also look forward to enhancing supervised classification evaluation by generating metrics that combine statistical methods with other taxonomy-based metrics such as semantic similarity measures [15]. Acknowledgements. This research work is supported by the Ministry of Higher Education and Scientific Research, Tunisia (MoHESR) through the Federated Research Project PRFCOV19-D1P1 as well as of the Young Researcher Support Program Grant 19PEJC08-03. We thank Johannes Rieke (TU Berlin, Germany) for providing source codes at https://traingenerator.jrieke.com/. We also thank Leila Zia (Wikimedia Foundation, United States of America) and Boulbaba Ben Ammar (University of Sfax, Tunisia) as well as four reviewers for contributing to the development of this final output.
References 1. Tsoumakas, G., Katakis, I.: Multi-Label Classification: An Overview. Aristotle University of Thessaloniki (2006) 2. Buda, M., Maki, A., Mazurowski, M.A.: A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw. 106, 249–259 (2018). https://doi.org/10.1016/ j.neunet.2018.07.011 3. Luque, A., Carrasco, A., Martín, A., de las Heras, A.: The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recognit. 91, 216–231 (2019). https://doi.org/10.1016/j.patcog.2019.02.023 4. Shahinfar, S., Meek, P., Falzon, G.: “How many images do I need?” Understanding how sample size per class affects deep learning model performance metrics for balanced designs in autonomous wildlife monitoring. Eco. Inform. 57, 101085 (2020). https://doi.org/10.1016/ j.ecoinf.2020.101085 5. Blatchford, M.L., Mannaerts, C.M., Zeng, Y.: Determining representative sample size for validation of continuous, large continental remote sensing data. Int. J. Appl. Earth Obs. Geoinf. 94, 102235 (2021). https://doi.org/10.1016/j.jag.2020.102235 6. Guo, Y., Graber, A., McBurney, R.N., Balasubramanian, R.: Sample size and statistical power considerations in high-dimensionality data settings: a comparative study of classification algorithms. BMC Bioinf. 11(1), 1–19 (2010). https://doi.org/10.1186/1471-2105-11-447 7. Yang, Y.-Y., Rashtchian, C., Salakhutdinov, R., Chaudhuri, K.: Close Category Generalization for Out-of-Distribution Classification. In: SoCal ML & NLP Symposium 2021, 5:1–5:16. University of California San Diego, San Diego, California (2021) 8. Jiang, S., Xu, T., Guo, J., Zhang, J.: Tree-CNN: from generalization to specialization. EURASIP J. Wirel. Commun. Netw. 2018(1), 1–12 (2018). https://doi.org/10.1186/s13638018-1197-z 9. Carvalho, P.F., Chen, C.-H., Chen, Y.: The distributional properties of exemplars affect category learning and generalization. Sci. Rep. 11, 1 (2021). https://doi.org/10.1038/s41598-02190743-0
How Knowledge-Driven Class Generalization Affects
645
10. Rieke, J.: Traingenerator – A Web App to Generate Template Code for Machine Learning. GitHub (2020). https://traingenerator.jrieke.com 11. Vaidya, G., Kontokostas, D., Knuth, M., Lehmann, J., Hellmann, S.: DBpedia commons: structured multimedia metadata from the wikimedia commons. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9367, pp. 281–289. Springer, Cham (2015). https://doi.org/10.1007/ 978-3-319-25010-6_17 12. Turki, H., Hadj Taieb, M.A., Ben Aouicha, M.: MeSH qualifiers, publication types and relation occurrence frequency are also useful for a better sentence-level extraction of biomedical relations. J. Biomed. Inform. 83, 217–218 (2018). https://doi.org/10.1016/j.jbi.2018.05.011 13. Turki, H., Hadj Taieb, M.A., Ben Aouicha, M., Fraumann, G., Hauschke, C., Heller, L.: Enhancing knowledge graph extraction and validation from scholarly publications using bibliographic metadata. Front. Res. Metrics Anal. 6, 694307 (2021). https://doi.org/10.3389/ frma.2021.694307 14. Wu, B., et al.: Tencent ML-Images: a large-scale multi-label image database for visual representation learning. IEEE Access 7, 172683–172693 (2019). https://doi.org/10.1109/ACC ESS.2019.2956775 15. Ben Aouicha, M., Hadj Taieb, M.A., Ezzeddine, M.: Derivation of “is a” taxonomy from wikipedia category graph. Eng. Appl. Artif. Intell. 50, 265–286 (2016). https://doi.org/10. 1016/j.engappai.2016.01.033 16. Seo, H., et al.: Machine learning techniques for biomedical image segmentation: an overview of technical aspects and introduction to state-of-art applications. Med. Phys. 47(5), e148–e167 (2020). https://doi.org/10.1002/mp.13649 17. Chen, Z., Zhu, Z., Jiang, H., Sun, S.: Estimating daily reference evapotranspiration based on limited meteorological data using deep learning and classical machine learning methods. J. Hydrol. 591, 125286 (2020). https://doi.org/10.1016/j.jhydrol.2020.125286 18. Li, R.Y., Di Felice, R., Rohs, R., Lidar, D.A.: Quantum annealing versus classical machine learning applied to a simplified computational biology problem. NPJ Quant. Inf. 4, 1 (2018). https://doi.org/10.1038/s41534-018-0060-8 19. Menger, V., Scheepers, F., Spruit, M.: Comparing deep learning and classical machine learning approaches for predicting inpatient violence incidents from clinical text. Appl. Sci. 8(6), 981 (2018). https://doi.org/10.3390/app8060981 20. Shah, K., Patel, H., Sanghvi, D., Shah, M.: A comparative analysis of logistic regression, random forest and KNN models for the text classification. Augment. Human Res. 5(1), 1–16 (2020). https://doi.org/10.1007/s41133-020-00032-0 21. Zhang, Y., et al.: Comparison of machine learning methods for stationary wavelet entropybased multiple sclerosis detection: decision tree, k-nearest neighbors, and support vector machine. SIMULATION 92(9), 861–871 (2016). https://doi.org/10.1177/0037549716666962 22. Ferrada, S., Bustos, B., Hogan, A.: IMGpedia: a linked dataset with content-based analysis of wikimedia images. In: d’Amato, C., et al. (eds.) ISWC 2017. LNCS, vol. 10588, pp. 84–93. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68204-4_8 23. Huang, S.: An Image Classification Tool of Wikimedia Commons. Humboldt-Universität zu Berlin (2020). https://doi.org/10.18452/21576 24. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al.: Scikitlearn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011) 25. Kadiyala, A., Kumar, A.: Applications of python to evaluate environmental data science problems. Environ. Prog. Sustain. Energy 36(6), 1580–1586 (2017). https://doi.org/10.1002/ ep.12786 26. Turki, H., Hadj Taieb, M.A., Ben Aouicha, M.: Semantics-aware dataset for the mono-label supervised classification of animals. Zenodo, 4514256 (2021). https://doi.org/10.5281/zen odo.4514256
646
H. Turki et al.
27. Krizhevsky, A.: Learning Multiple Layers of Features from Tiny Images. University of Toronto (2009) 28. Yoon, K., Kwek, S.: An unsupervised learning approach to resolving the data imbalanced issue in supervised learning problems in functional genomics. In: Fifth International Conference on Hybrid Intelligent Systems (HIS’05), p. 6. IEEE, Rio de Janeiro, Brazil (2005). https:// doi.org/10.1109/ICHIS.2005.23 29. Kim, J., Kim, B.-S., Savarese, S.: Comparing image classification methods: K-nearestneighbor and support-vector-machines. In: Proceedings of the 6th WSEAS International Conference on Computer Engineering and Applications and Proceedings of the 2012 American conference on Applied Mathematics, pp. 133–138. WSEAS (2012). https://doi.org/10.5555/ 2209654.2209684 30. Hoffman, R.R., Mueller, S.T., Klein, G., Litman, J.: Metrics for explainable AI: challenges and prospects. arXiv preprint arXiv:1812.04608 (2018) 31. Yang, L., Liu, S., Tsoka, S., Papageorgiou, L.G.: Mathematical programming for piecewise linear regression analysis. Expert Syst. Appl. 44, 156–167 (2016). https://doi.org/10.1016/j. eswa.2015.08.034 32. Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708. IEEE, Honolulu (2017). https://doi.org/10.1109/CVPR. 2017.243
Deep Residual Network for Autonomous Vehicles Obstacle Avoidance Leila Haj Meftah(B) and Rafik Braham PRINCE Research Lab ISITCom H- Sousse, Sousse University, 4011 Sousse, Tunisia [email protected], [email protected]
Abstract. With the advancement of artificial intelligence and machine learning, autonomous automobiles are emerging as a lucrative subject of study and a source of interest for car companies. This framework contains our research. The fundamental purpose of this research is to propose an obstacle avoidance strategy for self-driving vehicles. We are looking at developing a model for high-quality obstacle avoidance prediction for autonomous cars that is based on images generated by our virtual simulation platform and then utilized with a ResNet50 deep learning technique. The primary challenge for an autonomous car is to move without collision. For autonomous vehicle simulation research, the suggested technique is feasible, efficient, and trustworthy. The performance of the proposed design is then compared to that of current architectures. The experimental results suggest that the ResNet50 design outperforms the other approaches tested. Keywords: ResNet50 · Deep learning Obstacle avoidance · Simulation
1
· Autonomous vehicles ·
Introduction
Over the next decade, autonomous vehicles will have a massive economic impact. Developing models that can compete with or outperform human drivers might save thousands of lives each year. The task at hand is to develop an open source autonomous vehicle [1]. Humans can easily identify obstacles, lane markings, and other vehicles, but computers find this task extremely difficult. Machine learning algorithms can help to solve the problem of simulating human perception. Although machine learning algorithms are a promising approach to autonomous driving, they are not without their own set of difficulties. One such challenge is the quantity of data that is required to train them. A combination of machine learning and other algorithms enables autonomous cars. These algorithms rely largely on data gathered by human drivers and necessitate a tiered approach to converting input into vehicle controls. This data is frequently made up of video feeds from several on-board sensors like as cameras, lidar, radar, and infrared. This data may be utilized to optimize or train algorithms, basically functioning as experience from which to learn. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 647–656, 2022. https://doi.org/10.1007/978-3-030-96308-8_60
648
L. H. Meftah and R. Braham
Deep Learning and Artificial Intelligence (AI) have been the driving forces behind several advancements in computer vision [1,2], robotics, and Natural Language Processing (NLP). They also have a significant influence on the current autonomous driving revolution in academics and business. Autonomous Vehicles (AVs) and self-driving cars start to transition from laboratory research and testing to public road driving. Their incorporation into our environmental scene promises a reduction in traffic accidents and congestion, as well as an increase in mobility in congested areas [3]. Deep learning innovations have occurred since Hinton’s publication of a novel deep structured learning architecture known as the deep belief network (DBN) [4]. Hinton, Bengio, and LeCun, the founders of deep learning, were nominated for the ACM Turin Award in 2019. Deep learning techniques have advanced rapidly during the last decade, having a substantial influence on signal and information processing. Hinton’s group won first place in the ImageNet Challenge 2002, employing a new Convolutional Neural Network (CNN) dubbed AlexNet [5]. ResNet is one of the most effective deep neural networks, and it performed spectacularly in the ILSVRC 2015 classification challenge [6]. ResNet demonstrated outstanding generalization performance on additional identification tasks, taking first place in the ILSVRC and COCO 2015 contests for ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation. There are several ResNet architectural versions, i.e. the same principle but with a different number of layers. ResNet-18, ResNet-34, ResNet-50, ResNet-101, ResNet-110, ResNet-152, ResNet-164, ResNet-1202 are a few examples [6]. The term ResNet followed by a two or more digit number merely indicates that the ResNet model has a specific number of neural network layers. In this paper, we will look through ResNet-50 in depth, which is one of the most active networks on its own.
2
Related Work
A variety of methodologies have been utilized in autonomous vehicle research. In this section, we focus on those who employ deep neural networks, and we provide an overview of important research using that approach, with a special emphasis on those who use simulations, end-to-end networks, or real-world verification. By inventing the Autonomous Land Vehicle in a Neural Network (ALVINN) technology, Pomerleau (1989) [7] pioneered the use of a neural network for autonomous vehicle navigation. The model structure was simple, consisting of a modest, fully-connected network by today’s standards. Based on pixel inputs, the network predicted actions in simple driving scenarios with little barriers. However, it demonstrated the potential of neural networks for end-to-end autonomous navigation. Ciresan et al. developed a neural network capable of accurately recognizing street signs 90.4% of the time in their 2012 research article [8]. While the suggested system was trained on driving-related pictures, the classifier may be taught to categorize any sort of image.
Deep Residual Network for Autonomous Vehicles Obstacle Avoidance
649
Ciresan and al in addition, published another article in 2012 employing a similar neural network that was trained to classify human handwriting [9]. On the MNIST (Mixed National Institute of Standards and Technology) a handwriting data set, that network had an error rate of 0.48%. Ciresan and al study’s demonstrates that neural networks are extremely capable picture classifiers. According to the authors, their classifier is indifferent to contrast and light. Increasingly recent efforts to use deep CNNs and RNNs to handle the difficulties of video categorization, scene interpretation [10], and object identification [11] have spurred the use of more complex CNN architectures in autonomous driving. The paper “Learning Spatio-temporal Features with 3D Convolutional Networks” describes how to build 3D Convolutional Networks to capture spatiotemporal features in a sequence of pictures or videos [12]. To the aimed to contribute, Carnegie Mellon University’s ALVINN [7] was the first to deploy a neural network for driving. ALVINN’s steering output was controlled by a reactive neural network. Reflexive neural networks immediately translate input data to a control output. In this example, the input is ALVINN’s camera picture, and the output is a steering angle [13,14]. The control network was trained with real-world data and tested on a real-world vehicle. ALVINN’s neural network had to function on relatively little inputs due to the time’s low processing capacity; the pictures given to the network were just 30 × 32 pixels. Additionally, this paper [15] introduce a CNN model for self-driving vehicles that avoid obstacles Clearly, the fundamental problem for an autonomous car is to navigate without colliding. The proposed solution for autonomous vehicle simulation study is practical, efficient, and reliable using this technology. Further, this article [16] offers a learning strategy utilizing the VGG16 architecture, with the output of the suggested architecture being compared to the present other architecture. The experimental findings show that the VGG16 with the transfer learning architecture outperformed the other evaluated techniques. In this case, we need to tackle the obstacle avoidance problem for autonomous cars otherwise, the difficulty is to utilize a very deep CNN approach. We decided to use the ResNet50 architecture.
3 3.1
Methodology Deep Residual Networks
ResNet is an abbreviation for Residual Network. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun originally described it in their 2015 computer vision research article titled ‘Deep Residual Learning for Image Recognition’ [6]. ResNets, also known as Deep Residual Networks, proposed a method for solving complicated issues in CNNs as the network became deeper. This model was enormously successful, as evidenced by its ensemble winning first place in the ILSVRC 2015 classification competition with an error of only 3.57%. It also won first place in the ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation categories in the 2015 ILSVRC & COCO competitions.
650
L. H. Meftah and R. Braham
ResNet has various variants that employ the same idea but have differing numbers of layers. Resnet50 refers to the version that can operate with 50 neural network layers. Because the separate layers may be taught for different duties to generate extremely accurate results, these extra layers help in the speedier resolution of complex challenges. ResNet was founded with the express purpose of addressing this issue. To improve model accuracy, deep residual nets employ residual blocks. The concept of “skip connections,” which is at the heart of the residual blocks, is the strength of this type of neural network. There are two ways for these skip connections to work. This guarantees that the model’s upper levels perform no worse than its lower layers. 3.2
Model Architecture
The challenge is to train a deep learning model to allow a car to drive itself around a track in a driving simulator without colliding with any obstacles. It is a supervised regression problem involving vehicle control and real-time road pictures from a vehicle’s cameras. We utilized ResNet 50 as the pre-trained weights to create the model. We removed the final four layers to create our own custom neural network. The ResNet-50 model is divided into five stages, each having a convolution and an identity block. Each convolution block contains three convolution layers, and each identity block contains three convolution layers as well. There are about 23 million trainable parameters in the ResNet-50. We utilized the flatten layer on top of the heavy resnet architecture to equalize the weights. Following that, we utilized three dense layers of 100, 50, and 10 neurons, with elu as the activation function. In between, we utilized 50% dropouts to reduce over-fitting of the values to the training set (Fig. 1).
Fig. 1. Model structure using Resnet 50
Deep Residual Network for Autonomous Vehicles Obstacle Avoidance
3.3
651
Simulation Phase
We utilize our virtual simulation platform it’s similar to car racing game, which has been deployed on two separate routes. One was utilized to acquire training data, whereas the other was never viewed by the model as a replacement for the test set. The driving simulator would save frames from three front-facing “cameras,” which would capture data from the car’s perspective, as well as various driving characteristics such as throttle, speed, and steering angle. We will feed camera data into the model and expect it to anticipate how to avoid obstacles. We redesigned the road’s shape, constructed a new track, and added obstacles to the scenario. Among these obstacles, we picked yellow cubes that are randomly dispersed on the route. Following the recording of these modifications, the simulator is ready to use them to collect data on safe driving behavior. The goal is to collect road navigation data from three cameras installed on the car. Then, using this data, such as training data, we build a model to be used in the simulator’s autonomous mode. 3.4
Collecting Training Data
Before we could begin the deep neural network (Resnet50) training process, we needed to collect the data that would be used to train the network. Our virtual simulation platform primary function is to generate training data. In the background, a data collecting script captures information on the driver’s steering wheel angle and throttle value. The simulator’s images are annotated with the relevant steering and throttle settings. After the data has been captured, it is utilized to train the neural network as an example. The image recorded by the vehicle camera is sent into the neural network, which is then requested to generate the appropriate steering and throttle settings. The network can learn to adapt to a range of driving circumstances and make generalizations about the task of driving by detecting the conditions present in each image. To further enlarge the collection of data, we enhance it by cropping chosen portions of each image, which offers examples of poor driving and the correction required in certain circumstances. This not only increases the amount of data we have, but it also gives the neural network the capacity to rectify its mistakes (Fig. 2).
4
Experiments, Results and Discussion
We must divide the data and apply the 80–20 rule, which states that we should use 80% for training and the remaining 20% for testing the model on unseen images. We also displayed the distributions of sample training and validation data (Fig. 3). • The pre-processing task: We need to do some pre-processing activities to our image dataset created by our virtual simulation platform (VSim-AV), such as: • Image sizing. • Image augmentation.
652
L. H. Meftah and R. Braham
Fig. 2. Recording training data from the simulator to avoiding obstacles scenario
Fig. 3. Data distribution
4.1
Data Augmentation
The brightness is adjusted at random to mimic different lighting situations. We create augmented images with varying brightness by first converting images to HSV, then scaling up or down the V channel and converting back to RGB. Shadow Augmentation: Shadows are projected across images at random. The assumption is that even if the camera has been shaded (perhaps by rain or dust), the model should still be able to forecast the right position on the road. This is accomplished by selecting random points and darkening all of the points on one side of the image. Horizontal and Vertical Shifts: To simulate the impact of the vehicle being at different places on the road, we shift the camera images horizontally and add an offset equal to the shift to the position. We also randomly move the images vertically to simulate driving up or down a slope. Rotation: The images were likewise rotated around the center during the data augmentation experiment. The concept behind rotations is that the model should be independent of camera position, and that rotations may reduce prevent overfitting. Pre-processing: We went on to conduct some image processing. We cropped the image to eliminate extraneous features, converted it to YUV format, applied gaussian blur, reduced the size for simpler processing, and normalized the results.
Deep Residual Network for Autonomous Vehicles Obstacle Avoidance
653
We normalize the value range from [0,255] to [−1, 1] for each image by image= −1+2*original images/255. The image is then rescaled to a 227 × 227 × 3 square for the deep residual network model (Fig. 4).
Fig. 4. Pre-processed image
4.2
Training Process
Loss Function and Optimisation: The mean-square-loss function was utilized in our model. This loss function is frequently used in regression situations. Large deviations are severely punished by the MSE. Simply put, this function is the mean of the total of the squared variances between the actual and anticipated values (see Eq. 1). The results were presented as the root-mean-square error, abbreviated as RMSE, which is just the square root of the MSE. 1 1 2 (y1 − yˆ1 ) ....RM ES = (y1 − yˆ1 )2 (1) M ES = n n As the learning rate optimizer, Adam’s solution to stochastic gradient descent (SGD) [17,18] is used to facilitate model testing. The mean squared error (MSE) is used to determine the model’s output, as shown in Eq. 1. To minimize this loss, the Adam optimizer was created. These optimizers are also the preferred option for deep learning applications. Initial testing of this model revealed that the change in loss level decreased after a few epochs. Although Adam’s formula computes an adaptable learning rate, decay of the learning rate was utilized. The optimizer’s decay rate was changed from 0 to the learning rate divided by the number of epochs. During training, the Keras Adam optimizer’s other default settings produced good results. TensorFlow is an open-source library that was developed by the Google Brain team. It employs methods of automated learning based on the idea of deep neural networks (deep learning). A Python API is provided. Keras is a Python package that encapsulates access to the functionalities offered by many machine learning frameworks, especially Tensorflow. Keras does not yet provide these techniques natively. It readily interfaces with Tensorflow.
654
L. H. Meftah and R. Braham
Because the result of our Google Colab or Collaboratory is a Google cloud service based on Jupyter Notebook and designed for machine learning training and analysis. This platform allows Machine Learning models to be trained directly in the cloud. So, without having to install anything other than a browser on our machine. The training results were collected at 80% and the validation data at 20%. It is crucial to realize that deep learning is stochastic, thus there will always be some inaccuracy in prediction. This is decided through the testing method. The prediction error may be exacerbated by the quality of the learning data and the learning process, as well as the stochastic nature of the algorithms utilized. When the maneuver to avoid an impediment is accomplished, the autonomous vehicle switches from the right to the left lane. The prediction is based on the car’s location on the road. 4.3
Discussion
Now that we’ve constructed our model and used it to make data predictions, it’s time to evaluate its success. We may visually compare the predictions to the real validation data, or we can use any sort of measure (in our instance, loss) to evaluate actual performance (Fig. 5).
Fig. 5. The Loss value
To summarize, we have a loss value of 0.05. This measure is excellent, and we have developed a highly dependable model. Finally, with a batch size of 128, We trained the model for 25 epochs. We also obtained by plotting the training and validation loss as a function of epochs. In only 25 epochs, the model is converging pretty well. It indicates that it is learning a reasonably good policy for driving a car in unknown road settings. Feel free to experiment with the hyper-parameters for better results. We believe that the model developed in this study is superior and more effective. Following the development of the model, our VSim-AV simulator’s autonomous mode is followed by the autonomous vehicle.
Deep Residual Network for Autonomous Vehicles Obstacle Avoidance
655
There are no collisions in this area. We were pleasantly impressed by how nicely the vehicle performed on the test track. They’d never seen this track before. The performance on the practice track was a little off, but I believe that’s acceptable since it demonstrates that the car wasn’t just memorizing the circuit. It recovered effectively from a few crucial circumstances, despite the fact that none of those moves had been.
5
Conclusion and Future Work
We addressed the development of a deep residual network model to be utilized later in autonomous mode in this paper. Training data are utilized in this research to train a deep neural network, specifically a deep residual network. The objective is to mimic human driver behavior in the face of a dynamic motion to avoid obstacles. The fact that the training data are images explains the decision to use a deep learning technique. Using the keras: Python Deep Learning package, we trained, verified, and tested a model. The output data was recorded in a model and will be utilized later for an autonomous vehicle. The strategy described in this paper demonstrates that the deep residual network is an excellent choice for training an autonomous vehicle how to avoid obstacles. The challenges for future work include ensuring that the autonomous vehicle avoids high-speed collision-free obstacles, and then changing the obstacles with other automobiles to resolve this move in a safe manner while also treating the scenario of bad weather.
References 1. Huang, Y., Chen, Y.: Autonomous driving with deep learning: a survey of stateof-art technologies. arXiv preprint arXiv:2006.06091 (2020) 2. Bagloee, S.A., Tavana, M., Asadi, M., Oliver, T.: Autonomous vehicles: challenges, opportunities, and future implications for transportation policies. J. Modern Transp. 24(4), 284–303 (2016). https://doi.org/10.1007/s40534-016-0117-3 3. Takaya, K., Asai, T., Kroumov, V., Smarandache, F.: Simulation environment for mobile robots testing using ROS and gazebo. In: 2016 20th International Conference on System Theory, Control and Computing (ICSTCC), pp. 96–101, IEEE (2016) 4. Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006) 5. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012) 6. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) 7. Pomerleau, D.A.: Alvinn: an autonomous land vehicle in a neural network. Technical report Carnegie-Mellon Univ Pittsburgh pa artificial intelligence and psychology (1989)
656
L. H. Meftah and R. Braham
8. Ciresan, D.C., Meier, U., Masci, J., Gambardella, L.M., Schmidhuber, J.: Flexible, high performance convolutional neural networks for image classification. In: Twenty-Second International Joint Conference on Artificial Intelligence (2011) 9. Ciresan, D., Giusti, A., Gambardella, L., Schmidhuber, J.: Deep neural networks segment neuronal membranes in electron microscopy images. Adv. Neural. Inf. Process. Syst. 25, 2843–2851 (2012) 10. Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1915–1929 (2012) 11. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Largescale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014) 12. Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015) 13. Muller, U., Ben, J., Cosatto, E., Flepp, B., Cun, Y.: Off-road obstacle avoidance through end-to-end learning. In: Advances in Neural Information Processing Systems, pp. 739–746, Citeseer (2006) 14. Bojarski, M., et al.: End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016) 15. Meftah, L.H., Braham, R.: A virtual simulation environment using deep learning for autonomous vehicles obstacle avoidance. In: 2020 IEEE International Conference on Intelligence and Security Informatics (ISI), pp. 1–7, IEEE (2020) 16. Meftah, L.H., Braham, R.: Transfer learning for autonomous vehicles obstacle avoidance with virtual simulation platform. In: Abraham, A., Piuri, V., Gandhi, N., Siarry, P., Kaklauskas, A., Madureira, A. (eds.) International Conference on Intelligent Systems Design and Applications, pp. 956–965, Springer, Cham (2020). https://doi.org/10.1007/978-3-030-71187-0 88 17. Bottou L.: Stochastic gradient descent tricks. In: Montavon, G., Orr, G.B., M¨ uller, KR. (eds) Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science, vol.7700. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-64235289-8 25 18. Hsieh, S.-T., Sun, T.-Y., Lin, C.-L., Liu, C.-C.: Effective learning rate adjustment of blind source separation based on an improved particle swarm optimizer. IEEE Trans. Evolution. Comput. 12(2), 242–251 (2008)
Modeling Travelers Behavior Using FSQCA Oumayma Labti(B)
and Ez-zohra Belkadi
Laboratory of Research in Management, Information and Governance, Faculty of Juridical Economic and Social Sciences Ain-Sebaa, Hassan II University of Casablanca, Route des Chaux et Ciments Beausite, BP 2634 Casablanca, Morocco
Abstract. The fsQCA (fuzzy set Qualitative Comparative Analysis) method was used in this paper to examine consumer buying behavior through OTAs (Online travel Agencies). The main purpose of this study is to determine the necessary conditions and combinations of conditions that drive high levels of purchase intention and purchase behavior of travel via the OTAs. Using an empirical study of 374 consumers, we have identified the necessary conditions and configurations that influence consumer buying behavior through OTAs. The findings indicated the critical role of attitude, perceived relative benefits, communicability, and personal innovation in ensuring purchase intention and behavior among OTAs. Further, the results showed that purchase intention is a necessary condition for travel purchase behavior from OTAs. Keywords: Purchase Behavior (PB) · Purchase Intention (PUI) · Fuzzy Set Qualitative Comparative Analysis (fsQCA)
1 Introduction Travel publications have proven that implementing a booking canal seems to vary depending on the users’ previous experiences, attitudes, and personal traits [1], which are in the majority of instances unapparent. For example, it is proven that the motivation to embrace and employ a technology hinges on consumers’ attitudes and thoughts about the risk it entails [2]. From the preceding discussion, additional research is still required to gain substantial knowledge regarding the drivers of consumers’ purchasing intention and behavior through OTAs. The paper focuses on clarifying the key drivers behind the consumers’ PB of travel from OTAs. Firstly, the majority of studies have targeted factors impacting online booking intentions. This paper fills this gap by examining the drivers of actual travel purchasing behavior via OTAs. Secondly, most studies have concentrated on the purchasing behavior antecedents for booking travel online in an isolated way. However, they have not attempted to investigate the combinations of factors that may contribute to high PUI and PB levels through OTAs. We assume that several elements drive human behavior to perform a specific behavior and that human behavior is seen as a multifactorial state. To address this issue, we are looking for variables and combinations of variables that impact travel purchase behavior. Thirdly, earlier research has looked at the © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 657–666, 2022. https://doi.org/10.1007/978-3-030-96308-8_61
658
O. Labti and E. Belkadi
antecedents of purchasing intention towards online booking, whether in industrialized or emerging markets. The present study investigates the predictors of attitude, PUI, and PB concerning travel booking through OTAs of North African consumers. Moreover, a new methodological analysis is introduced in this paper to examine consumers’ intention and behavior when purchasing travel from OTAs, applying a FSQCA. Using FSQCA can generate more in-depth information [3]. The FSQCA approach is becoming increasingly prominent in information systems (IS) and information technology (IT) research [4]. Consequently, for the study’s objectives, this approach is applied to extend and refine the outcome. As a result, this research provides concrete, theoretical and methodological contributions. The document has three principal contributions to the available literature. Firstly, it examined PB as a key variable for purchasing through OTAs. Secondly, it discusses the necessary conditions and the different combinations of variables that lead to high levels of purchasing intention and behavior through OTAs. Finally, it also investigated the combinations of factors that may prevent consumers from purchasing travel using OTAs. The document is structured as follows. In Sect. 2, the theoretical background and the hypotheses of the study are presented. Section 3 illustrates the methodology employed in the research. Then Sect. 4 summarizes the key findings of the empirical analysis. Finally, Sect. 5 concludes the paper.
2 Literature Review 2.1 Online Consumer Behavior The behavior of online consumers was the focus of various studies, theories, models, and constructions. Most of the academic papers include Fishbein and Ajzen’s TRA [5], TAM [6] and UTAUT [7]. A useful implementation can be found in the field of online booking and tourism management [8]; as the Internet facilitates access to travel data, customers have to rationalize their requests to obtain relevant touristic information so that they can manage and organize their travel. The majority of surveys’ focus has been on intentions [9], instead of focusing on actual behaviors. Previous investigations have pointed out the importance of socio-demographic characteristics, risk, attitude, and trust in shaping online travel purchasing intentions [10]. Further analysis has been conducted to explore the drivers of consumer’ willingness to book travel online [11]. Those papers have demonstrated many integrative and interdisciplinary perspectives [12]. Recent scholars have further developed the analysis by integrating additional theories like the consumption values theory to assess the PUI travel online [13]. In contrast, few investigations have revealed the consumers’ personalities, admitting that online travel consumers are inclined to be more innovative and more technologically advanced [14]. So far, we are not aware that any prior study has looked at consumer PB instead of purchasing intention through OTAs. 2.2 Conceptual Model The conceptual model is composed of six predictors, consisting of personal innovativeness, extroversion, compatibility, communicability, perceived relative advantages, and
Modeling Travelers Behavior Using FSQCA
659
experience, and attitude toward purchasing travel using OTAs, the PUI from OTAs, and the PB through OTAs. 2.2.1 PB and PUI in the Online Context He et al. (2008) [15] consider that the lack of PUI is really a significant barrier to ecommerce growth and dramatically affects online business. Limayem et al. (2000) [16] have examined online PB on the Web and have confirmed empirically the fact that online PB is driven directly by the intention to purchase. Several studies have demonstrated a meaningful and positive relationship linking consumers’ PUI to their online PB [17]. According to Bhatti (2018) [18], it is essential to analyze the PUI of the consumer that mediates the online buying behavior. Consumer PB is considered as a construct that has been investigated in several areas related to marketing, such as green marketing [19], shopping for luxury goods [20], business-to-business dealings [21] as well as online shopping [22]. The intention is a major driver of online PB [23]. In contrast, Lee and Johnson (2002) [24] have highlighted in their paper that online PUI may not translate into actual online shopping. Lim et al. (2016) [25] report that both online PUI and online PB should receive more attention. According to a recent study, it is necessary for the future to target online PB [26]. Regarding online travel purchasing, several studies have addressed online purchasing intention as the main dependent variable [27], instead of the actual PB of travel through OTAs. 2.2.2 Communicability Chawla et al. (2015) [12] have highlighted, in particular, the relevance of communicability for determining the PUI travel online. Following Amaro et al. (2016) [28], Communicability in this study indicates others’ influence. Given that only a few studies have tackled the topic of communicability in the field of online travel purchasing, this research will explore the relation between communicability and consumer behavior. 2.2.3 Compatibility The consistent influence of compatibility on the decision-making process regarding online shopping behavior was noted in many earlier surveys showing a robust association of compatibility with attitudes [29]. Several studies have highlighted the relevant role of compatibility for predicting consumers’ online intentions [30]. Compatibility has been shown to be significantly and positively correlated to intentions to book travel online [31]. Peña-García et al. (2020) [17] have found a significant and positive relationship between compatibility and online PUI. Similarly, the perceived compatibility appeared to be an essential predictor of online travel purchases. Amaro and Duarte’s (2015) [10] findings show that compatibility impacts positively attitude and online PUI of travel. 2.2.4 Perceived Relative Advantages Relatively recent research performed by Tan and Ooi (2018) [32] reported time-saving to positively affect online shopping’s perceived usefulness, which positively impacts consumers’ PUI online. Tomás Escobar-Rodríguez and Bonsón-Fernández (2017) [33]
660
O. Labti and E. Belkadi
have also suggested a positive relationship between time savings and the perceived value that enhance the intention to buy online. Dani (2017) [34] also recently completed an additional study where four key parameters were found as influencing consumers to buy online: time savings, convenience, website design, and security. Convenience has always been a significant aspect when it comes to online shopping for clients. Convenience is linked within an online context to the time and energy savings perceived by shoppers to buy and engage with online services [35]. There are different convenience gains from online shopping, including time savings, more flexibility, and decreased physical efforts [36]. Convenience has been shown to hold a major significance for explaining consumers’ online shopping behavior. Multiple scholars that have examined consumers’ online buying behavior consistently have identified convenience as one of the key drivers pushing customers to buy online. As a result, purchasers highly motivated by convenience are more inclined to shop from online retailers [37]. Kaur (2018) [38] has examined how purchasing direction affects consumers’ online PUI. He has identified that convenience orientation and shopping pleasure orientation had the most potent effect on consumers’ online PUI. Akram (2018) [39] investigated the relationship between online shopping benefits and consumer online PUI and revealed that convenience remained by far to be the most prominent predictor of consumer online PUI. Khan and Khan (2020) [40] also determined that convenience positively affects attitudes regarding e-shopping. It has been indicated that experience increases consumers’ comfort level [41]. In the context of travel, this variable has not assumed much weight, unlike other areas where this variable is an important predictor of consumer behavior. 2.2.5 Personal Innovativeness San Martín and Herrero (2012) [42] have demonstrated that consumers’ innovative character positively affects the PUI online in rural tourism. Consumers’ innovativeness has been consistently shown to be positively related to online booking of travel [43] and has been proven to mediate the relationship that exists among travelers’ attitudes and their PUI of travel online [44]. In fact, shoppers of online travel are generally prone to high technology, tend to be highly open towards new advances in technology [45], and enjoy experimenting and trying out new technologies [46].
3 Data and Methods 3.1 Data Collection A questionnaire was used in this study. The survey tool was initially pretested through face-to-face interviews with 20 participants to verify the questions’ comprehensibility. The questionnaire was provided in Arabic, French, and English. A sample of 415 respondents was collected. But only the responses of consumers who have already made an online travel purchase were considered. Therefore, once incomplete responses were excluded, 374 surveys were used for data analysis. The items employed were derived from pertinent literature and published studies that have been adjusted and validated through empirical analysis to suit this research’s specific needs. In the primary survey,
Modeling Travelers Behavior Using FSQCA
661
each item was scored based on a five-point Likert scale, anchored by (1) strongly disagree to (5) strongly agree, to rate the extent of accord or disaccord concerning the constructs. 3.2 FSQCA Approach The purpose of fsQCA consists in identifying all potential combinations of causal conditions by which a specific outcome may be achieved. The FSQCA method has been designed principally to capture the intricate linear and non-linear combinations between the antecedents of the PUI and the PB through OTAs. In complement, the FSQCA also represents a derivative of Qualitative Comparative Analysis (QCA), disclosing the elements required or adequate to generate an appropriate outcome [47]. Rubinson (2013) [48] further claims that the FSQCA is the highest of the QCA type. The major advantage of the FSQCA over alternative statistical techniques is the fact as it maintains equifinality, implying that many diverse combinations of factors could drive a specific outcome.
4 Results FSQCA is an asymmetric modeling method integrating fuzzy sets with fuzzy logic. Its purpose consists of discovering all possible associations of causal conditions from which a particular output can be obtained. For this paper, high levels of purchasing intention through OTAs and high purchasing behavior levels through OTAs represent the outcomes/results. A key element within the FSQCA is the calibration of the data. At this stage, the researcher must convert (calibrate) all of the variables’ numbers into fuzzy sets. This is achieved by verifying the database to remove any missed numbers. No values were reported missing in the sample. Then, since this study’s causal conditions (variables) consisted of multi-item dimensions, each component’s mean scores were calculated. The literature supports using the direct method by defining three values: full membership, nonfull membership, and cross-over membership. Under recommendations provided by previous researchers [49], we have calibrated measurements into fuzzy sets using values ranging from 0 to 1, where 1 = full set membership, 0.5 = crossover point, and 0 = no full set membership. As proposed for the five-point Likert scales, the values of 4, 3, and 2 have been assumed to be the thresholds. We analyzed if any of the conditions were necessary for ensuring a high level of PUI and PB through OTAs, considering whether the presence (absence) of a condition, in all situations in which the outcome was present (absent). A condition qualifies as “necessary” if the respective consistency ratio is greater than 0.90. Table 2 shows that attitude, convenience, and time-saving are all considered necessary to reach a high level of PUI of travel through OTAs; this was evidenced through the consistency scores, which all are greater than 0.90. Analysis of Necessary Conditions for Predicting high levels of PB through OTAs indicate that PUI, attitude, communicability, convenience and time saving are all necessary to reach high rates of PB (Table 1). Once we have calibrated and assessed the necessary conditions, a truth table analysis was performed by implementing the FSQCA algorithm. The FSQCA offers three types of options (complex, parsimonious, intermediate) to be analyzed by the investigator. The complex approach includes all combinations of conditions using logical operations,
662
O. Labti and E. Belkadi
Table 1. Analysis of necessary conditions leading to intention and behavior to purchase travel using OTAs. PB Variables
PUI High
Low
Variables
High
Low
Consistency
Coverage
Consistency
Coverage
Consistency
Coverage
Consistency
PIV
0.710
0.933
0.338
0.151
PIV
0.695
0.885
0.495
Coverage
~PIV
0.353
0.611
0.850
0.174
0.500
~PIV
0.355
0.595
0.765
0.489
PUI
0.921
0.950
~PUI
0.129
0.349
0.291
0.102
ATT
0.938
0.864
0.495
0.174
0.857
0.787
~ ATT
0.103
0.349
0.615
0.793
ATT
0.951
0.903
0.478
0.154
COMN
0.898
0.847
0.518
0.186
~ATT
0.109
0.381
0.700
0.829
~ COMN
0.137
0.427
0.576
0.684
COMN
0.910
0.885
0.487
0.161
CONV
0.901
0.843
0.523
0.186
~COMN
0.137
0.440
0.652
0.712
~CONV
0.130
0.418
0.561
0.684
CONV
0.915
0.882
0.510
0.167
TS
0.903
0.823
0.582
0.202
~CONV
0.136
0.451
0.643
0.720
~TS
0.125
0.441
0.493
0.660
TS
0.910
0.856
0.575
0.184
~TS
0.132
0.478
0.550
0.676
Note: ATT = Attitude; COMN = Communicability; CONV = Convenience; PIV = Personal innovativeness; PUI = Purchase intention; PB = Purchase behaviour; TS: Time saving. Source: own research
which are made simpler with parsimonious and intermediate solutions that are relatively clearer to analyze. The intermediate solution, which belongs to the complex solutions and contains the parsimonious solutions, will be presented in this paper. The intermediate solution for reaching high PUI levels through OTAs consists of 3 different combinations of conditions and 2 solutions for high PB levels through OTAs (refer to Table 2). The intermediate solution for high PUI travel through OTAs provides overall coverage of 83%, and overall consistency equals 0.89. For the first and the third solution, consumers demonstrate a high PUI travel from OTAs when they have a high level of personal innovativeness or a high level of time-saving, combined at the same time with a high level of attitude communicability and convenience. This points out the critical role of attitude, communicability, and convenience while also identifying consumers who are most likely choosing to purchase travel through OTAs. These two solutions explain 59% and 78%, respectively, of consumers who have high PUI travel using OTAs. For Solution 2, high levels of personal innovativeness, attitude, convenience, and time-saving reflect the sufficiency of high PUI levels and accounts for 61% of the sample cases. The obtained findings within the FSQCA on the configurations for high levels toward PB of travel from OTAs present 81% of consumers. All the solutions and the overall solution indicate high consistency (0.94) that exceeds the threshold recommended (0.80). Solution 1 shows that consumers with high levels of attitude, communicability, convenience, and timesaving will have high levels of PB regardless if they have a high PUI and/or high personal innovativeness. This solution presents 79% of the empirical cases. The FSQCA supposes the presence of asymmetries among the parameters. Therefore, it’s useful to assess if the configurations conducting to the opposite results (low levels of PUI and PB) can be distinguished from those conducting to the result (high levels of PUI and PB). Our
Modeling Travelers Behavior Using FSQCA
663
analyses of the inverse of the outcomes provide two solutions for each outcome (Low levels of PUI and PB through OTAs). For both solutions, 1b and 2b, low levels of personal innovativeness, attitude, and communicability constitute sufficient conditions that lead to low PUI levels via OTAs. A low level of PUI from OTAs is the key condition driving low PB levels from OTAs. Table 2. Sufficiency analysis. Causal configurations for PUI
Coverage
Consistency
Causal configurations for PB
Coverage
Consistency
Soul1:PIV*ATT*COMN*CONV
0.599
0.925
Solu1:ATT*COMMUN*CONV*TS
0.798
0.940
Solu2: PIV*ATT*CONV*TS
0.616
0.927
Solu2:PIV*PUI*ATT*CONV*TS
0.506
0.995
Solu3: ATT*COMN*CONV*TS
0.783
0.894
Overall consistency
0.891
Overall consistency
0.941
Overall coverage
0.831
Overall coverage
0.818
Causal configurations for negation
Causal configurations for negation
Solu1b: ~ PIV* ~ ATT* ~ COMN* ~ TS
0.356
0.901
Solu1b: ~ PUI
0.713
0.871
Solu2b: ~ PIV* ~ ATT* ~ COMN * ~ CONV
0.398
0.917
Solu2b: ~ PIV* ~ PUI
0.738
0.886
Overall consistency
0.898
Overall consistency
0.852
Overall coverage
0.420
Overall coverage
0.810
“ ~” indicates the absence of a condition, and “*” denotes the combination of sufficiency conditions.
5 Discussion and Conclusion We have employed the FSQCA to explore the combinations of conditions that would contribute to PUI and the PB formation. The FSQCA results indicate the existence of a positive and strong relationship between PUI and PB. These findings expand the tourism management literature by proving that PUI has the potential to be translated into actual PB for the travel context. This achievement prompts two key concerns. First, an adequate amount of PUI can be converted into actual behavior. Second, the FSQCA conclusions highlight that attitude is a crucial condition of both the PUI and PB. A positive attitude towards online booking enables consumers to generate positive emotions resulting in an actual purchase. To be more specific, we have implemented the FSQCA to evaluate the necessary and multiple combinations of causal conditions to reach high levels of the PUI and PB of travel through OTAs. The FSQCA (asymmetric) analysis found that the independent study variables (attitude, convenience, time-saving) were all necessary to attain high PUI levels but were not sufficient on their own.
References 1. Fong, L.H.N., Lam, L.W., Law, R.: How locus of control shapes intention to reuse mobile apps for making hotel reservations: evidence from Chinese consumers. Tour. Manag. 61, 331–342 (2017). https://doi.org/10.1016/j.tourman.2017.03.002
664
O. Labti and E. Belkadi
2. Joo, Y.J., Park, S., Shin, E.K.: Students’ expectation, satisfaction, and continuance intention to use digital textbooks. Comput. Hum. Behav. 69, 83–90 (2017). https://doi.org/10.1016/j. chb.2016.12.025 3. Pappas, I.O., Giannakos, M.N., Sampson, D.G.: Fuzzy set analysis as a means to understand users of 21st-century learning systems: the case of mobile learning and reflections on learning analytics research. Comput. Hum. Behav. 92, 646–659 (2019). https://doi.org/10.1016/j.chb. 2017.10.010 4. Ortiz de Guinea, A., Raymond, L.: Enabling innovation in the face of uncertainty through IT ambidexterity: a fuzzy set qualitative comparative analysis of industrial service SMEs. Int. J. Inf. Manag.50, 244–260. https://doi.org/10.1016/j.ijinfomgt.2019.05.007 5. Fishbein, M., Ajzen, I.: Attitude, intention and behavior: an introduction to theory and research reading. J. Bus. Ventur. (1975) 6. Davis, F.D.: Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Q. Manag. Inf. Syst. 13, 319–340 (1989). https://doi.org/10.2307/249008 7. Venkatesh, V., Morris, M.G., Davis, G.B., Davis, F.D.: User acceptance of information technology: toward a unified view. MIS Q. Manag. Inf. Syst. 27, 425–478 (2003). https://doi.org/ 10.2307/30036540 8. Pourfakhimi, S., Duncan, T., Coetzee, W.J.L.: Electronic word of mouth in tourism and hospitality consumer behaviour: state of the art. Tour. Rev. 75, 637–661 (2020). https://doi. org/10.1108/TR-01-2019-0019 9. Jensen, J.M.: Travellers’ intentions to purchase travel products online: the role of shopping orientation. In: Advances in Tourism Economics: New Developments (2009). https://doi.org/ 10.1007/978-3-7908-2124-6_13 10. Amaro, S., Duarte, P.: An integrative model of consumers’ intentions to purchase travel online. Tour. Manag. 46, 64–79 (2015). https://doi.org/10.1016/j.tourman.2014.06.006 11. Ukpabi, D.C., Karjaluoto, H.: Consumers’ acceptance of information and communications technology in tourism: a review. Telemat. Inform. 34, 618–644 (2017) 12. Chawla, M., Khan, M., Pandey, A.: Online buying behaviour: ABrief review and update. J. Manag. Res. 9, (2015) 13. Talwar, S., Dhir, A., Kaur, P., Mäntymäki, M.: Why do people purchase from online travel agencies (OTAs)? A consumption values perspective. Int. J. Hosp. Manag. 88, 102534 (2020). https://doi.org/10.1016/j.ijhm.2020.102534 14. Escobar-Rodríguez, T., Carvajal-Trujillo, E.: Online purchasing tickets for low cost carriers: an application of the unified theory of acceptance and use of technology (UTAUT) model. Tour. Manag. 43, 70–88 (2014). https://doi.org/10.1016/j.tourman.2014.01.017 15. He, D., Lu, Y., Zhou, D.: Empirical study of consumers’ purchase intentions in C2C electronic commerce. Tsinghua Sci. Technol. 13, 287–292 (2008). https://doi.org/10.1016/S1007-021 4(08)70046-4 16. Limayem, M., Khalifa, M., Frini, A.: What makes consumers buy from Internet? A longitudinal study of online shopping. IEEE Trans. Syst. Man, Cybern. - Part A Syst. Hum. 30(4), 421–432 (2000). https://doi.org/10.1109/3468.852436 17. Peña-García, N., Gil-Saura, I., Rodríguez-Orejuela, A., Siqueira-Junior, J.R.: Purchase intention and purchase behavior online: a cross-cultural approach. Heliyon 6, e04284 (2020). https://doi.org/10.1016/j.heliyon.2020.e04284 18. Bhatti, A.: Consumer purchase intention effect on online shopping behavior with the moderating role of attitude. Int. J. Acad. Manag. Sci. Res. IJAMSR (2018) 19. Nguyen, T.N., Lobo, A., Greenland, S.: Pro-environmental purchase behaviour: the role of consumers’ biospheric values. J. Retail. Consum. Serv. 33, 98–108 (2016). https://doi.org/10. 1016/j.jretconser.2016.08.010
Modeling Travelers Behavior Using FSQCA
665
20. Beuckels, E., Hudders, L.: An experimental study to investigate the impact of image interactivity on the perception of luxury in an online shopping context. J. Retail. Consum. Serv. 33, 135–142 (2016). https://doi.org/10.1016/j.jretconser.2016.08.014 21. Wei, C.L., Ho, C.T.: Exploring signaling roles of service providers’ reputation and competence in influencing perceptions of service quality and outsourcing intentions. J. Organ. End User Comput. 31, 86 (2019). https://doi.org/10.4018/JOEUC.2019010105 22. Sundström, M., Hjelm-Lidholm, S., Radon, A.: Clicking the boredom away – Exploring impulse fashion buying behavior online. J. Retail. Consum. Serv. 47, 150–156 (2019). https:// doi.org/10.1016/j.jretconser.2018.11.006 23. Ayo, C.K., Oni, A.A., Adewoye, O.J., Eweoya, I.O.: E-banking users’ behaviour: e-service quality, attitude, and customer satisfaction. Int. J. Bank Mark. 34(3), 347–367. https://doi. org/10.1108/IJBM-12-2014-0175 24. Lee, M.Y., Johnson, K.K.P.: Exploring differences between Internet apparel purchasers, browsers and non-purchasers. J. Fash. Mark. Manag. 6(2), 146–157 (2002). https://doi.org/ 10.1108/13612020210429485 25. Lim, Y.J., Osman, A., Salahuddin, S.N., Romle, A.R., Abdullah, S.: Factors influencing online shopping behavior: the mediating role of purchase intention. Procedia Econ. Finan. 35, 401–410 (2016). https://doi.org/10.1016/s2212-5671(16)00050-2 26. Bhatti, A., Saad, S., Gbadebo, S.: Convenience risk, product risk, and perceived risk influence on online shopping: moderating effect of attitude. Int. J. Bus. Manag. (2018) 27. Huang, D., Li, Z., Mou, J., Liu, X.: Effects of flow on young Chinese consumers’ purchase intention: a study of e-servicescape in hotel booking context. Inf. Technol. Tour. 17(2), 203– 228 (2017). https://doi.org/10.1007/s40558-016-0073-0 28. Amaro, S., Duarte, P., Henriques, C.: Travelers’ use of social media: a clustering approach. Ann. Tour. Res. 59, 1–15 (2016). https://doi.org/10.1016/j.annals.2016.03.007 29. Vijayasarathy, L.R.: Predicting consumer intentions to use on-line shopping: the case for an augmented technology acceptance model. Inf. Manage. 41, 747–762 (2004). https://doi.org/ 10.1016/j.im.2003.08.011 30. Fazal-e-Hasan, S.M., Amrollahi, A., Mortimer, G., Adapa, S., Balaji, M.S.: A multi-method approach to examining consumer intentions to use smart retail technology. Comput. Hum. Behav. 117, 106622 (2020). https://doi.org/10.1016/j.chb.2020.106622 31. Christou, E., Kassianidis, P.: Consumer’s perceptions and adoption of online buying for travel products. J. Travel Tour. Mark. 12, 93–107 (2002). https://doi.org/10.1300/J073v12n04_06 32. Tan, G.W.H., Ooi, K.B.: Gender and age: do they really moderate mobile tourism shopping behavior? Telemat. Inform. 35, 1617–1642 (2018). https://doi.org/10.1016/j.tele.2018.04.009 33. Escobar-Rodríguez, T., Bonsón-Fernández, R.: Analysing online purchase intention in Spain: fashion e-commerce. IseB 15(3), 599–622 (2016). https://doi.org/10.1007/s10257016-0319-6 34. Dani, N.J.: A study on consumers’ attitude towards online shopping. Int. J. Res. Manag. Bus. Stud. 4, 42–46 (2017) 35. Khan, S., Khan, M.A.: Measuring service convenience of e-retailers: an exploratory study in India. Int. J. Bus. Forecast. Mark. Intell. 4, 353–367 (2018). https://doi.org/10.1504/ijbfmi. 2018.10012173 36. Roy, G., Datta, B., Mukherjee, S.: Role of electronic word-of-mouth content and valence in influencing online purchase behavior. J. Mark. Commun. 25, 1–24 (2019). https://doi.org/10. 1080/13527266.2018.1497681 37. Pham, Q.T., Tran, X.P., Misra, S., Maskeliunas, R., Damaševiˇcius, R.: Relationship between convenience, perceived value, and repurchase intention in online shopping in Vietnam. Sustain. Switz. 10, 156 (2018). https://doi.org/10.3390/su10010156 38. Kaur, M.: Shopping orientations towards online purchase intention in the online apparel purchase environment. Int. J. Adv. Res. Innov. (2018)
666
O. Labti and E. Belkadi
39. Akram, M.S.: Drivers and barriers to online shopping in a newly digitalized society. TEM J. (2018). https://doi.org/10.18421/TEM71-14 40. Khan, A. Khan, S.: Purchasing grocery online in a nonmetro city: investigating the role of convenience, security, and variety. J. Public Aff. (2020). https://doi.org/10.1002/pa.2497 41. Pappas, I.O: User experience in personalized online shopping: a fuzzy-set analysis. Eur. J. Mark. (2018). https://doi.org/10.1108/EJM-10-2017-0707 42. San Martín, H., Herrero, Á.: Influence of the user’s psychological factors on the online purchase intention in rural tourism: integrating innovativeness to the UTAUT framework. Tour. Manag. 33, 341–350 (2012). https://doi.org/10.1016/j.tourman.2011.04.003 43. Kamarulzaman, Y.: Adoption of travel e-shopping in the UK. Int. J. Retail Distrib. Manag. 35, 703–719 (2007). https://doi.org/10.1108/09590550710773255 44. Lee, H.Y., Qu, H., Kim, Y.S.: A study of the impact of personal innovativeness on online travel shopping behavior - a case study of Korean travelers. Tour. Manag. 28, 886–897 (2007). https:// doi.org/10.1016/j.tourman.2006.04.013 45. Kim, W.G., Ma, X., Kim, D.J.: Determinants of Chinese hotel customers’ e-satisfaction and purchase intentions. Tour. Manag. 27, 890–900 (2006). https://doi.org/10.1016/j.tourman. 2005.05.010 46. Heung, V.C.S.: Internet usage by international travellers: reasons and barriers. Int. J. Contemp. Hosp. Manag. 15, 370–378 (2003). https://doi.org/10.1108/09596110310496015 47. Huarng, K.H., Hui-Kuang Yu, T., Rodriguez-Garcia, M.: Qualitative analysis of housing demand using Google trends data. Econ. Res.-Ekon. Istraz (2020). https://doi.org/10.1080/ 1331677X.2018.1547205 48. Rubinson, C.: Contradictions in fsQCA. Qual. Quant. 47, 2847–2867 (2013). https://doi.org/ 10.1007/s11135-012-9694-3 49. Pappas, I.O., Kourouthanassis, P.E., Giannakos, M.N., Lekakos, G.: The interplay of online shopping motivations and experiential factors on personalized e-commerce: a complexity theory approach. Telemat. Inform. 35, 730–742 (2017). https://doi.org/10.1016/j.tele.2016. 08.021
AHP Approach for Selecting Adequate Big Data Analytics Platform Naima EL Haoud and Oumaima Hali(B) Laboratoire Ingénierie Scientifique des Organisations (ISO), Université Hassan II- ENCG Casablanca-Maroc, Casablanca, Morocco
Abstract. A big data analytics (BDA) platform is a vital investment that may major impact a company’s competitiveness and performance in the future. Based on the analytic hierarchy approach (AHP), this study offers a thorough methodology for identifying an appropriate BDA platform. The framework may be used to create BDA selection objectives in a systematic way to support an organization’s business goals and strategies, identify important traits, and offer a consistent evaluation standard for group decision-making. Simpleness, rationality, comprehensibility, excellent computing efficiency, and the ability to assess each alternative’s relative performance in a simple mathematical form are all characteristics of the AHP techniques. The model is tested by comparing seven BDA platforms. Keywords: Big data · Platform selection · Decision-making · Analytical Hierarchy Process (AHP)
1 Introduction Day by day, Big Data (BD) are continually growing, and they are a source of value creation and competitive advantage for companies; they support modern medicine, solve scientific problems, etc. [1]. BDA is the area where we apply advanced analytic techniques to BD sets. The term “analytics” refers to logical analysis science to transform data into actions for decisionmaking or solving problems [2]. With the evolution of storage technology and software, data analysis is improving [3]. Although due to BD characteristics, namely great variety, high velocity, and a high volume of data, Traditional methods are not able to extract the right information from the data sets. That’s why we should continue looking for new methods that could pull out useful information from massive data. Researchers are working more than ever to develop original techniques for analyzing large data sets, which has conduct to the development of many different algorithms and platforms. The platform’s capability to adjust to the high demands of data processing plays a crucial role in deciding whether it is appropriate to construct analysis-based resolutions on a specific platform [4]. BD is making significant changes in data analysis platforms. The selection of the right platforms becomes an important decision for the users because of the data’s nature. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 667–675, 2022. https://doi.org/10.1007/978-3-030-96308-8_62
668
N. EL Haoud and O. Hali
The document has three principal contributions to the available literature. Firstly, it helped to select the most adapted BDA platform. Secondly, it proposed an AHP model. This essential but straightforward evaluation method can help businesses determine the most suitable BDA platform. Finally, This approach is also flexible enough to incorporate extra attributes or decision-makers in the evaluation. We have examined in detail the different data analysis platforms. The following paper is structured as follows: Section 2 reviews the literature related to BD and BDA and then defines this paper’s main contributions. Section 3 describes the methodology used for evaluating the BDA platform. While the results are presented in Sect. 4. In the last section, the conclusion is provided.
2 Literature Review Lee et al. [5] provided a timely remark on MapReduce studies’ status and related work to improve and enhance the MapReduce framework. They primarily concentrated on overcoming the framework’s constraints. The goal of this study is to help the database and open source communities better understand the MapReduce framework’s technical characteristics. They describe the MapReduce structure and explore its inherent benefits and drawbacks in this survey. They represent the optimization techniques that have recently been described in the literature. They also go through some of the outstanding concerns and challenges that parallel data analysis using MapReduce has brought up. Elgendy and Elragal [6] also discussed some of the different methods and analytical tools that can be applied to BD and the possibilities offered by the application of BDA in various decision domains. They concluded that BDA may want to be utilized to leverage commercial enterprise trade and enhance decision-making by means of making use of superior BDA strategies and revealing hidden statistics and precious knowledge. Lnˇeniˇcka showed and determined the exploitation of an AHP model for the BDA platform selection, which may be used by businesses, public sector institutions as well as citizens to solve multiple criteria decision-making problems that aids them to find patterns, relationships, and valuable information in their BD, make sense of them and to take responsive action. He proposed an AHP model for the BDA platform selection based on the three defined use cases. Therefore, some of the variety of BDA systems are reviewed in detail, and their purposes and opportunities provided in more than a few BD life cycle phases are portrayed. He offers added value by means of a classification of existing BDA platforms based on the BD life cycle [7]. Singh & Reddy analyze different platforms existing for executing BDA. Various studies of BDA platforms assess each of these platforms’ benefits and weaknesses based on different metrics like scalability, fault tolerance, data I/O rate, data size supported, iterative task support, and real-time processing [4]. Bhargava et al. test three BD platforms’ performance: Map Reduce, Spark, and Yarn, give the results gotten through, and perform these systems using different settings at various TPC-H benchmarking scales [8]. Ilieva. G compares recurrently used Cloud-based resources for BD, focusing on their particular characteristics. This comparison will be a stepping stone in creating a fuzzy multi-criteria system for evaluating cloud platforms for operating, analyzing, and deploying BD [9].
AHP Approach for Selecting Adequate Big Data Analytics Platform
669
Uddin et al. respect various criteria to select analytics platforms for public sector institutions and business and purpose. Multiple criteria decision making (MCDM) is frequently used to rank one or more alternatives from a finite set of existing options with respect to various criteria. They analyze and determine the main criteria for the platform selection problem [10].
3 Research Methodology 3.1 Criteria for Comparing Big Data Analytics Platforms From the literature review, we select the criteria to compare the BDA platforms. They are scaled into three categories. a. System/platform • Scalability – This means that platforms can process BD and adapt to the rise of processing demand. • Data I/O performance – indicates that the rate at which the data is passed to/ from a peripheral device. • Fault tolerance. b. Application/Algorithm • Real-time processing –means the ability to execute BD and produce the results taking time into consideration • Data size supported –implies the volume of the dataset that a platform can run and manage efficiently. • ˙Iterative tasks support –means the capability of a platform to support repetitive tasks efficiently. c. Cost and policy perspective • Cost - is the price that can make a customer spend to buy what he wants. • Sustainability – means the cost related to the configuration, adjustment, maintenance, and quantity of data required to manage in the present and the future by organizations. • Policy and regulation – means the policies related to the use of the selected solution like the law conflicts, privacy policy, and the limitations of the use. 3.2 Big Data Analytics Platforms Selection Apache Hadoop is a free software platform for storing and analyzing large amounts of data on commodity hardware clusters. – It is a project that lets you manage your BD; it enables you to deal with high volumes of data that are changing rapidly and a wide variety of data.
670
N. EL Haoud and O. Hali
– It is not just a single open-source project; Hadoop is a whole ecosystem of projects that work together to provide a standard set of services • Allows huge distributed computations – key attributes : • • • • •
Redundant and reliable (no data loss) Extremely powerful Batch processing centric Easy to program distributed applications Runs on commodity hardware
Apache Spark: is the open standard for flexible in-memory data processing that allows batch, real-time, and advanced analytics on the Apache Hadoop platform. Map R: Technologies is a distributed data platform for artificial intelligence (AI) and analytics providers that enables enterprises to apply data modeling to their business processes to increase revenue, reducing costs, and mitigating risks. MapR is built for organizations with demanding production needs. Cloudera: Cloudera Data Platform (CDP) is a platform that allows firms to analyze data using self-service analytical tools within hybrid and multi-cloud environments. It enables companies to construct Data Lakes on the Cloud rapidly in order to make the most of all their data. Oracle big data: One of the most frequently used and dependable relational database management systems is Oracle BD. Oracle Corporation is a software company that develops and sells object-relational database management systems. It’s one of the most established SQL databases available. Azure is a service provided by Microsoft. The Azure cloud platform consists of more than 200 products and cloud services that are meant to assist you in developing innovative solutions, overcoming current difficulties, and influencing the future.. With your preferred tools and frameworks, build, operate, and manage apps across various clouds, on-premises, and at the edge with your chosen tools and frameworks. Hortonworks data platform: The Hadoop Distributed File System (HDFS), Hadoop MapReduce, Apache Pig, Apache Hive, Apache HBase, and Apache ZooKeeper, as well as the Hortonworks Data Platform (HDP), are all created by Hortonworks. Large volumes of data are analyzed, stored, and manipulated using this platform. 3.3 Evaluation Index Method The hierarchy index system for BDA selection established is summarized in Fig. 1. The goal of the model, which is to select the optimal platform, lies at the top level. The first level contains the criteria that must be satisfied to accomplish the overall goal. The general criteria level involved three significant criteria: System/Platform, Application/Algorithm, Cost/and policy perspective. Each of these three criteria needed further decomposition into specific elements.
AHP Approach for Selecting Adequate Big Data Analytics Platform
671
Fig. 1. The hierarchy index system for BDA selection
3.4 The AHP Method AHP is recommended for solving complex problems with a multi-criteria decision. The strength of this approach is that it organizes factors in a structured way while providing a relatively simple solution to decision making problems. This approach permits to deconstruct a problem in a logical way by moving from a higher level to a lower level until a simple comparison is reached for each pair of criteria, after which one can go back to the higher level for decision making. Generally, AHP is used when we have multiple variables or criteria; those variables might be qualitative or quantitative. It was applied in various applications because of its strength and strict mathematical background. The steps of this method are explained below: (1) Organizing the decision problem into a hierarchical model. In this step, we decompose the decision problem into elements that are essential for the decision. (2) Constructing judgment matrix A. (3) Using the eigenvalue method to determine the relative weights of decision elements. λmax , the largest eigenvalue of A can be obtained. And W is its right eigenvector. (4) Checking the consistency of the judgment matrix. The following condition must be satisfied for a consistency check: AW = λmax W
(1)
The equation to calculate the consistency check index C.I is expressed as follows: CI =
λmax − n n−1
(2)
672
N. EL Haoud and O. Hali
The equation to calculate the consistency ratio CR is expressed as follows: C.R =
C.I R.I
(3)
RI is the average random consistency index, which is independent of attribute values and varies for different attributes. If CR ≤ 0:10, the level of inconsistency is considered acceptable. Otherwise, the decision-makers should be asked to review and revise their elicited preferences. The conceptual framework of the proposed evaluation procedure is shown in Fig. 2 [11].
Fig. 2. Flowchart of the evaluation procedure [11].
4 Results The handled decision problem aims to determine the most suitable platform for BDA by AHP. After the aim has been determined, the criteria were picked for this aim and were
AHP Approach for Selecting Adequate Big Data Analytics Platform
673
incorporated into the hierarchic structure. As decision alternatives, Apache Hadoop, Apache Spark, Map R converged Data Analytics, Cloudera, Oracle BDA, Microsoft Azure, and Hortonworks Data Platform were determined. A group of decision-makers was created to build the comparison matrices in order to obtain the factor and subfactor weights. When we compared all sub-criteria at the corresponding level concerning the upper-level criteria, the corresponding level’s comparison matrices were derived. After using the AHP procedure according to the methodology described above, the obtained results are calculated based on the pairwise comparaisons (Tables 1 and 2). Table 1 contains the criteria weights, while Table 2 shows the final ranking of platforms considering all the criteria. The overall synthesis shows that Apache Hadoop outranks the other seven alternatives. Therefore, Apache Hadoop should be selected as the optimal platform that can be used in BDA with the priority of 0.391. Table 1. The priorities of the sub-criteria First level index
Weight
Second level index
Weight
System platform
0.178
Scalability
0.528
Data I/O performance
0.333
Fault tolerance
0.140
Real time processing
0.089
Application/algorithm
0.070
Cost and policy perspective
0.751
Data size supported
0.588
Iterative tasks support
0.323
Cost
0.770
Sustainability
0.068
Policy and regulation
0.162
Table 2. The AHP model final synthesis result Alternative
Priority
Ranking
Apache hadoop
0.391
1
Cloudera
0.195
2
Map R converged data
0.138
3
Apache spark
0.083
4
Oracle big data analytics
0.073
5
Microsoft azure
0.070
6
Hortonworks data platform
0.050
7
λmax = 7.348, CI = 0.058, CR = 0.04
674
N. EL Haoud and O. Hali
5 Discussion of Results The obtained results indicate that among the Platforms considered for BDA, the most suitable platform is Apache Hadoop, with Cloudera coming in a close second. Results of the performed analysis show that the least suitable platform for the defined task is Hortonworks. In the case of Hortonworks, its low score was caused mostly by the Fault tolerance factor, scalability, iterative tasks support, cost, and policy and regulation.
6 Conclusions and Future Work In this position paper, we present a MCDM method, namely, AHP, to select the optimal platform for BDA. Due to BDA platform selection’s complexity, this research used the AHP approach to evaluate the optimal BDA platform among multiple platforms; AHP is a dynamic tool that can assess and structure planning problems involving several criteria. The selection approach that uses AHP has various benefits; first, pairwise prioritization of requirements as an input, second, consistency checking, and third, relative prioritization services over linear prioritization. This paper gives a basic calculation process by using the AHP approach to select the optimal BDA platform and analyzes the criteria and weights for BDA platform selection. The calculation result shows BDA platform alternative “Apache Hadoop” should be selected as the optimal one due to its highest ranking score (0.391). Applying the AHP approach to the validation of BDA platform selection is reasonably effective, practical, and robust. This method can support decision-makers in BDA platform selection. In the future study, we will use other MCDM methods such as fuzzy AHP [12, 13] or A’WOT [14] method for BDA platform selection. Meanwhile, we can also compare the results obtained from different MCDM approaches.
References 1. Tien, J.M.: Big data: unleashing ınformation. J. Syst. Sci. Syst. Eng. 22(2), 127–51 (2013). https://doi.org/10.1007/s11518-013-5219-4. 2. Liberatore, M.J., Wenhong, L.: The analytics movement: implications for operations research. Interfaces 40(4), 313–24 (2010). https://doi.org/10.1287/inte.1100.0502 3. Lake, P., Drake, R.: Information Systems Management in the Big Data Era. Vol. 10. Springer International Publishing (2014). https://doi.org/10.1007/978-3-319-13503-8 4. Singh, D., Reddy, C.K.: A survey on platforms for big data analytics. J. Big Data 2(1) (2014). https://doi.org/10.1186/s40537-014-0008-6 5. Lee, K.-H., et al.: Parallel data processing with MapReduce: a survey. ACM SIGMOD Rec. 40(4), 11–20 (2012) 6. Elgendy, N., Elragal, A.: Big data analytics: a literature review paper. In: Perner, P. (ed.) Advances in Data Mining. Applications and Theoretical Aspects, pp. 214–227. Springer International Publishing, Cham (2014). https://doi.org/10.1007/978-3-319-08976-8_16 7. Lnenicka, M.: AHP model for the big data analytics platform selection. Acta Inf. Pragensia 4(2), 108–121 (2015). https://doi.org/10.18267/j.aip.64 8. Bhargava, S., Drdinesh, G., Keswani, B.: Performance Comparison of Big Data Analytics Platforms. Int. J. Eng. Appl. Manag. Sci. Paradigms (IJEAM), ISSN, 2320–6608. (2019) 9. Ilieva, G.: Decision analysis for big data platform selection. Eng. Sci., LVI1(2) (2019). https:// doi.org/10.7546/EngSci.LVI.19.02.01
AHP Approach for Selecting Adequate Big Data Analytics Platform
675
10. Uddin, S., et al.: A fuzzy TOPSIS approach for big data analytics platform selection. J. Adv. Comput. Eng. Technol. 5(1), 49–56 (2019) 11. Bourhim, E.M., Cherkaoui, A.: Selection of optimal game engine by using AHP approach for virtual reality fire safety training. In: Abraham, A., Cherukuri, A.K., Melin, P., Gandhi, N. (eds.) Intelligent Systems Design and Applications: 18th International Conference on Intelligent Systems Design and Applications (ISDA 2018) held in Vellore, India, December 6-8, 2018, Volume 1, pp. 955–966. Springer International Publishing, Cham (2020). https:// doi.org/10.1007/978-3-030-16657-1_89 12. Mostafa Bourhim, E.L., Cherkaoui, A.: Efficacy of virtual reality for studying people’s preevacuation behavior under fire. Int. J. Human-Comput. Stud. 142, 102484 (2020) 13. Bourhim, E.M., Cherkaoui, A.: Usability evaluation of virtual reality-based fire training simulator using a combined AHP and fuzzy comprehensive evaluation approach. In: Jeena Jacob, I., Shanmugam, S.K., Piramuthu, S., Falkowski-Gilski, P. (eds.) Data Intelligence and Cognitive Informatics: Proceedings of ICDICI 2020, pp. 923–931. Springer Singapore, Singapore (2021). https://doi.org/10.1007/978-981-15-8530-2_73 14. Bourhim, E.M., Cherkaoui, A.: Exploring the potential of virtual reality in fire training research using A’WOT hybrid method. In: Thampi, S.M., Trajkovic, L., Sushmita Mitra, P., Nagabhushan, E.-S., El-Alfy, Z.B., Mishra, D. (eds.) Intelligent Systems, Technologies and Applications: Proceedings of Fifth ISTA 2019, India, pp. 157–167. Springer Singapore, Singapore (2020). https://doi.org/10.1007/978-981-15-3914-5_12
Combining Bert Representation and POS Tagger for Arabic Word Sense Disambiguation Rakia Saidi1(B) and Fethi Jarray1,2 1
2
LIMTIC Laboratory, UTM University, Skudai, Tunisia [email protected] Higher Institute of Computer Science of Medenine, Medinine, Tunisia [email protected]
Abstract. In this paper, we address the problem of Word Sense Disambiguation of Arabic language where the objective is to determine an ambiguous word’s meaning. We model the problem as a supervised sequence to sequence learning. We propose recurrent neural network models, BERT based models and combined POS-BERT models for Arabic word sense Disambiguation. We achieve 96.33% accuracy with the POS-BERT system. Keywords: Word Sense Disambiguation · Arabic language · Deep learning · Supervised learning · Recurrent Neural Network · Part of speech tagging · BERT
1
Introduction
Word Sense Disambiguation (WSD) is a sub field of natural language processing (NLP). It is the process of determining the sense of a word in its context. Since natural language is inherently ambiguous, a given word can Alian several possible meanings, hence the difficulty of the task. WSD can be encountered in real life for several types of applications, such as machine translation, information retrieval, lexicography, knowledge extraction, semantic interpretation, etc. A wide variety of traditional machine learning techniques have been proposed to train a classifier on manually annotated corpora. Recently, deep neural networks (DNN) have shown incredible capabilities and have revolutionized the artificial intelligence for most tasks. In particular in the field of NLP, DNN outperforms traditional learning methods. In this paper we present three contributions: First, we propose Recurrent Neural Network architectures for WSD. Second, we design a BERT based model for AWSD. Third, we strengthen the BERT model with POS tagger since the sense of a word may depend on its categorical class. The rest of paper is organized as the following: Sect. 2 presents the state of the art. Section 3 explains our approaches for Arabic WSD. Section 4 discusses the numerical results. We conclude this paper with a summary of our contributions and we mention some future extensions. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 676–685, 2022. https://doi.org/10.1007/978-3-030-96308-8_63
BERT for Arabic Word Sense Disambiguation
2
677
Related Work
The Arabic WSD (AWSD) approaches can be classified into six categories: Lesk algorithm based approaches [2], information retrieval based approaches [8,23–25, 27], graph-based approaches [3], metaheuristic based approaches [5,11], Machine Learning (ML) based approaches [1,4,8,14–18,24] and recently Deep Learning approaches (DNN) [19]. Concerning the DNN based approaches, El Razzez et al. [19] presented an Arabic gloss-based WSD technique. They used the Bidirectional Encoder Representation from Transformers (BERT) to build two models for Arabic WSD. This method achieves an F1-score of 89%. Table 1 summaries the accuracy and the used corpus for each method. Table 1. State of the art approaches of AWSD. Method
Category
Dataset
Predicting accuracy
Zouaghi et al. [2]
Lesk algorithm
Arabic WordNet
78%
Bouhriz et al. [6]
Information retrieval Arabic WordNet
74%
Abood et al. [20]
Information retrieval Quranic translation (4 words) 77%
Abderrahim et al. [21] Information retrieval Arabic WordNet
Unspecified
Bounhas et al. [22]
Information retrieval encyclopedic books of Arabic stories
Unspecified
Bounhas et al. [23]
Information retrieval dictionaries of Hadith
Unspecified
Merhbene et al. [3]
Graph
Arabic WordNet
83%
Menai [5]
Metaheuristic
Unspecified
79%
Elmougy et al. [1]
ML
limited annotated corpus
76%
El-Gedawy [4]
ML
Arabic WordNet
86%
Alkhaltan et al. [8]
ML
Arabic WordNet
SkipGram = 82.17%, Glove = 71.73%
Hadni et al. [14]
ML
Arabic WordNet
74.6%
El-Gamml et al. [15]
ML
Five arabic words
96.2%
Merhbene et al. [16]
ML
Samples annotated
54.70%
Laatar et al. [17]
ML
Historical Arabic Dictionary Corpus
SkipGram = 51.52%, Cbow = 50.25%
M. Eid et al. [18]
ML
Five arabic nouns
88.3%
Pinto et al. [24]
ML
English and arabic texts
Unspecified
El Razzez et al. [19]
DNN
Context-gloss benchmark
96%
We note that the best accuracy score depends on both the dimension of the data set and the corpus. For example a five Arabic words [15], the accuracy is about 96.2% and for an automatically created context-gloss benchmark contextgloss benchmark [19], the accuracy is 96%. Our contribution in this manuscript fall into the deep neural networks techniques. More precisely we will propose different variants of recurrent neural network (RNN) and BERT based model.
678
3 3.1
R. Saidi and F. Jarray
Materials and Methods Corpus Details
In this paper we use the publicly available Arabic WordNet (AWN) corpus [32]. AWN is a lexical resource for modern standard Arabic. It was constructed according to the material of Princeton WordNet. It is structured around elements called Synsets, which are a list of synonyms and pointers linking it to other synsets. Currently, this corpus has 23,481 words grouped into 11,269 synsets. One or more synsets may contain a common word. The senses of a word are related by the synsets in the AWN to which it belongs. Some words synsets (i.e. senses) extracted from AWN are presented in Fig. 1.
Fig. 1. Examples of AWN synsets: (a)
, (b)
, and (c)
As we mentioned above, in AWN corpus, each word has a list of meanings, has four meanings (Fig. 1a), the word for example in Fig. 1 the word has only two meanings (Fig. 1b) and the word has three meanings (Fig. 1c). So the main task is how to predict the desired meaning. All proposed models are coded in python programming language and are available at Kaggle upon request. 3.2
RNN Based Models for AWSD
The vanilla RNN learning model is a sequence-to-sequence learning where the input sequence represents the words and output sequence represent the senses. Each sentence is tokenized into words and entirely feeded into the neural network. We used the basic encoded word embedding and a random initialization for unknown words. We compared different recurrent architectures such as LSTM, bi-LSTM, and GRU. The experiments were carried out based on the pre-trained
BERT for Arabic Word Sense Disambiguation
679
Arabic embedding. Figure 2 depicts the common architecture of the different RNN models.
Fig. 2. RNN based models for AWSD.
We briefly explain the different extensions of the basic RNN. Long Short-Term Memory (LSTM): This type of neural network was designed to avoid long term dependencies issue of the basic RNN. It remembers information for a long period of time. It can learn what to store in the long-term state, what to forget, and what to read in it. The Bidirectional long short-term memory (bi-LSTM): Bidirectional LSTM is an extension of classic LSTM that can improve model performance for sequence classification issues. When the whole sequence is available, the BiLSTM merges two LSTMs. One on the input sequence and one on a reverse copy of the input sequence. Gated Recurrent Unit (GRU): The GRU cell is a simplified version of the LSTM cell, and it is more and more popular due to its higher performance and smaller size. We note that both LSTM and GRU cells have contributed to the success of recurrent neural networks because they are well suited for classifying, processing and predicting sequences of variable lengths. They train the models using back propagation, and avoid the vanish explode issue in RNN. Already for our system, we get the best results with the GRU model.
680
3.3
R. Saidi and F. Jarray
BERT Based Models for AWSD
Recurrent neural networks (RNN) and various extension such as LSTM and GRU suffer from a vanishing gradient problem and lack of parallelism because they are sequential models in nature. In order to address these problems, Vaswani et al. [25] proposed a new neural network architecture called Transformer, which does not use any recurrence. It is essentially based on two important changes: (1) a new mechanism of attention called “multi-head attention”, and (2) a new way to encode the position of a word in a sequence. Delvin et al. [26] introduced the BERT model which is a multi-layer bidirectional Transformer encoder. It is designed to condition both left and right background in all layers to pretrain deep bidirectional representations from unlabeled texts. BERT models are delivered in many versions depending on the size, the supported languages and the application domain. Concerning the Arabic language, the mainly available BERT models are AraBERT [27], ARBERT [28], Arabic BERT [29] and the multilingual mBERT [30]. As second contribution we build an AWSD system based on each BERT model and we assess their performance as depicted in Fig. 3.
Fig. 3. BERT based models for AWSD
3.4
Integrating POS Tagger and BERT Representation Models for AWSD
The Part of Speech tagging consists in assigning a lexical category also known as a part of speech (noun, verb, etc.) to each word in a sentence [30]. In this contribution we freeze the learning of POS tagger and we use the Stanford Arabic Part
BERT for Arabic Word Sense Disambiguation
681
of Speech Tagger [32]. However, in future research, we plan to simultaneously train the POS tagger and the AWSD classifier. Our integrated architecture can be describe as follows: We feed the input sentence or text into the BERT network. BERT produces contextualized word embeddings for all input tokens in our sentence. Simultaneously, we feed the same sentence to an Arabic POS tagger to determine the grammatical category for each token. Finally we feed the inputs of the POS tagger and BERT to a fully connected layer to output the meaning of ambiguous words (see Fig. 4).
Fig. 4. POS-BERT architecture for AWSD
4
Results and Discussion
Open-Source Software and Codes. This research relies heavily on open source software. As a programming language, Python 3.8 is used. The libraries, including Pandas, Numpy, lxml, and Scikit-Learn are used for prepossessing the data and for data management. TensorFlow and Keras are deep-learning systems employed. For each RNN, different codes are developed. All figures are generated through Matplotlib. Hyper-parameters Setting. We randomly select 70% of the dataset as a training set, 15% as a test set and the remaining 15% as a validation set. In all the experiments below, the ratio remains the same. And for the record, all the results are obtained on the test set. We used Grid Search strategies to find the
682
R. Saidi and F. Jarray
best combination of parameters, we set the optimal sizes for word embedding, batch-size, and number of epoch, respectively 300, 64 and 50 to all our models. Table 2 displays the results of different models.
Table 2. Results of our proposed RNN models for AWSD Method
Predicting accuracy
Vanilla RNN 83.08% LSTM
86.87%
Bi-LSTM
88.95%
GRU
91.02%
AraBERT
92.14%
mBERT
94.50%
ArabicBERT 94.85% POS-BERT
96.33%
The number of epochs is a major parameter for training a neural network since the objective function is iteratively improved. For all models it seems that 50 epochs is sufficient to reach the optimum. Another parameter that influences the accuracy of the models is the size of the batch-size. The cross validation experiments showed that if the batch size is too small like 28 or too large such as 128, the predicting accuracy would be bad. This issue is fixed by setting the size of this parameter to 64. According to Table 2, among the methods used, the POS-BERT for ASWD model performs best in 50 epochs with an accuracy of 96.33% over the testing set. We note also that GRU based model is the best RNN model and it outperforms even LSTM based model. This is may be due to the fact that GRU has fewer training parameters and thus it uses less memory, executes faster, and trains quicker than LSTMs and BiLSTM models. For the BERT models, we point out that the ArabicBERT outperforms the others and the results of Mbert and arabic bert are very close, maybe it is due to the similar size of vocabulary that both models have. Finally, we compare our POS-BERT model with the state of the art approaches ([8], [19] and [17]). Table 3 shows that POS-BERT model outperforms the existing works on AWSD. It’s also slightly better than the approach based on automatically created corpus [19]. This supports the hypothesis that the sense of an ambiguous word depends on its lexical category. So explicitly integrating POS tagging in the disambiguation process will strengthen it.
BERT for Arabic Word Sense Disambiguation
683
Table 3. Comparing POS-BERT model with recent works for AWSD. Approach
Dataset
Predicting accuracy
AWSD-Glove [8]
AWN
71.73%
AWSD-SkipGram [8]
AWN
82.17%
AWSD- SkipGram [17] Benchmark [17] 51.52% AWSD- Cbow [17]
Benchmark [17] 50.25%
AWSD- Arbert [19]
Benchmark [19] 89%
AWSD- Arabert [19]
Benchmark [19] 96%
POS-BERT for ASWD AWN
5
96.33%
Conclusion
In this paper, we proposed a novel approach POS-BERT to solve the Arabic word sense disambiguation. The idea is to simultaneously feed the outputs of a BERT representation model and a POS tagger into a fully connect layer to predict the meaning of a target word. We validated our approach through Arabic WordNet dataset. We showed that PoS-BERT model outperforms the existing approaches with 96.33% accuracy. As a future work, we plan to enrich POSBERT by automatically extracting the meaning of Arabic multiword expressions.
References 1. Elmougy, S., Taher, H., Noaman, H.: Na¨ıve Bayes classifier for Arabic word sense disambiguation. In: Proceeding of the 6th International Conference on Informatics and Systems, pp. 16–21 (2008) 2. Zouaghi, A., Merhbene, L., Zrigui, M.: A hybrid approach for Arabic word sense disambiguation. Int. J. Comput. Process. Lang. 24(02), 133–151 (2012) 3. Merhbene, L., Zouaghi, A., Zrigui, M.: A semi-supervised method for Arabic word sense disambiguation using a weighted directed graph. In: Proceedings of the Sixth International Joint Conference on Natural Language Processing, pp. 1027–1031 (2013) 4. El-Gedawy, M.N.: Using fuzzifiers to solve word sense ambiguation in Arabic language. Int. J. Comput. Appl. 79(2) (2013) 5. Menai, M.E.B.: WSD using evolutionary algorithms-application to Arabic language. Comput. Hum. Behav. 41, 92–103 (2014) 6. Bouhriz, N., Benabbou, F., Lahmar, E.B.: WSD approach for Arabic text. Int. J. Adv. Comput. Sci. Appl. 7(4), 381–385 (2016) 7. Alian, M., Awajan, A., Al-Kouz, A.: Arabic WSD-survey. In: 2017 International Conference on New Trends in Computing Sciences (ICTCS), pp. 236–240. IEEE (2017) 8. Alkhatlan, A., Kalita, J., Alhaddad, A.: Word sense disambiguation for Arabic exploiting Arabic wordnet and word embedding. Procedia Comput. Sci. 142, 50– 60 (2018) 9. Sun, X.R., Lv, S.H., Wang, X.D., Wang, D.: Chinese word sense disambiguation using a LSTM. In: ITM Web of Conferences, vol. 12, p. 01027. EDP Sciences (2017)
684
R. Saidi and F. Jarray
10. Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5–6), 602–610 (2005) 11. Menai, M.E.B., Alsaeedan, W.: Genetic algorithm for Arabic word sense disambiguation. In: 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, pp. 195–200. IEEE(2012) 12. Classify Sentences via a Recurrent Neural Network (LSTM). https:// austingwalters.com/classify-sentences-via-a-recurrent-neural-network-lstm/. Accessed 3 May 2021 13. Apaydin, H., Feizi, H., Sattari, M.T., Colak, M.S., Shamshirband, S., Chau, K.W.: Comparative analysis of recurrent neural network architectures for reservoir inflow forecasting. Water 12(5), 1500 (2020) 14. Hadni, M., Ouatik, S.E.A., Lachkar, A.: Word sense disambiguation for Arabic text categorization. Int. Arab J. Inf. Technol. 13(1A), 215–222 (2016) 15. El-Gamml, M.M., Fakhr, M.W., Rashwan, M.A., Al-Said, A.B.: A comparative study for Arabic word sense disambiguation using document preprocessing and machine learning techniques. In: Arabic Language Technology International Conference, Bibliotheca Alexandrina, CBA’11, Alexandria, Egypt (2011) 16. Merhbene, L., Zouaghi, A., Zrigui, M.: An experimental study for some supervised lexical disambiguation methods of Arabic language. In: Fourth International Conference on Information and Communication Technology and Accessibility (ICTA), pp. 1–6. IEEE (2013) 17. Laatar, R., Aloulou, C., Belghuith, L.H.: Word2vec for Arabic word sense disambiguation. In: Silberztein, M., Atigui, F., Kornyshova, E., M´etais, E., Meziane, F. (eds.) NLDB 2018. LNCS, vol. 10859, pp. 308–311. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91947-8 32 18. Eid, M.S., Al-Said, A.B., Wanas, N.M., Rashwan, M.A., Hegazy, N.H.: Comparative study of Rocchio classifier applied to supervised WSD using Arabic lexical samples. In: Proceedings of the Tenth Conference of Language Engineering (SEOLEC’2010), Cairo, Egypt (2010) 19. El-Razzaz, M., Fakhr, M.W., Maghraby, F.A.: Arabic gloss WSD using BERT. Appl. Sci. 11(6), 2567 (2021) 20. Abood, R.H., Tiun, S.: A comparative study of open-domain and specific-domain word sense disambiguation based on Quranic information retrieval. In: MATEC Web of Conferences, vol. 135, p. 00071. EDP Sciences (2017) 21. Abderrahim, M.A., Abderrahim, M.E.A.: Arabic word sense disambiguation with conceptual density for information retrieval, vol. 06, Issue 01 (2018) 22. Bounhas, I., Elayeb, B., Evrard, F., Slimani, Y.: Organizing contextual knowledge for Arabic text disambiguation and terminology extraction. Knowl. Organ. 38(6), 473–490 (2011) 23. Soudani, N., Bounhas, I., ElAyeb, B., Slimani, Y.: Toward an Arabic ontology for Arabic word sense disambiguation based on normalized dictionaries. In: Meersman, R., et al. (eds.) OTM 2014. LNCS, vol. 8842, pp. 655–658. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45550-0 68 24. Pinto, D., Rosso, P., Benajiba, Y., Ahachad, A., Jim´enez-Salazar, H.: Word sense induction in the Arabic language: a self-term expansion based approach. In: 7th Conference on Language Engineering of the Egyptian Society of Language Engineering-ESOLE, pp. 235–245 (2007) 25. Vaswani, A., et al.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017)
BERT for Arabic Word Sense Disambiguation
685
26. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. ArXiv:1810.04805 (2018) 27. Antoun, W., Baly, F., Hajj, H.: AraBERT: transformer-based model for Arabic language understanding. arXiv preprint arXiv:2003.00104 (2020) 28. Abdul-Mageed, M., Elmadany, A., Nagoudi, E.M.B.: ARBERT & MARBERT: deep bidirectional transformers for Arabic. ArXiv:2101.01785 (2020) 29. Safaya, A., Abdullatif, M., Yuret, D.: KUISAIL at SemEval-2020 task 12: BERTCNN for offensive speech identification in social media. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 2054–2059 (2020) 30. Libovick´ y, J., Rosa, R., Fraser, A.: How language-neutral is multilingual BERT? arXiv preprint arXiv:1911.03310 (2019) 31. Saidi, R., Jarray, F., Mansour, M.: A BERT based approach for Arabic POS tagging. In: Rojas, I., Joya, G., Catal` a, A. (eds.) IWANN 2021. LNCS, vol. 12861, pp. 311–321. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-85030-2 26 32. Arabic WordNet. http://globalwordnet.org/resources/arabic-wordnet/awnbrowser/. Accessed 15 Oct 2021
Detection of Lung Cancer from CT Images Using Image Processing S. Lilly Sheeba1(B) and L. Gethsia Judin2 1 SRM Institute of Science and Technology, Ramapuram Campus, Chennai, India
[email protected] 2 Anna University, Chennai, India
Abstract. Cancer is a life-threatening disease which involves abnormal cell division and invasion of such cells to other parts of the body as well. Lung cancer is medically named as Lung Carcinoma. Lung cancer may be diagnosed using Chest Radiographs or Computed Tomography (CT) scans. In order to detect the tumor accurately and more precisely we go for CT scan, which has less noise when compared to Magnetic Resonance Imaging (MRI) images. To further improve the quality and accuracy of images, Median filter and Watershed segmentation is used. MATLAB is an Image Processing Tool used to exploit a comprehensive set of standard algorithms for image processing, analysis and visualization. Here image processing techniques like pre-processing, segmentation and feature extraction are utilized to identify the exact location of the tumor in the lung using CT images as the input. This enables lung cancer to be detected and diagnosed for adequate treatment and medications without considerable time delay. Keywords: Computed Tomography · Lung cancer · Image processing · Watershed segmentation · Feature extraction
1 Introduction Lung Cancer predominantly affects people who smoke other than family history, constant exposure to radiations and certain gases. Lung Cancer otherwise known as Lung Carcinoma in medical terminology is a deadly disease and it is the second major contributor impacting the mortality rate of human population. Tumors that comprise of cancerous cells are known as malignant tumors. Tumors in the lungs can grow quite large and there are two kinds of tumors, benign tumor and malignant tumor. Lung cancer is nothing but uncontrolled growth of cell encountered in lung tissues. Cancerous cells in turn multiply by breaking from the original tumor. Cancerous cells are diagnosed by different methods including scanning methods like Computed Tomography (CT), Magnetic Resonance Imaging (MRI) and Sputum Cytology. Computed Tomography (CT) [11, 12, 16] is said to be more effective than plain chest X-ray in providing with high quality images with more details. The advantage of CT scan over MRI is that it is a painless technique and is less noisy when compared © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 686–695, 2022. https://doi.org/10.1007/978-3-030-96308-8_64
Detection of Lung Cancer from CT Images Using Image Processing
687
to MRI scan [15]. CT generally gives a 360-degree view of internal organs including the spine. The purpose of this work is to detect the lung cancer more accurately by employing image processing techniques. This work includes conversion of color images into gray scale images besides applying smoothening, enhancement and feature extraction techniques to obtain more accurate results. Enhancement techniques are used to improve the perception of information in images. The resultant image is more suitable for further processing than the original image. Digital images usually contain noises which in turn reduces its overall quality. These noises could be imparted either due to fluctuations in lighting or attributed by sensor or transmission noise. Hence smoothening is employed to provide an enhanced image quality by removing unwanted or undesired noises from the original image. It uses filters to suppress noise or fluctuation present in an image to improve the quality of the image. Median filtering is used to reduce salt and pepper noise and employs a non-linear smoothing operation to accomplish this. Image Segmentation [19, 20] is a process in image processing which advances the image for further analysis. It involves splitting of an image into multiple regions or segments, with each segment corresponding to different objects or parts of objects. Every pixel in an image must be associated to any one of the regions or segments. In a good segmentation technique pixel in the same region have similar greyscale values and form a connected region while pixels which are in neighboring regions have dissimilar greyscale values. Watershed segmentation is employed to detect the affected area. In an image processing, watershed involves transformation of a grayscale image into another image, where vital objects or regions for further analysis can be identified by marking desired foreground objects and background locations. Watershed transformation is implemented mainly using any of the three methods namely, distance transform approach, gradient method or marker-controlled approach. The marker-controlled watershed segmentation is a suitable method for segmenting objects. Markers are placed inside an object of interest. Here internal markers are used to locate objects of interest, and external markers are used to locate the background. After segmentation, watershed regions are located on the desired ridges and thereby separates the desired object from its neighbors [19–21]. Feature Extraction is an important step in image processing that uses algorithms and techniques to detect and isolate desired region of an image based on features like area, perimeter, centroid, diameter, eccentricity and mean intensity should be obtained. Thus, cancer region is obtained exactly by identifying the cancer cells. Once the cancer region is detected, the result is displayed.
2 Related Works Lung cancer is the most dangerous disease and it involves growth of abnormal cells or tissues. This unwanted growth is known as a tumor. The symptoms of lung carcinoma comprise of insistent coughing, weight loss, shortness of breath and subsequent chest pain. Earlier and easier diagnosis of lung cancer can be done by performing a CT scan or a Magnetic Resonance Imaging scan on the patient. But the easiest and most efficient one that is most preferred for diagnosis is CT scan [6, 11, 12, 16].
688
S. Lilly Sheeba and L. Gethsia Judin
Primary lung cancer is the cancer that at first originated in the lung. Lung Cancer in general is classified into two main classes namely, Small Cell Lung Cancer (SCLC) and Non-Small Cell Lung Cancer (NSCLC). Small cell lung cancer is otherwise known as oat cell cancer. About 10% to 15% of lung cancer comes under SCLC. It has a small cancer cell which are mostly filled with the nucleus. It is usually caused by smoking. Doctors often suggest chemotherapy treatment for SCLC. In Non-small cell lung cancer (NSCLC), malignant cancer cells are mostly located in the tissues of the lung. Smoking increases the risk of this lung cancer. Signs of NSCLC includes persistent cough that lasts long and shortness of breath. In most of the cases, there is no cure for this type of cancer. In [1], lung cancer is detected by using image processing techniques like preprocessing, histogram equalization, threshold and feature extraction. In this model, artificial neural networks are used. Here the detection system uses a feed forward back propagation neural network and works on the principle of minimizing the error. The work employs MATLAB2016a to implement the design. This helped radiologists by preventing incorrect diagnosis. The overall efficiency of system was 78%. Lung cancer can be diagnosed more efficiently method and accurately using Support Vector Machine (SVM) and image processing techniques [2]. MATLAB is a tool best suited for easily implementing image processing techniques. In this system, Median filter and segmentation were employed to get more accurate results. This system assists in earlier diagnose of cancer prevalence in different organs of the human body and there by helps in suppressing further growth of abnormal cells to other parts of the body. Watershed segmentation is used in [3] to detect earlier stage lung cancer more accurately. This work sequentially uses various image processing techniques to achieve its specified target. Among all segmentation techniques, watershed segmentation is a powerful segmentation approach which sheds more light to inner depths of the marked segment. It also uses feature extraction techniques to identify the area to be extracted from the image. By measuring the area of tumor, the lung cancer region can be detected accurately. It gives good potential for lung detection at early stage of the lung cancer. The main objective of [4] is to extract lung region from human chest CT images using Artificial Neural Network (ANN) classifier model. Here Median filter and Morphological operation like erosion and dilation which is used to extract the border of the lung. After the extraction, the extracted region is segmented using unsupervised modified Hopfield neural network classifier. The extracted region is then clustered based on similar characteristics. This process can be used as an input for the next diagnosis. [5] mainly focuses on detecting lung cancer in earlier stages within a shorter period and provides a better solution to reduce further spreading of cancer cells. This work uses Gabor filters and watershed segmentation approach for quick detection of lung cancer. Here Gabor filter was used for image enhancement instead of Discrete Wavelet Transform (DWT) and histogram equalization. This proposed method is highly suitable for manufactures of cancer detection equipment’s and medical practitioners.
Detection of Lung Cancer from CT Images Using Image Processing
689
3 Proposed System Cancer is one of the most dangerous deadly disease in the world. Among them, Lung cancer is the leading cancer killer due to lower survival rates. The toughest part of lung cancer is that it cannot be detected in its earlier stage. Lung cancer occurs when there is an abnormal cell growth totally out of control. There are few methods available to detect the presence of cancer tumors in lungs which includes X-rays, Computed Tomography, Magnetic Resonance Imaging and Sputum Cytology. Diagnosis is mostly based on Computed Tomography images. Except CT, the other techniques are time consuming so we prefer CT scan. CT is considered to be more effective when compared to chest X-ray in detecting and diagnosing lung cancer. Recently, image processing techniques play a major role in detecting lung cancer. Image processing is a form of signal processing where images and their properties can be used to gather and analyze information about the objects in the image. Digital Image Processing exploits digital images and computer algorithms to enhance, manipulate or transform images in order to precisely determine prevailing condition and subsequently make appropriate decisions. Obtaining CT scan
Smoothening
Image Enhancement
Segmentation
Feature Extraction
Identification of Cancer Cells
Result
Fig. 1. Proposed cancer detection model
Figure 1 depicts the various image processing steps involved in predicting the exact location of cancerous cells in the lung image. The input images are acquired in the form of CT scan. The lung CT scan image are in the format of RGB color model. Median filtering is used to smoothen the grayscale image. Image enhancement is imparted to
690
S. Lilly Sheeba and L. Gethsia Judin
improve the color, contrast of the images and overall quality of the image for further analysis. Then the images are segmented by morphological operation using watershed transformation. The geometric features include area, perimeter and eccentricity of the tumor are extracted to identify the exact location of cancer in lungs. A. Collection of CT Image The acquisition of the lung cancer image is obtained by using Computed Tomography (CT) scan [7–10]. This is the first step for detecting the presence of cancerous cells in the CT image. The main advantage of CT images is that it has less noise compared to X-Ray and MRI images. It is easier to use and it gives better clarity and less distortion with enhanced quality. Figures 1 and 2 depict the CT images of non-cancerous and cancerous lung respectively. It can be easily noticed from Fig. 3 that some kind of abnormality exists in the lung when compared to the lung in Fig. 2.
Fig. 2. CT image of non-cancerous lung
Fig. 3. CT image of cancerous lung
Detection of Lung Cancer from CT Images Using Image Processing
691
B. Conversion into Grayscale Image Images in their original form are in Red Green Blue (RGB) form. The RGB form of an image is logically represented as a combination of 3 × 3 matrices, having a dimension of 1*x, where x denotes the number of pixels. These data are difficult to process and therefore we first convert these images into gray scale image. Grayscale images is one in which each pixel value is a single sample representing only the amount of light, i.e., it carries only intensity information. Grayscale images are a combination of two colors: black and white (also referred to as binary images). Hence these black and white images incorporates many shades of gray. To convert colored image into grayscale image, RGB pixel value must be replaced with its average value and is given as Average Value = (R-Value + G-Value + B-Value)/3 The gray scale image obtained from Fig. 3 is depicted in Fig. 4.
Fig. 4. Grayscale image of cancerous lung
C. Pre-processing The goal of this process is to obtain an improvement of the image either by eliminating unwanted distortions or by enhancing certain features for further processing. Preprocessing does not increase any image information but only aims at clearly exposing information pertaining to specific critical analysis. It mainly involves three techniques including image smoothening, image enhancement and image segmentation techniques with an intention to assist in the identification process of cancerous tissues in the lung at an earlier stage. Early detection is considered as a vital step in promoting health-care solutions and thereby improving mortality rate in humans.
692
S. Lilly Sheeba and L. Gethsia Judin
D. Image Smoothening Image Enhancement is the process of adjusting digital images so that the results produced are suitable for displaying. It tends to enhance the perception of information in images. It is considered ideal for providing better input for other image processing techniques.
Fig. 5. Median filtered image
Smoothening suppresses the noise and some unwanted lightning fluctuations prevailing in CT images. But unfortunately, smoothening blurs all the sharp edges that contains information about the image. Further to remove unwanted noises from the image, median filter is employed. Median filter is a non-linear filter which is especially used for removing salt and pepper noise. Figure 5 shows the output obtained after the smoothening process. E. Image Segmentation Segmentation is the process of partitioning a digital image into a multiple segment. The goal of this process is to represent the image in a meaningful manner that is easier to analyze. The main objective of segmentation is to transform the replica of CT scan such that it is easy for examining and analyzing the image at ease. Image Segmentation reduces unnecessary information that is present in an image. Figure 6 shows the presence of cancerous tissues in the lung image. Here, we use watershed segmentation technique. Marker-based watershed segmentation is applied to separate desired object from an image. Separating these objects from an image is the most difficult image processing operations. Marker controlled watershed approach has two types: an external marker which is associated with the background of the desired object and internal marker which is associated with the desired objects of interest.
Detection of Lung Cancer from CT Images Using Image Processing
693
Fig. 6. Binary image after watershed segmentation
F. Feature Extraction Feature extraction is the ideal step of all image processing techniques. It is a technique to recognize the patterns of the replica of an image. The segmented output is the input image to feature extraction phase. The images are extracted based on the features such as area, perimeter, eccentricity etc. to accurately identify the presence of cancer in the lung at an earlier stage. In the stage of feature extraction, detection and isolation of various desired portions or shapes of an image is done. From the spotted region of the image, features like (i) area, perimeter, centroid, diameter, (ii) eccentricity and (iii) mean intensity are obtained. Here we use region-based segmentation. After extracting the features, accuracy can be obtained. Using Median filter and Region based segmentation, the accuracy can be further enhanced.
4 Conclusions Lung cancer is a deadly disease that is described by abnormal growth of cells in tissues of the lungs. In our proposed system we are diagnosing the lung cancer using different image processing techniques inclusive of grayscale conversion, smoothening and segmentation for image enhancement and accurate analysis. All these algorithms are involved in preprocessing the colored CT image of the lung. Median filter and watershed segmentation techniques contribute significantly in improving the accuracy in predicting the cancerous region. Hence, this proposed method is highly suitable for diagnosing the lung cancer accurately even in earlier stages of cancer.
5 Future Works This prediction can further be extended to determine the link between pollutants (contents in industrial emissions in water, air pollution contents and presence of radioactive contents in soil) pertaining to a particular geographical location and the onset of lung cancer using approaches in [22–24].
694
S. Lilly Sheeba and L. Gethsia Judin
It can also be triggered to predict the persistence of lung cancer province or district wise in particular Future works can also considers attributes including. a. b. c. d. e. f.
area (D1, D2, D3, …, D32) age (child, adult, senior) sex (male, female) smoking (yes, no) lung cancer type (NSCLC, SCLC) lung tumor (malignant, benign)
References 1. Kalaivani, S., Chatterjee, P., Juyal, S., Gupta, R.: Lung cancer detection using digital image processing and artificial neural networks. In: International Conference on Electronics, Communication and Aerospace Technology (ICECA) (2017) 2. Rahane, W., Dalvi, H., Magar, Y., Kalane, A.: Lung cancer detection using image processing and machine learning healthcare. In: IEEE International Conference on Current Trends toward Converging Technologies (ICCTCT) (2018) 3. Kulkarni, A., Panditrao, A.: Classification of lung cancer stages on CT scan images using image processing. In: International Conference on Advanced Communication Control and Computing Technologies (lCACCCT) (2014) 4. Sammouda, R.: Segmentation and analysis of CT chest images for early lung cancer detection. In: Global Summit on Computer & Information Technology (2016) 5. Avinash, S., Manjunath, K., Senthil Kumar, S.: An improved image processing analysis for the detection of lung cancer using gabor filters and watershed segmentation technique. In: International Conference on Inventive Computation Technologies (2016) 6. Al-Tarawneh, M.S.: Lung cancer detection using image processing techniques. Leonardo Electron. J. Pract. Technol. 11(21), 147–158 (2012) 7. Pandey, N., Nandy, S.: A novel approach of cancerous cells detection from lungs CT scan images. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 2(8), 316–320 (2012) 8. Chaudhary, A., Singh, S.S.: Lung cancer detection on CT images using image processing. In: International Transaction on Computing Sciences, vol. 4 (2012) 9. Bandyopadhyay, S.K.: Edge detection from CT images of lung. Int. J. Eng. Sci. Adv. Technol. 2(1), 34–37 (2012) 10. Kaur, A.R.: Feature extraction and principal component analysis for lung cancer detection in CT scan images. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 3(3), 187–190 (2013) 11. Hadavi, N., Nordin, M.J., Shojaeipour, A.: Lung cancer diagnosis using CT scan images based on cellular learning automata. In: International Conference on Computer and Information Sciences. IEEE (2014) 12. Vijaya, G., Suhasini, A., Priya, R.: Automatic detection of lung cancer in CT images. Int. J. Res. Eng. Technol. 3(7), 182–186 (2014) 13. Miah, M.B.A., Yousuf, M.A.: Detection of Lung cancer from CT image using image processing and neural network. In: 2nd International Conference on Electrical Engineering and Information and Communication Technology (2015) 14. Agarwal, R., Shankhadhar, A., Sagar, R.K.: Detection of lung cancer using content based medical image retrieval. In: 5th International Conference on Advanced Computer and Communication Technologies, pp. 48–52 (2015)
Detection of Lung Cancer from CT Images Using Image Processing
695
15. Deshpande, A.S., Lokhande, D.D., Mundhe, R.P., Ghatole, J.M.: Lung cancer detection with fusion of CT and MRI images using image processing and machine learning. Int. J. Adv. Res. Comput. Eng. Technol. 4(3), 763–767 (2015) 16. Mahersia, H., Zaroug, M., Gabralla, L.: Lung cancer detection on CT scan images: a review on the analysis techniques. Int. J. Adv. Res. Artif. Intell. 4(4), 38–45 (2015) 17. Pratap, G.P., Chauhan, R.P.: Detection of lung cancer cells using image processing techniques. In: 1st IEEE International Conference on Power Electronics, Intelligent Control and Energy Systems. IEEE (2016) 18. Rossetto, A.M., Zhou, W.: Deep learning for categorization of lung cancer CT images. In: IEEE/ACM International Conference on Connected Health: Applications, Systems and Engineering Technologies (2017) 19. Amandeep Kaur, A.: Image segmentation using watershed transform. Int. J. Soft Comput. Eng. 4(1), 5–8 (2014) 20. Lu, N., Ke, X.Z.: A segmentation method based on gray-scale morphological filter and watershed algorithm for touching objects image. In: Fourth International Conference on Fuzzy Systems and Knowledge Discovery (2007) 21. Gonzalez, R.C., Woods, R.E.: Digital Image Processing. Prentice-Hall, Hoboken (2002) 22. Jemimah, C., Lilly Sheeba, S.: Analysis of bodily fluids and fomites in transmission of ebola virus using bigdata. Procedia Comput. Sci. 92, 56–62 (2016) 23. Varela-Santos, S., Melin, P.: A new modular neural network approach with fuzzy response integration for lung disease classification based on multiple objective feature optimization in chest X-ray images. Expert Syst. Appl. 168, 114361 (2021) 24. Varela-Santos, S., Melin, P.: A new approach for classifying coronavirus COVID-19 based on its manifestation on chest X-rays using texture features and neural networks. Inf. Sci. 545, 403–414 (2021)
An Overview of IoT-Based Architecture Model for Smart Home Systems Odamboy Djumanazarov1,2(B) , Antti Väänänen1 , Keijo Haataja1 , and Pekka Toivanen1 1 School of Computing, University of Eastern Finland, Kuopio, Finland [email protected], {antti.vaananen,keijo.haataja, pekka.toivanen}@uef.fi 2 Urgench Branch of Tashkent University of Information Technologies named after Muhammad al-Khwarizmi, Urgench, Uzbekistan
Abstract. A smart home expresses the specific situations that occur in the building and accordingly responds to them according to previously developed algorithms. Most smart home systems are controlled by sensors. The sensors are used to control and monitor home functions using wireless methods. We are exploring the concept of a “smart home” with the integration of Internet of things (IoT) services integrating smart sensors and actuators, combining smart things on the network using appropriate technology for easy access to different places. A smart environment should be able to respond in case of emergency or risk and uniform any abnormal behavior. In this paper, we present a review to build of IoT-based architecture model for a reliable approach to an advanced concept of a smart home system. Keywords: Smart home · IoT architecture · Sensors · Actuators · Home appliances · Wireless communication
1 Introduction Detailed “Smart Home” is a special platform, connected different objects to a network that meets the requirements of the entire system and provide a more convenient operation that uses the Internet of Things, computer technology, and communication technology [1]. In resent years we have seen the development of new “smart” devices that can directly connect to the Internet and be controlled by applications remotely. Internet of Things (IoT) is a network of devices and other elements included sensors, electronics, software, and connectivity are embedded systems [2]. This future another new technology cloud computing [3] has to the creation of a cloud solution based on the Internet of Things for the development of a smart home. In the IoT, there is a shift from functionality to connectivity and sensor-based decision making, which means that a device can become more useful when connected to other devices. However, the Internet of Things is not just a collection of devices and sensors © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Abraham et al. (Eds.): ISDA 2021, LNNS 418, pp. 696–706, 2022. https://doi.org/10.1007/978-3-030-96308-8_65
An Overview of IoT-Based Architecture Model for Smart Home Systems
697
connected to each other in a wired or wireless network - it is a tight integration of the virtual and real world, where communication between people and devices takes place [4]. Users with one command sets the desired situation and controls the operating modes of all systems and electrical appliances. The smart home is able to independently configure and adjust the functions of the smart appliances according to various parameters, such as time of day, person’s needs, person’s location, weather, and ambient lighting, in order to achieve a pleasant living environment for the resident. 1.1 Smart Home Devices But how can a device become “smart”? The first argument is due to a change in its design: this design may be such that the behavior of the system may look reasonable. The second argument is due to “intellectualization” (equipping the system with devices for collecting information, processing it and making decisions). This approach allows us to provide a fairly complex and “reasonable” behavior in much simpler ways than by building the appropriate design. Finally, the third argument - the behavior of the system becomes “reasonable because it interacts with other systems. IoT technology (the Internet of things) just provides the opportunity for each element of a smart home (thing) and the entire smart home to access the Internet and exchange information with other smart homes and systems. Why the third argument is the most interesting?. It provides many more opportunities for organizing a smart home (you can use data from all over the Internet), and it is more economical (Internet is much cheaper than creating complex smart devices) [5]. There are several basic functions for a smart home [6]: – Smart homes can increase comfort, safety, convenience, and interactivity of family life, as well as optimize people’s lifestyles. – The smart home can support remote payment; – A smart home can monitor and interact with the home via mobile phone and remote network, and timely processing; – “Smart Home” realizes in real-time the meter readings and the security service of the humidity, temperature, water, electricity, and gas sensors, which provide more convenient conditions for high-quality service; – Support perfect intelligent service; The rest of this paper is organized as follows: In Sect. 1, we describe the typical smart home and smart home that requires smart devices; Sect. 2 describes the most common Smart home services which have been defined, as of today, in the IoT taking into consideration both standardized; Sect. 3 focuses on IoT-based architecture model for smart homes which may arise when interacting with an IoT system. We described the integration of smart home components and IoT sensors, and this concept requires further development. Section 4 describes the communication protocols comparison of wireless technologies; Sect. 5 describes focused a conception design of Smart Home Security using sensor based IoT solutions. Section 6 Conclusions and Future Work.
698
O. Djumanazarov et al.
2 Smart Home Services 2.1 Measuring Home Conditions A typical smart home includes a set of sensors to measure home conditions such as light, temperature, humidity, and proximity. Each sensor has its own function for recording one or more measurements [7]. Temperature and humidity may be measured by one sensor (for example DHT 22 [16]), other sensors calculate the light ratio (sensor photoresistor) for a given area and the distance from it to each object exposed to it. All sensors allow storing the data and visualizing it so that the user can view it anywhere and anytime. To do so, it includes a signal processer, a communication interface, and a host on a cloud infrastructure [8]. Monitoring home temperature humidity, light, and proximity have great advantages. The sensors installed in the home can help you identify issues with your house.
Illustration of online wireless sensor solution [6]. The use of IoT devices is increasing every day. The reason for increasing the number of IoT devices is that they provide comfort in people’s lives and perform work with better outcomes than humans. Statists has been reported that the number of IoT devices in 2018, will have more than tripled since 2012 and there will be 50 billion devices. It shows IoT devices is increasing every day. Graph chart 1 you can see the number of connected IoT devices in Internet from 2012 to 2020 [9]. Devices 60 50 40 30 20 10 0 2012
2013
2014
2015
2016
2017
2018
2019
Years
Diagram 1. Number of connected IoT devices from 2012 to 2020 [7].
2020
An Overview of IoT-Based Architecture Model for Smart Home Systems
699
2.2 Home Appliances Management Home appliances management (HAM) provides the user with an interface that the user can use to manage appliances and monitor system data. This leads us to wonder why a person needs to perform these tasks of managing and monitoring device usage data? [10]. What can be done to replace human intervention and automate the entire end-toend system, from reading and analyzing usage data to planning and taking appropriate action based on user preferences? IoT, more precisely Home appliances management (HAM) plays a very important role in this goal of automating a process where two goals are to be achieved - energy savings and user preferences. Monitoring smart devices by the user can help the user achieves the first goal - preventing energy losses during the day. On the other hand, I believe that reinforcement learning can allow us to find a solution to the second goal - user preference management. Creates the cloud service for managing home appliances which will be hosted on a cloud infrastructure. The smart home cloud service offers a simple, flexible and inexpensive way to store and access data from smart home devices. The managing service allows the user, controlling the outputs of smart actuators associated with home appliances, such as lamps and fans. Smart actuators are devices, such as valves and switches, which perform actions such as turning things on or off or adjusting an operational system. Actuators provide a variety of functionalities, such as on/off valve service, positioning to percentage open, modulating to control changes on flow conditions, emergency shutdown (ESD). To activate an actuator, a digital write command is issued to the actuator [8]. 2.3 Home Access Control Home access technologies are commonly used for public access doors. A common system uses a database with the identification attributes of authorized people. When a person is approaching the access control system, the person’s identification attributes are collected instantly and compared to the database. If it matches the database data, the access is allowed, otherwise, the access is denied. For a wide distributed institute, we may employ cloud services for centrally collecting person’s data and processing it. Some use magnetic or proximity identification cards, other uses face recognition systems, fingerprint, and RFID. In an example implementation, an RFID card and an RFID reader have been used. Every authorized person has an RFID card. The person scanned the card via an RFID reader located near the door. The scanned ID has been sent via the internet to the cloud system. The system posted the ID to the controlling service which compares the scanned ID against the authorized IDs in the database [8] (Fig. 1). “Operation of Access Control” Access control readers give access to the building based on established credentials. Things like a key card, key fob, or biometrics like fingerprints are all considered established credentials. System readers are connected to a network. Every person who needs access has a code tied to their credential and the system recognizes that they are authorized to be in the building. Software tracks who enters and exits the home and has the
700
O. Djumanazarov et al.
Fig. 1. Overall structure of smart homes [6].
ability to alert security supervisors etc. when someone enters the home after hours or there is a break-in [23]. Some Types of Access Control 1. Cloud-based access control systems. In smart homes, access control rights are stored in the cloud, not on a local server. The administrator can manage permissions at any time and place using a browser. Security Manager, supervised all facilities located in several locations [24]. 2. Mobile or smartphone-based access control systems. Mobile or smartphone access control works on the same principles. You can get better access to it with mobile access control, with a simple push of a button on a mobile device such as smartphones, including wearable devices. In addition to enhancing the ease of use and usability, the system offers the user efficient and cost-effective solutions as well as effective identity management. Developers really find this type of control system to be highly efficient [24]. 3. IoT-based access control systems. To explain the Internet of Things (IoT) based access control, I presented the smartphone as one of the most powerful Bluetooth sensors, NFC. Using the IoT for access control, all door readers are connected to the Internet and have firmware that can be updated both for security reasons and to add new features [24].
3 IoT-Based Architecture Model for Smart Homes Figure 2 shows the smart-home and IoT based, main components, and their interconnectivity. Here in the smart home environment, we can see the typical devices connected to a local area network (LAN). Nowdays it might be ZigBee, BLE, Wi-Fi or
An Overview of IoT-Based Architecture Model for Smart Home Systems
701
other proprietary RF communication as shown in Table 1. This enables communication among the sensors, actuators, and outside of it. Connected to the LAN are a server and its database. The server controls the devices, logs its activities, provides reports, answers queries, and executes the appropriate commands. For more comprehensive or common tasks, the smart home server transfers data to the cloud and remotely activate tasks in it using APIs, application programming interface processes. Besides, IoT home appliances are connected to the internet and the LAN, and so expands smart home to include IoT. The connection to the internet allows the end-user application to communicate with smart home enabling resident to get information and remotely activate tasks [8].
Fig. 2. Building an IoT-based architecture model for smart homes [8].
A smart home systems to be controlled automatically. To accomplish this, the smart home system which we designed adopts wireless communication to build a local home network in order to sense the objects or devices at home, and it uses 4G or Ethernet to connect the local home network to the Internet for communication in order to support remote control and management. For example Min Li and et al. [1] realized the home air condition and other smart appliances network by power fiber optic network interconnection. They achieved smart appliances automatically collect all information (electricity, water meters, gas meters and others), analysis and management. Soliman et al. [25] proposed and created a system architecture for smart homes. Users can control devices using Web applications. Cook et al. [26] proposed architecture made of controllers (computer servers), sensors, and actuators with wireless technology ZigBee. The program control is based on subscribe pattern. Jie et al. [27] proposed an architecture model by scalability. By using the proposed model can be added or removed (devices) to the smart home infrastructure.
702
O. Djumanazarov et al.
The proposed architecture is content into five layers: – – – – –
the resource layer (sensors, appliances); the interface layer; the agent layer; the kernel layer (management controllers); user application layer (API).
Zhou et al. [28] proposed called CloudThings cloud-based architecture, which focused on the development, and management of IoT applications. CloudThings use the CoAP protocol and direct access to the Internet. This architecture is made based on three components: – Infrastructure as a service (IaaS) – Platform as a service (PaaS) – Software as a service (SaaS)
Fig. 3. Architecture of the IoT-based smart home system
The system can be divided into three layers: sensing and actuating layer, network layer, and application layers are shown in Fig. 3. In this architecture model Fig. 3 [12] the sensors and the actuating layer is mainly responsible for data collection or commands receiving. Network layer of the system including environment sensors like
An Overview of IoT-Based Architecture Model for Smart Home Systems
703
temperature, humidity, gas, or body sensors, and so on. And then, the data are processed respectively by the terminals and transmitted to the gateway which locates in the middle of the sensing layer and network layer through a self-organizing network (wireless communication). The Getway receives and deals with the data from all the sensors, and transmits them through the Internet to the remote management platform. On the other hand, the management platform can support different applications such as health care, security, video monitoring, and entertainment. Users can log in to the platform to execute each application and management platform sends different commands or data to the gateway through the Internet. The gateway analyzes the received data and starts different actuators mechanisms [12].
4 Communication Protocols and Comparison of Wireless Technologies In a smart home environment, devices need to be interconnected to exchange information. The ways in which these devices and sensors can communicate are determined through communication protocols. These protocols, which define how information is transmitted, are developed by organizations [4]. Generally classified communication protocols into three main groups, according to the medium of propagation: wired, wireless, and hybrid [Dragos Mocrii et al.]. The choice of the right technology to use depends on the use case. Furthermore, the choice also depends on the size of the network. Some communications protocols offer longer ranges, some higher security, and others lower power consumption. Smart home wireless sensor networks (WSN), typically composed of the body area network and personal area network are susceptible to interference with higher power wireless technologies, such as Wi-Fi. As a result, WSN devices may sometimes fail to receive commands, or send sensed data, negatively affecting quality-of-service (QoS) in the smart home. In this table popular wireless technologies in the Internet of Things which we can often meet in different IoT devices in our homes. We also will consider some promising modifications of these technologies designed to expand their possibilities and make them closer to the Internet of Things. We can classify them by range, data rate, and power consumption [14]. The possible network configuration like network topology and physical size constraints is also important for designing a network. Wireless connectivity is not dominated by one single technology. In most cases, the technologies which provide low-power, low-bandwidth communication over short distances, operate on an unlicensed spectrum, have limited quality-of-service (QoS) and security requirements will be widely used for home and indoor environments [15]. Following low-power, low-bandwidth technologies are most suited for this description: ZigBee [17], WiFi HaLow [18], Bluetooth [19], BLE [20], ANT [11, 21], and Z-Wave [22]. In the table, you can see the main characteristics of considered technologies. Each technology has advantages and limitations. The most common for smart home applications which are often mentioned in the literature are the next technologies: ZigBee, WiFi, Bluetooth, and Z-Wave. ZigBee is one of the leading players in a smart home market, providing low-power, low-bandwidth mesh connectivity for both home automation and energy management applications [14].
704
O. Djumanazarov et al. Table 1. Comparison of wireless communication technologies [13] ZigBee
WiFi HaLow
Bluetooth
BLE
ANT
Z-wave
Standardization
IEEE 802.15.4
IEEE 802.11ah
IEEE 802.15.1
IEEE 802.15.1
Proprietary
Proprietary
Frequency
2.4 GHz, 868, 915 MHz
900 MHz
2.4 GHz
2.4 GHz
2.4 GHz
900 MHz