Intelligent Systems Design and Applications: 22nd International Conference on Intelligent Systems Design and Applications (ISDA 2022) Held December ... (Lecture Notes in Networks and Systems, 715) 3031355067, 9783031355066

This book highlights recent research on intelligent systems and nature-inspired computing. It presents 223 selected pape

156 66 50MB

English Pages 615 [614] Year 2023

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface
ISDA 2022—Organization
Contents
Towards Automatic Forecasting: Evaluation of Time-Series Forecasting Models for Chickenpox Cases Estimation in Hungary
1 Introduction
2 Related Work
3 Time-Series Forecasting Problem Definition
4 Exploratory Data Analysis (EDA)
5 Experimental Setup
5.1 Data Splitting
5.2 Data Normalization
5.3 Models
5.4 Evaluation Metrics
6 Benchmarking Results
7 Conclusion
References
Parameters Optimization in Hydraulically Driven Machines Using Swarm Intelligence
1 Introduction
2 Applying Swarm Intelligence Technique to Optimize Hydraulic Parameters
2.1 Firefly Algorithm
2.2 Modelling the Hydraulics
3 Developing Hydraulic Parameters Optimization Algorithm
3.1 Optimization Algorithm
3.2 Mass Lifting Cylinder
3.3 Real-System and Optimization Model
4 Results and Discussions
4.1 Optimizing the Lift Mass, Hydraulic Parameters and Characteristic Curve
5 Conclusion
References
A Machine Learning Framework for Cereal Yield Forecasting Using Heterogeneous Data
1 Introduction
2 Materials and Methodology
2.1 Study Area
2.2 The Used Dataset
2.3 Methodology
2.4 Machine Learning Methods for Cereal Yield Forecasting
3 Experiments and Results
3.1 Experiment Design
3.2 Hyper Parameters Selection
3.3 Experimental Results
4 Discussion and Conclusion
References
New Approach in LPR Systems Using Deep Learning to Classify Mercosur License Plates with Perspective Adjustment
1 Introduction
2 Related Works
3 Materials and Methods
3.1 Methods
3.2 Datsets
4 Proposed Methodology
5 Results and Discussion
6 Conclusion and Future Work
References
Ensemble of Classifiers for Multilabel Clinical Text Categorization in Portuguese
1 Introduction
2 Related Work
3 Proposal
3.1 Dataset Acquisition
3.2 Preprocessing and Labeling
3.3 Selection of Vectorization Techniques
3.4 Ensemble and Validation
4 Results and Discussions
4.1 Embeddings Definition Results
4.2 Classification Results
5 Conclusion
References
Semantic Segmentation Using MSRF-NET for Ultrasound Breast Cancer
1 Introduction
2 An Overview of MSRF-NET
2.1 The Encoder
2.2 The MSRF Sub-network
2.3 Shape Stream
2.4 The Decoder
2.5 Loss Function
3 Methodology
4 Experimental Results
5 Conclusion
References
A Hybrid Image Steganography Method Based on Spectral and Spatial Domain with High Hiding Ratio
1 Introduction
2 Related Works
3 Mathematical Background
3.1 Discrete Cosine Transform (DCT)
3.2 Substitution
4 Proposed Methodology
5 Results and Discussion
5.1 Peak Signal to Noise Ratio
5.2 Hiding Ratio (HR)
6 Conclusion
References
PU Matrix Completion Based Multi-label Classification with Missing Labels
1 Introduction
2 Related Work
3 Proposed Method
3.1 Problem Formulation
3.2 Debiased Learning with PU Asymmetric Loss
3.3 Learning Instance Dependencies by the Manifold Regularization
3.4 Learning Label Dependencies by Low-Rank Constraint
4 Optimization Algorithm
5 Experiment
5.1 Experiment Setup
5.2 Comparing Algorithms
5.3 Results and Analysis
5.4 Parameters Sensibility Study
6 Conclusion and Further Work
References
SP2P-MAKA: Smart Contract Based Secure P2P Mutual Authentication Key Agreement Protocol for Intelligent Energy System*-12pt
1 Introduction
1.1 Motivation
1.2 Paper Outlines
2 Literature Survey
3 Technical Preliminaries
3.1 Lattice Based Cryptography (LBC)
4 Proposed Model
4.1 Network Model
4.2 Proposed SP2P-MAKA Protocol
5 Security Analysis
6 Conclusion and Future Work
References
Automated Transformation of IoT Systems Models into Event-B Specifications*-4pt
1 Introduction
2 State of the Art
3 The Proposed Approach
4 Transformation of the UML Models of IoT System into Event-B Specifications
4.1 Structural Features
4.2 Behavioral Features
5 Case Study: IoT Based Health Monitoring of Elderly (IHME)
6 Conclusion
References
Prediction of Business Process Execution Time
1 Introduction
2 BPM: Business Process Management
3 Process Mining
4 BPETPM: Business Process Execution Time Prediction Method
5 iBPMS4PET: Intelligent Business Process Management System for Prediction of Execution Time
6 General Conclusion and Perspectives
References
Pre-processing and Pre-trained Word Embedding Techniques for Arabic Machine Translation
1 Introduction
2 Related Works
3 Methodology
3.1 Neural Machine Translation
3.2 Word Representations
3.3 Pre-processing
4 Experimental Setup
4.1 Dataset
4.2 Model Hyper-parameters
4.3 Pre-trained Word Embeddings
5 Results
5.1 Pre-processing Results
5.2 Pre-trained Word Embeddings Results
6 Conclusion
References
A Framework for Automated Abstraction Class Detection for Event Abstraction
1 Introduction
2 Related Work
3 Framework
3.1 Event Data
3.2 Constructing Preceding and Succeeding Patterns
3.3 Computing Activity Similarity
4 Evaluation
5 Conclusion
References
Using Clinical Data and Deep Features in Renal Pathologies Classification
1 Introduction
2 Related Work
3 Methodology
3.1 Clinical and Image Dataset
3.2 Evaluated Deep Features
3.3 Evaluated Classifiers
3.4 Evaluation Metrics
4 Results Assessment
5 Final Conclusions and Work Perspectives
References
An MDA Approach for Extending Functional Dimension for Sensitive Business Processes Execution*-12pt
1 Introduction
2 Related Work
2.1 Business Process Management System
2.2 Business Process Execution
2.3 Business Process Engine
3 MDA Approach for the Execution of the Functional Dimension
3.1 Types of Model Transformations
3.2 Mapping Rules Between BPM4KI and Extended BPMN
3.3 Activiti BPMS Meta-Model
3.4 Implementation and Validation
4 Conclusions
References
Double Deep Reinforcement Learning Techniques for Low Dimensional Sensing Mapless Navigation of Terrestrial Mobile Robots
1 Introduction
2 Related Works
3 Theorical Background
3.1 Deep Q-Network - DQN
3.2 Double Deep-Q Network - DDQN
4 Experimental Setup
4.1 PyTorch
4.2 ROS, Gazebo and Turtlebot
4.3 Simulation Environments
5 Methodology
5.1 Network Structure
5.2 Reward Function
6 Results
7 Conclusions
References
Review on Sentiment Analysis Using Supervised Machine Learning Techniques
1 Introduction
1.1 Machine Learning Sentiment Analysis Techniques
1.2 Process Involved in Sentiment Analysis
1.3 Sentiment Analysis Categories
1.4 Machine Learning Algorithms in Sentiment Analysis
1.5 Two Types of Supervised Learning Algorithm
2 Literature Review
2.1 Multiclass Supervised Learning Algorithms
3 Performance Metrics
4 Discussion
5 Conclusion
References
The Emotional Job-Stress of COVID-19 on Nurses Working in Isolation Centres: A Machine Learning Approach
1 Introduction
1.1 Machine Learning
2 Methodology
2.1 Data Collection
2.2 Pre-processing
2.3 Text Prediction (Classification)
2.4 Support Vector Machine
3 Results
3.1 Demography of Participants
3.2 Emotional Impact of COVID-19 on Frontline Nurses
4 Discussion
4.1 Performance of the Various Classifiers
4.2 Limitations
5 Conclusion
6 Recommended Future Studies
References
An Investigative Approach on the Prediction of Isocitrate Dehydrogenase (IDH1) Mutations and Co-deletion of 1p19q in Glioma Brain Tumors
1 Introduction
2 Literature Survey
2.1 Method/Algorithm to Exploration of Paper Related to Survey
3 Data Set
4 System Architecture
5 Methodology
5.1 Implementation Details
6 Result
7 Conclusion
References
Towards a French Virtual Assistant for COVID-19 Case Psychological Assistance Based on NLP
1 Introduction
2 Study Background
2.1 Natural Language Processing (NLP)
2.2 Speech Recognition
2.3 Related Works
3 Envisaged Approach
3.1 Message Understanding Component (MUC)
3.2 Retrieval and Storage Data Component (RSDC)
4 Implementation
5 Conclusion and Future Works
References
CoCoSL: Agricultural Solutions Using Image Processing for Coconut Farms in Sri Lanka
1 Introduction
2 Literature Review
3 Methodology
3.1 Image Pre-processing
3.2 Components
4 Evaluation
4.1 Data Set
4.2 Training and Performance
5 Conclusion
References
CIWPR: A Strategic Framework for Collective Intelligence Encompassment for Web Page Recommendation
1 Introduction
2 Related Works
3 Proposed System Architecture
4 Implementation, Results and Performance Evaluation
5 Conclusions
References
A Bioinspired Scheduling Strategy for Dense Wireless Networks Under the SINR Model
1 Introduction
2 The SINR Model and Scheduling
3 The Proposed Bioinspired Scheduling Strategy
4 Experimental Evaluation
5 Conclusion
References
Implementation Analysis for the Applications of Warehouse Model Under Linear Integer Problems
1 Introduction
2 Design and Methodology
2.1 Linear Programming Technique
2.2 Analytical Hierarchical Process
2.3 Rough Analytic Hierarchy Process
3 Framework Implementation
3.1 Implementation of Linear Programming Technique
3.2 Implementation of the AHP Model
3.3 Implementation of the Rough AHP Model
3.4 Implementation of the Rough TOPSIS Model
4 Result of Linear Programming Technique
5 Conclusion
References
Inventiveness of Text Extraction with Inspiration of Cloud Computing and ML Using Python Logic
1 Introduction
2 Discussion
2.1 Web Structure
2.2 The Social Web
2.3 The Internet of Things (IoT), Internet of Everything (IoE), and Concept of Anywhere-on-Earth (AoE)
3 Business Intelligence and Big Data Analytics
4 Data Science, Machine Learning and Web Science Approach
5 Big Data Analytics
6 Data Structures for Code Optimization and Computational Analysis
7 Use of NLTK for Text Analysis Operations
7.1 Proposed Method for Optimizing Text Extraction Process Through Python for Data Science
8 Conclusion
References
Trade Management System Using R3 Corda Blockchain
1 Introduction
1.1 Public vs Private Blockchain
2 Literature Survey
2.1 Related Work
3 R3 Corda - Preliminaries
3.1 States
3.2 Contracts
3.3 Flows
3.4 Notaries
4 Proposed Architecture for Trade Management Using R3 Corda
4.1 States and Its Working in Trade Management
4.2 Contracts and Its Working in Trade Management
4.3 Flow and Its Working in Trade Management
5 Implementation and Result Analysis
5.1 Case Study
6 Conclusion
References
An Improved GAN-Based Method for Low Resolution Face Recognition
1 Introduction
2 Proposed Method for Low Resolution Face Recognition (LRFR)
2.1 Off-line Phase
2.2 Inference Phase
3 Experimental Study
3.1 Dataset Description
3.2 Experimental Protocol
3.3 Evaluation Metrics
3.4 Experimental Results
4 Conclusion
References
The Bibliometric Global Overview of COVID-19 Vaccination
1 Introduction
2 Material and Method
2.1 Applied Bibliometric Methods in COVID-19 Vaccination
3 Results
4 Discussion
5 Conclusion
5.1 Limitations of the Study
5.2 Future Studies
References
Continuous Authentication of Tablet Users Using Raw Swipe Characteristics: Tree Based Approaches
1 Tablets
1.1 Introduction
1.2 Some Applications
1.3 Touchscreens
2 Security in Tablets
2.1 Primary Security Measures in Tablets
2.2 Secondary Security Measures for Tablets
2.3 Touch Based CA: Its Significance and Desirable Qualities
3 TCA Using Raw Swipe Vectors
3.1 Swipes
3.2 Swipe Vectors
3.3 Continuous Authentication Based on Raw Swipe Vectors
4 Related Works
5 This Work
6 Dataset Characteristics
7 The Framework
7.1 Random Forests (RF)
7.2 AdaBoost (ADB)
7.3 Extreme Gradient Boost (XGB)
7.4 Gradient Boosting Classifier (GBC)
7.5 Extra Trees Classifier (ETC)
8 Overall Measures of Performance
9 Results and Discussions
10 Conclusions
References
A Survey on Controllable Abstractive Text Summarization
1 Introduction
2 Related Work
3 Challenges
4 Conclusion
References
Explainable Decision Making Model by Interpreting Classification Algorithms
1 Introduction
2 Proposed Decision-Making Model
2.1 Mathematical Notations
2.2 Probabilistic Label Assignment
2.3 Preference Order Generation
3 Results
4 Conclusion
References
Combining Clustering and Maturity Models to Provide Better Decisions to Elevate Maturity Level
1 Introduction
2 Methods
3 Results and Discussion
4 Conclusion
References
Tourist Trajectory Data Warehouse: Event Time of Interest, Region of Interest and Place of Interest
1 Introduction
2 State of the Art
3 Running Example
4 Trajectory Data Definitions
5 Modeling Tourist Trajectory Data
6 Tourist TrDW Schema Design
7 Experimental Implementation Approach
8 Conclusion
References
Multi-level Image Segmentation Using Kapur Entropy Based Dragonfly Algorithm
1 Introduction
2 Related Work
3 Proposed Work
4 Experimental Setup
5 Data Analysis and Results
5.1 Analysis of images
6 Discussion
7 Conclusion
References
Prediction Analytics of Hyperledger Blockchain Data in B2B Finance Applications
1 Introduction
1.1 Blockchain in Business to Business (B2B) and Business to Customer (B2C)
2 Literature Survey
3 Blockchain Technology in B2B Applications
3.1 Hyperledger Platform
4 Hyperledger Working Mechanism in B2B Business Applications
4.1 Steps Involved in Business Transaction Management Using Hyperledger Fabric
5 Data Analytics and Prediction of Hyperledger Based Blockchain Data for Finance Application
5.1 Sample Analytics for Finance Application
6 Conclusion
References
Effective Connectivity of High-Frequency Oscillations (HFOs) Using Different Source Localization Techniques
1 Introduction
2 Materials and Methods
2.1 Materials
2.2 Methods
3 Results
4 Conclusion and Discussion
References
Enhanced Road Damage Detection for Smart City Surveillance
1 Introduction
2 Related Works
3 Methodology
3.1 Convolutional Neural Network
3.2 Deep Convolutional Neural Network
3.3 LeNET-5
3.4 VGG16
3.5 Inception
3.6 RESNET
4 Results and Discussion
5 Conclusion
6 Future Scope
References
A Deep Learning Based Natural Language Processing Approach for Detecting SQL Injection Attack
1 Introduction
2 Related Work
2.1 SQL Injection Attack
2.2 DL Algorithms
2.3 ML Algorithms
3 Methodology
3.1 Procedures and Optimization
3.2 Algorithms
4 Results and Discussion
4.1 Dataset Description
4.2 Configuration of Detection Parameters
4.3 Feature Set Injection
4.4 Confusion Matrix, Accuracy Visualization, and ToP
4.5 Performance Optimization
4.6 Performance Testing
5 Conclusion
References
ML Classifier Using Multiple Neural Networks Trained by Linear Programming
1 Introduction
2 Literary Survey
2.1 The HSIC Bottleneck [1]
2.2 Online Alternating Minimization with Auxiliary Variables [2]
2.3 Decoupled Neural Interfaces Using Synthetic Gradients [3]
3 Background and Related Work
3.1 Network Structure
3.2 Training Process
3.3 Propagation Process
3.4 Result Correcting Process
4 Prerequisites
4.1 Training Data
4.2 LP Solver
4.3 Tools/Libraries and Programming Language
5 Proposed Model for Classifier Using Multiple Neural Networks Trained by LP
5.1 Proposed Model Architecture
6 Algorithmic Comparison
7 Performance Analysis
7.1 Industry Dataset
7.2 Synthetic Dataset
8 Conclusion
References
Improving the Categorization of Intent of a Chatbot in Portuguese with Data Augmentation Obtained by Reverse Translation
1 Introduction
2 Related Works
3 Theoretical Foundation
4 Method Proposal
5 Experiment and Results
6 Conclusion
References
TIR-GAN: Thermal Images Restoration Using Generative Adversarial Network
1 Introduction
2 Proposed Method
2.1 Generative Sub-network
2.2 Discriminative Sub-network
3 Experimental Study
3.1 Dataset's Description
3.2 Experimental Metrics
3.3 Experimental Results
4 Conclusion
References
A Multi-level Wavelet Decomposition Network for Image Super Resolution
1 Introduction
2 State of the Art
3 Proposed Method
4 Experimental Results
5 Conclusion
References
Multi-modal Knowledge Graph Convolutional Network for Recommendation
1 Introduction
2 Related Work
3 Problem Formulation
4 Method
4.1 MKG Entity Encoder
4.2 MKGCN Layer
4.3 Model Prediction
5 Experiments
5.1 Datasets
5.2 Baselines
5.3 Experimental Setup
5.4 Experimental Results
6 Conclusions
References
Comparing SVM and Random Forest in Patterned Gesture Phase Recognition in Visual Sequences
1 Introduction
2 Theoretical Background
2.1 Gesture Phase Segmentation
2.2 Support Vector Machines
2.3 Random Forest
3 Gesture Phase Segmentation with RF
3.1 Dataset
3.2 Gesture Phase Segmentation
4 Results and Analysis
5 Conclusion
References
Machine Learning for Complex Data Analysis: Overview and a Discussion of a New Reinforcement-Learning Strategy
1 Introduction
2 Big Data Analysis Using Machine Learning
2.1 Big Data Analysis in the Transportation Field
2.2 Big Data Analysis in the Healthcare Field
2.3 Big Data Analysis in the Agriculture Field
3 Synthesis
4 A New Reinforcement Learning Strategy Based on Self-design Agents for Complex Data Analysis: Application on Road Accidents
5 Conclusion and Future Scope
References
Enhanced Zero Suffix Method for Multi-objective Bulk Transportation Problem
1 Introduction
2 Mathematical Formulation of the Problem
3 Algorithm
4 Numerical Problem
5 Result
6 Comparative Study
7 Conclusion
References
Speech Emotion Recognition Using Support Vector Machine and Linear Discriminant Analysis
1 Introduction
2 Literature Review
3 Speech Database
4 Proposed Methodology
5 Feature Selection and Extraction
5.1 Prosodic Features
5.2 Spectral Features
5.3 Temporal Features
5.4 Features Considered
6 Classifier Selections
6.1 Support Vector Machine (SVM)
6.2 Linear Discriminant Analysis
7 Results and Discussion
7.1 Feature Selection
7.2 Training Model
7.3 Performance Comparison
8 Conclusion
References
Human Activity Recognition in a Thermal Spectrum for an Intelligent Video Surveillance Application
1 Introduction
2 Proposed Method
2.1 Off-line Phase: Building the Thermal Activity Model
2.2 Inference Phase: Activity Recognition
3 Experimental Results
3.1 First Series of Experiments
3.2 Second Series of Experiments
4 Conclusion
References
Detecting Urgent Instructor Intervention Need in Learning Forums with a Domain Adaptation
1 Introduction
2 Related Works
3 Algorithms
3.1 Word and Sentence Embeddings
3.2 Multitask Training
3.3 Seed Words and Sentences
4 Results and Discussion
4.1 Data Set
4.2 Intra-domain Analysis
4.3 Cross-domain Analysis
4.4 Domain Adaptation Analysis
5 Conclusion
References
Extracting the Distinctive Features of Opinion Spams Using Logistic Regression
1 Introduction
2 State of the Art
3 Our Approach
3.1 Datasets
3.2 Pre-processing
3.3 Annotation Dictionary
3.4 Discussion
4 Experiments
4.1 Dataset and Extracted Features
4.2 Discussion
5 Conclusion
References
ILP-Based Approach for Cloud IaaS Composition
1 Introduction
2 Problem Statement
3 Related Work
4 System Model
4.1 User Request Model
4.2 Multi-cloud Model and Assumptions
5 Problem Formulation
5.1 Problem Constraints
5.2 Objective Function
6 Performance Evaluation
6.1 Simulation Environment
6.2 Simulation Results
7 Conclusion
References
A Comparative Analysis of Prediction of Brain Stroke Using AIML
1 Introduction
2 Related Survey
3 Methodology Implemented
3.1 Balancing Dataset
3.2 SMOTE Algorithm: Synthetic Minority Oversampling Technique
4 Summary
5 Conclusion and Future Work
References
Predicting Points of Interest with Social Relations and Geographical-Temporal Information
1 Introduction
2 Related Work
3 Proposed Method
3.1 Social Connections
3.2 Users' Preferences
3.3 Temporal Matrix Factorization
3.4 Users' Context Model
4 Experiments
4.1 Dataset
4.2 Evaluation Metrics
4.3 Compared Approaches
5 Discussion and Analysis of Results
6 Conclusion
References
Machine Learning Method for DDoS Detection and Mitigation in a Multi-controller SDN Environment Using Cloud Computing
1 Introduction
2 Related Works
3 Proposed Method
4 Case Study
4.1 Training ML Model in the Cloud
4.2 Model Deployment in the SDN
4.3 Real Time DDoS Detection and Mitigation in SDN
5 Conclusion
References
A Survey of Recent Techniques in Computational Drug Repurposing
1 Introduction
2 Drug Repositioning Strategies
3 Drug Repositioning Methods
3.1 Machine Learning Methods
3.2 Network-Based Methods
3.3 Text Mining Methods
4 Data Sources
5 Challenges and Opportunities
6 Conclusion
References
Face Mask Recognition Based on Two-Stage Detector
1 Introduction
2 Related Work
3 Methodology/Pipeline
3.1 Dataset
3.2 Data Preprocessing
3.3 Data Augmentation
3.4 Classification of Images Using Faster R-CNN
3.5 Building Blocks of Faster R-CNN
4 Implementation
4.1 Performance Metrics
5 Results
6 Future Work
7 Conclusion
References
Energy Efficiency of Python Machine Learning Frameworks
1 Introduction
2 Background
2.1 Machine Learning
2.2 Support Vector Machine
2.3 Deep Learning
2.4 Convolutional Neural Networks
2.5 Recurrent Neural Network
2.6 Overview of Machine Learning Frameworks
3 Related Works
4 Benchmarking Setup
4.1 Evaluation Metrics
4.2 System Setup
5 Methodology Implementations
6 Experimental Comparison and Analysis
7 Conclusion and Future Work
References
Author Index
Recommend Papers

Intelligent Systems Design and Applications: 22nd International Conference on Intelligent Systems Design and Applications (ISDA 2022) Held December ... (Lecture Notes in Networks and Systems, 715)
 3031355067, 9783031355066

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Lecture Notes in Networks and Systems 715

Ajith Abraham · Sabri Pllana · Gabriella Casalino · Kun Ma · Anu Bajaj   Editors

Intelligent Systems Design and Applications 22nd International Conference on Intelligent Systems Design and Applications (ISDA 2022) Held December 12–14, 2022 - Volume 2

Lecture Notes in Networks and Systems

715

Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland

Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas—UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Türkiye Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong

The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science. For proposals from Asia please contact Aninda Bose ([email protected]).

Ajith Abraham · Sabri Pllana · Gabriella Casalino · Kun Ma · Anu Bajaj Editors

Intelligent Systems Design and Applications 22nd International Conference on Intelligent Systems Design and Applications (ISDA 2022) Held December 12–14, 2022 - Volume 2

Editors Ajith Abraham Faculty of Computing and Data Science FLAME University Pune, Maharashtra, India Machine Intelligence Research Labs Scientific Network for Innovation and Research Excellence Auburn, WA, USA

Sabri Pllana Center for Smart Computing Continuum Burgenland, Austria Kun Ma University of Jinan Jinan, Shandong, China

Gabriella Casalino University of Bari Bari, Italy Anu Bajaj Department of Computer Science and Engineering Thapar Institute of Engineering and Technology Patiala, Punjab, India

ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-3-031-35506-6 ISBN 978-3-031-35507-3 (eBook) https://doi.org/10.1007/978-3-031-35507-3 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

Welcome to the 22nd International Conference on Intelligent Systems Design and Applications (ISDA’22) held in the World Wide Web. ISDA’22 is hosted and sponsored by the Machine Intelligence Research Labs (MIR Labs), USA. ISDA’22 brings together researchers, engineers, developers and practitioners from academia and industry working in all interdisciplinary areas of computational intelligence and system engineering to share their experience and to exchange and cross-fertilize their ideas. The aim of ISDA’22 is to serve as a forum for the dissemination of state-of-the-art research, development and implementations of intelligent systems, intelligent technologies and useful applications in these two fields. ISDA’22 received submissions from 65 countries, each paper was reviewed by at least five or more reviewers, and based on the outcome of the review process, 223 papers were accepted for inclusion in the conference proceedings (38% acceptance rate). First, we would like to thank all the authors for submitting their papers to the conference, for their presentations and discussions during the conference. Our thanks go to the program committee members and reviewers, who carried out the most difficult work by carefully evaluating the submitted papers. Our special thanks to the following plenary speakers, for their exciting talks: • • • • • • • • • •

Kaisa Miettinen, University of Jyvaskyla, Finland Joanna Kolodziej, NASK- National Research Institute, Poland Katherine Malan, University of South Africa, South Africa Maki Sakamoto, The University of Electro-Communications, Japan Catarina Silva, University of Coimbra, Portugal Kaspar Riesen, University of Bern, Switzerland Mário Antunes, Polytechnic Institute of Leiria, Portugal Yifei Pu, College of Computer Science, Sichuan University, China Patrik Christen, FHNW, Institute for Information Systems, Olten, Switzerland Patricia Melin, Tijuana Institute of Technology, Mexico

We express our sincere thanks to the organizing committee chairs for helping us to formulate a rich technical program. Enjoy reading the articles!

ISDA 2022—Organization

General Chairs Ajith Abraham Andries Engelbrecht

Machine Intelligence Research Labs, USA Stellenbosch University, South Africa

Program Chairs Yukio Ohsawa Sabri Pllana Antonio J. Tallón-Ballesteros

The University of Tokyo, Japan Center for Smart Computing Continuum, Forschung Burgenland, Austria University of Huelva, Spain

Publication Chairs Niketa Gandhi Kun Ma

Machine Intelligence Research Labs, USA University of Jinan, China

Special Session Chair Gabriella Casalino

University of Bari, Italy

Publicity Chairs Pooja Manghirmalani Mishra Anu Bajaj

University of Mumbai, India Machine Intelligence Research Labs, USA

Publicity Team Members Peeyush Singhal Aswathy SU Shreya Biswas

SIT Pune, India Jyothi Engineering College, India Jadavpur University, India

viii

ISDA 2022—Organization

International Program Committee Abdelkrim Haqiq Alexey Kornaev Alfonso Guarino Alpana Srk Alzira Mota Amit Kumar Mishra Andre Santos Andrei Novikov Anitha N. Anu Bajaj Arjun R. Arun B Mathews Aswathy S U Ayalew Habtie Celia Khelfa Christian Veenhuis Devi Priya Rangasamy Dhakshayani J. Dipanwita Thakur Domenico Santoro Elena Kornaeva Elif Cesur Elizabeth Goldbarg Emiliano del Gobbo Fabio Scotti Fariba Goodarzian Gabriella Casalino Geno Peter Gianluca Zaza Giuseppe Coviello Habib Dhahri Habiba Drias Hiteshwar Kumar Azad Horst Treiblmaier Houcemeddine Turki Hudson Geovane de Medeiros

FST, Hassan 1st University, Settat, Morocco Innopolis University, Russia University of Foggia, Italy Jawaharlal Nehru University, India Polytechnic of Porto, School of Engineering, Portugal DIT University, India Institute of Engineering, Polytechnic Institute of Porto, Portugal Sobolev Institute of Mathematics, Russia Kongu Engineering College, India Thapar Institute of Engineering and Technology, India Vellore Institute of Technology, India MTHSS Pathanamthitta, India Marian Engineering College, India Addis Ababa University, Ethiopia USTHB, Algeria Technische Universität Berlin, Germany Kongu Engineering College, Tamil Nadu, India National Institute of Technology Puducherry, India Banasthali University, Rajasthan, India University of Bari, Italy Orel State University, Russia Istanbul Medeniyet University, Turkey Federal University of Rio Grande do Norte, Brazil University of Foggia, Italy Universita’ degli Studi di Milano, Italy University of Seville, Spain University of Bari, Italy University of Technology Sarawak, Malaysia University of Bari, Italy Polytechnic of Bari, Italy King Saud University, Saudi Arabia USTHB, Algeria Vellore Institute of Technology, India Modul University, Austria University of Sfax, Tunisia Federal University of Rio Grande do Norte, Brazil

ISDA 2022—Organization

Isabel S. Jesus Islame Felipe da Costa Fernandes Ivo Pereira Joêmia Leilane Gomes de Medeiros José Everardo Bessa Maia Justin Gopinath A. Kavita Gautam Kingsley Okoye Lijo V. P. Mahendra Kanojia Maheswar R. Marìa Loranca Maria Nicoletti Mariella Farella Matheus Menezes Meera Ramadas Mohan Kumar Mrutyunjaya Panda Muhammet Ra¸sit Cesur Naila Aziza Houacine Niha Kamal Basha Oscar Castillo Paulo Henrique Asconavieta da Silva Pooja Manghirmalani Mishra Pradeep Das Ramesh K. Rasi D. Reeta Devi Riya Sil Rohit Anand Rutuparna Panda S. Amutha Sabri Pllana Sachin Bhosale

ix

Institute of Engineering of Porto, Portugal Federal University of Bahia (UFBA), Brazil University Fernando Pessoa, Portugal Universidade Federal e Rural do Semi-Árido, Brazil State University of Ceará, Brazil Vellore Institute of Technology, India University of Mumbai, India Tecnologico de Monterrey, Mexico Vellore Institute of Technology, India Sheth L.U.J. and Sir M.V. College, India KPR Institute of Engineering and Technology, India UNAM, BUAP, Mexico UNAM, BUAP, Mexico University of Palermo, Italy Universidade Federal e Rural do Semi-Árido, Brazil University College of Bahrain, Bahrain Sri Krishna College of Engineering and Technology, India Utkal University, India Istanbul Medeniyet University, Turkey USTHB-LRIA, Algeria Vellore Institute of Technology, India Tijuana Institute of Technology, México Instituto Federal de Educação, Ciência e Tecnologia Sul-rio-grandense, Brazil Machine Intelligence Research Labs, India National Institute of Technology Rourkela, India Hindustan Institute of Technology and Science, India Sri Krishna College of Engineering and Technology, India Kurukshetra University, India Adamas University, India DSEU, G.B. Pant Okhla-1 Campus, New Delhi, India VSS University of Technology, India Vellore Institute of Technology, India Center for Smart Computing Continuum, Forschung Burgenland, Austria University of Mumbai, India

x

ISDA 2022—Organization

Saira Varghese Sam Goundar Sasikala R Sebastian Basterrech Senthilkumar Mohan Shweta Paliwal Sidemar Fideles Cezario Sílvia M. D. M. Maia Sindhu P. M. Sreeja M U Sreela Sreedhar Surendiran B. Suresh S. Sweeti Sah Thatiana C. N. Souza Thiago Soares Marques Thomas Hanne Thurai Pandian M. Tzung-Pei Hong Vigneshkumar Chellappa Vijaya G Wen-Yang Lin Widad Belkadi Yilun Shang Zuzana Strukova

Toc H Institute of Science & Technology, India RMIT University, Vietnam Vinayaka Mission’s Kirupananda Variyar Engineering College, India VSB-Technical University of Ostrava, Czech Republic Vellore Institute of Technology, India DIT University, India Federal University of Rio Grande do Norte, Brazil Federal University of Rio Grande do Norte, Brazil Nagindas Khandwala College, India Cochin University of Science and Technology, India APJ Abdul Kalam Technological University, India NIT Puducherry, India KPR Institute of Engineering and Technology, India National Institute of Technology Puducherry, India Federal Rural University of the Semi-Arid, Brazil Federal University of Rio Grande do Norte, Brazil University of Applied Sciences and Arts Northwestern Switzerland, Switzerland Vellore Institute of Technology, India National University of Kaohsiung, Taiwan Indian Institute of Technology Guwahati, India Sri Krishna College of Engineering and Technology, India National University of Kaohsiung, Taiwan Laboratory of Research in Artificial Intelligence, Algeria Northumbria University, UK Technical University of Košice, Slovakia

Contents

Towards Automatic Forecasting: Evaluation of Time-Series Forecasting Models for Chickenpox Cases Estimation in Hungary . . . . . . . . . . . . . . . . . . . . . . . Wadie Skaf, Arzu Tosayeva, and Dániel T. Várkonyi

1

Parameters Optimization in Hydraulically Driven Machines Using Swarm Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zhanjun Tan, Qasim Khadim, Aki Mikkola, and Xiao-Zhi Gao

11

A Machine Learning Framework for Cereal Yield Forecasting Using Heterogeneous Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Noureddine Jarray, Ali Ben Abbes, and Imed Riadh Farah

21

New Approach in LPR Systems Using Deep Learning to Classify Mercosur License Plates with Perspective Adjustment . . . . . . . . . . . . . . . . . . . . . . Luís Fabrício de F. Souza, José Jerovane da Costa Nascimento, Cyro M. G. Sabóia, Adriell G. Marques, Guilherme Freire Brilhante, Lucas de Oliveira Santos, Paulo A. L. Rego, and Pedro Pedrosa Rebouças Filho Ensemble of Classifiers for Multilabel Clinical Text Categorization in Portuguese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Orrana Lhaynher Veloso Sousa, David Pereira da Silva, Victor Eulalio Sousa Campelo, Romuere Rodrigues Veloso e Silva, and Deborah Maria Vieira Magalhães Semantic Segmentation Using MSRF-NET for Ultrasound Breast Cancer . . . . . . Hamza Hadri, Abderahhim Fail, and Mohamed Sadik

31

42

52

A Hybrid Image Steganography Method Based on Spectral and Spatial Domain with High Hiding Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D. Kumar and V. K. Sudha

63

PU Matrix Completion Based Multi-label Classification with Missing Labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zhidong Huang, Peipei Li, and Xuegang Hu

71

SP2P-MAKA: Smart Contract Based Secure P2P Mutual Authentication Key Agreement Protocol for Intelligent Energy System . . . . . . . . . . . . . . . . . . . . . Pooja Verma and Daya Sagar Gupta

83

xii

Contents

Automated Transformation of IoT Systems Models into Event-B Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Abdessamad Saidi, Mohamed Hadj Kacem, Imen Tounsi, and Ahmed Hadj Kacem

93

Prediction of Business Process Execution Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Walid Ben Fradj and Mohamed Turki Pre-processing and Pre-trained Word Embedding Techniques for Arabic Machine Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Mohamed Zouidine, Mohammed Khalil, and Abdelhamid Ibn El Farouk A Framework for Automated Abstraction Class Detection for Event Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 Chiao-Yun Li, Sebastiaan J. van Zelst, and Wil M. P. van der Aalst Using Clinical Data and Deep Features in Renal Pathologies Classification . . . . 137 Laiara Silva, Vinícius Machado, Rodrigo Veras, Keylla Aita, Semiramis do Monte, Nayze Aldeman, and Justino Santos An MDA Approach for Extending Functional Dimension for Sensitive Business Processes Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Zohra Alyani and Mohamed Turki Double Deep Reinforcement Learning Techniques for Low Dimensional Sensing Mapless Navigation of Terrestrial Mobile Robots . . . . . . . . . . . . . . . . . . . 156 Linda Dotto de Moraes, Victor Augusto Kich, Alisson Henrique Kolling, Jair Augusto Bottega, Raul Steinmetz, Emerson Cassiano da Silva, Ricardo Grando, Anselmo Rafael Cuckla, and Daniel Fernando Tello Gamarra Review on Sentiment Analysis Using Supervised Machine Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 C. Nalini, B. Dharani, Tamilarasu Baskar, and R. Shanthakumari The Emotional Job-Stress of COVID-19 on Nurses Working in Isolation Centres: A Machine Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 Richard Osei Agjei, Sunday Adewale Olaleye, Frank Adusei-Mensah, and Oluwafemi Samson Balogun An Investigative Approach on the Prediction of Isocitrate Dehydrogenase (IDH1) Mutations and Co-deletion of 1p19q in Glioma Brain Tumors . . . . . . . . . 188 Disha Sushant Wankhede and Chetan J. Shelke

Contents

xiii

Towards a French Virtual Assistant for COVID-19 Case Psychological Assistance Based on NLP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 Nourchene Ouerhani, Ahmed Maalel, and Henda Ben Ghézala CoCoSL: Agricultural Solutions Using Image Processing for Coconut Farms in Sri Lanka . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 Uthayakumaran Uthayarakavan, Kamalanathan Thakshayini, Sivakumar Krushi Yadushika, Ananthasiri Gowthaman, and Rrubaa Panchendrarajan CIWPR: A Strategic Framework for Collective Intelligence Encompassment for Web Page Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 H. S. Manoj Kumar, Gerard Deepak, and A. Santhanavijayan A Bioinspired Scheduling Strategy for Dense Wireless Networks Under the SINR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 Vinicius Fulber-Garcia, Fábio Engel, and Elias P. Duarte Jr. Implementation Analysis for the Applications of Warehouse Model Under Linear Integer Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 Chhavi Gupta, Vipin Kumar, and Kamal Kumar Gola Inventiveness of Text Extraction with Inspiration of Cloud Computing and ML Using Python Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 Rajeev Tripathi and Santosh Kumar Dwivedi Trade Management System Using R3 Corda Blockchain . . . . . . . . . . . . . . . . . . . . 257 K. Anitha Kumari, S. Sangeetha, V. Rajeevan, M. Deva Dharshini, and T. Haritha An Improved GAN-Based Method for Low Resolution Face Recognition . . . . . . 276 Sahar Dammak, Hazar Mliki, Emna Fendri, and Amal Selmi The Bibliometric Global Overview of COVID-19 Vaccination . . . . . . . . . . . . . . . 287 Richard Osei Agjei, Frank Adusei-Mensah, Oluwafemi Samson Balogun, and Sunday Adewale Olaleye Continuous Authentication of Tablet Users Using Raw Swipe Characteristics: Tree Based Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 Rupanka Bhuyan and S. Pradeep Kumar Kenny A Survey on Controllable Abstractive Text Summarization . . . . . . . . . . . . . . . . . . 311 Madhuri P. Karnik and D. V. Kodavade

xiv

Contents

Explainable Decision Making Model by Interpreting Classification Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 Ramisetty Kavya, Shatakshi Gupta, Jabez Christopher, and Subhrakanta Panda Combining Clustering and Maturity Models to Provide Better Decisions to Elevate Maturity Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 Luciano Azevedo de Souza, Mary de Paula Ferreira, and Helder Gomes Costa Tourist Trajectory Data Warehouse: Event Time of Interest, Region of Interest and Place of Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 Intissar Hilali, Nouha Arfaoui, and Ridha Ejbali Multi-level Image Segmentation Using Kapur Entropy Based Dragonfly Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 Shreya Biswas, Anu Bajaj, and Ajith Abraham Prediction Analytics of Hyperledger Blockchain Data in B2B Finance Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 D. Jeya Mala and A. Pradeep Reynold Effective Connectivity of High-Frequency Oscillations (HFOs) Using Different Source Localization Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377 Thouraya Guesmi, Abir Hadriche, and Nawel Jmail Enhanced Road Damage Detection for Smart City Surveillance . . . . . . . . . . . . . . 385 Yuvaraj Natarajan, Sri Preethaa Kr, Gitanjali Wadhwa, Mathivathani Natarajan, and Lekshmipriya Saravanan A Deep Learning Based Natural Language Processing Approach for Detecting SQL Injection Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396 Yuvaraj Natarajan, B. Karthikeyan, Gitanjali Wadhwa, S. A. Srinivasan, and A. S. Parthiv Akilesh ML Classifier Using Multiple Neural Networks Trained by Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 Yuvraj Talukdar and Padmavathi Improving the Categorization of Intent of a Chatbot in Portuguese with Data Augmentation Obtained by Reverse Translation . . . . . . . . . . . . . . . . . . . 418 Jéferson do Nascimento Soares and José Everardo Bessa Maia

Contents

xv

TIR-GAN: Thermal Images Restoration Using Generative Adversarial Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428 Fatma Bouhlel, Hazar Mliki, Rayen Lagha, and Mohamed Hammami A Multi-level Wavelet Decomposition Network for Image Super Resolution . . . 438 Nesrine Chaibi and Mourad Zaied Multi-modal Knowledge Graph Convolutional Network for Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445 Dan Ding and Yongli Wang Comparing SVM and Random Forest in Patterned Gesture Phase Recognition in Visual Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455 Thayanne França da Silva and José Everardo Bessa Maia Machine Learning for Complex Data Analysis: Overview and a Discussion of a New Reinforcement-Learning Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465 Karima Gouasmia, Wafa Mefteh, and Faiez Gargouri Enhanced Zero Suffix Method for Multi-objective Bulk Transportation Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473 Shivani, Sudhir Kumar Chauhan, Renu Tuli, and Nidhi Sindwani Speech Emotion Recognition Using Support Vector Machine and Linear Discriminant Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482 J. Indra, R. Kiruba Shankar, and R. Devi Priya Human Activity Recognition in a Thermal Spectrum for an Intelligent Video Surveillance Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493 Nourane Kallel, Hazar Mliki, Ahmed Amine Ghorbel, and Achraf Bouketteya Detecting Urgent Instructor Intervention Need in Learning Forums with a Domain Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502 Antonio Leandro Martins Candido and José Everardo Bessa Maia Extracting the Distinctive Features of Opinion Spams Using Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513 Nissrine Bensouda, Sanaa El Fkihi, and Rdouan Faizi ILP-Based Approach for Cloud IaaS Composition . . . . . . . . . . . . . . . . . . . . . . . . . . 523 Driss Riane and Ahmed Ettalbi A Comparative Analysis of Prediction of Brain Stroke Using AIML . . . . . . . . . . 533 K. RamyaSree and P. MohanKumar

xvi

Contents

Predicting Points of Interest with Social Relations and Geographical-Temporal Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545 Simin Bakhshmand, Bahram Sadeghi Bigham, and Mahdi Bohlouli Machine Learning Method for DDoS Detection and Mitigation in a Multi-controller SDN Environment Using Cloud Computing . . . . . . . . . . . . . 555 Ameni Chetouane, Kamel Karoui, and Ghayth Nemri A Survey of Recent Techniques in Computational Drug Repurposing . . . . . . . . . 565 A. S. Aruna, K. R. Remesh Babu, and K. Deepthi Face Mask Recognition Based on Two-Stage Detector . . . . . . . . . . . . . . . . . . . . . . 576 Hewan Shrestha, Swati Megha, Subham Chakraborty, Manuel Mazzara, and Iouri Kotorov Energy Efficiency of Python Machine Learning Frameworks . . . . . . . . . . . . . . . . . 586 Salwa Ajel, Francisco Ribeiro, Ridha Ejbali, and João Saraiva Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597

Towards Automatic Forecasting: Evaluation of Time-Series Forecasting Models for Chickenpox Cases Estimation in Hungary Wadie Skaf(B) , Arzu Tosayeva, and D´ aniel T. V´ arkonyi Telekom Innovation Laboratories, Data Science and Engineering Department (DSED), Faculty of Informatics, E¨ otv¨ os Lor´ and University, P´ azm´ any P´eter stny. 1/A, Budapest 1117, Hungary {skaf,n0ndni,varkonyid}@inf.elte.hu

Abstract. Time-Series Forecasting is a powerful data modeling discipline that analyzes historical observations to predict future values of a time-series. It has been utilized in numerous applications, including but not limited to economics, meteorology, and health. In this paper, we use time-series forecasting techniques to model and predict the future incidence of chickenpox. To achieve this, we implement and simulate multiple models and data preprocessing techniques on a Hungary-collected dataset. We demonstrate that the LSTM model outperforms all other models in the vast majority of the experiments in terms of county-level forecasting, whereas the SARIMAX model performs best at the national level. We also demonstrate that the performance of the traditional data preprocessing method is inferior to that of the data preprocessing method that we have proposed.

1

Introduction

Varicella Zoster Virus (VZV) is a member of the herpes virus family with doublestranded DNA [2]. This virus causes varicella (chickenpox), a highly contagious pediatric disease often contracted between the ages of 2 and 8 [2]. Chickenpox is generally a mild disease, although it can develop problems that necessitate hospitalization [3,11] and, in rare cases, be fatal [20]. Despite the fact that chickenpox is an extremely contagious disease in which over 90% of unvaccinated persons become infected [4], and despite the availability of vaccinations [20], Hungary has no explicit prescription for chickenpox vaccination in its national immunization policy [20]. Given this, and the fact that the reported cases throughout these years form a time-series of values, studies can be conducted to predict the number of future cases in the country, allowing the health system and necessary medications to be prepared. In this paper, we examine the various models of Time-Series Forecasting and perform model evaluation for the chickenpox cases forecasting use case in order to choose the model that achieves the best results c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023  A. Abraham et al. (Eds.): ISDA 2022, LNNS 715, pp. 1–10, 2023. https://doi.org/10.1007/978-3-031-35507-3_1

2

W. Skaf et al.

at the county- and country-level. Our primary contributions are as follows: (1) Conducting a comprehensive exploratory data analysis on this relatively new dataset in order to identify underlying patterns. (2) Examining the relationship between chickenpox cases and other variables such as the population. (3) Conducting comprehensive experiments on multiple time-series models and selecting the model that produces the best results for each county and at the national level. The paper is structured as follows. First, we list and discuss related work, then we formalize and discuss the issue of time-series forecasting, after that, we do exploratory data analysis (EDA) in which we explore the dataset and list its key characteristics. Finally, we detail the experimental setup before reporting and summarizing our major findings.

2

Related Work

Time-Series Forecasting research dates back to 1985 [10], and since then, it has been a constantly expanding research area, especially in the past decade [1], due to the expansion of data volumes arising from users, industries, and markets, as well as the centrality of forecasting in various applications, such as economic, weather, stock price, business development, and health. As a result, numerous forecasting models have been developed, including ARIMA [19], SARIMA [19], ARIMAX [7], SARIMAX [7], N-BEATS [18], DeepAR [24], Long Short-Term Memory Neural Network (LSTM) [15], Gated Recurrent Unit Neural Networks (GRU) [29], and Temporal Fusion Transformer (TFT) [14]. These models and others have been utilized in a wide range of use cases, including but not limited to the following: energy and fuels [9,17,21] where accurate estimates are required to improve power system planning and operation, Finance [8,26,27], Environment [6,16,30], Industry [13,22,28], and Health [5,12,25]. In this paper, we contribute to the usage of time-series forecasting in the Health domain by predicting the number of cases of chickenpox in Hungary.

3

Time-Series Forecasting Problem Definition

Time-series forecasting problem can be formalized as follows: given a univariate time-series, which represents a sequence of values X = (x1 , x2 , . . . , xt ) forecasting is the process of predicting the value of future observations of a time-series (xt+1 , xt+2 , . . . , xt+h ) based on historical data, where xi ∈ R (i ∈ [1, . . . , t + h]) is the value of X at time i, t is the length of X, and h is the forecasting horizon.

4

Exploratory Data Analysis (EDA)

The dataset used in this paper was made available by Rozemberczki, B. et al [23]. This dataset consists of county-level time series depicting the weekly number of chickenpox cases reported by general practitioners in Hungary from 2005 to 2015, subdivided into 20 vertices: Budapest, Pest, Borsod, Hajdu, Gyor,

Time-Series Forecasting Models Evaluation for Chickenpox Cases

3

Jasz, Veszprem, Bacs, Baranya, Fejer, Csongrad, Szabolcs, Heves, Bekes, Somogy, Komarom, Vas, Nograd, Torna, and Zala. The main charactericts of the data are as following: 1. As can be seen in Fig. 2, the city of Budapest, Hungary’s capital, has the highest average number of reported cases each week by a significant margin compared to other counties. This is primarily due to the difference in population, and consequently, if we calculate the average number of reported cases as a percentage of the population, we can deduce that Veszprem has the highest ratio. 2. In Fig. 3, seasonality is evident, with the greatest number of cases occurring during the winter months and the smallest number occurring between the summer and fall seasons. This is also apparent by decomposing the countrylevel time series (Fig. 1). 3. A downward trend can be noticed in the data (Fig. 1).

Fig. 1. Country-level Time-Series Data Decomposition

5 5.1

Experimental Setup Data Splitting

In all of our experiments, we split the data so that 80 percent of the data is used for training and 20 percent is used for testing; accordingly, the last 20 percent of each series’ values are used for testing. 5.2

Data Normalization

We experimented two methods of data normalization:

4

W. Skaf et al.

Fig. 2. The average number of weekly reported cases per county

1. Method 1: We performed traditional data normalization so data would be within the range [−1, 1]. 2. Method 2: We performed normalization by converting each data sample to a percentage of the population at the time when cases were reported as shown in Eq. 1 xc,d × 100 (1) xc,d = pc,d where sc,d denotes the reported cases in county c on date d, and pc,d denotes the population of county c on date d. 5.3

Models

We conducted experiments on the following models: ARIMA [19], SARIMA [19], SARIMAX [7], N-BEATS [18], DeepAR [24], Long Short-Term Memory Neural Network (LSTM) [15], Gated Recurrent Unit Neural Networks (GRU) [29], and Temporal Fusion Transformer (TFT) [14]. Throughout the experiments, each model was trained for 200 epochs using the Adam optimizer and a learning rate of α = 0.01. 5.4

Evaluation Metrics

To evaluate the performance of the models, we calculated the Root Mean Square Error (RMSE) using the equation:   t 1  [ˆ xi − xi ]2 (2) RM SE =  t i=1 where x ˆi denotes the ith predicted value and xi denotes the ith original (observed) value.

Time-Series Forecasting Models Evaluation for Chickenpox Cases

5

Fig. 3. Chickenpox weekly cases in Hungary reported between 2005 and 2015 broken by counties

6

Benchmarking Results

The results of the experiments are summarized in Table 1, which contains a collection of the results for each model, separated according to the county and the normalization methods. The column labeled “loss 1” refers to method 1, and the column labeled “loss 2” refers to method 2, both of which are described in Sect. 5.2. As can be seen in Table 1, regarding forecasting of the individual county, the LSTM model performs better than any of the other models in a vast majority of the counties: Budapest, Bekes, Heves, Szabolcs, Veszprem, Baranya, Borsod, Jasz, Pest, Tolna, Zala, Bacs, Csongrad, Hajdu, Komarom, Somogy, and Vas with the RMSE loss values of 0.03, 0.03, 0.04, 0.04, 0.05, 0.03, 0.05, 0.03, 0.04, 0.05, 0.04, 0.04, 0.04, 0.03, 0.05, 0.03, 0.06 respectively and the GRU Model performed better in Fejer, Nograd, and Gyor with RMSE loss values of 0.04, 0.02, and 0.03, respectively, whereas when it came to forecasting on the national level, SARIMAX achieved the best results when with an RMSE loss value of 0.02. The main reason the LSTM model outperformed other models in most cases was due to its ability to do long-term memorization more than the other models, and as can be seen in Fig. 3, the vast majority of the series does not have a consistent pattern and would benefit from this long-term memorization; this also explains why SARIMAX performed better on the country-level series (Figs. 3 and 1), where all the series were summed, resulting in a more consistent pattern that does not rely heavily on long-term memorization ability. In addition, the normalization approach that we have proposed outperformed the traditional normalization approach (Method 1) in each and every (model, county) pair experiment, achieving a significant improvement in terms of the RMSE loss value. By calculating the loss value improvement after applying

6

W. Skaf et al.

Method 2 in comparison to Method 1 for the best performing model for each county (as highlighted in Table 1), we can see that the SARIMAX model for the country-level forecasting has the highest gain with a 77.78% improvement, while the LSTM model for Vas county has the lowest gain with a 14.29% improvement, and the average improvement across all models is 51.39% (Fig. 4).

Fig. 4. Improvements in RMSE Loss values after applying the Normalization Method 2 in comparison to Normalization Method 1

7

Conclusion

In this paper, we presented, discussed, and highlighted the results of a series of experiments that we conducted out on the chickenpox cases dataset. The purpose of these experiments was to evaluate time-series forecasting models for use in predicting the number of chickenpox cases in Hungary at the county and national levels. We demonstrated that the LSTM model performed better than other models for the majority of county-level forecasting except in the cases of Fejer, Nograd, and Gyor counties, while the SARIMAX model produced the most accurate results at the country-level. In addition, we proposed a custom data preprocessing method for this dataset by dividing the proportion of cases by the population size, and demonstrated that this method outperformed conventional normalization in terms of achieving lower RMSE Loss values.

Time-Series Forecasting Models Evaluation for Chickenpox Cases

7

Table 1. Benchmarking Results County

Loss1 Loss 2 County

Model

Loss 1 Loss 2

Budapest Sarimax Arima Sarima LSTM GRU N-BEATS DeepAR TFT

Model

Loss1 Loss2 County 0.11 0.35 0.24 0.09 0.13 0.30 0.17 0.34

0.04 0.21 0.14 0.03 0.07 0.20 0.09 0.11

Baranya Sarimax Arima Sarima LSTM GRU N-BEATS DeepAR TFT

Model

0.12 0.33 0.28 0.08 0.12 0.22 0.15 0.23

0.04 0.18 0.15 0.03 0.06 0.05 0.05 0.09

Bacs

Sarimax Arima Sarima LSTM GRU N-BEATS DeepAR TFT

0.12 0.35 0.22 0.09 0.12 0.30 0.14 0.22

0.07 0.21 0.12 0.04 0.08 0.05 0.08 0.12

Bekes

Sarimax Arima Sarima LSTM GRU N-BEATS DeepAR TFT

0.13 0.35 0.19 0.07 0.10 0.25 0.16 0.22

0.05 0.20 0.11 0.03 0.08 0.04 0.09 0.06

Borsod

Sarimax Arima Sarima LSTM GRU N-BEATS DeepAR TFT

0.12 0.27 0.15 0.08 0.09 0.20 0.18 0.32

0.08 0.13 0.11 0.05 0.06 0.07 0.08 0.08

Csongrad

Sarimax Arima Sarima LSTM GRU N-BEATS Deep-AR TFT

0.13 0.34 0.19 0.09 0.09 0.32 0.14 0.25

0.07 0.21 0.11 0.04 0.07 0.08 0.07 0.09

Fejer

Sarimax Arima Sarima LSTM GRU N-BEATS DeepAR TFT

0.11 0.35 0.24 0.09 0.07 0.23 0.15 0.25

0.06 0.21 0.14 0.06 0.04 0.06 0.08 0.12

Gyor

Sarimax Arima Sarima LSTM GRU N-BEATS DeepAR TFT

0.12 0.33 0.28 0.07 0.07 0.30 0.18 0.26

0.05 0.18 0.15 0.04 0.03 0.04 0.07 0.09

Hajdu

Sarimax Arima Sarima LSTM GRU N-BEATS DeepAR TFT

0.12 0.34 0.19 0.08 0.12 0.23 0.15 0.24

0.07 0.21 0.11 0.03 0.08 0.06 0.07 0.12

Heves

Sarimax Arima Sarima LSTM GRU N-BEATS DeepAR TFT

0.12 0.36 0.26 0.09 0.12 0.11 0.13 0.22

0.05 0.18 0.12 0.04 0.07 0.05 0.06 0.11

Jasz

Sarimax Arima Sarima LSTM GRU N-BEATS DeepAR TFT

0.10 0.35 0.24 0.08 0.09 0.22 0.14 0.33

0.04 0.22 0.11 0.03 0.05 0.06 0.07 0.13

Komarom

Sarimax Arima Sarima LSTM GRU N-BEATS DeepAR TFT

0.11 0.33 0.21 0.07 0.11 0.33 0.16 0.34

0.06 0.13 0.12 0.05 0.08 0.07 0.07 0.09

Nograd

Sarimax Arima Sarima LSTM GRU N-BEATS DeepAR TFT

0.12 0.33 0.28 0.08 0.09 0.24 0.14 0.24

0.06 0.18 0.15 0.03 0.02 0.04 0.07 0.11

Pest

Sarimax Arima Sarima LSTM GRU N-BEATS DeepAR TFT

0.13 0.33 0.21 0.08 0.11 0.33 0.14 0.23

0.05 0.13 0.12 0.04 0.06 0.05 0.07 0.11

Somogy

Sarimax Arima Sarima LSTM GRU N-BEATS DeepAR TFT

0.12 0.36 0.26 0.06 0.12 0.32 0.13 0.26

0.04 0.18 0.12 0.03 0.07 0.06 0.06 0.12

Szabolcs

Sarimax Arima Sarima LSTM GRU N-BEATS DeepAR TFT

0.14 0.38 0.26 0.08 0.09 0.23 0.15 0.33

0.08 0.22 0.13 0.04 0.07 0.06 0.09 0.09

Tolna

Sarimax Arima Sarima LSTM GRU N-BEATS DeepAR TFT

0.11 0.34 0.23 0.08 0.11 0.33 0.19 0.33

0.06 0.21 0.11 0.05 0.07 0.06 0.08 0.12

Vas

Sarimax Arima Sarima LSTM GRU N-BEATS DeepAR TFT

0.12 0.33 0.21 0.07 0.12 0.23 0.18 0.22

0.07 0.13 0.12 0.06 0.08 0.07 0.08 0.09

Veszprem Sarimax Arima Sarima LSTM GRU N-BEATS DeepAR TFT

0.12 0.35 0.22 0.08 0.12 0.22 0.16 0.34

0.06 0.21 0.12 0.05 0.07 0.06 0.07 0.12

Zala

Sarimax Arima Sarima LSTM GRU N-BEATS DeepAR TFT

0.11 0.33 0.35 0.09 0.13 0.20 0.15 0.30

0.07 0.23 0.21 0.04 0.08 0.05 0.07 0.11

Country Level Sarimax Arima Sarima LSTM GRU N-BEATS DeepAR TFT

0.09 0.31 0.25 0.09 0.08 0.23 0.19 0.33

0.02 0.11 0.14 0.07 0.06 0.03 0.08 0.12

8

W. Skaf et al.

References 1. Alsharef, A., Aggarwal, K., Kumar, M., Mishra, A.: Review of ML and AutoML solutions to forecast time-series data. Arch. Comput. Methods Eng. (2022). https://doi.org/10.1007/s11831-022-09765-0, ISSN 1886-1784 2. Arvin, A.M.: Varicella-zoster virus. Clin. Microbiol. Rev. 9(3), 361381 (1996) 3. Bonanni, P., Breuer, J., Gershon, A., Gershon, M., Hryniewicz, W., Papaevangelou, V., Rentier, B., R¨ umke, H., Sadzot-Delvaux, C., Senterre, J., et al.: Varicella vaccination in europetaking the practical approach. BMC Med. 7(1), 112 (2009) 4. Breuer, J., Fifer, H.: Chickenpox, April 2011. https://www.ncbi.nlm.nih.gov/pmc/ articles/PMC3275319/ 5. Bui, C., Pham, N., Vo, A., Tran, A., Nguyen, A., Le, T.: Time series forecasting for healthcare diagnosis and prognostics with the focus on cardiovascular diseases. In: Vo Van, T., Nguyen Le, T., Nguyen Duc, T. (eds.) 6th International Conference on the Development of Biomedical Engineering in Vietnam (BME6). BME 2017. IFMBE Proceedings, vol. 63, pp. 809–818. Springer, Singapore (2018). https://doi. org/10.1007/978-981-10-4361-1 138, ISBN 978-981-10-4361-1 6. Chen, J., Zeng, G.Q., Zhou, W., Du, W., Lu, K.D.: Wind speed forecasting using nonlinear-learning ensemble of deep learning time series prediction and extremal optimization. Ener. Conv. Manag. 165, 681–695 (2018). https://doi.org/ 10.1016/j.enconman.2018.03.098, https://www.sciencedirect.com/science/article/ pii/S0196890418303261, ISSN 0196-8904 7. Cools, M., Moons, E., Wets, G.: Investigating the variability in daily traffic counts using ARIMAX and SARIMA(X) models: assessing the impact of holidays on two divergent site locations (2008) 8. Dingli, A., Fournier, K.S.: Financial time series forecasting - a deep learning approach. Int. J. Mach. Learn. Comput. 7, 118–122 (2017) 9. Feng, Q., Qian, S.: Research on power load forecasting model of economic development zone based on neural network. Ener. Rep. 7, 1447–1452 (2021). https://doi.org/10.1016/j.egyr.2021.09.098, https://www.sciencedirect. com/science/article/pii/S2352484721009045, ISSN 2352-4847. 2021 International Conference on Energy Engineering and Power Systems 10. De Gooijer, J.G., Hyndman, R.J.: 25 years of IIF time series forecasting: a selective review. Econ. eJ. (2005) 11. Helmuth, I.G., Poulsen, A., Suppli, C.H., Mølbak, K.: Varicella in Europe-a review of the epidemiology and experience with vaccination. Vaccine 33(21), 2406–2413 (2015) 12. Hoppe, E., et al.: Deep learning for magnetic resonance fingerprinting: a new approach for predicting quantitative parameter values from time series. Stud. Health Technol. Inform. 243, 202–206 (2017) 13. Huang, X., Zanni-Merk, C., Cr´emilleux, B.: Enhancing deep learning with semantics: an application to manufacturing time series analysis. Procedia Comput. Sci. 159, 437–446 (2019). https://doi.org/10.1016/j.procs.2019.09.198, https:// www.sciencedirect.com/science/article/pii/S1877050919313808, ISSN 1877-0509. Knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 23rd International Conference KES 2019 ¨ Loeff, N., Pfister, T.: Temporal fusion transformers for inter14. Lim, B., Arık, S.O., pretable multi-horizon time series forecasting. Int. J. Forecast. 37(4), 1748–1764 (2021)

Time-Series Forecasting Models Evaluation for Chickenpox Cases

9

15. Lindemann, B., M¨ uller, T., Vietz, H., Jazdi, N., Weyrich, M.: A survey on long short-term memory networks for time series prediction. Procedia CIRP 99, 650–655 (2021). https://doi.org/10.1016/j.procir.2021.03.088, https://www. sciencedirect.com/science/article/pii/S2212827121003796, ISSN 2212-8271. 14th CIRP Conference on Intelligent Computation in Manufacturing Engineering, 15–17 July 2020 16. Liu, H., Tian, H.Q., Liang, X.F., Li, Y.F.: Wind speed forecasting approach using secondary decomposition algorithm and ELMAN neural networks. Appl. Energy 157, 183–194 (2015). https://doi.org/10.1016/j.apenergy.2015.08.014, https:// www.sciencedirect.com/science/article/pii/S0306261915009393, ISSN 0306-2619 17. Muzaffar, S., Afshari, A.: Short-term load forecasts using LSTM networks. Energy Procedia 158, 2922–2927 (2019). https://doi.org/10.1016/j.egypro.2019.01. 952, https://www.sciencedirect.com/science/article/pii/S1876610219310008, ISSN 1876-6102. Innovative Solutions for Energy Transitions 18. Oreshkin, B.N., Carpov, D., Chapados, N., Bengio, Y.: N-BEATS: neural basis expansion analysis for interpretable time series forecasting. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020. OpenReview.net (2020). https://openreview.net/ forum?id=r1ecqn4YwB 19. Box George, E.P., Jenkins, G.M.: Time Series Analysis Forescasting and Control. Holden-Day (1976) 20. Public health guidance on varicella vaccination in the Eropean union, February 2015. https://www.ecdc.europa.eu/en/publications-data/public-health-guidancevaricella-vaccination-european-union 21. Qian, K., Wang, X., Yuan, Y.: Research on regional short-term power load forecasting model and case analysis. Processes 9(9) (2021). https://doi.org/10.3390/ pr9091617, https://www.mdpi.com/2227-9717/9/9/1617, ISSN 2227-9717 22. Rashid, K.M., Louis, J.: Times-series data augmentation and deep learning for construction equipment activity recognition. Adv. Eng. Inform. 42, 100944 (2019) 23. Rozemberczki, B., Scherer, P., Kiss, O., Sarkar, R., Ferenci, T.: Chickenpox cases in Hungary: a benchmark dataset for spatiotemporal signal processing with graph neural networks. arXiv preprint arXiv:2102.08100 (2021) 24. Salinas, D., Flunkert, V., Gasthaus, J., Januschowski, T.: DeepAR: probabilistic forecasting with autoregressive recurrent networks. Int. J. Forecast. 36(3), 1181–1191 (2020). https://doi.org/10.1016/j.ijforecast.2019.07.001, https://www. sciencedirect.com/science/article/pii/S0169207019301888, ISSN 0169-2070 25. Sarafrazi, S., et al.: Cracking the “sepsis” code: asessing time series nature of EHR data, and using deep learning for early sepsis prediction. In: 2019 Computing in Cardiology (CinC), pp. 1–4 (2019) 26. Sezer, O.B., Gudelek, M.U., Ozbayoglu, A.M.: Financial time series forecasting with deep learning: a systematic literature review: 2005–2019. arXiv, abs/1911.13288 (2020) 27. Shahi, T.B., Shrestha, A., Neupane, A., Guo, W.: Stock price forecasting with deep learning: a comparative study. Mathematics 8(9) (2020). https://doi.org/10.3390/ math8091441, https://www.mdpi.com/2227-7390/8/9/1441, ISSN 2227-7390

10

W. Skaf et al.

28. Wang, Y., Zhang, D., Liu, Y., Dai, B., Lee, L.H.: Enhancing transportation systems via deep learning: a survey. Transp. Res. Part C: Emerg. Technol. 99, 144– 163 (2019). https://doi.org/10.1016/j.trc.2018.12.004, https://www.sciencedirect. com/science/article/pii/S0968090X18304108, ISSN 0968-090X 29. Zhang, X., Shen, F., Zhao, J., Yang, G.: Time series forecasting using GRU neural network with multi-lag after decomposition. In: ICONIP (2017) 30. Zhang, Y., Pan, G.: A hybrid prediction model for forecasting wind energy resources. Environ. Sci. Poll. Res. 27(16), 19428–19446 (2020). https://doi.org/ 10.1007/s11356-020-08452-6

Parameters Optimization in Hydraulically Driven Machines Using Swarm Intelligence Zhanjun Tan1(B) , Qasim Khadim2 , Aki Mikkola3 , and Xiao-Zhi Gao1 1

3

School of Computing, University of Eastern Finland, Kuopio Campus, Kuopio, Finland [email protected], [email protected] 2 Laboratory of Machine Design, University of Oulu, Oulu, Finland [email protected] Laboratory of Machine Design, LUT University, Lappeenranta, Finland [email protected]

Abstract. Finding the unknown parameters of hydraulically driven machines such that the simulation model behaves similar to the realworld can be a cumbersome task. This could be due to number of factors in the working cycles of a physical system. This research work presents the optimization of parameters in the hydraulically driven machines using the swarm intelligence technique. To demonstrate the application, the characteristic curve of pressure compensated directional control valve, the lifting mass and hydraulic parameters of system in a mass lifting hydraulic cylinder are optimized with the Firefly algorithm. The performance of algorithm is analyzed using the root mean square error in the states of system and cost function. Application of this case study results in the efficient optimization of unknown parameters in the hydraulically driven machines and digital twin applications.

1

Introduction

A hydraulic machine usually has a number of parameters, which reflect its state and working performance [1]. The information of such hydraulic parameters can reflect the condition and faults of a machine [1–3]. It is worth to mention that unknown parameters are affected by wear and other factors in the working cycles [1]. However, the information of these parameters can be used in the digital management of product processes [4] and digital-twin applications [5] during the working cycles. Despite the above advantages, these parameters are usually difficult to measure because of economical and sensor measurements implementation difficulties. In certain situations, the unknown parameters can only be determined in general manner from the product’s catalogues, which do not imply the current state and working performance of an individual. Examples

c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023  A. Abraham et al. (Eds.): ISDA 2022, LNNS 715, pp. 11–20, 2023. https://doi.org/10.1007/978-3-031-35507-3_2

12

Z. Tan et al.

of such unknown parameters are the characteristics curve of a directional control valve [1] and the frictional parameters [6] in the hydraulically driven machines. In general, parameter optimization or estimation provide the essential tools to determine the parameters of the simulation models [7,8]. As reported in [7,9], the online or offline simulation models of the system can be applied to determine unknown parameters. The most common parameter estimation methods include orthogonal least squares [10], weighted least squares [11], Kalman filtering [12, 13], and robust techniques (e.g., regression diagnostics and clustering [14]). Kalman filters had been utilised for parameter estimation applications in a wide range of engineering applications [14]. However, parameter estimation approaches, through Kalman filters [13], are challenging to use, since the unknown parameters of a real system may be the function of many other system variables, and can be expressed only via complicated unknown non-linear variations [1]. This is contradictory to the simulation experiments [15]. In general, parameters are treated as constants in the simulation models. The functions and derivatives defining the unknown parameters are not always available in practice, causing difficulties in the estimation of the parameters. Furthermore, Kalman filters [13] require the variances of unknown parameters [1], which makes the parameter estimation process even more complex. An alternative solution to the above problems is in accordance with the optimization of parameters in a physical system in the post-processing phase. Optimization methods can be generally classified as combinatorial optimization and continuous optimization [16]. Combinatorial optimization includes searching for maxima or minima of parameters based on an objective function in a discrete domain [17]. Continuous optimization [18] implies the search for maxima or minima of parameters in the continuous domain of a population. Another successful application with Firefly Algorithm (FA), a swarm intelligence algorithm in the continuous domain [16,18], in hydraulically driven system enhanced the confidence for resolving our problem [19]. The comparison of FA with Accelerated Particle Swarm Optimization, Simulated Annealing, Harmony Search, and Differential Evolution has been explored in [20]. The advantages are: (1) more efficient than the comparison algorithms; (2) suitable for nonlinear and multimodal optimization problems; (3) stronger dynamic characteristics; (4) can avoid any drawbacks linked to velocities. Therefore, FA can be used to effectively and efficiently cope with demanding optimization problems. The intention of this study is to illustrate a swarm intelligence-based technique for optimizing the parameters in the hydraulically driven machines. To this end, a metaheuristic algorithm: FA is used [8]. As an example, the characteristic curve of pressure-compensated directional control valve, lift mass, and the hydraulic parameters are optimized during the working cycle of a mass lifting hydraulic cylinder. The optimization is illustrated using the simulation models of the system. The FA enables the parameters of the optimization model to follow the real system despite the above differences.

Parameters Optimization in Hydraulically Driven Machines

2

13

Applying Swarm Intelligence Technique to Optimize Hydraulic Parameters

Section 2 describes the swarm intelligence technique and modeling of hydraulics. 2.1

Firefly Algorithm

In the tropical regions, it is fairly easy to observe amazing flashing light of fireflies in the summer sky. Actually, most of the fireflies can provide brief and rhythmical flashes. A particular species usually displays the unique pattern of flashes. The generation of gleaming light is caused by bioluminescence process, and the real functions of this indicating system still need more exploration. Notwithstanding, two rudimentary functions of flashes are communications, as of attracting mating partners and potential prey. According to the flashing rate and time duration in rhythmic flash, they consist of parts of the indicating system, which attract both sexes at the same time. Moreover, the light absorbed by air becomes weak when the distance increases. The aforementioned two factors make fireflies visible within a few hundred meters during the night, which provide an effective communication channel for fireflies. In general, the characteristics of FA are depicted as follows: • All unisex fireflies are fascinated by other fireflies regardless of genders; • There is a positive correlation between attractiveness and brightness. For two flashing fireflies, they follow the rules of less bright one moves towards the brighter one. The brightness and attractiveness is inversely proportional to the distance. When it moves randomly, there is no brighter one than a specific one; • The firefly brightness is decided via the landscape of the objective function. Using aforementioned rules, the pseudo codes and the flowchart shown in Fig. 1 (a), (b) and (c) summaries the fundamental steps of FA [16]. For intelligibility, the light intensity I(r) follows the inverse square law, I(r) = Is /r2 ,

I = I0 e−γr ,

2

I(r) = I0 e−γr ,

2

β = β0 e−γr .

(1)

where Is denotes source intensity, γ means light absorption coefficient, I0 represents original light intensity. To keep away from the singularity at r = 0, Eq. (1)a and Eq. (1)b can be expressed in the Gaussian form as Eq. (1)c, Since a firefly’s attractiveness and light intensity is in positive correlation, the attractiveness β can be determined by Eq. (1)d, where β0 stands for the attractiveness at r = 0. The Cartesian distance depicts the distance between any two fireflies i and j at xi and xj and can be described as Eq. (2)a,

14

Z. Tan et al.

Fig. 1. Flowchart of Firefly algorithm and case example plus algorithm pseudo codes.

  d  rij = || xi − xj || =  (xi,k − xj,k )2 ,

2

xi = xi + β0 e−γrij (xj − xi ) + αi .

k=1

(2) where xi,k is the kth component of the spatial coordinate xi of ith firefly. The firefly i moves towards the brighter one j can be determined by Eq. (2)b, where the attraction is produced by the second term. The randomization is depicted via α and i in the third term, where α is the randomization parameter, and i is a vector of random numbers pull out from a uniform distribution or a Gaussian distribution. 2.2

Modelling the Hydraulics

Figure 2 shows a hydraulic cylinder. The force Fh introduced by a hydraulic can be written as, (3) Fh = p1 A1 − p2 A2 − Fμ , where p1 , p2 , A1 and A2 are the pressures and areas on the piston and piston-rod side, respectively. The friction force Fμ can be calculated according to [6]. The hydraulic circuit attached to a hydraulic cylinder can be modelled by employing the lumped fluid theory [21] and semi-empirical method [21]. A hydraulic volume is described using the lumped fluid theory [21] in the form of equally distributed pressure, the effective bulk modulus and the hydraulic volume. The flow rate Qd can be modelled with the semi-empirical method [21] as,  (4) Qd = Cv U sgn(Δp) | Δp |,

Parameters Optimization in Hydraulically Driven Machines

 where Cv = Cd Av

2 ρ

15

is the semi-empirical flow rate coefficient, Cd is the dis-

charge coefficient, Av area of flow port and ρ is the oil density. In Eq. (4), U is the relative position of the spool, sgn (·) is the signum function, and Δp is the pressure difference.

3

Developing Hydraulic Parameters Optimization Algorithm

This section details the application of FA in framework of hydraulics. 3.1

Optimization Algorithm

The goal is to set the real system states as the reference for the optimization model. Real system parameters are computed using the sum of differences between the state vectors of real system and optimization model. The objective function f (x) is   k  (5) f (x) =  (xt − xt )2 r

o

t=1

where xtr and xto are the state vectors of the real system and optimization model at a time step t, respectively, and k is the number of time samples. The holistic structure of our optimization system is depicted in Fig. 1 (b). 3.2

Mass Lifting Cylinder

The FA is implemented on a mass lifting cylinder, shown in Fig. 2, to demonstrate the optimization of parameters. It contains hydraulic volumes V1 and V2 , the pressures p1 and p2 , a 4/3 directional control valve, a constant pressure source pp , and a tank pT . The derivative of pressures p1 and p2 can be computed as, p˙1 =

Be1 (Qd1 − Aa s), ˙ V1

p˙2 =

Be2 (Ab s˙ − Qd2 ), V2

(6)

where Be1 and Be2 are the effective bulk modulus, Qd1 and Qd2 are the flow rates of directional control valve. The parameters to model the system are taken from the reference studies [1,5]. The hydraulic volumes are calculated as V1 = Vh1 + A1 s and V2 = Vh2 + A2 (l − s). Here, Vh1 and Vh2 are the hydraulic volumes of corresponding hoses, s is the actuator position, and l is the length of cylinder. Actuator acceleration is calculated as s¨ = Fh /m. The differential equations of the actuator velocity, accelerations and the pressures are integrated using Runge-Kutta method of order 4.

16

Z. Tan et al.

Table 1. Properties and initial conditions of the real system and optimization model. s0 , s˙ 0 , p10 and p20 represent the initial position, velocity and pressures. m, Cd , kp and k0 are the mass and hydraulic parameters to be optimized. Symbol

Real system

Optimization model 0.53 m

s0

0.26 m

s˙ 0

0 m/s

0 m/s

p 10

4.15 MPa

8.48 MPa

p 20

5.60 MPa

12 MPa

m

250 kg

200 kg

Cd

1.8

1.0

kp

1600 MPa

1000 MPa

k0

5

10

Fig. 2. A double acting cylinder lifting the mass.

3.3

Real-System and Optimization Model

To apply the FA, two simulation models of hydraulic system shown in Fig. (2) named as the real system and optimization model are modelled. A real system represents an accurate simulation model in which the states and parameters are unknown. Note that these parameters are difficult to acquire in numerous applications. In practise, a real system may not be modelled accurately, and it can contain some errors in the force model. Thus, errors are introduced the optimization model in terms of force model. Table 1 gives the errors in the initial conditions and parameters of the optimization model with respect to the real system. In Table 1, kp and k0 represent pressure flow coefficient and flow gain, respectively [1]. The FA uses a state vector x of the optimization model in the optimization process. The state vector x T   in case of Fig. 2 is x = s s˙ p1 p2 m kp k0 Cv . Here, Cv = Cva Cvb Cvc Cvd are the semi-empiric flow rate coefficients at the respective ports of the pressurecompensated directional control valve. The parameter Cv demonstrates the non-linear nature of characteristic curve of the pressure-compensated directional control valve. For the constant semiempiric flow rate coefficient, Cv is replaced by the parameter Cd . In case of non-linear characteristic curve, the curve-fitting method is used to describe the opening of ports in the pressure-compensated directional control valve. For this, a B-spline curve is constructed with the knot vector u for non-uniform open

Parameters Optimization in Hydraulically Driven Machines

splines [22] as, C(u) =

n 

B i,d (u)Ni ,

17

(7)

i=0

where n is the number of control points, d is the degree, B i,d (u) are the dth order of B-spline basis functions, and Ni is the control point vector. As an example, in case of non-linear characteristic curve, Ni can

be written in terms of U and Qd1 Umin U1 ... Un Umax at port 1 as N = . Here, Umin , U1 , and Un represent Qdmin Qd1 ... Qdn Qdmax spool positions, and Qdmin , Qd1 , and Qdmax are the flow rates at the port 1 of the hydraulic valve. B i,d (u) can be written using the Cox–de Boor recursion formula [22].

4

Results and Discussions

The Matlab implementation is carried out by Intel(R) Core(TM) i7-6700 CPU @ 3.40 GHz, 16 GB RAM, Windows 10 Education 64-bit. Table 2(a) describes the average affects of different α values of FA with the population of 10, and the value of parameter α is set to be [0,1]. Considering of the computation cost and optimal result exploration, 10 fireflies were selected for the size of the swarm population. The performance of FA is validated by using the states and parameters of the weighted optimization model against the real system. Both systems are actuated using the step input signal U . The resulting equations of motion are integrated at the time step of 1 ms for 1 s.

Table 2. Experiment results (a) Exploration of population 10 applied in FA. Alpha(α)

0

(b) Percentage RMSE.

0.5

Parameters

1

Time interval(s) 0.2-0.4 0.6-0.8 0.2-0.4 0.6-0.8 0.2-0.4 0.6-0.8 Cost(1e+07) Elapsed time(s)

4.1

6.101

6.137

51.21

6.193

6.366

51.52

6.192

6.367

51.41

Symbol 0.2-0.4s 0.6-0.8s

Position

s

2.79 %

0.06 %

Velocity



0.46 %

0.46 %

Pressure

p1

0.15 %

0.07 %

Pressure

p2

0.12 %

0.07 %

Lift mass

m

0.20 %

0.20 %

Valve parameter

Cd

0.46 %

0.46 %

Hydraulic parameter

kp

0.00 %

0.00 %

Hydraulic parameter

k0

3.32 %

3.32 %

Optimizing the Lift Mass, Hydraulic Parameters and Characteristic Curve

Figure 3 represent the optimization of system states s, s, ˙ p1 and p2 and the parameters m, kp , k0 and Cd . The dashed red-coloured line indicates the optimization model, whereas black color line describes the real system (ground truth). It is clearly visible that the dashed red-coloured lines follow the real system, despite the modelling errors and differences in the initial conditions.

18

Z. Tan et al.

The states and parameters start to converge to that of the real system in the first 0.05 s. Fluctuations at the start of simulation are, nevertheless, due to the random initialization of FA. The percentage Root Mean Square Errors (RMSE) in the states of optimization model with respect to the real system are given in Table 2(b) during the working cycles. The RMSE for the unknown parameters m, Cd , and kp remains below 0.50%. In case of k0 , RMSE is 3.32% in 0.2–0.4 s and 0.6–0.8 s. The RMSE in the states of system also stays below 0.50% in Table 2(b). This demonstrates that the unknown parameters of real system are accurately optimized using FA in the working cycles. Moreover, the correct optimization of parameters requires the percentage RMSE in the system states to be below 0.50%.

Fig. 3. Optimization of states and parameters using Firefly Algorithm in the system.

The FA is based on the objective function presented in Eq. (5) to minimize the cost value. The curve in Fig. 4a starts with a very sharp increasing and really fast converges to a stable value. The cost value represents the minimum value of objective function in 0.2–0.4 s and 0.6–0.8 s of working cycle. We can discover that the cost value of the objective function in 0.6–0.8 s is slightly higher than 0.2–0.4 s, which is related to the higher pressures p1 and p2 in 0.6–0.8 s of working cycle. Figure 4b represents the optimization of characteristic curve of pressurecompensated valves [23] using the FA. This valve is used in controlling the working cycle of forestry machine [24] and many other heavy machines. Finding the correct values of Cv to model the dynamics of valve [23] is cumbersome. This study proposes FA to optimize the correct value of Cv for the simulation testing in different applications. For instance, as can be seen in D module of valve [23], only the minimum point cmin and the maximum point cmax can be found. However, in practise, these flow rates are often unclear in a working cycles. To T  demonstrate this practical case, a control vector Na = cmin c1 c2 c3 c4 cmax

Parameters Optimization in Hydraulically Driven Machines

19

Fig. 4. Cost value of the case model and optimization of characteristic curve.

is selected as input in the optimization model. Note c1 , c2 , c3 , and c4 represent random points. Figure 4b demonstrates that FA has correctly optimized the characteristic curve of pressure-compensated valve.

5

Conclusion

This paper studies the optimization of parameters in the hydraulically driven machines using firefly algorithm, which is implemented to optimize the characteristic curve of a pressure-compensated valve, the lifting mass and hydraulic parameters in mass lifting hydraulic cylinder. The system dynamics is modelled using the lumped fluid theory and semi-empirical method. Two simulation versions of the system: the real system and optimization model are used in our work. Applying the optimization algorithm on the above problem results in determining characteristic curve of a pressure-compensated valve, the lifting mass and hydraulic parameters in the system. The use of optimization algorithms can be challenging, since it is rather difficult to choose the their parameters for a given system. Moreover, the implementation of optimization algorithms in a physical system with actual measurements would require the information of the unknown parameters of the real product. However, the accurate modelling and simulation of an actual physical system is always non-trivial. This study offers a method to synchronize the physics based simulation models with the real-world products by optimizing the unknown parameters. In the future, the follow up research can extend to the optimization of parameters in a an actual product using the physics based simulation model. This will enable an effective use of the physics based simulation models in the digital twin applications.

References 1. Khadim, Q., Kiani-Oshtorjani, M., et al.: Estimating the characteristic curve of a directional control valve in a combined multibody and hydraulic system using an augmented discrete extended Kalman filter. Sensors 21(15), 5029 (2021)

20

Z. Tan et al.

2. Son, J., Zhou, S., et al.: Remaining useful life prediction based on noisy condition monitoring signals using constrained Kalman filter. Reliab. Eng. Syst. Saf. 152, 38–50 (2016) 3. Beebe, R.S., Beebe, R.S.: Predictive Maintenance of Pumps Using Condition Monitoring. Elsevier, Amsterdam (2004) 4. Ukko, J., Saunila, M., et al.: Real-Time Simulation for Sustainable Production: Enhancing User Experience and Creating Business Value. Routledge, London (2021) 5. Khadim, Q., Hagh, Y.S., et al.: State estimation in a hydraulically actuated log crane using unscented Kalman filter. IEEE Access 10, 62863–62878 (2022) 6. Andersson, S., S¨ oderberg, A., Bj¨ orklund, S.: Friction models for sliding dry, boundary and mixed lubricated contacts. Tribol. Int. 40(4), 580–587 (2007) 7. Beck, J.V., Arnold, K.J.: Parameter Estimation in Engineering and Science. James Beck, London (1977) 8. Yang, X.-S.: Nature-Inspired Metaheuristic Algorithms. 2nd edn. Luniver Press, Bristol (2010) 9. Deuerlein, J., Piller, O., et al.: Parameterization of offline and online hydraulic simulation models. Procedia Eng. 119, 545–553 (2015) 10. Billings, S.A., Jones, G.N.: Orthogonal least-squares parameter estimation algorithms for non-linear stochastic systems. Int. J. Syst. Sci. 23(7), 1019–1032 (1992) 11. Asparouhov, T., Muth´en, B.: Weighted least squares estimation with missing data. Mplus Tech. Appendix 1–10, 2010 (2010) 12. Haykin, S.: Kalman Filtering and Neural networks, vol. 47. Wiley, Hoboken (2004) 13. Simon, D.: Optimal State Estimation: Kalman, H Infinity, and Nonlinear Approaches. Wiley, Hoboken (2006) 14. Zhang, Z.: Parameter estimation techniques: a tutorial with application to conic fitting. Image Vis. Comput. 15(1), 59–76 (1997) 15. Khadim, Q., Kaikko, E.-P., et al.: Targeting the user experience in the development of mobile machinery using real-time multibody simulation. Adv. Mech. Eng. 12(6), 1687814020923176 (2020) 16. Reddy, M.J., Kumar, D.N.: Evolutionary algorithms, swarm intelligence methods, and their applications in water resources engineering: a state-of-the-art review. H2Open J. 3(1), 135–188 (2020) 17. Cook, W., Lov´ asz, L., et al.: Combinatorial Optimization: Papers from the DIMACS Special Year, vol. 20. American Mathematical Society (1995) 18. Andriyenko, A., Schindler, K., et al.: Discrete-continuous optimization for multitarget tracking. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1926–1933. IEEE (2012) 19. Nedic, N., Stojanovic, V., et al.: Optimal control of hydraulically driven parallel robot platform based on firefly algorithm. Nonlinear Dyn. 07 (2015) 20. Yang, X.-S., He, X.: Why the firefly algorithm works? arXiv, abs/1806.01632 (2018) 21. Watton, J.: Fluid Power Systems: Modeling, Simulation, Analog, and Microcomputer Control. Prentice Hall, Hoboken (1989) 22. De Boor, C.: On calculating with B-splines. J. Approx. Theory 6(1), 50–62 (1972) 23. Danfoss: Actuator position sensor (2022). Danfoss Valves 24. KESLA Oy: Hydraulically driven machines (2022). KESLA Oy

A Machine Learning Framework for Cereal Yield Forecasting Using Heterogeneous Data Noureddine Jarray1,2(B) , Ali Ben Abbes1 , and Imed Riadh Farah1 1

National School of Computer Science, Mannouba, Tunisia [email protected] 2 Institut des R´egions Arides (IRA), Medenine, Tunisia

Abstract. The combination between Machine learning (ML) and heterogeneous data offers an opportunity for agronomists in crop yield forecasting for decision support systems. ML has recently emerged as a method to support Sustainable Development Goals (SDGs) such as the SDG-2 that which aims to achieve food security. This paper presents a modularized, robust process, data-driven and ML based framework, designed to help the decision makers within the yield forecasting to accurately achieve food security. Four ML models, including eXtreme Gradient Boost (XGBoost), Random Forest (RF), Linear Regressor (LR), and Support Vector Regression (SVR) were employed to yield forecasting. Experiments were carried out on twenty provinces in Tunisia from 2002 to 2018. The results obtained showed that XGBoost slightly outperformed other ML techniques. The results of model validation, obtained from the XGBoost model showed that the Pearson correlation r, RootMean-Square Error (RMSE) and the Mean Absolute Error (MAE) values were 0.97, 208.492, and 105.910, respectively. This paper showed the best results and it can be used to address national food security challenges. Keywords: Machine learning · heterogeneous data · forecasting cereal yield · Sustainable Development Goals (SDGs)

1

·

Introduction

In recent decades, the advent of publicly-available heterogeneous data and the appearance of artificial intelligence (AI) played a main important role in the achievement of the SDGs [1]. They are supported by several agricultural challenges [2–6]. The objective of the National SDG-2 in to provide needy people with food to people in need [7]. The food security can be guaranteed by crop yield forecasting [8]. Wheat and barley are essential food crops. In addition, crop yield prediction is involved due to the complex relationships between crop growth and environmental variables. It is hard to analysis the crop growth process and several pertinent variables (e.g. varieties, soil management, etc.). c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023  A. Abraham et al. (Eds.): ISDA 2022, LNNS 715, pp. 21–30, 2023. https://doi.org/10.1007/978-3-031-35507-3_3

22

N. Jarray et al.

Recently, Machine Learning (ML) techniques, such as Linear regression (LR), Random Forest (RF), eXtrme gradient Boosting (XGBoost) and Support Vector Regressor (SVR), have been applied to forecast cereal yield. Their main advantage is that they treat the output of cereal yield as an implied function of the input variables [9]. As there exists a correlation between cereal yield, remote sensing and climatic data, ML has demonstrated its powerful performance in yield forecasting. Several studies use utilized either climate data or satellite data, or combined them to develop ML methods for crop yield forecasting. For instance, [10] examined the relationship between Multiple Linear Regression (MLR) and Artificial Neural Network(ANN) and introduced MLR-ANN model of crop yield prediction. They used the MLR to initialize ANN’s weights and biases in the input layer. The experiments were conducted in the province of Tamilnadu in India for 30 years (from 1983 to 2016). [11] applied different ML algorithms (MLR, Supprt Vector Machine (SVM), RF and XGboost) to predict cereal yield at the municipality level in Morocco by using remote sensing data (Normalized Difference Vegetation Index (NDVI), Vegetation Condition Index (VCI), Vegetation Health Index (VHI), and soil moisture) and climatic data (precipitation and temperature) as predictors. They forecast the cereal yield using ML techniques and they achieve good performance R2 = 0.88 and an RMSE of around 0.22 t. ha−1 . [12] applied a methodology to assess and optimize food quality and safety initiatives in the food sector. They presented a detailed review for food security applications. [13] presented an approach combined the Convolutional Neural Network (CNN) and long short-term memory (LSTM) to yield forecasting. The Surface Temperature (LST), Surface Reflectance (SR) and Land Cover derived from Moderate-Resolution Imaging Spectroradiometer (MODIS) satellite images are used input data to ML model. The ground yield data is used as validated data. Recently, although there are not many works in Tunisia on ML based on of cereal yield forecasting, the concept of a robust framework to forecast cereal yields using heterogeneous data is not yet studied. In this paper, we proposed a modulated, robust and data-driven framework based on ML techniques. This framework uses the available complex and heterogeneous data in order to cereal yield forecasting based on benefiting from the ability of four different ML techniques for accuracy and efficiency. The objective of the present paper is to forecast early the cereal yield for barley, durum and soft wheat. Our contributions are stated below: – Propose a novel framework learning-based method to cereal yield forecasting. – Use four ML techniques and identify which suites best to forecast cereal yield based on heterogeneous data. The rest of the paper is organized as follows. Section 2 focuses on the methodology deployed including data collection. Section 3 shows experimental results and discusses the effects of different features and model structures on cereal crop forecasting. Section 4 concludes the paper.

A Machine Learning Framework for Cereal Yield Forecasting

2

23

Materials and Methodology

In this study, the target variable is cereal yield, while the input variables are the satellite indices, weather data (rainfall and temperature), and soil moisture data. The twenty selected provinces are the most productive. 2.1

Study Area

We focus on cereal cropping regions in Tunisia (Fig. 1). The study area of 163 610 km2 , and with an altitude ranging between −17 m and 1544 m m is a Mediterranean country, located in northwest Africa and, more precisely, in a transition zone between the temperate European region and the desert. It covers 7◦ to 12◦ E, 33◦ to 37◦ N. It is characterized by a Mediterranean climate with sufficient rainfall in the northern regions. The choice of the study area is driven by the diversity of land use. In fact, the cultivated land, urbanized land, undeveloped land, surface covered by water and the forest occupy 32%, 0.5%, 50.5%, 5% and 12%, respectively.

Fig. 1. The 20 provinces of Tunisia.

2.2

The Used Dataset

To carry out this study, we collected the following data from multiple sources: cereal yield data, climate data, and remote sensing data. Several useful vegetation indices, such as NDVI, VCI and VHI, were extracted from remote sensing. Soil and climatic data were used to forecast crop yield. Table 1 shows the

24

N. Jarray et al.

employed datasets and their sources. Table 2 displays the input variables derived from these raw datasets together with the period of the year considered in the model. Remote Sensing Data. The remote sensing data were collected from the Food and Agriculture Organization of the United Nations (FAO) database from https://www.fao.org/giews/earthobservation/country/index.jsp. A) Normalized Difference Vegetation Index The NDVI is the most vastly employed vegetation index. It was purposed to monitor the vegetation cover using multi-spectral data based on spectral reflectance measurements. It was acquired in the visible (red band) and near-infrared, respectively. It was used in several research works to crop yield forecasting [11]. B) Vegetation Condition Index We selected the VCI from the normalized NDVI exceptions. In fact, VCI is linked to vegetation cover. The VCI index is widely used in agricultural crop monitoring [11]. C) Vegetation Health index The VHI is an index composed of sub-indices in order to vegetation cover monitoring. It is calculated from satellite images on both the Land Temperature Surface (LST) and NDVI [11]. D) Soil moisture The soil moisture is an important predictor of the land surface environmental conditions. Soil moisture information has an essential impact in different fields, such as drought forecasting, climate and weather modeling, water resources, agriculture management, and crop production monitoring [11]. Climatic Data. The climatic data, including air temperature at 2 m and rainfall, was extracted from the daily database ERA5. It was collected using the Google Earth Engine platform, which is an online platform that ensures cloud processing and provides analysis-ready data. It hosts the remote sensing and climatic data for this study. The Google Earth Engine allows accessing conveniently to its data and processing functionality. Table 1. Summary of the characteristics of the datasets used for to forecast yield and the yield fields. Category

Variable

Spatial resolution Temporal Resolution

Observed yield

Observed yield

Governorate level

Remote sensing NDVI, VCI, VHI Governorate level

Yearly 10 days

Remote sensing SM

24 km

Daily/Gouvernorate

Climatic data

Rainfall

24 km

Daily/Gouvernorate

Climatic data

Temperature

24 km

Daily/Gouvernorate

A Machine Learning Framework for Cereal Yield Forecasting

25

Table 2. Input variables utilized in the forecasting model. Variable

Period of the Year

NDVI

February–April

VCI

February–April

VHI

February–April

SM

October–November

Rainfall

October–November and January–March

Air temperature December

Cereal Yield Data. We collected the cereal yield (soft wheat, barley, and durum wheat) from the Open Data Portal for agriculture and water resources in Tunisia. http://www.agridata.tn/ from 2002 to 2018. The 20 provinces of Tunisia were selected for as study area. The yearly cereal yield of each province was calculated as the total crop production. The average cereal yield over the study period did not exceed 3000 (1000 qt) depending on the province. 2.3

Methodology

This study was carried out to forecast cereal yield using several ML techniques based on independent climate and agriculture factors to predict the cereal yield. The proposed framework is presented in Fig. 2.

Fig. 2. The proposed framework.

26

N. Jarray et al.

Firstly, the input data were acquired from different sources. Then, data that constitute the output of the introduced model were calibrated to the same scale. Subsequently, the used dataset was split into two parts. One for the training step and for testing and validation, respectively. Afterwards, the four ML methods were trained in order to forecast cereal crop yield. Eventually, the pre-trained models were applied to test, validate and evaluate their performance. 2.4

Machine Learning Methods for Cereal Yield Forecasting

Four ML techniques were used to forecast cereal yield [11]: RF, XGBoost, LR and SVR. A) Random Forest. The RF is a group of trees built with a sample set of training data and their related indicators from the set of training data and features. It is used for regression and classification tasks. B) eXtreme Gradient Boost. XGBoost is a learning technique. To obtain a more efficient prediction, it uses boosting. Various trees are created sequentially, where the following trees reduce the errors of the previous trees. C) Support Vector Regression. SVR is widely used to solve regression problems. On the other hand, SVM is a class of supervised learning algorithms inspired from the statistical learning theory. D) Linear Regressor. LR is an ML method based on supervised learning. It predicts the value applying independent variables. This method is principally used to find out the relationship between the input and output variables.

3

Experiments and Results

In this section, we present data pre-processing applied to samples and we describe data splitting strategy. Afterwards, we introduce the evaluation metric and the model configurations considered in the experiments. Subsequently, we discuss the evaluation metrics and we compare the baselines with the proposed model. Eventually, we illustrate and discuss the experimental results. 3.1

Experiment Design

Most of the datasets contain missing values. To deal with this problem, they were replaced with zero. Then, the whole dataset was split into two parts: 70% of the dataset was used for training and 30% of the dataset was employed for validation and testing. This split strategy was utilized to refine the generalization and to avoid the overfitting of the model. In addition, the Root Mean Squared Error (RMSE), Pearson’s Correlation Coefficient R, and Mean Absolute Error (MAE) were adopted to evaluate the performance of the different ML techniques. 3.2

Hyper Parameters Selection

Table 3 shows the grid search of the hyper parameters used in the four used ML methods. The bold font denotes the best setting of each hyper parameter. The grid search was used in order to find the best hyper parameter values to get the

A Machine Learning Framework for Cereal Yield Forecasting

27

perfect estimation results from each model. Based on the grid search, different combinations of all hyper parameters were employed. Then, the performance for each combination was calculated and the best value of the hyper parameters was selected, which allowed choosing adequately the best hyper parameters. Table 3. Details of the grid search of the hyper-parameters of each ML method. Methods Hyper-parameters

3.3

RF

max depth : [3,9,20, 30], min samples split : [1, 2, 5], n estimators : [5, 10, 100]

XGboost

max depth : [3,9,20, 30], learning rate : [0.1, 0.01, 0.001], n estimators : [5, 10, 100]

SVR

regularization parameter: [0, 0.5, 1.0, 1.5, 2.0], kernel type: [linear, poly, RBF, Gaussian, Sigmoid, ANOVA radial basis], kernel coefficient: [scale, auto]

LR

learning rate : [ 0.1, 0.01, 0.001, 0.0001], number of iterations : [1000, 5000, 10000]

Experimental Results

In this section, we present different ML techniques to forecast the cereal yield with improved precision and correlation between the observed and the forecasted target output. Overall, the forecasting of XGBoost exhibited the best prediction accuracy and loss, compared to the other ML techniques applied to the same dataset. Comparison of the Accuracy of Cereal Yield Forecasting Models in Training and Testing Phase. The forecasting accuracy of the used ML techniques was calculated by using RMSE, MAE, and R metrics, as it shown in Table 4. The latter displays the results obtained by the different ML techniques applied to forecast the cereal yield. The XGBoost gave the best accuracy of 0.96 and 0.95 in the training and testing phases, respectively. On the other hand, the RF provided accuracy results equal to 0.95, in the training phase, and 0.93 in the testing phase. However, the accuracy rate provided by SVR, using three combinations of kernels and the linear regression models, did not exceed 0.78 in the training phase and 0.75 in the testing phase. In this study, four ML methods were trained based on the observed yields and six variables of remote sensing and climatic data collected from 2002 to 2018 at the county scale were applied. Considering the three used evaluation indicators (R, RMSE, and MAE), the RF, XGBoost, and SVR (RBF) methods showed the highest accuracy, with an R (from 0.75 to 0.95), and the lowest RMSE (≤153235 qt) and MAE (≤165236 qt). Although the R of the other models did not exceed 0.61, all their RMSE were ≥168635 qt, indicating an insignificant relationship between the predicted and the observed cereal yields and high error rates.

28

N. Jarray et al.

Table 4. The R, RMSE and MAE values for cereal crop forecasting in the training phase and the testing phase, respectively. Models R

Training phase RMSE MAE

R

Testing phase RMSE MAE

RF

0.95 246.969 151.626 0.93 210.235 233.326

XGboost

0.97 208.492 105.910 0.95 153.235 165.236

SVR(RBF)

0.78 467.397 194.664 0.75 351.659 294.596

SVR(Linear)

0.58 607.689 373.848 0.57 743.953 420.859

SVR(Polynomial) 0.62 580.556 363.723 0.61 369.899 250.568 LR

0.64 529.628 383.973 0.58 168.635 388.469

Cereal Yield Forecasting. Based on the trained ML methods, cereal yields in 20 provinces in the country of Tunisia were forecasted (testing phase). The prediction results reveal that these regression models were acceptable in term of accuracy. The scatter diagrams of the forecasted and the observed cereal yields of the ML techniques are shown in Fig. 3. We note that the forecast and observed returns had a good linear fit, with an R of about 0.95.

Fig. 3. Spatial distribution of the observed and the predicted yields of cereal in 2017. Yields predicted by RF (a), SVR(RBF) (b), XGboost (c), SVR (Linear) (d), SVR (poly) (e) and Linear Regressor (f) and those observed at county scale (g).

A Machine Learning Framework for Cereal Yield Forecasting

29

Spatial Patterns of Cereal Crop Yield Forecasted. The spatial pattern of the yields predicted obtained by four ML methods in 2017 was presented in the testing phase. A slight difference was found, especially between RF and XGBoost (Fig. 4).

Fig. 4. Spatial distribution of the observed and the predicted yields of cereal in 2017. Yields predicted by RF (a), XGBoost (b), SVR(RBF) (c), SVR (Linear) (d), SVR (poly) (e), and LR (f) and those observed at county scale (g).

4

Discussion and Conclusion

This paper presented a modularized, process-based, data-driven, and machine learning computed framework for cereal yield forecasting. It is comprised of four ML techniques including RF, XGBoost, SVM and LR have been exploited and analyzed for accurate forecasting. Therefore, the results provided by XGBoost are almost similar to those given by RF. Besides, the XGBoost showed good accuracy on the test data in terms of R value. However, this value is high than that provided by the RF model. This shows that the using of complex and heterogeneous data improved the accuracy of cereal yield forecasting. With the continuous refinement of the forecasted cereal yield, decisions makers will have data-enabled insights on crop forecasting in order to monitor of the food security system. Further work will be needed to explore the potential impact of the refined spatial resolution. Acknowledgment. The authors gratefully acknowledge the National Observatory of Agriculture http://www.onagri.nat.tn/, Tunisia, for the cereal yield data.

30

N. Jarray et al.

References 1. Yeh, C., et al.: Sustainbench: benchmarks for monitoring the sustainable development goals with machine learning. In: Conference on Neural Information Processing Systems (2021) 2. Jarray, N., Abbes, A.B., Farah, I.R.: A novel teacher-student framework for soil moisture retrieval by combining Sentinel-1 and Sentinel-2: application in arid regions. IEEE Geosci. Remote Sens. Lett. 19, 1–5 (2022) 3. Jarray, N., Abbes, A.B., Rhif, M., Chouikhi, F., Farah, I.R.: An open source platform to estimate soil moisture using machine learning methods based on Eo-learn library. In: 2021 International Congress of Advanced Technology and Engineering (ICOTEN), pp. 1–5. IEEE, July 2021 4. Jarray, N., Abbes, A.B., Rhif, M., Dhaou, H., Ouessar, M., Farah, I.R.: SMETool: a web-based tool for soil moisture estimation based on Eo-Learn framework and Machine Learning methods. Environ. Modell. Softw. 157, 105505 (2022) 5. Balti, H., Abbes, A.B., Mellouli, N., Farah, I.R., Sang, Y., Lamolle, M.: Multidimensional architecture using a massive and heterogeneous data: application to drought monitoring. Future Gener. Comput. Syst. (2022) 6. Rhif, M., Abbes, A.B., Mart´ınez, B., Farah, I.R.: Deep learning models performance for NDVI time series prediction: a case study on north west Tunisia. In: 2020 Mediterranean and Middle-East Geoscience and Remote Sensing Symposium (M2GARSS), pp. 9–12, March 2020 7. Del´eglise, H., Interdonato, R., B´egu´e, A., d’Hˆ otel, E.M., Teisseire, M., Roche, M.: Food security prediction from heterogeneous data combining machine and deep learning methods. Expert Syst. Appl. 190, 116189 (2022) 8. Tian, H., Wang, P., Tansey, K., Zhang, J., Zhang, S., Li, H.: An LSTM neural network for improving wheat yield estimates by integrating remote sensing data and meteorological data in the Guanzhong plain, PR China. Agric. For. Meteorol. 310, 108629 (2021) 9. Ferchichi, A., Abbes, A.B., Barra, V., Farah, I.R.: Forecasting vegetation indices from spatio-temporal remotely sensed data using deep learning-based approaches: a systematic literature review. Ecol. Inform. 101552 (2022) 10. Gopal, P.M., Bhargavi, R.: A novel approach for efficient crop yield prediction. Comput. Electron. Agric. 165, 104968 (2019) 11. Bouras, E.H., et al.: Cereal yield forecasting with satellite drought-based indices, weather data and regional climate indices using machine learning in Morocco. Remote Sens. 13(16), 3101 (2021) 12. Sahni, V., Srivastava, S., Khan, R.: Modelling techniques to improve the quality of food using artificial intelligence. J. Food Qual. 2021 (2021) 13. Gavahi, K., Abbaszadeh, P., Moradkhani, H.: Deepyield: a combined convolutional neural network with long short-term memory for crop yield forecasting. Expert Syst. Appl. 184, 115511 (2021)

New Approach in LPR Systems Using Deep Learning to Classify Mercosur License Plates with Perspective Adjustment Lu´ıs Fabr´ıcio de F. Souza, Jos´e Jerovane da Costa Nascimento, Cyro M. G. Sab´ oia, Adriell G. Marques, Guilherme Freire Brilhante, Lucas de Oliveira Santos, Paulo A. L. Rego, and Pedro Pedrosa Rebou¸cas Filho(B) Laborat´ orio de Processamento de Imagens, Sinais e Computa¸ca ˜o Aplicada, Instituto Federal do Cear´ a, Universidade Federal do Cear´ a, Fortaleza, Brazil [email protected] http://lapisco.ifce.edu.br

Abstract. Brazil is undergoing a gradual change in the format of its license plates from the old Brazilian model to the new Mercosur model. The proposed study addresses a fully automatic process, using the Detectron2 network for the classification of plate types, combined with the Haar Cascade method for detecting the region of interest and the perspective adjustment method for plate alignment using Tesseract-OCR for character recognition. The results show an accuracy of 95.48% for the classification of the plate type, obtaining satisfactory results with 98.00% of accuracy with the perspective adjustment method, against 93.00% without the adjustment, and 87.71% and 87.46% character recognition without and with perspective adjustment respectively. Thus, presenting great effectiveness compared to the work found in the literature. Keywords: Detectron2 · License Plate Recognition Adjustment · Haar Cascade · Tesseract

1

· Perspective

Introduction

The increase of vehicles number on the streets and highways also increases the challenges for traffic management and safety in cities. The monitoring of streets and highways continues to follow this trend, with more and more traffic control and security cameras spread over different points of a city. With the maturation of computer vision techniques, these increases in both traffic and traffic monitoring create a favorable environment for research on Automatic License Plate Recognition (ALPR), generating increasingly robust and reliable results [1]. Applying these object detection and recognition systems can bring different benefits to the increasingly chaotic traffic in cities and highways. Computer c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023  A. Abraham et al. (Eds.): ISDA 2022, LNNS 715, pp. 31–41, 2023. https://doi.org/10.1007/978-3-031-35507-3_4

32

L. F. de F. Souza et al.

vision algorithms have the potential to help manage traffic correction and traffic violations. Different approaches can be implemented, such as detecting stolen vehicles or vehicles with administrative issues, identification of pedestrian crossings, identification of pedestrians using the crosswalk, assisting fixed cameras in autonomous cars to detect the speed allowed on the road, and other signs, among other possibilities [2]. ALPR systems are commonly divided into image acquisition, plate detection, character segmentation, and character recognition. These four steps have challenges that affect the accuracy of the systems due to external factors such as weather variations and camera positioning and internal factors such as technique/algorithm and hardware used [3]. In Brazil, another challenge arises: The gradual transition between the old Brazilian plate model and the new Mercosur model [4]. This work proposes an ALPR model preceded by an old Brazilian and Mercosur plate model classifier based on Detectron2. As the main contributions of this work, the different topics below are addressed: – Detectron2 classification on old license plates models and new Mercosur license plate models. – Mercosur license plate model detection in real and synthetic imageries. – Character recognition on Mercosur model license plates in different perspective angles using Tesseract OCR.

2

Related Works

Considering the promising results of computer vision techniques applied to problems with a necessary degree of repeatability and its vast capacity for process automation, mainly image detection and classification, several works are proposed in detecting people and traffic objects. L. Cuimei et al. [5] proposed a human face detection algorithm using Haar Cascade combined with three additional weak classifiers based on skin hue matching, eye, and mouth detection. Firstly, Haar Cascade detection is applied; for false-positive cases, a new classifier based on skin tone hue histogram correspondence is applied, further reducing errors. If false positives exist, two new classifiers based on detecting eyes and mouths are used. According to the author, the results show that the proposed method compensates for the deficiency of the Haar Cascade offered by Viola-Jones [6]. Presenting a convolutional neural network (CNN) [7,8] model using Detectron2 Mask R-CNN, [9] proposed a study of a system that automatically detects and classifies fresh Excelsa bean grains. The model had an accuracy of 87.5%. Driven by the challenges of increasing the number of vehicles in circulation. M. Valdeos et al. [10] proposed an automatic system for detecting and recognizing license plates using the Python language, the OpenCV library, YoloV4, and

New Approach in LPR Systems Using Deep Learning

33

optical character recognition Tesseract OCR, achieving 100% accuracy in the training performed with a thousand images. In [11], the authors proposed an integrated vehicle and license plate type recognition detection system in high-density areas using YoloV4. The study trained the model to identify six different vehicle types and their license plates, categorized by size and passenger capacity. The results were divided into detection using only one lane, two lanes, three lanes, and four lanes of traffic, with averages of license plate detection above 98% for the respective traffic lanes and above 97.5% of detection and recognition of vehicle types also in the respective traffic lanes.

3

Materials and Methods

This section addresses the methods used in this study and the datasets used in this research, divided into two subsections; Assessment Methods and Datasets. 3.1

Methods

Classification - Detectron2 Network - The Detectron2 Network is a CNN developed by Facebook AI Research (FAIR) to support new computer vision research. This Network works based on the faster R-CNN and masks R-CNN networks, allowing not only the classification of images but also the classification of objects in the image, such as regions of interest or other artifacts [12]. Detection - Harr Cascade - Haar-cascade is a cascade classifier provided by the tree-based OpenCV library proposed by Viola and Jones for face detection, [5] through the detection of surface shapes in the region of interest with the help of Wavelet and Haar theories, which can also be used to detect other types of objects. Perspective Adjustment - Computer Vision Method - The perspective adjustment in question is made with the aid of computer vision techniques of low processing cost, The model adjusts the license plate image’s perspective through the objects’ contours [13]. Tessecart-OCR Character Identification - Tesseract is an optical character recognition library based on artificial intelligence algorithms such as Machine Learning capable of classifying text in images and transcribing them digitally through the individual classification of the characters read in the image in question. The tool has support for recognizing more than 100 languages. It is used in this work to identify characters from traffic licenses. 3.2

Datsets

For this study, we are using two different databases. The first database is the LPR-UFC [13], which contains 2.566 images of license plates of the new Mercosur format. The second database used is RodoSol-ALPR [4], composed of twenty

34

L. F. de F. Souza et al.

thousand images divided into four sub-datasets. For this work, only the cars-me sub-dataset containing 5 thousand plates of the new Mercosul model was used. This work used the images from the LPR-UFC dataset both for training the Haar Cascade detection model and for the new model class of the classification model. From the RodoSol-ALPR dataset, 2.566 images of license plates from the sub-dataset of the old Brazilian model (cars-br) were used, maintaining the balance between the number of license plates of the new model and the old model.

4

Proposed Methodology

Fig. 1. Figure that illustrates the methodology proposed by this study. Stage 01 Classifies license plate in old brazilian model or a new Mercosur model. Stage 02 - Tilt adjustment. Stage 03 - Trainning detection with Haar Cascade. Stage 04 - Character recognition using Tesseract OCR. Stage 05 - Heuristic adjust algorithm.

This section presents the proposed methodology of this work, explaining the stages of classification of the different types of license plates in circulation in Brazil, plate detection, and character recognition. As shown in Fig. 1, the execution flow carried out in this research begins with classifying images into two classes: The old Brazilian model and the new Mercosur model. After categorizing the license plates in the models above, the old license plate is discarded for the other processes, giving continuity only to the new license plate model. The next step is to adjust the slopes that make both the detection stage and the character recognition stage difficult. For this, we apply the Hough transform. With the adjustment performed, the detection step is activated as shown in Fig. 2, steps 2 and 3. The next step after plate detection

New Approach in LPR Systems Using Deep Learning

35

and segmentation is character recognition by Tesseract OCR. The recognition may contain a letter and digit placement errors which are corrected by the next heuristic stage. the license plates classified as being from the new Mercosur model proceed to the following stages of the process, with the images classified as being from the old Brazilian model being discarded for the rest of the experiment. The license plates classified as being from the new Mercosur model proceed to the following stages of the process, with the images classified as being from the old Brazilian model being discarded for the rest of the experiment. Stage 01 - License Plate Classification - This work adapted the Detectron2 convolutional network to train a classification model with two classes: 1 old Brazilian plate model and 2 - new Mercosur model. The classification model and license plates were trained using the old Brazilian RodoSol-ALPR license plate base and the new Mercosul LPR-UFC model database. The RodoSol-ALPR database contains twenty thousand license plates divided into four sub-datasets with five thousand images each. The LPR-UFC database contains 2.566 images of license plates in the new Mercosur format, as explained in Sect. 3. To balance the databases for training, only 2.566 images from the RodoSol-ALPR dataset, sub-dataset cars-br, were used. The training was performed using 80% of the images for training and 20% for testing, with 100.000 iterations. Stage 02 - Perspective Adjustment - In this work, we propose the perspective adjustment of the images of license plates with a negative inclination for both detection and interpretation by Tesseract-OCR using the Hough Line Transform Fig. 2. In 1, we have the original image captured by the cameras; in 2, we cut the area of interest to show the angulation; and in 3, the license plate image with the corrected inclination. The result of processing the license plate image by this algorithm is the correction of the tilt of the plate, helping both in the detection and recognition of the characters according to the results shown in the Tables 3 and 4 and represented graphically in the Fig. 3.

Fig. 2. Figure illustrating the perspective adjustment result.

Stage 03 - License Plate Detection - The training of the license plate detection model was performed using 2.566 positive imageries from the UFCLPR [13] dataset and 35.568 negative images [14]. For the set of positive images, six different resizing with a Width × Height factor of 0.325 were performed, totaling 15.396 images according to Table 1. In Fig. 2, steps 2 and 3, we show how detection, and consequently recognition, are benefited by perspective adjustment. When performing the plate tilt correc-

36

L. F. de F. Souza et al.

tion, the detection bounding box correctly frames the plate, making it possible, when cropping the image, that all characters are intact for the next step and character recognition. Table 1. License Plate Haar-Cascade Trainning. Positive Negative Images Size 15396

35568

60 × 19 80 × 26 100 × 32 120 × 39 140 × 45 160 × 52

Stage 04 - License Plate Recognition - Optical character recognition (OCR) is a process of recognizing characters from an image, making it possible to obtain an editable text file [15]. For this work, the character recognition of the license plates is done by the Tesseract library. Tesseract OCR requires the input image to be converted into a Binary Image and then processed in “Connected Component Analysis,” in which the text/word split is performed with character outlines. The outlined characters are then sent for character recognition and processed in an adaptive classifier [16]. The license plate image cut from the bounding box found in the license plate detection step, before being passed to Tesseract-OCR for character recognition, goes through a series of pre-processing provided by the OpenCV library, such as binarization, smoothing, and morphological operation. The better the image quality, such as size, contrast, lighting, etc., the better the result. Stage 05 - Heuristic - The new license plates in the new Mercosur format have seven characters with the following sequence: letter, letter, letter, digit, letter, digit, and digit. The result of character recognition by Tesseract-OCR may contain minor errors and concur digits where a letter should be and vice versa. A heuristic algorithm is applied to correct these eventual changes to the recognition result.

5

Results and Discussion

In this section, we discuss the results achieved in our experiment. For a better understanding, we have divided the discussion into three subsections: License Plate Classification, License Plate Detection, and License Plate Recognition. The work was developed and tested on a computer with the following configurations: Intel Core i7 processor with 2.9 GHz, 8 GB RAM, Ubuntu 16.04 LTS operating system. License Plate Classification - The Detectron2 network is a convolutional network capable of detecting regions of interest and delineating the object using bounding boxes, and the convolutional network can be adapted to identify the object. Once the thing has been labeled, the network classifies objects into two different classes, Brazilian license plates of the old model and Brazilian license plates of the new Mercosur format.

New Approach in LPR Systems Using Deep Learning

37

Our dataset is defined in two classes: The old and Mercosur models. For each image, the classification provides a value that indicates the accuracy of the plate being of that model labeled as old or new. The detection accuracy of the classification step was averaged over the accuracy values of each image provided. As a result, the model obtained 95.48% accuracy, as shown in Table 2. Table 2. Training and test results, and k-fold cross-validation. License Plate Classification ACC (%) Stage

Test Fold 1 Fold 2 Fold 3

Proposed

95.48 95.51

94.88

95.79

Once trained the models with the classes above, the new board model and old board model performed tests with 20% of the dataset images with different types of boards. Thus, the trained model obtained relevant accuracy when dealing with authentic images. We performed cross-validation by separating the dataset into three folds, averaging 855 images. Then, three training/testing sections were made, each with two training folds and one testing fold. The metrics’ values remain similar in the three tested folds, considering training with 66% and testing in approximately 33% of the dataset. The convolutional network accurately identified the other characteristics of the license plates, making the model effective for identifying the type of Mercosur license plates and old license plates. License Plate Detection - In the detection model proposed by this study, a new model with 17 stages was trained using authentic images of the new Mercosur license plates model using the LPR-UFC dataset [13] resized to six different sizes according to Table 1. The model trained in this study obtained tests applied both to the synthetic database and to the real database, with perspective adjustment and without perspective adjustment, presented by [14] and [13], according to Table 1. The experiments showed relevant gains in detecting the license plates when the perspective adjustment was applied to the images. This approach has shown to be very promising in optimizing the results. The model trained with the database of real imageries and applied to the synthetic images without perspective adjustment obtained 83.96% of accuracy, a difference of gain of only 00.14% in relation to the training of the model with the database of synthetic images presented by [14], this similarity of the results is due to the excellent quality of the images from the LPR-UFC dataset [13], which contains images with the edges of the license plates as well defined as the naturally generated synthetic imageries have. When applied to perspectiveadjusted images, the detection gain of the work proposed in this article increases from 90.00% to 92.61%, an increase of 02.61%.

38

L. F. de F. Souza et al.

When the model trained with the database of real images is applied and applied to the test base of real images, the result shows a better evolution, going from 91.00% to 93.00% without the perspective adjustment and from 97.00% to 98.00% with the perspective adjustment, showing an increase of 02.00% and 01.00% respectively. This study made several comparisons between the crossings of the trained models between the synthetic image database and the real imageries database. It showed that using real imageries for model training makes detection results more promising. This is due to the model learning important details that real imageries have, such as shadows, dirt, and camera angle about the captured object, among other occlusions.

Table 3. License plate detection models and respective results. No Perspective Adjustment Method Train

test

ACC (%)

Proposed Method

Syntectic Dataset Real Dataset Syntectic Dataset Real Dataset Real Dataset

83.96% 93.00% 83.82% 90.00% 91.00%

Cyro M. G. Saboia (2022) [14] Luis Fabriicio de F. Souza (2022) [17] Cyro M. G. Saboia (2022) [13]

Real Dataset Real Dataset Syntectic Dataset Syntectic Dataset Real Dataset

With Perspective Adjustment Method Train

test

ACC (%)

Proposed Method

Syntectic Dataset Real Dataset Syntectic Dataset Real Dataset Real Dataset

92.61% 98.00% 90.00% 90.00% 97.00%

Cyro M. G. Saboia (2022) [14] Luis Fabriicio de F. Souza (2022) [17] Cyro M. G. Saboia (2022) [13]

Real Dataset Real Dataset Syntectic Dataset Syntectic Dataset Real Dataset

95 90 85 Proposed(Synthetic)

Proposed(Real)

2022 [14]

Perspective Adjustment (% ACC)

2022 [17]

2022 [13]

No Perspective Adjustment (% ACC)

Fig. 3. Graphical representation of table of License plate detection models and respective results.

License Plate Recognition - The results presented in Table 4 show that the character recognition of the proposed method had a worse performance compared to the work of [14]. This worsening is because the synthetic images have images with different highlights and defined contours, in the case of drawings

New Approach in LPR Systems Using Deep Learning

39

with highlighted edges 4, such as contrast, more precise delimitation of the plate contours, and homogeneous lighting. However, the character recognition results without perspective adjustment and with perspective adjustment obtained gains of 4.70% and 4.33% respectively in relation to the results presented by [17] and an increase of 01.66% and a loss of 01.02 % respectively about the results given by [13]. This variation of gain and loss in the recognition results is directly related to the quality of the images of the detected license plates 4 since the Tesseract-OCR method will not necessarily perform the recognition correctly. Considering that the new Mercosur plate model has seven digits and the character order is letter, letter, letter, digit, letter, digit, and digit, stage 05 Fig. 1 performs the correction of eventual changes of digits. Table 4 presents different works found in state-of-the-art, which propose character recognition through the use of the same databases used in the tests of this proposed study. The proposed model presented relevant gains with the help of perspective adjustment. Table 4. Character Recognition models Comparison using Tesseract OCR - Real and synthetic images. No Perspective Adjustment Method Proposed Method Cyro M. G. Saboia (2022) [14] Luis Fabriicio de F. Souza (2022) [17] Cyro M. G. Saboia (2022) [13]

Experiments

ACC

Real Dataset Syntectic Dataset Real Dataset Real Dataset

87.71% 95,72% 83,01% 86,05%

With Perspective Adjustment Method Proposed Method Cyro M. G. Saboia (2022) [14] Luis Fabriicio de F. Souza (2022) [17] Cyro M. G. Saboia (2022) [13]

6

Experiments

ACC

Real Dataset Syntectic Dataset Real Dataset Real Dataset

87.46% 95,72% 83,13% 88.48%

Conclusion and Future Work

The proposed Study brings a new, fully automatic approach for detecting and classifying the type of Brazilian plates (old node and new Mercosur model). The new model brought an efficient approach, a post-capture perspective adjustment of the license plate in digital images. The Study approached different modelbased comparisons with training through synthetic and real images. Thus, using Haar Cascade for detection, the new model proposed by this study reached 98.00% accuracy with perspective adjustment, surpassing stateof-the-art for this type of problem using the same databases. The model was able to classify vehicle license plates detected in old and new license plates (Mercosur model), obtaining an accuracy of 95.48%. After the classification, the model presented satisfactory results in the perspective adjustment process, maintaining

40

L. F. de F. Souza et al.

the equivalence between the reading and identification results of the characters in the captured license plates, reaching 87.46% of accuracy. Thus concluding, the effectiveness of the proposed model is capable of detecting accurately, classifying efficiently, and identifying the characters effectively. For future work, we propose solutions for new problems, such as occlusion, night images, and camera types for different issues. Acknowledgement. The authors would like to thank The Cear´ a State Foundation for the Support of Scientific and Technological Development (FUNCAP) for the financial support (grant #6945087/2019).

References 1. Lee, Y.Y., Halim, Z.A., Ab Wahab, M.N.: License plate detection using convolutional neural network-back to the basic with design of experiments. IEEE Access 10, 22577–22585 (2022) 2. Ibadov, S., Ibadov, R., Kalmukov, B., Krutov, V.: Algorithm for detecting violations of traffic rules based on computer vision approaches. In: MATEC Web of Conferences, vol. 132, p. 05005 (2017) 3. Mukhija, P., Dahiya, P.: Challenges in automatic license plate recognition system: an Indian scenario (2021) 4. Laroca, R., Cardoso, E.V., Lucio, D.R., Estevam, V., Menotti, D.: On the crossdataset generalization in license plate recognition. In: VISAPP, pp. 166–178 (2022) 5. Cuimei, L., Zhiliang, Q., Nan, J., Jianhua, W.: Human face detection algorithm via Haar cascade classifier combined with three additional classifiers. In: 2017 13th IEEE International Conference on Electronic Measurement & Instruments (ICEMI), pp. 483–487 (2017) 6. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE, CVPR 2001, vol. 1, pp. I–I, April 2001 7. Raghavan, R., Verma, D.C., Pandey, D., Anand, R., Pandey, B.K., Singh, H.: Optimized building extraction from high-resolution satellite imagery using deep learning. Multimedia Tools Appl. 81, 1–15 (2022) 8. Sindhwani, N., Anand, R., Meivel, S., Shukla, R., Yadav, M.P., Yadav, V.: Performance analysis of deep neural networks using computer vision. EAI Endors. Trans. Ind. Netw. Intell. Syst. 8(29), e3–e3 (2021) 9. Yumang, A.N., Juana, M.C.M.S., Diloy, R.L.C.: Detection and classification of defective fresh excelsa beans using mask r-CNN algorithm. In: 2022 14th International Conference on Computer and Automation Engineering (ICCAE), pp. 97–102 (2022) 10. Valdeos, M., Velazco, A.S.V., Paredes, M.G.P., Vel´ asquez, R.M.A.: Methodology for an automatic license plate recognition system using convolutional neural networks for a Peruvian case study. IEEE Latin Am. Trans. 20(6), 1032–1039 (2022) 11. Park, S.-H., Yu, S.-B., Kim, J.-A., Yoon, H.: An all-in-one vehicle type and license plate recognition system using YOLOv4. Sensors 22, 921 (2022) 12. Pham, V., Pham, C., Dang, T.: Road damage detection and classification with detectron2 and faster R-CNN. In: 2020 IEEE International Conference on Big Data (Big Data), pp. 5592–5601 (2020) 13. Sab´ oia, C.M.G., et al.: Fully automatic LPR method using Haar cascade and perspective adjustment for real mercosur license plates (Submitted for publication)

New Approach in LPR Systems Using Deep Learning

41

14. Sab´ oia, C.M.G., Filho, P.P.R.: Brazilian Mercosur license plate detection and recognition using Haar cascade and tesseract OCR on synthetic imagery. In: Abraham, A., Gandhi, N., Hanne, T., Hong, TP., Nogueira Rios, T., Ding, W. (eds.) Intelligent Systems Design and Applications. ISDA 2021. LNCS, vol. 418. Springer, Cham. https://doi.org/10.1007/978-3-030-96308-8 79 15. Dome, S., Sathe, A.P.: Optical charater recognition using tesseract and classification. In: 2021 International Conference on Emerging Smart Computing and Informatics (ESCI), pp. 153–158 (2021) 16. Audichya, M., Saini, J.: A study to recognize printed Gujarati characters using tesseract OCR. Eng. Technol. Appl. Sci. Res. 5, 1505–1510 (2017) 17. Souza, L.F.D.F.: New approach to the detection and recognition of Brazilian Mercosur plates using Haar cascade and tesseract OCR in real images, vol. 17, pp. 144–153 (2022)

Ensemble of Classifiers for Multilabel Clinical Text Categorization in Portuguese Orrana Lhaynher Veloso Sousa1 , David Pereira da Silva2 , Victor Eulalio Sousa Campelo3 , Romuere Rodrigues Veloso e Silva1,2(B) , and Deborah Maria Vieira Magalhães1,2 1

Electrical Engineering Department, Federal University of Piaui, Picos, Brazil {romuere,deborah.vm}@ufpi.edu.br 2 Information Systems Department, Federal University of Piaui, Picos, Brazil 3 Specialized Medicine Department, Federal University of Piaui, Teresina, Brazil

Abstract. The widespread adoption of medical document management has generated a large volume of unstructured data containing abbreviations, ambiguous terms, and typing errors. These factors make manual categorization an expensive, time-consuming, and error-prone task. Thus, the automatic classification of medical data into informative clinical categories can substantially reduce the cost of this task. In this context, this work aims to evaluate the use of an ensemble of classifiers of clinical texts to differentiate them into prescriptions, clinical notes, and exam requests. For this, we used the combination of N_gram+TF-IDF and BERTimbau to vectorize the text. Then, we used the classifiers Random Forest, Multilayer Perceptron, and Support Vector Machine to create the ensemble. After that, we predict the final ensemble label through a voting approach. The results are promising, reaching an accuracy of 0.99, kappa of 0.99, and F1-score of 0.99. Our approach allows automatic and accurate classification of clinical texts, achieving better categorization results than individual approaches.

Keywords: Clinical data

1

· Ensemble · Embeddings · Classification

Introduction

The increase of technological solutions for the automation of hospital processes has produced a large amount of unstructured data, such as the systems for the management and registration of medical documents [20]. This data comes in free text format and has multiple expressions describing the same clinical condition or procedures. Automatic text classification is one of the essential tasks to reduce human workload significantly by the digitization of structured information and intelligent decision support [11]. Classifying relevant medical data into clinical informational categories can reduce the cost of human labor in medical services.

c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023  A. Abraham et al. (Eds.): ISDA 2022, LNNS 715, pp. 42–51, 2023. https://doi.org/10.1007/978-3-031-35507-3_5

Ensemble of Classifiers for Multilabel Clinical Text Categorization

43

Such data can be classified into symptoms, illnesses, and documents, such as discharge summaries, medical reports, or clinical notes. However, classifying clinical data is still a challenging task. Several records contain abbreviations, ambiguous terms and misspellings, high noise levels, sparseness, complex medical vocabularies, and grammatical errors [13]. Also, in the medical domain, creating labeled training data requires significant effort due to i) the lack of publicly available clinical corpora for privacy reasons and ii) the requirement of medical knowledge to annotate the clinical texts [21]. Traditional machine learning methods may perform poorly in dealing with complex data, such as clinical documents. A single technique does not always guarantee a high precision [3]. Ensemble methods were developed in this context, combining more than one technique to solve the same task [7,10]. In this scenario, this work aims to evaluate a classification ensemble to categorize clinical texts into prescriptions, clinical notes, and exam requests. We used different vectorization techniques to text representation. We also analyzed the Snorkel framework [15] to create weakly labeled training sets. Then, the ensemble formed by the Support Vector Machine [1], Random Forest [19], and Multilayer Perceptron [5] algorithms performs the classification. The main contributions of this work are (1) the availability of the source code and dataset of clinical documents in Portuguese labeled in the classes: prescriptions, clinical notes, and exam requests1 (2) the analysis of the Snorkel framework results in creating labels in the training set; (3) the proposal of an ensemble for clinical documents classification. The rest of the article is organized as follows: Sect. 2 presents works related and Sect. 3 describes the methodology. Section 4 presents the results and a discussion of them. Finally, Sect. 5 brings final considerations and future work.

2

Related Work

Automatic text classification is an effective method to categorize files into predefined labels, such as document, sentence, and character. Thus, this technique has been used to label medical documents, facilitating the organization and extraction of information. Table 1 presents studies in the literature on the use of classification in health-related texts, especially clinical data. [2] used a loosely supervised method to detect suicidal ideation from unstructured clinical notes in electronic health record (EHR) systems. The work [4] aimed to evaluate a machine learning (ML)-based phenotyping approach to identify patients with immune-related adverse events (irAEs) from clinical notes. In [12], ML classification models were used in a set of radiological reports to assign protocols of medical imaging procedures in Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) exams. The authors in [6] automated the analysis and classification of drug mechanism of action (MOA) in outpatient Alzheimer’s disease (AD) treatment texts. In the work [10], the authors trained 1

The source code and the dataset of our study are publicly available at https:// github.com/pavic-ufpi/ISDA_Clinical_Text.

44

O. L. V. Sousa et al. Table 1. Related works. Method

Document type

Vectorization

[2]

Clinical notes for detecting suicidal ideation

BOW, bigram, TF-IDF and Word2Vec

[4]

Clinical notes to identify immune-related adverse TF-IDF and BioWordVec events (irAEs)

English

[12]

Radiological reports to assign medical imaging proce- Unigram + TF-IDF dure protocols

Spanish

[6]

Alzheimer’s treatment texts to classify the drug’s TF-IDF mechanism of action (MOA)

English

[10]

MRI reports to detect medial or lateral meniscus tear

BOW + n_gram (1, 2 and 3)

English

[9]

Clinical records for recognition of 16 comorbidities

BOW, TF-IDF, Word2Vec, GloVe, fastText and Uni- English versal Sentence Encoder

[17]

Descriptions of surgeries and postoperative records N_gram (1, 2 and 3) + TF and TF-IDF for infections prediction

[16]

Clinical notes for predicting the Charlson comorbidity Unigram and index (CCI) Word2Vec

Proposed

Prescriptions, clinical exams and clinical notes

Language

multigram

+

TF-IDF,

English

Portuguese LDA

and Portuguese

N_gram (1 and 2) + TF-IDF and BERTimbau

Portuguese

ML models and different ensembles of these models to detect medial or lateral meniscus tears in MRI reports of free text. [9] developed a classification system for the recognition of 16 comorbidities from clinical records of patients. The works mentioned above performed the classification of datasets in English and Spanish with different purposes. On the other hand, [17] analyzed text descriptions in Brazilian Portuguese. However, the 15,479 descriptions refer to surgeries and postoperative records to predict and detect infections in surgical centers. The authors in [16] automatically estimated the Charlson comorbidity index (CCI), used to predict the mortality of patients with comorbidities. As shown, deep approaches did not achieve much better results than classical techniques in the related works. In this context, our work performs the classification of routine medical texts through a Brazilian Portuguese dataset. We considered three classes: prescriptions, clinical notes, and exam requests. Inspired by the literature, this work proposes an ensemble as a classification model. It consists of two steps: (1) representation with different vectorization techniques, and (2) classification with three ML algorithms.

3

Proposal

This section presents the clinical text classification methodology, that is divided into the following steps: dataset acquisition, preprocessing and labeling, vectorization, ensemble and validation, as illustrated in Fig. 1. 3.1

Dataset Acquisition

The database used in this work comprises prescriptions, referrals, certificates, reports, clinical notes, and exam requests that physicians produced during faceto-face consultations. There were 3,000 samples collected from May 10, 2010, to August 11, 2021. These samples contain the identification columns by the name of the patient, name of the doctor, registration number of the professional in the Regional Council of Medicine (CRM), federal unit (UF) of the CRM, date of consultation, and text of the clinical document, which is in rich text format

Ensemble of Classifiers for Multilabel Clinical Text Categorization

45

Fig. 1. Methodology for classifying medical documents.

(RTF). We considered three classes for this research: prescriptions, clinical notes (reports, referrals, certificates, and medical notes), and exam requests. We chose these classes due to the future interest in extracting specific information from these categories by using named entity recognition (NER). 3.2

Preprocessing and Labeling

Before vectorizing the text, we must preprocess the data to maintain the confidentiality of the health professional and the patient, and to increase the quality of its representation. Most medical texts were in RTF format, so we performed the conversion of the files to strings with the method rtf_to_text from the package striprtf. We then removed manually all columns containing information of a private nature and additional confidential information from the samples. Then, we performed the removal of accents, excess punctuation, and exchange of commas for period in decimals. In addition, we remove all remaining whitespace characters from the conversion to string: \ t, \ n and \ r. Finally, we convert all the text to lowercase. Table 2 presents the examples of the dataset. The first labeling was done manually by two specialists (pharmacists) and one non-specialist. After labeling all 3,000 samples (1000 samples per document

46

O. L. V. Sousa et al.

Table 2. Examples of the content of preprocessed samples by database category. Category Prescriptions Clinical notes Request for medical exams

Preprocessed text uso oral vertix 1 cx tomar um comp via oral a noite por 2 meses ao dentista, encaminho a paciente com suspeita de bruxismo para avaliacao. solicitacao de exame: solicito rm do joelho direito

Fig. 2. Distribution of classes with the PCA algorithm in BERTimbau embeddings.

type), the two experts validated the manual labels. Then, we split the dataset into training and testing sets, with 2,100 and 900 samples, respectively. Since clinical data are sensitive and there is a need for medical knowledge to label them, the use of weak supervision has become relevant. Thus, we use Snorkel to create a second training set with weak labels. It allows users to generically specify multiple sources of weak programmatic oversight, such as rules and standards over text, that can vary in accuracy and coverage and be arbitrarily correlated [14]. We created labeling functions using regex, keywords, and half of the training dataset samples with the labels validated by the expert and used them as framework input. After that, we combine the predicted weak labels with the training dataset. In the end, we had two training datasets, one with ground-truth labels and the other with weak labels. We used the Principal Component Analysis (PCA) [22] to visualize the dataset with the BERTimbau [18] representation. Figure 2 shows the density of clusters. The prescription class has a denser cluster than the clinical notes and exam request classes. Besides that, there is a more significant overlap between data from clinical notes and exam request classes, while prescription data remains in more distant representation space. In addition, there are some outliers between the clusters, mainly in the clinical note class. 3.3

Selection of Vectorization Techniques

In order to increase the variety of representations of the working samples, we selected two approaches for vectorizing the text. Initially, we chose four methods that were tested and, from them, the two approaches were selected. Among the techniques used for this step, we have BERTimbau, the pre-trained Brazilian

Ensemble of Classifiers for Multilabel Clinical Text Categorization

47

Portuguese version of the BERT model; Word2Vec and its extension for document representation, Doc2Vec [8] and N_gram together with TF-IDF [23]; with the dataset created by the Snorkel and the dataset with ground-truth. We used the unigram and bigram representation, resulting in an array of numerical values of 1,400 × 5,188 corresponding to the representation N_gram + TF-IDF. In BERTimbau, the base version was used (‘bert-base-portuguesecased’), as result the output was 1,400×768 matrix. We trained the last two models on the samples for the vectorization algorithm selection step. In Word2Vec, the model used was the continuous bag-of-words, and the result has 1,400 × 100 of dimension, while Doc2Vec produced a representation of 1,400 × 128. In vectorization approaches, we used 1,400 samples from the training dataset. After, we use the output of these four methods as input to the SVM classifier and employ cross-validation with the 10-folds. From the results obtained, the two methods selected were the junction of N_gram with the TF-IDF, with the representation of the datasets with ground-truth and weak labels, and BERTimbau. 3.4

Ensemble and Validation

In order to overcome the weaknesses of individual techniques and consolidate their strengths, we developed a classification ensemble for clinical documents. It consists of a set of individually trained classifiers whose predictions are combined in some way to form the final prediction. We used the simple average voting, that is the average of the probabilities of certainty of classification of the models. We train the ensemble using 2,100 samples with the following classifiers: Random Forest (RF) [19], Multilayer Perceptron (MLP) [5], and Support Vector Machine (SVM) [1]. They were selected because, first, they are widely used in the classification of clinical documents and, second, their prediction probabilities by class are easy to obtain. Each classifier had as input the three embeddings defined in the previous step, the representation of the training dataset with ground-truth and weak labels with N_gram+TF-IDF, with 2,100×7,442 dimension, and the representation made by the BERTimbau, with 2,100×768 dimension. In SVM, we defined C as 1.0 and the radial basis function as kernel. In RF, we set the number of trees to 100 and there was no maximum tree depth. In MLP, we used 100 neurons in the hidden layer. We adopted the cross-validation approach with 10-folds to create the nine sets of labels to use in the ensemble. After combining the labels, we used the ensemble to predict the 900 samples from the test dataset. The performance evaluation of the different stages of the work was done through the use of metrics accuracy (Acc), kappa (Kap), precision (Prec), recall (Rec), F1-score (F1), and area under the ROC curve (AUC).

48

O. L. V. Sousa et al.

Table 3. Results obtained using combinations of embeddings with the SVM classifier. In bold are the best results. Embedding BERTimbau Doc2Vec N_gram + TF-IDF Snorkel + N_gram + TF-IDF Word2Vec

4 4.1

Acc 0.99±0.01 0.73±0.02 0.99±0.01 0.96±0.03 0.94±0.02

Kap 0.98±0.02 0.59±0.03 0.98±0.02 0.94±0.04 0.91±0.03

Prec 0.99±0.01 0.74±0.02 0.99±0.01 0.96±0.03 0.94±0.02

Rec 0.99±0.01 0.73±0.02 0.99±0.01 0.96±0.03 0.94±0.02

F1 0.99±0.01 0.73±0.02 0.99±0.01 0.96±0.03 0.94±0.02

AUC 1.00±0.00 0.91±0.02 1.00±0.00 0.99±0.01 1.00±0.00

Results and Discussions Embeddings Definition Results

Table 3 presents the results achieved by the vectorization techniques. BERTimbau and N_gram+TF-IDF with ground-truth and weak labels had the best performances. Word2Vec also achieved excellent results but was inferior to the approaches already mentioned, while Doc2Vec had the worst performance. Two factors explain the results obtained by Doc2Vec: (1) this model generally requires longer and more informative documents, and (2) to obtain more conclusive results, Doc2Vec needs a rich and diversified vocabulary, that is, with a greater multiplicity of unique tokens. Our dataset has, for the most part, short documents and an extensive repetition of tokens in the samples. This repetition occurs through the already widespread use of a standard structure for creating these clinical texts. Thus, Word2Vec uses word association and tends to have better results with the proposed dataset than models for document representation. The representations of the dataset with ground-truth labels with the BERTimbau and N_gram+TF-IDF methods obtained the same values in the metrics. BERTimbau, a pre-trained model in Brazilian Portuguese and a method of embedding words that returns different vectors for the same term, depending on the context of use, can create strongly contextual embeddings. Just as N_gram+TF-IDF can generate word associations through the sequence of tokens used, keeping the context of each sample. These models produce contextual representations and find patterns in the samples that facilitate the discrimination between classes. Regarding the results with the Snorkel framework, as it is a methodology in which weak labels were used, these results are encouraging since they are numerically comparable to approaches trained only by the dataset with ground-truth labels, obtaining results like 0.96 ± 0.03 in accuracy, precision, recall and F1-score, 0.94 ± 0.04 in kappa and 0.99 ± 0.01 in AUC. 4.2

Classification Results

Table 4 presents the classification results obtained with the test dataset. There is significant similarity in the classification results, regardless of the approach used. With a particular structuring already widely used in its creation, the categorization of the dataset does not become a complex task. Prescription samples,for

Ensemble of Classifiers for Multilabel Clinical Text Categorization

49

Table 4. Results obtained with the prediction. Embeddings + Classifiers Acc N_gram + TF-IDF + RF 0.99 Snorkel + N_gram + TF-IDF + RF 0.97 BERTimbau + RF 0.98 N_gram + TF-IDF + MLP 0.99 Snorkel + N_gram + TF-IDF + MLP 0.97 BERTimbau + MLP 0.99 N_gram + TF-IDF + SVM 0.99 Snorkel + N_gram + TF-IDF + SVM 0.99 BERTimbau + SVM 0.98 Proposed Ensemble 0.99

Kap 0.99 0.95 0.96 0.99 0.95 0.99 0.99 0.98 0.96 0.99

Prec 0.99 0.97 0.98 0.99 0.97 0.99 0.99 0.99 0.98 0.99

Rec 0.99 0.97 0.98 0.99 0.97 0.99 0.99 0.99 0.98 0.99

F1 0.99 0.97 0.98 0.99 0.97 0.99 0.99 0.99 0.98 0.99

AUC 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 -

Table 5. Distribution of the number of samples misclassified in the prediction. Embeddings + Classifiers Prescriptions N_gram + TF-IDF + RF 1 Snorkel + N_gram + TF-IDF + RF 2 BERTimbau + RF 1 N_gram + TF-IDF + MLP 0 Snorkel + N_gram + TF-IDF + MLP 1 BERTimbau + MLP 1 N_gram + TF-IDF + SVM 0 Snorkel + N_gram + TF-IDF + SVM 0 BERTimbau + SVM 1 Ensemble 0

Notes 1 18 13 2 23 4 1 5 13 4

Exams 5 7 8 4 6 3 6 5 8 4

Total 07 27 22 06 30 08 07 10 22 08

example, follow a formative pattern with characteristics such as route of administration, dosage, and drug name. Exam requests present information such as the procedure’s name or exam and the area or location of the procedure. This standardization, combined with the context of each sample maintained by the embeddings, makes the discretization between classes highly efficient. Table 5 compares the number of misclassified samples by category obtained with the ensemble and the unique classifiers that constitute it. Numerically, the N_gram+TF-IDF + MLP approach had the lowest number of errors, although the use of this vectorization method produces the best performances regardless of the classifier used. Other classification approaches also obtained fewer errors, such as BERTimbau + MLP and the dataset weakly labeled with N_gram+TF-IDF + SVM. The prescription class is the one that offers the lowest difficulty in terms of discretization with the other categories (vide Fig. 2). The class of exam requests obtained a standardization of errors regardless of the methodology. In the clinical notes class, classifiers trained with the Snorkel+N_gram+TF-IDF had the worst performances, except with SVM; weak labeling justifies this behavior. Snorkel had difficulty discretizing this class, as there were samples with ambiguous elements or did not offer enough information to be identified. Even with these results, the

50

O. L. V. Sousa et al.

Snorkel achieved similar performance with the validated labels dataset, proving its effectiveness in creating weakly labeled training datasets. However, it is important to emphasize using an ensemble as a proposed classifier. Ensembles produce more accurate performance on results than single approaches, as predictions from different sources are merged, thus reducing the spread or dispersion of predictions and model performance. In this way, the use of ensembles produces models with greater robustness or reliability at average performance.

5

Conclusion

In this work, we proposed an ensemble of classification of prescriptions, clinical notes, and exam requests texts in Brazilian Portuguese. We conclude that the weak supervision strategy can train machine learning models with high efficiency in labeling clinical texts. The dataset with ground-truth labels used in this work is available with the source code to support the development of tools for the automatic classification of clinical text. As limitations of this work, we point out the size of the dataset and its high standardization in terms of sample formats per class. Thus, in future works, we will add samples that do not contain the typical elements of the classes and partition the classes into other categories of interest, such as industrialized and manipulated prescriptions, referrals, medical certificates, and others.

References 1. Burges, C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2(2), 121–167 (1998) 2. Cusick, M., et al.: Using weak supervision and deep learning to classify clinical notes for identification of current suicidal ideation. J. Psychiatr. Res. 136, 95–102 (2021) 3. Dong, X., Yu, Z., Cao, W., Shi, Y., Ma, Q.: A survey on ensemble learning. Front. Comp. Sci. 14(2), 241–258 (2020) 4. Gupta, S., Belouali, A., Shah, N., Atkins, M., Madhavan, S.: Automated identification of patients with immune-related adverse events from clinical notes using word embedding and machine learning. JCO Clin. Cancer Inform. 5, 541–549 (2021) 5. Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2, 359–366 (1 1989) 6. Kambar, M.E.Z.N., Nahed, P., Cacho, J.R.F., Lee, G., Cummings, J., Taghva, K.: Clinical text classification of alzheimer’s drugs’ mechanism of action. In: Yang, X.-S., Sherratt, S., Dey, N., Joshi, A. (eds.) Proceedings of Sixth International Congress on Information and Communication Technology. LNNS, vol. 235, pp. 513– 521. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-2377-6_48 7. Kausar, N., Abdullah, A., Samir, B.B., Palaniappan, S., AlGhamdi, B.S., Dey, N.: Ensemble clustering algorithm with supervised classification of clinical data for early diagnosis of coronary artery disease. J. Med. Imaging Health Inform. 6(1), 78–87 (2016)

Ensemble of Classifiers for Multilabel Clinical Text Categorization

51

8. Kim, D., Seo, D., Cho, S., Kang, P.: Multi-co-training for document classification using various document representations: Tf-idf, lda, and doc2vec. Inf. Sci. 477, 15– 29 (2019) 9. Kumar, V., Recupero, D.R., Riboni, D., Helaoui, R.: Ensembling classical machine learning and deep learning approaches for morbidity identification from clinical notes. IEEE Access 9, 7107–7126 (2020) 10. Li, M.D., Deng, F., Chang, K., Kalpathy-Cramer, J., Huang, A.J.: Automated radiology-arthroscopy correlation of knee meniscal tears using natural language processing algorithms. Acad. Radiol. 29(4), 479–487 (2022) 11. Liu, J., Bai, R., Lu, Z., Ge, P., Aickelin, U., Liu, D.: Data-driven regular expressions evolution for medical text classification using genetic programming. In: IEEE CEC. pp. 1–8. IEEE (2020) 12. López-Úbeda, P., Díaz-Galiano, M.C., Martín-Noguerol, T., Luna, A., UreñaLópez, L.A., Martín-Valdivia, M.T.: Automatic medical protocol classification using machine learning approaches. Comput. Methods Programs Biomed. 200, 105939 (2021) 13. Mujtaba, G., et al.: Clinical text classification research trends: Systematic literature review and open issues. Expert Syst. Appl. 116, 494–520 (2019) 14. Ratner, A., Bach, S., Ehrenberg, H., Fries, J., Wu, S., Ré, C.: Snorkel: Rapid training data creation with weak supervision. In: International Conference on Very Large Data Bases. vol. 11, p. 269. NIH Public Access (2017) 15. Ratner, A., Bach, S., Ehrenberg, H., Fries, J., Wu, S., Ré, C.: Snorkel: Rapid training data creation with weak supervision. VLDB J. 29(2), 709–730 (2020) 16. Santos, H., Ulbrich, A., Woloszyn, V., Vieira, R.: An initial investigation of the charlson comorbidity index regression based on clinical notes. In: International Symposium on Computer-Based Medical Systems, pp. 6–11. IEEE (2018) 17. da Silva, D.A., Ten Caten, C.S., Dos Santos, R.P., Fogliatto, F.S., Hsuan, J.: Predicting the occurrence of surgical site infections using text mining and machine learning. PLoS ONE 14(12), e0226272 (2019) 18. Souza, F., Nogueira, R., Lotufo, R.: BERTimbau: pretrained BERT models for brazilian portuguese. In: Cerri, R., Prati, R.C. (eds.) BRACIS 2020. LNCS (LNAI), vol. 12319, pp. 403–417. Springer, Cham (2020). https://doi.org/10.1007/978-3030-61377-8_28 19. Swain, P., Hauska, H.: The decision tree classifier: design and potential. IEEE Trans. Geosci. Electron. 15(3), 142–147 (1977) 20. Tayefi, M., et al.: Challenges and opportunities beyond structured data in analysis of electronic health records. Wiley Interdisciplinary Reviews: Computational Statistics, p. e1549 (2021) 21. Wang, Y., et al.: A clinical text classification paradigm using weak supervision and deep representation. BMC Med. Inform. Decis. Mak. 19(1), 1–13 (2019) 22. Wold, S., Esbensen, K., Geladi, P.: Principal component analysis. Chemom. Intell. Lab. Syst. 2(1–3), 37–52 (1987) 23. Yun-tao, Z., Ling, G., Yong-cheng, W.: An improved tf-idf approach for text classification. Journal of Zhejiang University-SCIENCE A 2005 6:1 6, 49–55 (8 2005)

Semantic Segmentation Using MSRF-NET for Ultrasound Breast Cancer Hamza Hadri(B) , Abderahhim Fail, and Mohamed Sadik NEST Research Group, LRI Lab, ENSEM of Hassan II University of Casablanca, Casablanca, Morocco {hamza.hadri.doc21,a.fail,m.sadik}@ensem.ac.ma

Abstract. Deep learning stands out as one of the most performant modalities in the search for reliable methods to classify or segment breast cancer. The classical approach employs an expert to detect the type of tumour and segment the area of interest. Nevertheless, it takes considerable time and resources. Researchers around the world invent new deep learning architectures to improve quality and reduce segmentation time. One of these architectures demonstrates promising results, e.g., Dice Coefficient (DSC) of 0.9420, 0.9217, on CVC-ClinicDB, and Kvasir-SEG datasets respectively, this network named MSRF-NET (Multi-Scale Residual Fusion Network). The main novelty resides in exchanging variable scale features using the Dual-Scale Dense Fusion (DSDF) block, which gives the model a strong capability in segmenting objects of numerous sizes. This paper proposes an implementation of MSRF-NET for segmenting ultrasound breast cancer. We will investigate the influence of batch size, epochs, input size, learning rate on training and testing. Best performing model achieved a 93.95% in Dice Coefficient, 96.1% in specificity, 94.2% in precision, and 93.7% in recall. Lastly, we compare the performance of MSRF-NET with that of U-net and its variants, U-net++ and V-net. Keywords: Ultrasound · Breast cancer · MSRF-NET · semantic segmentation

1 Introduction Breast Cancer upraised as one of the most widespread precancerous diseases among women [1, 2] due to the difference in anatomy structure from men. Masses or calcifications can identify breast cancer; the masses can be classified into two categories: malignant lumps and benign, in addition to their margin or shape. Like most cancers, early diagnosis decreases the probability of fatality significantly. Hence, imaging modalities pave the way to reliable Breast cancer screening. These methods vary from the most expensive, such as computed tomography (CT) and magnetic resonance imaging (MRI), to mammography. Moreover, the Ultrasound which is located in a cost-effective area. The US modality is a non-radioactive, non-invasive method that can achieve high resolution with a real-time acquisition. All of these screening methods suffer from the complexity of interpretation and detection, which necessitates trained specialists; thus, the automatic cancer classification and segmentation emerge. In recent years, computer-aided © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 715, pp. 52–62, 2023. https://doi.org/10.1007/978-3-031-35507-3_6

Semantic Segmentation Using MSRF-NET for Ultrasound Breast Cancer

53

diagnosis and detection (CAD) systems have become powered by machine learning [3], specifically deep learning, which has improved exponentially. The classical methods include but are not limited to local thresholding, regiongrowing [4], edge detection and Markov Random Field (MRF) technique. The first method demonstrates good stability; however, the initialization parameters are crucial; the edge-based segmentation method uses the wavelet transform to capture the variation in US images. The MRF delivers reliable segmentation; on the other hand, iterations are time-consuming [5]. Deep neural networks reached cutting-edge performance in image segmentation, particularly in biomedical images, in the last decade with the presentation of the U-net [6] architecture, universally considered the gold standard in image segmentation and can achieve competitive performance in terms of (F1-score) and (IoU) [7]. The success of the U-net architecture led to more inspired models similar to DsU-net [7], and other encoder-decoder structures adopt up-sampling operators along with a contracting path and an expansive path to increment the resolution of the output. The vanishing gradient descent is a severe problem in training a deep neural network. Resnet [8] exploits the skip connection mechanism to tackle this issue and consequently improve the training by further minimization of the loss function and improvement in the Dice coefficient score. The U-net++ is a more improved U-net architecture with a deeply connected encoder-decoder [9]; the newly designed skip pathways reduce the mismatch in the encoder-decoder features map. Another structure derived from U-net is called V-net [10], designed specifically for volumetric 3D image segmentation and its complications, but it shows great results for 2D segmentation. The main challenge in biomedical image segmentation resides in the small datasets and object size variability, leading to unstable performance or complex models performing differently depending on the image’s feature complexity and the dataset’s size. The multi-scale fusion architecture was introduced to approach these problems. The MSRF-net [11] (Multi-Scale Residual Fusion Network) is a decoder-encoder structure that adopts Dual-Scale Dense Fusion (DSDF) blocks to fuse high-low resolution features. The multi-scale fusion by DSDF block improves the segmentation in biomedical images. In this paper, we evaluate the performance of the MSRF-NET in segmenting ultrasound breast cancer for the dataset in [12]. The assessment concentrate on training and testing using the widely employed metrics: DSC, precision, recall, and sensitivity. Furthermore, we compare the testing results of the MSRF-NET with those of U-net, U-net++, and V-net. The rest of the paper is organized as follows: Sect. 2 provides an overview of the MSRF-NET, Sect. 3 explains the implementation and evaluation metrics, Sect. 4 discusses the experimental results, and Sect. 5 will be reserved for the conclusion of this work.

2 An Overview of MSRF-NET The residual dense block, RDB [13], is a particular method of augmenting the image features circulation between different dense layers and the previous one. Furthermore,

54

H. Hadri et al.

the flow of features increases the ability of the network to capture composite features, consequently improving the segmentation. On the other hand, multi-scale feature extraction helps construct a more precise feature map. Multi-scale fusion [11] facilitates the exchange of high-low resolution features and vice-versa. Overall, DSDF and multi-scale fusion incorporation reinforce the performance of the MSRF-net. 2.1 The Encoder The encoder blocks consist of successive convolution layers, followed by squeeze and excitation blocks. These components calculate the dependencies among channels; in the squeezing process, the global average poling accumulates feature maps. Meanwhile, the excitation process gathers channel dependencies, and a dropout layer was deployed for regularization.

Fig. 1. Block diagram of MSRF-NET (Encoder, Decoder, Shape Stream MSRF Sub-Network).

2.2 The MSRF Sub-network The MSRF Sub-network contains several DSDF blocks (see Fig. 2). The block arrangement augments the flow of various scale features while guarding the resolution through the segmentation process. In an individual DSDF block, two-stream (1) and (2) capture the low (Xl ) and high (Xh ) feature maps respectively. Those two operations will be represented by the abbreviation CLR (Convolution and LeakyRelu). Each stream starts with a 3 × 3 convolution filter and a LeakyRelu activation function (see Fig. 3). Equation 1 modeled the n-th low-resolution stream operation, and the same is applied to the high-resolution equation shown in 2. Mn,l = CLR(Mn−1,h ⊕ Mn−1,1 ⊕ Mn−2,1 ⊕ · · · ⊕ M0,l )

(1)

Mn,h = CLR(Mn−1,h ⊕ Mn−1,1 ⊕ Mn−2,1 ⊕ · · · ⊕ M0,h )

(2)

Semantic Segmentation Using MSRF-NET for Ultrasound Breast Cancer

55

Fig. 2. Description of MSRF-NET sub-network.

The concatenation operation presented by ⊕, and n illustrates the number of CLR operations, Srivastava et al. [11] used 5 CLR with a growth factor k equal to 2. The k represents the number of features the MSRF Sub-Network passes further into the network, i.e., the output channels compete. Besides this, a scaling factor w equal to 0.4 is deployed, and the w ∈ [0,1] is a property of local residual learning. Equations 3 and 4 represent the output of a singular DSDF block. Xh = w × M5,h + Xh

(3)

Xl = w × M5,l + Xl

(4)

2.3 Shape Stream In MSRF-Net, a gated shape stream (see Fig. 1) was used. Equations 5 and 6 represent the operation of the shape stream: αl = σ (Conv1×1 (Sl ⊕ X ))

(5)

Sl+1 = RB(Sl × α)

(6)

Here, αl represents the gated convolution attention map, first a 1 × 1 convolution is applied to the concatenation result between S l and X, where S and l define the features map and the number of layers, respectively. The X represents the output of the subnetwork, and a bilinear interpolation was implemented to overcome spatial dimension mismatch between S l and X. Finally, the sigmoid activation function wraps up the final result; moreover, a residual block RB with two CLR functions compute S l+1 . 2.4 The Decoder The decoder blocks (D1–D3) employ two different attention structures, the first one is gated attention, and the second one uses spatial and channel attention. Before the final output of each decoder block, two CLR functions were applied, one after the other.

56

H. Hadri et al.

Fig. 3. The Layers of Dual-Scale Dense Fusion (DSDF) block in MSRF-NET.

2.5 Loss Function In image segmentation tasks, the binary cross-entropy loss (LBCE) function is the first choice. In [11] the authors use (LBCE) and dice loss function (LDCE). As a result, the sum of Eqs. 7 and 8 presents a sophisticated loss function where y the predicted value and y is the ground truth. Furthermore, deep supervision was added to ameliorate the flow of regularization and the gradients (see Fig. 1).   (7) LBCE = (y − 1)log 1 − yˆ − ylog yˆ 

LDCE = 1 −

2yˆy + 1 y + yˆ + 1

(8)

3 Methodology We assess the performance of the MSRF-Net architecture with an open BUS dataset [12]; the 320 grayscale images in this dataset are initially normalized to 128 × 128 in size. Additionally, a ten-year experienced radiologist identified the tumour region. The dataset is equally divided between malignant and benign tumours; and the subject’s age is 46.6 ± 14.2. First, the evaluation process investigates the effects of various batch sizes on training and testing with two epoch sizes, 100 and 200 and equally important,

Semantic Segmentation Using MSRF-NET for Ultrasound Breast Cancer

57

the learning rate and image input size and their influence on the results in terms of convergence and performance gain. We implement the MSRF-Net using TensorFlow v2.9.1, Keras API v2.9.1, Intel(R) Xeon(R) CPU @ 2.20 GHz, and NVIDIA Tesla T4 graphic card with 16 GB of VRAM, and we used the loss function employed in the original paper with Adam optimizer. To ensure an objective benchmark, we use 10-fold cross-validation (80% for training, 10% for validating and the same for testing). In semantic segmentation, we can adopt different quantitative measures. In general, they are all calculated from the four metrics (TP, TN, FP, FN) of the confusion matrix where: TP: the number of True Positive pixels. TN: the number of True Negative pixels. FP: the number of False Positives pixels. FN: the number of False Negatives pixels. Our evaluation will use F1-score, precision, recall, and specificity, each of these metrics (see Eqs. 9 ,10 ,11 ,12) will give a global quantitative measure of the performance of the MSRF-NET. 1. Dice Coefficient Score: is the first and most critical metric, also known as the F1score, which measures the similarity between two images; the F1- score can be calculated as a harmonic mean of the recall and precision. Dice Coefficient Score(DCS) =

2TP 2TP + FP + FN

(9)

2. Recall and precision: the recall or sensitivity makes it possible to know the percentage of true predicted positive pixels compared to a false negative. The same can be applied with precision, replacing false negatives with false positives. Recall =

TP TP + FN

Precision =

TP TP + FP

(10) (11)

3. Specificity: it is also known as true negative rate (TNR), which measures the ability of a given model to correctly identify negative pixels, i.e., the background of breast tumour. Specificity =

TN TN + FP

(12)

4 Experimental Results The first step in our benchmark process, examine the impact of variable batch size on training and testing. Table 1 summarizes the results; the 10 cross-validations were

58

H. Hadri et al.

implemented, with a 0.00001 learning rate and 100 epochs alongside the Adam optimizer. The experiments demonstrate the effectiveness of small batch size, with two batches performing the best with 0.939 in DSC and 32 performing the worst with 0.918. Besides, as long as we augment the batch size, the training tends to be unstable. Additionally, a bigger batch size takes more time to train, e.g., 32 batch size takes approximately 30 min, and there are difficulties in generalization i.e., low bias. We highlighted the best results in bold. Table 1. Results of the mean for 10-fold cross validation, plus standard deviation on various batch sizes. Batch-size

DSC

Testing

validation

Testing

recall

specificity

precision

2

0.969 ± 0.02

0.939 ± 0.01

0.941 ± 0.04

0.949 ± 0.03

0.937 ± 0.02

4

0.967 ± 0.03

0.938 ± 0.04

0.941 ± 0.03

0.951 ± 0.02

0.936 ± 0.04

8

0.957 ± 0.06

0.931 ± 0.06

0.934 ± 0.08

0.943 ± 0.05

0.929 ± 0.08

16

0.948 ± 0.08

0.927 ± 0.07

0.933 ± 0.10

0.935 ± 0.07

0.922 ± 0.09

32

0.931 ± 0.11

0.918 ± 0.13

0.912 ± 0.15

0.924 ± 0.11

0.924 ± 0.12

The input size is considered a hyperparameter. Thus, we experiment with three input sizes, the original size of 128 × 128 and size ratios of 1:2, 2:1 (see Table 2). The results expose the low impact of input size on the overall performance, e.g., the DSC shows only ± 0.01 variation and the time to train changes proportionally, for example 64 × 64 trained in 15 min, and 256 × 256 in 38 min. Table 2. Validation and testing results on different Input-size (Batch size: 4, Epochs:100, Learning rate: 0.00001). Input-size

DSC

Testing

validation

Testing

recall

specificity

precision

64 × 64

0.965 ± 0.02

0.937 ± 0.03

0.941 ± 0.01

0.953 ± 0.04

0.938 ± 0.02

128 × 128

0.967 ± 0.02

0.938 ± 0.02

0.941 ± 0.03

0.951 ± 0.02

0.936 ± 0.04

256 × 256

0.968 ± 0.03

0.938 ± 0.03

0.942 ± 0.05

0.955 ± 0.02

0.93 ± 0.01

Generally, training a network for more epochs improves performance, though there is a high risk of overfitting, e.g., high variance; Table 3 presents the outcome of training the network with 200 epochs using distinct batch sizes. The validation DSC increased, for example 0.979 ± 0.01 in training, while the dice score declined in the testing phase 0.935 ± 0.02 (see Table 1 and Table 3) indicating a high variance problem. Moreover, the validation metric does not increase after 127 epochs.

Semantic Segmentation Using MSRF-NET for Ultrasound Breast Cancer

59

Table 4 examines the metrics of the MSRF-NET regarding various learning rates. The learning rate plays an essential role in identifying the optimal model; without a doubt, lowering the learning rate results in fast convergence at the cost of finding the global minimum of the cost function. The 0.0001 learning rate converges fast after only two epochs, but the loss function cannot be minimized further. However, we could only reach the local minimum, the DSC achieved the value of 0.924 ± 0.06. A more significant learning rate takes substantially more iterations to converge, so we run the model for 200 epochs to ensure a fair comparison. In brief, the validation metric for 0.000001 converges to 0.934 ± 0.02 after 147 epochs; however, in testing the performance declined, one possible reason can lead to this problem: the high number of iterations products a high variance model. To prevent the model from overfitting, we apply the early stopping technique. Table 3. Testing and validation results using 200 epochs (Learning rate: 0.00001, Input-size: 128 × 128). Batch-size

DSC

Testing

validation

Testing

recall

specificity

precision

2

0.979 ± 0.01

0.935 ± 0.02

0.940 ± 0.05

0.945 ± 0.02

0.931 ± 0.02

4

0.977 ± 0.02

0.931 ± 0.04

0.939 ± 0.02

0.942 ± 0.03

0.924 ± 0.05

8

0.971 ± 0.06

0.925 ± 0.06

0.932 ± 0.08

0.933 ± 0.05

0.918 ± 0.08

16

0.969 ± 0.09

0.921 ± 0.07

0.930 ± 0.11

0.928 ± 0.08

0.912 ± 0.09

32

0.955 ± 0.10

0.909 ± 0.14

0.916 ± 0.11

0.915 ± 0.11

0.924 ± 0.16

Table 4. Results on two distinct learning rates (Batch size = 4, Input-size = 128 × 128). Learning-rate

DSC

Testing

validation

Testing

recall

specificity

precision

0.0001

0.957 ± 0.08

0.924 ± 0.06

0.916 ± 0.06

0.935 ± 0.05

0.928 ± 0.07

0.000001

0.975 ± 0.03

0.934 ± 0.02

0.937 ± 0.04

0.937 ± 0.04

0.931 ± 0.03

To illustrate the performance of MSRF-NET, we compare it with three widely used architectures: the famous U-net [6], U-net++ [9] and V-net [10]. We use the same environment as MSRF-Net to implement this method. For U-net, we use (32, 64, 128, 256, 512) number of filters also batch-normalization layers. The same configuration applies to the U-net++ and V-net. We deployed the three models with 4 as the batch size and 128 × 128 input-size, with the cross-entropy loss function. We run the training for 100 epochs. Table 5 outlines the results. The implementation of U-net results in a structure of 11,690,913 trainable parameters. On the other hand, 12,461,313 are the parameters of U-net++, and 14,737,185 for the V-net. The MSRF-NET has 18,38 million parameters. The U-net and U-net++ take

60

H. Hadri et al.

approximately the same time to train, whereas the MSRF-net, takes a little bit more due to the complex nature of the loss function and the significant number of parameters. The U-net and U-net++ had almost similar performance. The V-net achieved a DSC of 93.4% ± 0.05, the MSRF-net overpassed the three structures with 93.9% in F-score and 93.7% ± 0.02 in recall. In another aspect of comparison, the U-net and U-net++ take considerably more time to predict the mask e.g., U-net: 42 ms, the V-net constructs the mask in 25 ms, and for the proposed implementation the frame per second (FPS) takes 13 ms.

Fig. 4. Examples of segmenting breast cancer with the proposed implementation: (a) Segmentation result, (b) ground truth, (c) original image. Table 5. Summary of breast cancer segmentation results (batch size = 4, Input-size = 128 × 128), compared with three architectures. Model

DSC

recall

specificity

precision

U-net

0.929 ± 0.08

0.934 ± 0.01

0.948 ± 0.05

0.924 ± 0.04

U-net++

0.931 ± 0.07

0.933 ± 0.03

0.952 ± 0.03

0.929 ± 0.07

V-net

0.934 ± 0.05

0.935 ± 0.04

0.961 ± 0.06

0.933 ± 0.02

MSRF-NET

0.939 ± 0.03

0.937 ± 0.02

0.951 ± 0.02

0.942 ± 0.04

Figure 4 Shows the segmented images of the MSRF-NET. The combination of fusing high and low resolutions in the MSRF sub-network in DSDF blocks, using five CLR

Semantic Segmentation Using MSRF-NET for Ultrasound Breast Cancer

61

operations, the decoder with triple attention block, and the shape stream to separately process the boundaries, gives an edge to the structure.

5 Conclusion In this work, we proposed the implementation of MSRF-NET for ultrasound breast cancer segmentation. The multi-scale features fusion through MSRF sub-network and the Dual-Scale Dense Fusion DSDF blocks, enhanced the map features exchange. Therefore, we achieved a competitive result in the evaluation metrics, and outperforming the Unet, U-net++ and V-net in DSC with 0.939 ± 0.03, and in recall and precision. In addition to the FPS. The outcomes support expanding the use of the MSRF-NET for other ultrasound breast cancer datasets, improving the structure further, and tackling the encounter problems. Acknowledgment. This work was supported by the National Center of Scientific and Technical Research (CNRST).

References 1. Gao, F., Chia, K.-S., Ng, F.-C., Ng, E.-H., Machin, D.: Interval cancers following breast cancer screening in Singaporean women. Int. J. Cancer 101, 475–479 (2002) 2. American Cancer Society. Breast Cancer Facts and Figures 2019. American Cancer Society: Atlanta, GA, USA (2019) 3. Xu, Y., Wang, Y., Yuan, J., Cheng, Q., Wang, X., Carson, P.L.: Medical breast ultrasound image segmentation by machine learning. Ultrasonics 91, 1–9 (2019) 4. Kallergi, M., Woods, K., Clarke, L.P., Qian, W., Clark, R.A.: Image segmentation in digital mammography: comparison of local thresholding and region growing algorithms. Comput. Med. Imaging Graph. 16, 323–331 (1992) 5. Huang, Q., Luo, Y., Zhang, Q.: Breast ultrasound image segmentation: a survey. Int. J. Comput. Assist. Radiol. Surg. 12(3), 493–507 (2017). https://doi.org/10.1007/s11548-016-1513-1 6. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III, pp. 234–241. Springer International Publishing, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28 7. Gómez-Flores, W., de Albuquerque Pereira, W.C.: A comparative study of pre-trained convolutional neural networks for semantic segmentation of breast tumors in ultrasound. Computers in Biology and Medicine, vol. 126, p. 104036. Springer (2020) 8. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) 9. Zhou, Z., Siddiquee, M.M.R., Tajbakhsh, N., Liang, J.: U-Net++: redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans. Med. Imaging 39(6), 1856–1867 (2020) 10. Dangoury, S., Sadik, M., Alali, A., Fail, A.: V-net performances for 2D ultrasound image segmentation. In: 2022 IEEE 18th International Colloquium on Signal Processing & Applications (CSPA), pp. 96–100 (2022)

62

H. Hadri et al.

11. Srivastava, A., et al.: MSRF-Net: a multi-scale residual fusion network for biomedical image segmentation. IEEE J. Biomed. Health Inform. 26(5), 2252–2263 (2022) 12. Huang, Q., Huang, Y., Luo, Y., Yuan, F., Li, X.: Segmentation of breast ultrasound image with semantic classification of super-pixels. Med. Image Anal. 61, 101657 (2020) 13. Zhang, Y., Tian, Y., Kong, Y., Zhong, B., Fu, Y.: Residual dense network for image superresolution. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2472–2481 (2018)

A Hybrid Image Steganography Method Based on Spectral and Spatial Domain with High Hiding Ratio D. Kumar1(B) and V. K. Sudha2 1 P.A. College of Engineering and Technology, Pollachi 642002, India

[email protected] 2 Dr. Mahalingam College of Engineering and Technology, Pollachi, India

Abstract. The internet’s global data communication has made data transfer both simple and dangerous. The security of data transferred over the Internet is a major concern nowadays. Steganography is a popular method for preventing unauthorized access to data. This paper proposes a three-level secured hybrid image steganography method with increased payload by embedding secret data bits in both the spectral and spatial domains. First, the cover and secret image are encrypted using the permutation order (PO) generated by the initial keys (First Level security), and then the cover image is converted using DCT and random DCT blocks are selected using key values to embed the secret data (Second Level security). Finally, the remaining secret data bits are embedded at random locations specified by the key values in the LSB planes (Third Level security). The method achieves a PSNR of 35.86 dB and a hiding ratio of 50% (4bpp), according to the performance results. Keywords: Spectral domain · Spatial domain · DCT · LSB Substitution

1 Introduction In modern ages, the security of confidential information has grown in importance, and advances in computer security have demonstrated that Cryptography and Steganography are secure approaches for obtaining data. Cryptography is the process of converting secret information into a meaningless and difficult-to-understand format [1]. Steganography is the process of embedding secret information into another image by means of a specific algorithm [2]. When secret data is embedded in a cover image, a stego image is created [3]. Similarly, when secret image is embedded in audio the resulting files are stego audio, stego text, and stego video [4]. Although cryptography and steganography are used to protect data, using either one alone is not an ideal solution. As a result, combining Cryptography and Steganography can provide more security against Statistical attack [3]. Kaur et al. [5] developed a steganography algorithm using DCT and Coupled Chaotic Map (CCM). The secret image is embedded in the DCT co-efficient of the cover image © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 715, pp. 63–70, 2023. https://doi.org/10.1007/978-3-031-35507-3_7

64

D. Kumar and V. K. Sudha

based on random locations provided by the CCM. The method focuses on securing secret data by providing highly sensitive keys through the integration of logistic and sine maps, but the proposed method performs encryption (cover and secret image), decryption, and random data hiding using highly sensitive keys, giving it an advantage over existing methods. When communicating secret data, the protection of data with high payload capacity is now the most important task. Managing a large amount of secret data while maintaining high security and image quality, on the other hand, is a significant challenge. This paper proposes a hybrid steganography method based on the spectral and spatial domains to provide high security as well as payload capacity. Furthermore, more secret bits can be hided using the suggested technique without degrading image quality, making this technique very useful for transmitting digital images over the Internet. In this work a 3 level secured a hybrid image steganography with increased payload of 50% is proposed in this method. i) First cover and secret image are encrypted using PO generated by key generator structure which provides first layer security ii) Based on random values generated by the internal keys the DCT blocks and position to hide secret data are selected randomly in spectral and spatial domain for embedding (second and third layer security), which increases payload of 4bpp with less amount distortion (PSNR = 35.86 dB)

2 Related Works Patel, R. et al. [4] proposed a two layer secured transform domain based video steganography algorithm using DCT and DST to embed secret image into cover. Method uses cover as video and secret data as image. To conceal secret image in cover video method uses LSB integer component of DCT and DST. Sharif, A. et al. [6] proposed a spatial domain-based digital image steganography method based on a chaotic map. In this method, a new three-dimensional LCA chaotic map is used to select frames at random and embed the secret data. Based on chaotic map Valandar, M. et al. [7] proposed a two level secured transform domain based image steganography technique. In this technique, IWT transform is utilized to embed the secret data into cover. To improve security and to increase the key space the proposed method uses 3D sine chaotic map. Furthermore, the sequence of the chaotic map is also utilized for selecting the pixel for embedding. Based on Intermediate Significant Bit Plane Embedding (ISBPE). Parah, S.A et al. [8] proposed three layer secured image steganography algorithm. The technique uses PN sequence generator for encryption, pixel selection and embedding process. Saidi et al. [2] used chaotic maps to implement a DCT-based adaptive image steganography algorithm. Cover and secret data are grayscale images in this method. Based on the sequence of the PWLCM function, the Least and Medium DCT coefficients of the cover image are used to embed the secret message. Syed Taqi Ali and Rajashree Gajabe [9] proposed an image steganography algorithm in spatial domain which conceals the gray scale image inside a cover image. In this method, gray scale image is encrypted by baker map and secret image is concealed inside the cover image. A blind high capacity image 3D steganography in spatial domain based on pattern identification is suggested by Thiyagarajan et al.

A Hybrid Image Steganography Method Based on Spectral and Spatial Domain

65

[10]. The method uses stego key for formation of triangle mesh and for embedding the secret message in the cover. Zhou et al. [11] developed an image steganography method using LSB technique. In this method, secret and cover image of size N × N and 4N × 4N are used respectively. The method scrambles the secret image using bit plane scrambling. After scrambling, the scrambled image is expanded to the size of cover image and then again scrambled using Arnold cat map. Finally, embedding is carried out using only LSB technique. Faez and Miri [12] established an image steganography based on integer wavelet transform to embed the secret data into cover. The method uses cover and secret image as gray scale image. The MSB bit value of each coefficient will determine the volume of data to be embedded in LSB bit. AbdelRaouf [13] proposed a new image steganography method for hiding secret data. In this method, LSB substitution technique has been utilized for hiding information into an image. Even though method has large payload but time taken for embedding secret data is high. To increase hiding ratio and payload hussain et al. [14] proposed an improved steganography method using LSB and Pixel Value Differencing (PVD). The method achieves better PSNR and embedding capacity. The proposed method is solely concentrated on PSNR and payload but not security. Attaby et al. [15] proposed a spectral domain based image steganography method using DCT. In this method, using modulus 3 technique the difference between two DCT coefficients were calculated to insert two secret bits into a cover image which improves the embedding capacity and reduces distortion. However, security of the scheme can be improved by applying a cryptography approach. From the above literature, the higher the PSNR, the noise generated by the steganography algorithm will be lesser which results in lower value of hiding ratio or payload and vice versa. So there is tradeoff between PSNR and Hiding Ratio.

3 Mathematical Background 3.1 Discrete Cosine Transform (DCT) The DCT [15] employs cosine as a basis vector and expresses an image as coefficients of different cosine frequencies The common equation for a 2-dimensional image is specified by Eq. (1). C(u, v) = α(u)α(v)

N −1 N −1  

 f (x, y)× cos

x=0 y=0

u(2x + 1) 2N



 × cos

v(2y + 1) 2N

 (1)

The inverse of Discrete Cosine Transform is defined by Eq. (2) f (x, y) = N

N −1 N −1  

 α(u)α(v)c(u, v)× cos

x=0 y=0

where, u, v = 0, 1, .....N − 1, f (x, y) is pixel values of row i and column j

u(2x + 1) 2N

   v(2y + 1) × cos 2N

(2)

66

D. Kumar and V. K. Sudha

and α(u), α(v) are the normalization coefficients provided by Eq. (3) ⎧ 1 ⎨ if u = 0 α(u), α(ν) = MN 2 ⎩ otherwise

(3)

MN

3.2 Substitution Steganographic systems use three major techniques: injection, substitution, and generation. The substitution method [14, 17] is the most widely used of these three techniques due to its simple embedding process and high hiding capacity. The least significant bits of the cover pixels are replaced by the secret data in this method.

4 Proposed Methodology Figure 1 depicts block diagram of the developed spectral and spatial domain based image steganography method. There are two inputs to the proposed system one is Cover Image (CI ) and other one is secret image (SI ). In this method, first the cover image (CI ) and Secret Image (SI ) are encrypted in the encryption block before embedding is carried out. After encryption, the cover image (CEI ) are divided into number of 8x8 blocks and then 2D-DCT is applied on each 8x8 block. In embedding block, first embedding is done in spectral domain and then embedding is done in spatial domain The main objective of the proposed method is to increase the hiding ratio. After embedding all secret data, the resulting CEI is decrypted using same Permutation Order (PO) to generate stego Image which will be transmitted in open network. The entire process of proposed method is given below. Step1: Logistic map is used for generating PO to change the order of pixels of both cover and secret image with a key. Step2: Apply discrete cosine transform (DCT) on each block NB and generate the blocks NBi given by the below equation. NBi (i, j) = DCT (CEi (i, j)), 1 ≤ i, j ≤ 8

(4)

where CEi (i, j) denotes pixel value of ith row and j th column in each block Step3: Convert each DCT co-efficient into binary and embed half of the secret data into 1st and 2nd ISB planes of the cover image. Step4: Apply inverse DCT of the embedded block and convert the pixels into binary. Step5: Embedded the remaining half of the secret data in Spatial domain randomly using LSB Substitution method in the 3rd and 4th plane and generate the stego image. Step6: The encrypted embedded stego image(ESI ) are decrypted using PO generated by step1.

A Hybrid Image Steganography Method Based on Spectral and Spatial Domain

67

SEi

Input Block Cover Image(CI)

Encryption Block

CEi Encryption

Embedding Block

Discrete Cosine Transform (DCT)

Embedding in transform domain

Secret Image (SI)

Stego-image for transmission

Embedded Stego Image (ESi)

Decryption

Inverse Discrete Cosine Transform

Embedding in spatial domain

Output Block

Fig. 1. Block diagram of proposed transform and spatial domain method

5 Results and Discussion The performance of proposed spectral and spatial domain based image steganography was implemented using MATLABR2018b. In order to test the proposed method, 8-bit gray scale image each having 256 Gy levels with size of 512 × 512 has been downloaded from the USC-SIPI image data base [18]. The performance has been compared for both spectral and spatial domains in terms of Peak Signal to Noise Ratio (PSNR) and Hiding ratio. 5.1 Peak Signal to Noise Ratio The PSNR analyses the quality of the cover image and stego image after embedding and measures the amount of distortion. The higher the PSNR with ideal value greater than 30 dB [16] better the quality of the image. Table1 shows PSNR values of different gray scale images. From Table 1 it is clear that the proposed method PSNR value is greater than ideal value of 30 dB it is difficult to differentiate the cover image and stego image. The PSNR is calculated using Eq. (5). Figure 2 and 3 shows cover and stego image Lena and Elaine after hiding the secret image using proposed method. From the Fig. 2 and 3 it is clear evident that there is no visual difference between the cover and stego image. PNSR = 10 × log110

2552 MSE

(5)

5.2 Hiding Ratio (HR) The ability of image steganography to conceal as much data as possible within a cover image is referred to as hiding ratio/payload. The hiding ratio is calculated using the equation. HR =

Size of embedded image × 100 cover image

(6)

68

D. Kumar and V. K. Sudha

Fig. 2. Cover and secret image of Lena and their Histogram

Fig. 3. Cover and Secret Image of Elaine and their histogram.

The Table 1 shows hiding ratio and payload of proposed methods and other methods and Fig. 4 shows comparison chart of PSNR and Hiding ratio in form of bar chart. From Table 1 it is clear that, the proposed method average PSNR is 35.86 dB which is greater than theoretical value of 30 dB [3] which implies that stego image is similar to original image and thus proposed steganography method has greater security compared to other methods (Table 2). Table 1. PSNR value of proposed for different Gray scale images File Name

Secret Image Cover Image

File Name

Secret Image

Lena (PSNR)

Cover Image Lena (PSNR)

5.2.08

Couple

36.3159

7.1.07

Tank

35.3125

5.2.09

Aerial

35.3148

gray21.512

21 level step wedge

36.1254

5.2.10

Stream and Brdige

34.9086

ruler.512

Pixel ruler

35.2152

7.1.02

Truck

36.5736

7.1.06

Truck and APCs

35.1478

7.1.03

Tank

35.2166

7.1.08

APC

36.8972

7.1.04

Car and APCs

36.6502

7.1.09

Tank

36.2178

7.1.05

Truck and APCs

36.0727

5.1.12

Clock

36.6572

A Hybrid Image Steganography Method Based on Spectral and Spatial Domain

69

Table 2. Comparison of PSNR and Hiding ratio of proposed and other methods Method

PSNR

Hiding ratio

Sharif, A. et al. [6]

38.7540



Valandar et al. [7]

53.2145



Parah et al. [8]

37.97

25

Ranjithkumar et al. [1]

45

25

Saidi et al. [2]

30.22

6.25

Ranjithkumar et al. [3]

44.14

25

AbdelRaouf et al. [13]

43.955

26

Proposed method

35.86

50

Fig. 4. Comparison chart of PSNR and Hiding ratio

6 Conclusion This paper proposes image steganography method in which secret image is embedded in both spectral and spatial domain using DCT and LSB substitution technique. The main objective of the proposed method is to increase the hiding ratio and security against Stastical attacks. The experimental results demonstrate that the method achieves better security (PSNR = 35.86 dB) as and Hiding ratio of 50% (bpp) with is better than other methods.

References 1. Ranjithkumar, R., Ganeshkumar, D., Senthamilarasu, S.: Efficient and secure data hiding in video sequence with three-layer security: an approach using chaos. Multimed Tools Appl., 80, 13865–13878 (2021)

70

D. Kumar and V. K. Sudha

2. Saidi, M., Hermassi, H., Rhouma, R., Belghith, S.: A new adaptive image steganography scheme based on DCT and chaotic map. Multimedia Tools Appl. 76(11), 13493–13510 (2016). https://doi.org/10.1007/s11042-016-3722-6 3. Ranjith Kumar, R., Jayasudha, S., Pradeep, S.: Efficient and secure data hiding in encrypted images: a new approach using chaos. Inf. Secur. J. Glob. Perspect. 25(4–6), 235–246 (2016). https://doi.org/10.1080/19393555.2016.1248582 4. Patel, R., Lad, K., Patel, M.: Novel DCT and DST based video steganography algorithms over non-dynamic region in compressed domain: a comparative analysis. Int. J. Inf. Technol. 14, 1649–1657 (2022). https://doi.org/10.1007/s41870-021-00788-7 5. Kaur, R., Singh, B.: A hybrid algorithm for robust image steganography. Multidimens. Syst. Signal Process. (2020). https://doi.org/10.1007/s11045-020-00725-0 6. Sharif, A., Mollaeefar, M., Nazari, M.: A novel method for digital image steganography based on a new three-dimensional chaotic map. Multimedia Tools Appl. 76(6), 7849–7867 (2016). https://doi.org/10.1007/s11042-016-3398-y 7. Valandar, M.Y., Barani, M.J., Ayubi, P., Aghazadeh, M.: An integer wavelet transform image steganography method based on 3D sine chaotic map. Multimedia Tools Appl. 78(8), 9971– 9989 (2018). https://doi.org/10.1007/s11042-018-6584-2 8. Parah, S.A., Sheikh, J.A., Assad, U.I., Bhat, G.M.: Hiding in encrypted images: a three tier security data hiding technique. Multidimension. Syst. Signal Process. 28(2), 549–572 (2015). https://doi.org/10.1007/s11045-015-0358-z 9. Gajabe, R., Ali, S.T.: Secret key-based image steganography in spatial domain Int. J. Image Graph. 22(02). https://doi.org/10.1142/S0219467822500140 10. Thiyagarajan, P., Natarajan, V., Aghila, G., et al.: Pattern based 3D image Steganography. 3D Res. 4, 1 (2013). https://doi.org/10.1007/3DRes.01(2013)1 11. Zhou, R.-G., Luo, J., Liu, X., Zhu, C., Wei, L., Zhang, X.: A novel quantum image steganography scheme based on LSB. Int. J. Theor. Phys. 57(6), 1848–1863 (2018). https://doi.org/ 10.1007/s10773-018-3710-x 12. Miri, A., Faez, K.: An image steganography method based on integer wavelet transform. Multimedia Tools Appl. 77(11), 13133–13144 (2017). https://doi.org/10.1007/s11042-0174935-z 13. AbdelRaouf, A.: A new data hiding approach for image steganography based on visual color sensitivity. Multimedia Tools Appl. 80(15), 23393–23417 (2021). https://doi.org/10.1007/s11 042-020-10224-w 14. Hussain, M., Riaz, Q., Saleem, S., Ghafoor, A., Jung, K.-H.: Enhanced adaptive data hiding method using LSB and pixel value differencing. Multimedia Tools Appl. 80(13), 20381–20401 (2021). https://doi.org/10.1007/s11042-021-10652-2 15. Attaby, A.A., Ahmed, M.F.M.M., Alsammak, A.K.: Data hiding inside JPEG images with high resistance to Steganalysis using a novel technique: DCT-M3. Ain Shams Eng. J. 9, 1966–1974 (2018). https://doi.org/10.1016/j.asej.2017.02.003 16. Siddiqui, T.J., Khare, A.: Chaos-based video steganography method in discrete cosine transform domain. Int. J. Image Graph. (2020). https://doi.org/10.1142/s0219467821500157 17. Alipour, M.C., Gerardo, B.D., Medina, R.P.: LSB substitution image steganography based on randomized pixel selection and one-time pad encryption. In: 2020 2nd International Conference on Big-Data Service and Intelligent Computation (2020). https://doi.org/10.1145/344 0054.3440055 18. The USC-SIPI Image Database. http://sipi.usc.edu/database. Accessed 12 June 2017

PU Matrix Completion Based Multi-label Classification with Missing Labels Zhidong Huang, Peipei Li, and Xuegang Hu(B) School Artificial Intelligence, Hefei University of Technology, Hefei, China {huangzhidong,huxuegang,lipeipei}@hfut.edu.cn

Abstract. Multi-label classification has attracted significant interests in various domains. However, only few positive labels are labeled. Treating unknown labels as negative labels brings the one-side noise, which causes the severe performance degradation. Motivated by this, we present the positive and unlabeled (PU) matrix completion based Multi-Label Classification with Missing Labels called PUMCML, that focus on finding all positive labels and construct a classifier based on completion matrix under instance and label dependencies. More specifically, an asymmetrical PU learning loss function is firstly used to solve the one-side noise from missing labels. Secondly, an instance dependencies assumption is proposed that similar instances in multi-label data will share similar labels. Furthermore, considering the label dependencies, the completion matrix consisting of the predicted labels and origin labels is constrained by a nuclear norm. Finally, the proposed algorithm is optimization problem which is convex but not smooth, so Alternating Direction Multiplier Method is design for handling the optimization problem by dividing the problem into multiple subproblems. Experimental results on four benchmarks demonstrate our method achieves a huge improvement in performance and robustness to missing labels compared to other advanced algorithms. Keywords: Multi-Label classification optimization

1

· Missing labels · ADMM

Introduction

In traditional multi-label classification problems, each instance is associated with a set of labels. Different from multi-class classification, different labels of an instance can exist together in multi-label scenario. It has been widely used in various domains, such as pattern recognition [2], text classification [15], and audio classification [9]. For example, a novel can be labeled as both “science fiction” and “adventure”. According to the popular taxonomy presented, there are two categories of multi-label learning approaches: Problem Transformation(PT) [22] and Algorithm Adaptation(AA). Majority of the methods are based on the assumption all labels are observed. However, it is difficult to get complete and correct labels in real world. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023  A. Abraham et al. (Eds.): ISDA 2022, LNNS 715, pp. 71–82, 2023. https://doi.org/10.1007/978-3-031-35507-3_8

72

Z. Huang et al.

Human labelers might not be able to discriminate all labels for every instance. It causes missing labels. Recently, lots of works have been proposed to solve the missing labels problem in multi-label classification. Some methods handle the missing-label problem as a pre-processing step, while the other works learn the multi-label classifiers and recover the label matrix simultaneously, which have made great progress. Although PU matrix completion [5] only considers a matrix completion problem, its solution treating the problem as a special positive and unlabeled problem and using asymmetrical loss also makes sense in multi-label with missing label problem. For instance, LEML [21] and LSML [6], consider the label dependencies to recovery the labels matrix. Manifold regularization used in [4,10] is also useful to missing labels problem by solving the instance dependencies. The illustration of instance and label dependencies is shown in Fig. 1.

Similar

0.9

0.8

0.1

1

0

1

1

0.95

0.85

0.15

?

0

1

1

0.2

0.3

0.4

0

0

1

0

Label Dependency

Instance Dependency

Fig. 1. Each label can be inferred from the same label from different instances (instance dependencies) and different labels from the same instance(label dependencies). For example, we give 3 instances with three features and four labels. The first instance is similar to the second instance in feature space so their labels will also be similar. And it seems that the appearance of the fourth label causes the first label to appear. So we can make the prediction that “?” is a latent “1”.

An instance without a certain label may belong to this label latently. And an instance with a certain label always belongs to this label. Thus, the labels that classifier can get should have more weight in the classification. Furthermore, in general opinion [10], similar instances should be predicted by similar labels(instance dependencies) and dependencies also exists between labels. It is important to unbiasedly learn with exploiting instance and label dependencies. To exploit label completion matrix under the dependencies between instance and label, we propose a PU matrix completion based multi-label classification method for Missing Labels (PUMCML). Specifically, A PU asymmetric loss can be used to modify weight for labels, because unlabeled label space may be latent positive. In addition, a manifold regularization term is used based on assumption that the similar instances should be predicted similar labels. The nuclear norm constrained for completed label matrix exploits the label dependencies. Due to the non-smooth part difficult to optimize directly, we step by step use the Alternating Direction Multiplier Method(ADMM) [11] to update variables. The main contributions of our proposed method are below.

PU Matrix Completion Based MLC with Missing Labels

73

• To remove the one-side noise caused by missing labels, we firstly take PU asymmetric loss, used in PU matrix completion [5], into multi-label learning to learn without bias. • A multi-label classifier with asymmetric loss is jointly learned under instance and label dependencies. Let the results predicted by the classifier be consistent with the results reduced with dependencies. It greatly improves the performance of the classifier. • A An efficient solution is proposed for the complex loss function with Hadamard product introduced by the asymmetric loss.

2

Related Work

In multi-label classification with missing labels, only a partial set of labels can be observed. The performance of multi-label classification without considering of missing labels is significantly influenced by one-side noise due to the label incompleteness. To alleviate the negative effects of this situation, many approaches have been proposed. In the early work, it is a common practice to take label matrix completion as a pre-processing step, like Maxide [20]. These two-step methods do not consider interrelation between outputs of multi-label classifier and matrix completion, which causes the performance degradation. Some works assumed that missing labels of an instance are known and the loss function is constructed without considering them. , such as LEML [21],SSWL [3]. But in most cases, it is unknown whether the label is negative or missing. Some methods have been proposed recently under the case that the entries of missing labels are unknown, such as LSML [6] and LRML [4]. More specifically, LSML learns label-specific features by 1 norm and label correlation by self-represent simultaneously. LRML completes the matrix constrained by the low-rank weight matrix and optimism loss at the same time. DM2L [12] uses global and local structures like Glocal [23]. Weighted loss function attracts attention recently years. PU matrix completion [5] uses asymmetric loss on matrix with missing values and low rank constraint for completed matrix. Although it is only designed for matrix completion task, matrix completion also can be treated as an important step CE-WCE [7] uses weighted loss function to account for the confidence in each label and can be incorporated to finetune a pre-trained model, but it does not take label dependencies into consideration.

3 3.1

Proposed Method Problem Formulation n

In multi-label learning, let the training data with n instances as D = (xi , yi )i=1 , where xi ∈ Rd represents the d features of the ith instance, and yi ∈ {0, 1}l represents the l labels of ith instance. In this paper, feature matrix is denoted

74

Z. Huang et al.

as X = [x1 , x2 , ..., xn ]T ∈ Rn×d and the corresponding label matrix as Y = n×l [y1 , y2 , ..., yn ]T ∈ {0, 1} . Under the missing labels setting, Yij = 1 indicates the ith instance is with the jth label, while Yij = 0 indicates the status of corresponding label is latently positive or not. It is similar as the setting in PU learning. Influencing from a lot of multi-label learning works in recent years, we also adopt a linear regression model with threshold fine-tuning as the classifier. Therefore, our method aims to learn a weight matrix W from incompleted label matrix to project the feature space to the real label space. We will learn W from the perspective of label and instance dependencies based on the manifold assumption. More details of all techniques are as follows. 3.2

Debiased Learning with PU Asymmetric Loss

According to LEML [21], the squared loss can bring the best performance for multi-label learning task. The base regression loss can be defined by: min W

||XW − Y||2F

(1)

where W is the weight matrix. But in the missing labels case, Yij = 0 indicates the ith instance also has possibility of having jth label. If we treat them as negative directly, it will cause a one-side noise problem, which leads to a bias for learning a regression model. Inspired by PU matrix completion [5], we propose an asymmetric loss [16] to reduce the bias.   L(ˆ yij , 1) + (1 − α) L(ˆ yij , 0) (2) min α W

Yij =1

Yij =0

where α is cost parameter ranged in (0.5, 1), L is loss function and yˆij is regression prediction of corresponding ith label of jth instance. Due to squared loss, it can be written as: min W

α 1−α ||(XW − Y)  Y||2F + ||XW  (E − Y)||2F 2 2

(3)

where  represents Hadamard product and E is all-one matrix. Thus, observed positive labels get larger weights and unknown labels with latent positive labels treated as negative ones get less weights so that our model is more consistent with our missing labels case. 3.3

Learning Instance Dependencies by the Manifold Regularization

Exploiting relationship between any two instances by the manifold regularization for label smoothness [18] also improves performance. According to the previous work [10,18] , we make an assumption as that if instance xi and instance xj are close in the features space, the predicted

PU Matrix Completion Based MLC with Missing Labels

75

labels will also be close in the labels space. Under this assumption, the label space of an instance is constrained by those labels belonging to near instances. Although calculating the similarity between every two instances will lead to a better performance but huge computational cost , this will be difficult to achieve on large datasets. So the k-nearest neighbor (knn) mechanism is adopted to measure instance dependencies used in CLML [10]. Different from treating all k-nearest neighbor with the same weight in CLML, we defined a kernel weight in knn set to calculate instance dependencies matrix D. The elements in matrix D are defined as follows:  k(xi , xj ), xi ∈ knn (xj ) or xj ∈ knn (xi ) (4) dij = 0, otherwise where knn(xi ) indicates Euclidean distance of k-nearest neighbors of instance 2 2 xi and k(xi , xj ) = e−xi −xj  /σ . A manifold regularization term defined below is utilized to achieve assumption with D. min W

n  i.j

dij xi W − xj W 

 = tr (XW) L(XW)

(5)

T

The L, which represents the n × n Laplacian matrix of D, computed by L = diag(D) − D. 3.4

Learning Label Dependencies by Low-Rank Constraint

In multi-label learning, an instance with one label may also have more possibility to belong to another label if these two labels are correlated. Therefore, exploiting dependencies between labels is necessary. Motivated by MLR-GL [17], the prediction for unknown labels is dependent on those observed labels. To find out this dependencies, a low-rank constraint is used for completion matrix. It is assumed that label correlation dependencies exists in the ground truth label matrix [10]. We denote C as a latent label matrix. If the predicted result on unknown labels obeys this dependencies, the rank of completion matrix Y + C should be lower than one predicted without label dependencies. In order to consider instance and label dependencies as a whole, the nuclear norm is used to constrain the competition matrix Y + C. Obviously, there must be consistency between the labels that are complemented with the label dependencies and the labels predicted by the sample features. So, the second term ||(XW)(E−Y)||2F in Eq. (3) should be rewritten to ||(XW − C)  (E − Y)||2F . Obviously, the latent positive label matrix C is sparse because of the random missing setting in Sect. 1. Therefore, we also use a 1 norm to constrain it.

76

Z. Huang et al.

The PU asymmetric loss function with low rank constraint and manifold regularization is defined as: min W

α ||(XW − Y)  Y||2F 2 1−α ||(XW − C)  (E − Y)||2F + 2 + γ||C||1 + β||Y + C||∗ + ηtr(WT XT LXW)

(6)

s.t. C > 0

where β, γ are trade-off coefficients. With a learned W, we can transform a linear regression XW to a classifier by a threshold and make the prediction.

4

Optimization Algorithm

The convex but non-smooth optimization problem, is difficult to be solved directly because of nuclear norm regularization terms and the 1-norm constraint. We exploit the Alternating Direction Multiplier Method to optimize the problem. To separate the problem, the variable T is introduced and T = Y + C. Then the subproblem of solve nuclear norm of ||Y + C||∗ is transformed to solve nuclear norm of ||T||∗ . In some recently multi-label learning work [19], singular value decomposition(SVD) is used to solve the nuclear norm in each interaction, which may lead to a heavy computational complexity. Fortunately, the nuclear norm can be efficiently solved by the subproblem that: 1 min ||T||∗ = min (||R||2F + ||M||2F ), s.t. T R,M 2

T = RM

(7)

if r < min(l, n), the rank of W will be constrained by this factor decomposition. For handling constraint between matrices T,Y and C, the augmented Lagrange function can be written below: min W

α ||(XW − Y)  Y||2F 2 1−α ||(XW − C)  (E − Y)||2F + 2 β + γ||C||1 + (||R||2F + ||M||2F ) 2 μ + < A, Y + C − RM > + ||Y + C − RM||2F 2 + η tr(WT XT LXM) s.t. C > 0

(8)

PU Matrix Completion Based MLC with Missing Labels

77

According to the LADMAP [14], it can be rewritten as: min W

α ||(XW − Y)  Y||2F 2 1−α ||(XW − C)  (E − Y)||2F + 2 β + γ||C||1 + (||R||2F + ||M||2F ) 2 A μ + ||Y + C − RM + ||2F 2 μ + ηtr(WT XT LXW) s.t. C > 0

(9)

where A indicates Lagrange multiplier matrix, and μ denotes the penalty parameter. It is clear that the Eq. (6) is a biconvex problem, which can be solved in an alternating way via the following subproblems: 1. Updating W: Although the problem with Hadamard product can obtain a closed form solution in SSWL [3], it has huge memory requirement and can not run on a big dataset. And f (W) is also not smooth due to 1 norm. Fortunately, we can solve it by proximal gradient descent method. With other variables fixed, the derivative of f (W) smooth part is updated as follows: ∇f (W) =αWT ((XW − Y)  (Y)) + (1 − α)WT ((XW − C)  (1 − Y))

(10)

T

+ ηX XWL 2. Updating C: Similar to updating W, the derivative of f (W) smooth part is calculated as follows: ∇f (C) =(1 − α)CT ((C − XW)  (1 − Y) A + μ(Y + C − RM + ) μ

(11)

and we can update C with non-negative constraint [6] by: Ct+1 = max(Sγ [Ct − θt ∇f (Ct )], 0)

(12)

3. Updating R, M: When fixing other variables, the optimization of term of R and M are treated as least square problem and get close form solution below:  −1 R = (μ1 (Y + C) + A) MT βI + μ1 MMT

(13)

 −1 T M = μ1 RT R + βI R (μ1 (Y + C) + A)

(14)

4. Updating A and μ: Lagrange multiplier matrix A, and we can update corresponding parameter μ by:

78

Z. Huang et al.

Algorithm 1: Optimization of PUMCML Input: Features matrix X,Labels matrixY,parameters α, γ, β, η 1 Initialization:W = 0, C = 0, A = 0, R = 0, M = 0, μ > 0; 2 while not converge do 3 Update W by Eq.(11); 4 Update C by Eq.(13); 5 Update R by Eq.(14) and M by Eq.(15); 6 Update A by Eq.(16); 7 t = t + 1; 8 end Output: The optimal solution W∗ ← Wt

At+1 = At + μt+1 (Y + C − RM)

(15)

μk+1 = min(μmax , πμk )

(16)

where π is a positive scalar. The convergence of our optimization strategy ADMM convergence has been proven in [11]. Algorithm 1 shows pseudo code of our method.

5 5.1

Experiment Experiment Setup

For evaluating our proposed PU matrix completion multi-label classification based method for Missing Labels (PUMCML), we conducted experiments on 4 real-world datasets: RCV1sub1, LanguageLog, Stackex Chess and Society. We select representative metrics to verify the performance of proposed method: micro F1, macro AUC, Ranking Loss and Coverage. For each dataset, 80% of data are randomly generated as the training data while 20% for testing data, which repeats 5 times. The missing rate is set from 30% to 70% with a step-size of 10%. Positive labels are randomly dropped out. The optimal parameters vary depending on the different datasets in the real applications. Generally, we simply set k of knn to 3 for the balance of computation complexity and performance. The parameter α controls the asymmetric loss. According to PU matrix completion, it is set to 0.5 + missing rate/2, which can lead to the best performance. The parameter β balances the label dependencies loss and γ controls the sparsity of the prediction for missing labels. Grid search is used to determine other parameters for archiving the best performance. We can search them one by one instead of grid search in practice and get the same performance most of the time.

PU Matrix Completion Based MLC with Missing Labels

5.2

79

Comparing Algorithms

For validating the performance of our proposed algorithm, several multi-label classification methods are selected for comparing as follow. We use Binary Relevance(BR) [1] with a Frobenius norm and LEML [21], which do not consider label or instance dependencies and missing labels, as baselines. LSML [6], LRML [4], SSWL [3] and DM2L [12] are advanced methods. The parameters of all methods are set by grid search. Due to the reason described in Sect. 5.2, for fair comparison, we use KM2 [13] to find out missing rate, which is also used in Positive and Unlabeled with Selected Bias (PUSB) [8], instead of using a prior missing rate. 5.3

Results and Analysis

Fig. 2. Results on RCV1sub1

Fig. 3. Results on LanguageLog

Fig. 4. Results on Stackex chess

Fig. 5. Results on Society

From Figs. 2, 3, 4 and 5, our proposed PUMCML significantly outperforms all baselines on near all the evaluation metrics. Its performance on “F1” is on par

80

Z. Huang et al.

with advanced classifiers. By using asymmetric loss, at the higher the missing rate, PUMCML achieves better performance against other methods. BR does not consider dependencies and missing labels and get the worst performance. LEML constructs the loss function considering the positive labels and exploits some relations by matrix decomposition. It can speed up multi-label classification and has information about which labels are missing, but its performance is only better than BR. LSML considers label dependencies and label specific features but missing instance dependencies. LRML considers instance and label dependencies while missing asymmetric loss. DM2L learns well from local and global spaces on lower missing rate but the performance declines rapidly on higher missing rates. They achieved a good performance but performed weaker than ours. SSWL is designed for weak supervision learning and uses two weight matrices to predict positive and negative labels respectively. But multi-label classification suffers a serve unbalance problem, it leads to a negative label prediction matrix difficult to train. So it does not achieve the desired performance. 5.4

Parameters Sensibility Study

Fig. 6. Sensitivity analysis of parameter α(Missing rate:60%) on RCV1sub1

Fig. 7. Sensitivity analysis of other parameters (Missing rate:60%) on RCV1sub1

In this section, we discuss how these four parameters influences performance on the real-world datasets. We conducted experiments by varying only one parameter while fixing the other three parameters. Figures 6, 7 report the sensitivity analysis of parameter α and other parameters in the case with missing rate 60% on the dataset of RCV1sub1. This dataset was chosen because it has higher cardinality value with more label dependencies to be captured. According to PU matrix completion [5], the parameter α set to 0.5 + missing rate/2 can achieve the best performance. However, in practice, we find the accuracy is optimal when α is set to a value slightly larger than 0.5 + missing rate/2 from Fig. 6. We speculate that the reason for this situation may come from imbalance of multi-label data. The parameter γ controls the sparsity of completed labels.

PU Matrix Completion Based MLC with Missing Labels

81

Classifier will perform better on label based metrics but perform worse on exampled based metrics. The parameters β and η also work well in a proper value. The parameter λ controls sparsity of W. When it takes a too large value, W will be a zero matrix and learning nothing. On the contrary, it can not make W sparsity for label specific features if it is too small. The parameter γ controls the sparsity of completed labels. An appropriate complement labels number will improve performance. Classifier will performance better on label based metrics but worse on exampled based metrics with. The parameter α is more significant than other parameters, especially at a high missing rate setting. It indicates that asymmetric loss with a proper α can debias one-side noise caused by missing labels to get a better completion matrix and improve the performance remarkably.

6

Conclusion and Further Work

In this paper, a PU matrix completion based Multi-Label classification with Missing Labels(PUMCML) method is proposed. With a PU asymmetrical loss, the squared error is re-written to two different weight parts. It is useful to Manifold regularization is taken into consideration for instance dependencies. Label dependencies is captured by low-rank constraint on the completion label matrix. Instance and label dependencies learning is jointly optimized. Experimental result demonstrates that asymmetrical loss significantly improves the performance of multi-label classifiers especially at a high missing rate. In the future, we will estimate missing rate and learn from instances with missing labels jointly.

References 1. Boutell, M.R., Luo, J., Shen, X., Brown, C.M.: Learning multi-label scene classification. Pattern Recogn. 37(9), 1757–1771 (2004) 2. Chalkidis, I., Fergadiotis, M., Malakasiotis, P., Androutsopoulos, I.: Large-scale multi-label text classification on EU legislation. arXiv preprint arXiv:1906.02192 (2019) 3. Dong, H., Li, Y., Zhou, Z.: Learning from semi-supervised weak-label data. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI18), pp. 2926–2933. AAAI Press (2018) 4. Guo, B., Hou, C., Shan, J., Yi, D.: Low rank multi-label classification with missing labels. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 417–422. IEEE (2018) 5. Hsieh, C.J., Natarajan, N., Dhillon, I.: PU learning for matrix completion. In: International Conference on Machine Learning. pp. 2445–2453. PMLR (2015) 6. Huang, J., et al.: Improving multi-label classification with missing labels by learning label-specific features. Inf. Sci. 492, 124–146 (2019) 7. Ibrahim, K.M., Epure, E.V., Peeters, G., Richard, G.: Confidence-based weighted loss for multi-label classification with missing labels. In: Proceedings of the 2020 International Conference on Multimedia Retrieval, pp. 291–295 (2020)

82

Z. Huang et al.

8. Kato, M., Teshima, T., Honda, J.: Learning from positive and unlabeled data with a selection bias. In: International conference on learning representations (2018) 9. Lee, J., Seo, W., Park, J.H., Kim, D.W.: Compact feature subset-based multi-label music categorization for mobile devices. Multimedia Tools Appl. 78(4), 4869–4883 (2019) 10. Li, J., Li, P., Hu, X., Yu, K.: Learning common and label-specific features for multilabel classification with correlation information. Pattern Recogn. 121, 108259 (2022) 11. Lin, Z., Liu, R., Su, Z.: Linearized alternating direction method with adaptive penalty for low-rank representation. arXiv preprint arXiv:1109.0367 (2011) 12. Ma, Z., Chen, S.: Expand globally, shrink locally: discriminant multi-label learning with missing labels. Pattern Recogn. 111, 107675 (2021) 13. Ramaswamy, H., Scott, C., Tewari, A.: Mixture proportion estimation via kernel embeddings of distributions. In: International Conference on Machine Learning, pp. 2052–2060. PMLR (2016) 14. Ren, X., Lin, Z.: Linearized alternating direction method with adaptive penalty and warm starts for fast solving transform invariant low-rank textures. Int. J. Comput. Vision 104(1), 1–14 (2013) 15. Santos, A.M., Canuto, A.M.: Using semi-supervised learning in multi-label classification problems. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2012) 16. Scott, C.: Calibrated asymmetric surrogate losses. Electronic J. Stat. 6, 958–992 (2012) 17. Wu, B., Jia, F., Liu, W., Ghanem, B., Lyu, S.: Multi-label learning with missing labels using mixed dependency graphs. Int. J. Comput. Vision 126(8), 875–896 (2018) 18. Wu, B., Liu, Z., Wang, S., Hu, B.G., Ji, Q.: Multi-label learning with missing labels. In: 2014 22nd International Conference on Pattern Recognition, pp. 1964– 1968. IEEE (2014) 19. Xie, M., Huang, S.: Partial multi-label learning. In: Proceedings of the ThirtySecond AAAI Conference on Artificial Intelligence, (AAAI-18), pp. 4302–4309. AAAI Press (2018) 20. Xu, M., Jin, R., Zhou, Z.H.: Speedup matrix completion with side information: application to multi-label learning. In: Advances in Neural Information Processing Systems, pp. 2301–2309 (2013) 21. Yu, H.F., Jain, P., Kar, P., Dhillon, I.: Large-scale multi-label learning with missing labels. In: International conference on machine learning, pp. 593–601. PMLR (2014) 22. Zhang, M.L., Zhou, Z.H.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26(8), 1819–1837 (2013) 23. Zhu, Y., Kwok, J.T., Zhou, Z.H.: Multi-label learning with global and local label correlation. IEEE Trans. Knowl. Data Eng. 30(6), 1081–1094 (2017)

SP2P-MAKA: Smart Contract Based Secure P2P Mutual Authentication Key Agreement Protocol for Intelligent Energy System Pooja Verma(B) and Daya Sagar Gupta Rajiv Gandhi Institute of Petroleum Technology, Jais 229304, India [email protected] Abstract. The upcoming intelligent system which attempts to maximize energy efficiency is known as the Intelligent Energy System (IES). In which, prosumers could be either energy requester or responder, who dynamically adjust energy transfer on real-time basis, via a bi- directional channel. Even so, the widespread use of the IES raises several privacy and security concerns, such as centrally controlled administrative power, data alteration activities. Thus, the primary objective of our protocol is to address issues reading identity authentication exist in the IES. A trustworthy and reliable mutual authentication scheme is suggested by combining the Lattice-based Cryptosystem (LB) with blockchain based smart contract concept for P2P communication in IES. Blockchain concept is mainly used to minimize the negative impact of malicious validators existence in distributed system. To maintain the privacy of transferred data, an agreement session key is being generated between mutually authenticated peers. As a result, our suggested protocol works admirably with principle of integrity, security, and privacy. An informal security analysis is also performed by imposing the concept of hard assumption of SIS and ISIS challenges exist in LB cryptosystem. Keywords: Lattice-based Cryptography · Smart Contract · Mutual Authentication (MA) · Privacy · Security · Intelligent Energy System (IES)

1

Introduction

As intelligent technologies become more prominent and are used to automate processes. Emerging computer and communication technologies are moving towards Internet of Things (IoT)-based applications [12,14]. Communication networking Services of IoT are used throughout many industries, including intelligent household appliances, intelligent transportation, the medical service institutions, and intelligent energy system (IES) etc. Peer-to-peer (P2P) in IES might have now start to appear pertinent and essential in a smart community due to Supported by organization RGIPT Jais. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023  A. Abraham et al. (Eds.): ISDA 2022, LNNS 715, pp. 83–92, 2023. https://doi.org/10.1007/978-3-031-35507-3_9

84

P. Verma and D. S. Gupta

the widespread growth in the range of distributed energy producers and consumers [1]. Energy users with the ability to produce and intake power are called prosumers [12]. Both centralized and distributed management styles are used to regulate the trading of energy. A single inter-mediating node should manage, process, and authenticates transaction data among various users in centralized system. This strategy raises the whole network expense and vulnerability of a single point of breakdowns. Challenges with safety and privacy seem to be also the limitations of a centralized system. If a viable alternative source of energy cannot be found, a major electric power breakdown inside the scheme could have terrible consequences on the buyer. Decentralized or distributed power systems are hence offered as a substitute towards the centralized model. To provide a comfortable and distributed environment to the consumer which allows them to regulate their own transaction records with a quite secured environment. Blockchain technology (BCT) with Smart contract(SC) is also offered a safe and secure environment for distributed system. Every user inside such the network is directly connected to every other user [1]. Additionally, the central methodology platform’s single point of breakdown and privacy concerns are resolved by blockchain. It is helpful for several things, including verification, trustworthiness, and privacy. Participants of the system who handle security measures using guidelines to agree on a general situation of own system are known as validators. As peers convey their sensitive information, preserving data privacy and security is one of the greatest challenges associated with the Intelligent system. Numerous authentication protocols for IES based networks were suggested to address such security risks. The privacy and anonymity of distributed data cannot be achieved by only MA-based strategies. Therefore, secure and protected end-to-end interaction is required, and by using smart contract it can handle easily. 1.1

Motivation

We reviewed several AKA protocols and learned that they could be based on either traditional Public-Key Cryptography or identity-based cryptography. Many of them seem to be vulnerable to several attacks or even have greater communication and computational total cost. Additionally, a quantum computer will eventually target such conventional security methods like PKC, IBC, and ECC, and others. Shor [13] founded that such approaches are susceptible to contexts with quantum entanglement. Resultantly, a technique which could also withstand quantum malicious activities is considered necessary. Thus, we switch towards the lattice, that hold a complicated configuration and a polynomial time limit on how long it takes to alleviate biggest hard challenges with an average expense [9]. We were encouraged to make contributions a lightweight intelligent energy system premised on a lattice. Following a security and effectiveness observation, we discovered that our scheme is safe and protected premised on a SIS challenge. Provide robust security as it also use blockchain based Smart Contract based function calling for verification of trustworthiness’ of peers points.

SP2P-MAKA

1.2

85

Paper Outlines

The remaining portions of my work are arranged as follows. In Sect. 2, several earlier announced works were noted and described. Section 3, define some preliminary terms, such as LB cryptosystem, and q-ary lattice with their hard presumptions, such a SIS and ISIS challenges premised on review work. The required phases for our proposed protocol, LB-Based AKA protocol using smart contract are described in Sect. 4 of this article. Next, in Sect. 5, we also formally demonstrate the informal security analysis of proposed protocol. Finally, Sect. 6 concludes and future work term of the paper.

2

Literature Survey

Author Bohen et al. [2] was the first who proposed a 2party AKA scheme based on ID cryptography concept in 2001. Cao et al. [3], introduced a pairing free ID AKA scheme by use of computational Diffie Hellman hard assumptions. Surprisingly, Islam et al. [11] discovered that Cao’s protocol appears to lack the ability to protect confidentiality from key offset and impersonation attacks. Thus, they designed a new pairing-free ID-based AKA framework using ECC, with security analysis has been conducted against [3]’s weaknesses. A secure ID-based AKA scheme was developed by Gupta et al. [8], which can be used by more than two connected parties. Various schemes based on Lattices are suggested by researchers in recent few years to tackle the quantum computing. Wang et al. [15] introduced an extended version of SIS and ISIS challenge of Lattices which are known as Bi-SIS and Bi-ISIS challenges. In [6] authors performed cryptanalysis over Wang’s scheme and found about the possibility if few attacks like MITM attacks. Further Author [7], introduced an improved version with more robust security that is LB 2P-AKA protocol. In which for authentication purpose they used signature concept. Although, this scheme suffered with high computing-communicating and storing cost. Gupta et al. [9] proposed a authenticated and key exchange scheme by utilizing LB cryptography for intelligent healthcare environments. They also proposed a novel work [5] for smart grid system (SGS) which provide the mutual authentication between the service provider and smart meter to prevent the falsification etc. Gupta et al. [10], proposed a 2P AKA for internet of vehicular environment by utilizing the ID-based and LB cryptosystem. Thus, this scheme requires very small storage capacity with small computational and communicational cost. Nyangaresi et al. [12] also presented an AKA protocol for smart homes by imposing the LB cryptography. As for the distributed system blockchain helps to provide the authenticity and integrity of records. Thus, researchers are also turn toward the utilization of blockchain based protocol, smart contract with cryptography-based algorithms to provide the robust security. Author Garg et al. [4] designed a MA based scheme by applying the blockchain with ECC for peer-to-peer (P2P) communication in SGS. Aggarwal et al. [1] introduced a secure protocol for the vehicle-to-grid (V2G) communication by utilizing the blockchain concept with cryptographic algorithms to provide the mutual authentication.

86

P. Verma and D. S. Gupta

3

Technical Preliminaries

Here, we define the basic mathematical fundamental of used cryptography as Lattice-based cryptography along with their hard challenges. 3.1

Lattice Based Cryptography (LBC)

In the appearance of quantum mechanics, the cryptographic concept could be hardened by the rigid assumptions on lattices. A lattice would be any conventional architecture. By employing the characteristics of lattice, lattice-based cryptographic concepts demonstrate a sophisticated security mechanisms and effective network establishment [5]. The profound challenges are robust sufficiently to conquer the quantum attacks were carried out in a quantum computing system. Using the mentioned specifications, one can define a lattice mathematically: 1. Integer Lattices: Let Rm be a collection of feature vectors l = {l1 , l2 , ....ln } that are linearly independent [10]. Where Rm is Euclidean space of dimension m, from which a lattice L is achieved by l could be define as: L( l1 , l2 , ....ln ) = {

n 

{zk lk : zk ∈ Z}}

(1)

k=1

where linear vectors l1 , l2 , ....ln are known as basis with rank n and the dimension of lattice L is m. Moreover, every lattice has a common basis. A basis is being represented by the basis matrix L = [l1 , l2 , ....ln ] ∈ Z m×n , in which the matrix L s column refers to a set of basis vectors. Thus, L derived from Rm Euclidean space of dimension m is interpreted as: L(L) = [Lz : z ∈ Z] where general matrix-vector multiplication performed represented by expression Lz. Definition 1: The minimum distance of L, which corresponds to the shortest non-zero vector, could be interpreted as: Dmin (L) = minl∈ L || l|| 0

(2)

Definition 2: Shortest Vector Challenges (SVC): For the known basis matrix L as:L ∈ Z m×n and its lattice L(L), there is a difficult to determine out a non-zero vector l ∈ L with definition || l || = Dmin (L). Definition 3: Closet Vector Challenges (CVC): For the known basis matrix L as: L ∈ Z m×n with its lattice L(L) and where ∫ ∈ / L ; there is a difficult to determine out a non-zero vector l ∈ L with definition || l || = Dmin (L). 2. q-ary Lattice: The Lattice L which fulfilled the condition: Zqn ⊆ L ⊆ Z n over the q modulus value is termed as q-ary lattice [9]. Definition 4: Assume an integer matrix: M ∈ Zq m×n with modulus value q, ⊥ therefore two q-ary lattices are represented as: q = {x ∈ Z n : Mx = 0 mod q}  and q = {x ∈ Z n : x = MT w mod q, ∀ w ∈ Z n }. Two hard challenges related to q-ary have been employed with our proposed protocol. Such issues are described in the following way:

SP2P-MAKA

87

Definition 5: Small Integer Solution (SIS) Challenges: To determine a vector n x ∈ Z0 with the satisfying condition: Mx = 0 mod q with ||x|| ≤ χ become a hard challenge, where matrix M ∈ Zq m×n and χ ∈ Z + are given. Definition 6: In-homogeneous Small Integer Solution (ISIS) Challenges: To n determine a vector x ∈ Z0 with the satisfying condition: MT w mod q with ||x|| ≤ χ become a hard challenge, where matrix M ∈ Zq m×n and χ ∈ Z + are given.

4 4.1

Proposed Model Network Model

For an IES framework, we designed a protocol SP2P-MAKA that generates a unique session key between P2P communication. Two peer ends are involved in this proposed protocol: Transmitter peer end (℘1 ), who initiates communication, Transceiver peer end (℘2 ), who is responsible for answers to transmitter requests. And there is a Registered Authority (RA) as a trustworthy entity. The primary function of RA is to retrieve the private-public key pairs as well as the signature of the corresponding peers. By employing offsite method for Peer’s authentication, RA computes the private-public key pairs with a signature using the Peer’s data. Real identity of any peer’s node only known by RA. All the symbols used in our scheme is defined in Table 1. Table 1. Notations used in Proposed scheme Notation

Definition

m

Security Parameter

n, p

Integer, Prime number

M

Matrix derived from Zqm×n

ID℘1 , HID℘1 Identity and hashed identity of Peer ℘1 ID℘2 , HID℘2 Identity and hashed identity of Peer ℘2 H1

Secure cryptographic hash function:H1 : {0, 1}∗ → {0, 1}m

H2

Secure cryptographic hash function: H2 : Zpn × Zpn × {0, 1}m → Zp∗

H3

Secure cryptographic hash function: H3 : Zpn × Zpn × Zpn × Zpn → {0, 1}m

x

Master Private key of Register Authority (RA).

RP b

Master Public key of Register Authority (RA).

a, b

Secret key of peer ℘1 and ℘2 respectively, where a, b ∈ Zpn

r, s

Ephemeral Keys chosen from Zpn

4.2

Proposed SP2P-MAKA Protocol

Proposed model comprises four phases in total: Setup, Registration and Key Extract, mutual Authentication and Key Agreement phase.

88

P. Verma and D. S. Gupta

1. Step up Phase: This phase is required to produce a set of system components, which is conducted by RA. It produces the systems variables in the following way once getting 1m as input. i. RA choose an integer n and a prime modulus p. ii. RA chooses a matrix M ∈ Zpn×n , where each of the matrix operations are performed using modulus p. iii. RA can define two secure cryptographic hash function like as: H1 : {0, 1}∗ → {0, 1}m H2 : Zpn × Zpn × {0, 1}m → Zp∗ H3 : Zpn × Zpn × Zpn × Zpn → {0, 1}m 2. Registration and Key Extraction Phase:The trusted authority RA must produce the private-public key pairs and generate the signature for authentication verification purpose. All steps involved under this part is defined below: i. First, RA choose a x ∈ Zpn as its master private key to generate its master public key as: RP b = x T · M. ii. Now publish system parameter param: {M, p, RP b , H1 , H2 , H3 } and maintaining the secrecy of master key x. iii. To hide the real identity of peer from other communicating node and intruders RA apply the hash function. For example, for the peer’s ℘1 identity ID℘1 ∈ {0, 1}∗ , RA computes: HID℘1 = H1 (ID℘1 ). iv. RA called the registration function REG (// HID of peer as argument) in smart contract algorithm, from which it generates the key pair including

Fig. 1. Illustrate the steps involved in Function Registration in Smart Contract

SP2P-MAKA

89

the signature and forwarded these values to the requested peer. Steps involved in function REG is described in Fig. 1. v. Similarly, when Peer ℘2 send the request to register then RA returned the values as per the function mentioned in Fig. 1, which are as: U℘2 = bT ·M, Sig℘2 = xT + e℘2 · bT , e℘2 = H2 [HID℘2 , U℘2 , RP b ]. vi. Furthermore, RA validates the signature if (Sig℘1 ·M == (RP b +e℘1 ·U℘1 ) verify or not? Validation of Signature: RHS: Sig℘1 · M == (xT + e℘1 · aT ) · M == (xT · M + e℘1 · aT · M) == (RP b + e℘1 · U℘1 ) LHS 3 Authentication Phase Under this subpart, we perform the authentication procedure when a peer ℘2 send a energy request to peer℘1 under the energy system. A secret key is being used to authenticate, which is used only once in such a transaction period. It indicates that once an energy transaction has been finished using the private key, the authenticity of the verification keys expired. There needs to be a fresh authentication procedure for each energy transfer. i. At Peer P1 end side: It chooses an ephemeral value s ∈ Zpn and computes: T℘1 = M · s, and V℘1 = s · Sig℘1 · M. Further, peer ℘1 sends the tuples [T℘1 , U℘1 , HID℘1 , V℘1 , t1 ] towards the ℘2 to generate the SK. ii. Authentication of Peer ℘1 at ℘2 ends side: First, ℘2 should verify the received timestamp and refuse the request in case of false return. Once it verifies then it should call the function signature verification in smart contract to check the authenticity of peer ℘1 . All steps involved in this signature verification are mentioned in Fig. 2. In this function, it should compute: T℘1 [H2 [HID℘1 , U℘1 , RP b ] · U℘1 + RP b ] to check it with M·V℘1 . By return value true or false peer should make decision regarding accept or reject the request as mention in Fig. 2. After the successfully authentication done, peer ℘2 forwards tuples [T℘2 , U℘2 , HID℘2 , V℘2 , t2 ] towards the peer ℘1 . iii. Authentication of Peer ℘2 at ℘1 end side: ℘1 start to verify the received timestamp t2 , peer ℘1 terminate the further process once it not verified. In case of true, ℘1 need to verify the ℘2 ’s identity by calling signature verification function, where it computes T℘2 [H2 [ HID℘2 , U℘2 , RP b ] · U℘2 +RP b ], to cross check it with M·V℘2 . Whenever, the function returns false then ℘1 terminates the request. Otherwise, ℘1 authenticate ℘2 as a legal peer as described in function represented by Fig. 2. 4. Session Key Agreement Phase: Both peers should start the generation of session key only after the phase 3 which shows peer ℘1 and ℘2 both the mutually authenticate to each other successfully. During the process of mutual authentication, the variables pair (T℘1 , T℘2 ) and (V℘1 , V℘2 ) depends on the secrecy of secret keys like as: privacy of (T℘1 , T℘2 ) is depends on the ephemeral

90

P. Verma and D. S. Gupta

Fig. 2. Illustrate the Function Signature verification in Mutual Authentication phase.

keys- s, r. Similarly, variable (Vp℘1 , V℘2 ) depends on s, r and (Sig ℘1 , Sig ℘2 ) which itself secure due to random secret key a, b. Thus, SK must be depending on these variables T℘1 , T℘2 , V℘1 and V℘2 to preserve the secrecy of this session key SK. In result, both peer points invoke the same session key such as: SK = H3 [T℘1 , T℘2 , V℘1 , V℘2 ].

5

Security Analysis

i. Man-in-the-Middle Attack: As in the suggested scheme, peer ℘1 transmit the tuples set as [T℘1 , U℘1 , HID℘1 , V℘1 , t1 ] . In such situation assume that Adversary A is present between peers ℘1 and ℘2 . Let, A computes T℘ 1 = M · u and sends this changed tuples [T℘ 1 , U℘1 , HID℘1 , V℘1 , t1 ] to peer ℘2 . Next, on receiving such tuples ℘2 wishes to validate the authenticity, so that peer ℘2 calculate T℘ 1 [H2 [HID℘1 , U℘1 , RP b ] · U℘1 + RP b ] = M · u[e℘1 · aT + xT ] · M = M · u · Sig ℘1 · M = M · V℘1 . As a result, peer ℘2 notify to adversary as authentication failed. Likewise, the peer ℘1 will also decline the As request As a result, there is no men-in-the-middle attack between the peer ℘1 and ℘2 . ii. Known Provisionally Information Attack: In the proposed scheme, session key SK calculated as: SK = H3 [ T℘1 , T℘2 , V℘1 , V℘2 ]. Even if opponent manage to gain ephemeral keys:s and r value of present session, it still unable to ascertain the session key information. A could only retrieve the session key after discovering the values: (T℘1 , T℘2 ) and (V℘1 , V℘2 ). Determining the value of (r · Sig ℘2 · M, s · Sig ℘2 · M) is comes under the hard assumption of complexity of solving SIS and ISIS issue. Thus, introduced scheme can prevent the attacks while provisional information are disclosed.

SP2P-MAKA

91

iii. Impersonation Attacks: Like MITM attack case, here A also want to pretend as one of communicating peers to other by alter the forwarded information. As we verify the identity of peers by other before going to share any information about the session. Thus, when A pretend to other then function signature verification should return false which define for rejecting the request. Hence, proposed scheme prevents system by impersonation attack. iv. Session Key Security: Session key SK = H3 [ T℘1 , T℘2 , V℘1 , V℘2 ] is derived after only the authentication of each peer done successfully. The final computation of SK is fully depends on the variables such as: T℘1 = M · s, V℘1 = s · Sig ℘1 · M, T℘2 = M · randV℘2 = r · Sig ℘2 · M. As we can see that derivation of keys s, andr are comes under hard assumption of ISIS and SIS. Furthermore, final computation is depending on the one-way secure hash cryptographic function. Thus, our scheme also maintains the secrecy of session key of each session. v. Perfect Forward Secrecy: Despite of having access to the peers secret key, an A was incapable to retrieve the prior SK information. A may indeed be able to deduce the SK, then A would initially derive the SK = H3 [T℘1 , T℘2 , V℘1 , V℘2 ]. These values could be determined only if secret keys a and b are known, yet not possible to determine (T ℘1 , T℘2 ) and (V℘1 , V℘2 ) as these variables are also dependents on ephemeral keys: s, which is not easy to determine either from T℘1 = M · orV℘1 = s · Sig ℘1 · M , due to hard assumption of SIS and ISIS issues. Similar for key values r, thus our proposed scheme preserves the forward secrecy. vi. Replay Attack: Let an A receive authentication message in any past session and want to use such message in current time as an authentic peer. Our protocol’ SK is depends on the key freshness property for secret and ephemeral keys. Thus, during verification it will deducted as replay attack. vii. Quantum Attack: Lattice based cryptography will stand against all type of quantum computing process. As our proposed protocol is based on LB cryptosystem thus, it can withstand against the quantum attack. Due to the hard assumption of SIS and ISIS challenges and polynomial time bound set against the quantum attack, the proposed protocol prevents such attacks.

6

Conclusion and Future Work

We introduced a secure P2P mutually authenticated and key agreement protocol for IES. In which first, peer’s should be verify to each other and only after mutual authentication it should start session key generation process. Moreover, we also use the two functions: REG and Signature Verification, which is generally based on smart contract concept of blockchain to minimize the malicious validator’s activities. Informal security analysis is performed which demonstrate that our proposed scheme is safe and secure by various attacks like MITM, impersonation, replay and quantum etc. In future, various challenges related to distributed nature of IES should need to address to minimize the computational and communications cost efficiently along with high security level.

92

P. Verma and D. S. Gupta

References 1. Aggarwal, S., Kumar, N., Gope, P.: An efficient blockchain-based authentication scheme for energy-trading in V2G networks. IEEE Trans. Ind. Inf. 17(10), 6971– 6980 (2020) 2. Boneh, D., Franklin, M.: Identity-based encryption from the weil pairing. In: Kilian, J. (ed.) CRYPTO 2001. LNCS, vol. 2139, pp. 213–229. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44647-8 13 3. Cao, X., Kou, W., Du, X.: A pairing-free identity-based authenticated key agreement protocol with minimal message exchanges. Inf. Sci. 180(15), 2895–2903 (2010) 4. Garg, S., Kaur, K., Kaddoum, G., Gagnon, F., Rodrigues, J.J.: An efficient blockchain-based hierarchical authentication mechanism for energy trading in V2G environment. In: 2019 IEEE International Conference on Communications Workshops (ICC Workshops), pp. 1–6. IEEE (2019) 5. Gupta, D.S.: A mutual authentication and key agreement protocol for smart grid environment using lattice. In: Proceedings of the International Conference on Computational Intelligence and Sustainable Technologies. pp. 239–248. Springer (2022). https://doi.org/10.1007/978-981-16-6893-7 22 6. Gupta, D.S., Biswas, G.: Cryptanalysis of Wang et al.’s lattice-based key exchange protocol. Perspect. Sci. 8, 228–230 (2016) 7. Gupta, D.S., Biswas, G.: A novel and efficient lattice-based authenticated key exchange protocol in C-K model. Int. J. Commun. Syst. 31(3), e3473 (2018) 8. Gupta, D.S., Hafizul Islam, S.K., Obaidat, M.S.: A secure identity-based threeparty authenticated key agreement protocol using bilinear pairings. In: Raj, J.S., Bashar, A., Ramson, S.R.J. (eds.) ICIDCA 2019. LNDECT, vol. 46, pp. 1–11. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38040-3 1 9. Gupta, D.S., Islam, S.H., Obaidat, M.S., Karati, A., Sadoun, B.: LAAC: lightweight lattice-based authentication and access control protocol for e-health systems in IoT environments. IEEE Syst. J. 15(3), 3620–3627 (2020) 10. Gupta, D.S., Ray, S., Singh, T., Kumari, M.: Post-quantum lightweight identitybased two-party authenticated key exchange protocol for internet of vehicles with probable security. Comput. Commun. 181, 69–79 (2022) 11. Islam, S., Biswas, G.: A pairing-free identity-based authenticated group key agreement protocol for imbalanced mobile networks. Annals of t´el´ecommunicationsAnnales des Telecommun. 67(11), 547–558 (2012) 12. Nyangaresi, V.O.: Lightweight key agreement and authentication protocol for smart homes. In: 2021 IEEE AFRICON, pp. 1–6. IEEE (2021) 13. Shor, P.W.: Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM Rev. 41(2), 303–332 (1999) 14. Verma, P.: A secure gateway discovery protocol using elliptic curve cryptography for internet-integrated manet. In: Cryptographic Security Solutions for the Internet of Things, pp. 181–210. IGI Global (2019) 15. Wang, S., Zhu, Y., Ma, D., Feng, R.: Lattice-based key exchange on small integer solution problem. Sci. China Inf. Sci. 57(11), 1–12 (2014)

Automated Transformation of IoT Systems Models into Event-B Specifications Abdessamad Saidi(B) , Mohamed Hadj Kacem, Imen Tounsi, and Ahmed Hadj Kacem ReDCAD Laboratory, ENIS, University of Sfax, Sfax, Tunisia [email protected] https://www.redcad.tn Abstract. Developing Internet of Things systems without the benefit of a standard is a difficult process. In this regard, we propose to describe Internet of Things systems using a UML meta-model. It is critical to ensure that no ambiguity, incompleteness, or misunderstanding exists in the Internet of Things systems that instantiate the meta-model. To do this, we suggest specifying Internet of Things systems by instantiating the meta-model using the Event-B formal method, which provides a starting point for architects to conduct successful verification. To do this, we suggest that conceptual modeling of Internet of Things systems be used to develop Event-B specifications. The purpose of this paper is to offer a series of both behavioral and structural transformation rules for converting each model element into its Event-B equivalent notion. Keywords: Internet of Things · Model Transformation · Transformation Rules · Structural Features · Behavioral Features Event-B Method

1

·

Introduction

The Internet of Things (IoT) is a concept that was first introduced in 1999 by Kevin Ashton. From this term, we distinguish two characteristics. The first one is that the IoT is a network, and the second one refers to things that are physical entities. Because devices are continually being added and removed from the network, the IoT has the aspect of dynamism. New types of devices will appear, and the IoT must be able to adapt. The IoT should be self-adaptive, reacting to changes as they occur. Also, the IoT can be considered a pervasive system. We can conclude that IoT systems are too complex in terms of heterogeneity of devices, storage, processing, and management. This complexity is the major challenge for engineers in developing robust IoT systems. For that, the conception, specification, and verification of IoT systems are crucial steps in their development. These steps permit us to avoid hardware and software failures and conduct to safe systems. As a result, we proposed a standard notation for describing the Internet of Things architecture based on the UML modeling standard in [10]. The modeling c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023  A. Abraham et al. (Eds.): ISDA 2022, LNNS 715, pp. 93–104, 2023. https://doi.org/10.1007/978-3-031-35507-3_10

94

A. Saidi et al.

of behavioral aspects is done using the UML sequence diagram meta-model, while a UML meta-model that extends the UML component diagram is used to build structural aspects. It’s critical to demonstrate that IoT system models conform to anticipated behavior and that they’re correct by construction. As a result, providing a clear specification model for the planned IoT system is critical. Formal approaches allow for the explicit definition of a system through the use of mathematical notations. It enables engineers to demonstrate that the system’s specification is realizable as well as its attributes without having to execute it in real-world scenarios. So, using the Event-B approach for formal verification, we want to find out how to automatically transform UML system models into formal models. We are interested in structural and behavioral views transformations in this research to turn each element in the model into its Event-B equivalent notion. A model transformation language is used to implement these transformation rules (TR), which generate Event-B specifications from the IoT system models. The remainder of the paper is organized into five sections: In Sect. 2, related works will be discussed. Section 3 is devoted to presenting a set of algorithms that will be implemented using a model transformation language to translate the IoT system models into Event-B specifications. A case study is presented in Sect. 4 to demonstrate our methodology. Section 5 summarizes the results of this work and draws conclusions.

2

State of the Art

Conceptual modeling, specification, and verification steps are required for such IoT based complicated systems because they offer different benefits. Due to their ability to fully satisfy these advantages, Model Driven Architecture (MDA), UML profiles, and formal approaches have gotten a lot of attention. UML offers a mechanism of extension to customize specific models for the IoT. MDA permits representing an IoT system at different levels of abstraction. Formal methods permit us to ensure the correctness of models and the safety of systems. Some recent research has focused on the modeling of IoT systems. Authors in [8], present a model driven approach called MontiThings that enables us to develop and deploy reliable IoT applications by separating the implementation details and the business logic. In [9], authors extend an existing model, which results in the UML profile STS4IoT. This profile offers the possibility to integrate IoT applications with other information systems (Data is sensed, transformed, and sent). In the industrial sector, authors in [4] propose a meta-model to model and analyze industrial IoT systems. An industrial real-time safety system was designed and analyzed to show the effectiveness of the meta-model. Similarly, there are various studies regarding the specification of IoT systems. A formal model is presented in [5] to model IoT systems using self-adaptive concepts. The model shows how nodes in the IoT network communicate, interact, and share data. The work of [6] presents an IoT-based smart irrigation system. The system requirements are formalized with a formal notation. Authors in [2] verify a hybrid model of an IoT operating system and authors verify the data

Automated Transformation of IoT Systems Models

95

encryption requirements in [1]. These works add a formal semantic to models using the Event-B method, and the verification is done via the Rodin platform. Furthermore, we find some papers that propose techniques for automatically transforming models into Event-B specifications. Authors in paper [12] propose a method for using SoaML to model SOA design patterns and transforming these models into Event-B specifications. Also, the method supports the composition of the SOA design patterns and their verification. In paper [3], the authors propose a set of algorithms for transforming the structural models of self-adaptive systems into Event-b contexts. For instance [7] implements a series of TRs using the XSLT transformation language. These rules permit to transform the multi-scale description of software architecture models into Event-B contexts and machines. The approach was tested in a smart city case study. For the UML diagrams into Event-B transformations, authors in [11,13] provide transformations of UML Activity Diagrams, as well as Sequence and Use Case Diagrams. However, these approaches do not take into account the automated transformation of IoT models to their specifications. The aim of this paper is to propose a set of algorithms to automatically transform IoT models into their equivalent Event-B specifications.

3

The Proposed Approach

Our approach is composed of five steps. As a first step, we create models through a graphical editor. Then, the user can modify the model until he arrives at the desired model. Next, based on the XML representation of the model, we apply some algorithms to transform models. After that, Event-B specification will be generated. Finally, to ensure that models are correct, we use the Rodin platform’s proof obligations (POs) to verify some properties.

4

Transformation of the UML Models of IoT System into Event-B Specifications

In a previous work [10], we proposed a meta-model to describe the IoT architecture. This section discusses the various transformation rules for transforming structural and behavioral views of Internet of Things system models into their Event-B equivalent concepts. 4.1

Structural Features

Before we start the transformation, we need to say which parts of our meta-model will be translated into Event-B contexts. – – – –

The The The The

messages. components that constitute the general IoT system. ServiceInterface. connections between these components.

96

A. Saidi et al.

1. The Messages TRs Lines 2 to 4 of Algorithm 1’s CONSTANTS clause indicate how this rule translates the MessageType model into a new type of Event-B MessageType, and lines 13 to 16 demonstrate how it turns the names of all messages into constants. Lines 13 to 18 in Algorithm 1 contain the AXIOMS (Messages partition) clause, which uses a partition to assign message names and types.

Algorithm 1: Message Type transformation rules algorithm 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

BEGIN Write (”SETS”) if exist Message then Write (’MessageType’) end Write (”CONSTANTS”) if exist Message then for each MessageType do Write (MessageType.Name) end end Write (”AXIOMS”) if exist Message then Write (’Messages Partition :partition(MessageType’) for each MessageType do Write (, {MessageType.Name} end Write (’)) end END

2. The Components TRs This rule formally identifies the key elements used in the architecture of an IoT system that implements the proposed meta-model, such as Sensor, Actuator, IoTGateways, IoTCloudPlatform, EndDevice, and PhysicalEntity. This rule transforms the two sets (IoT Component and PhysicalEntity) into a set in the SETS clause. From line 2 to line 4, the Algorithm 2 specifies this. Additionally, the rule changes each component’s name into a constant in the CONSTANTS clause. From line 5 through line 17, the Algorithm 2 specifies this. The IoT Component set in the meta-model is made up of the following elements: Sensor, Actuator, IoTGateway, IoTCloudPlatform, and EndDevice. As a result, this is explicitly transformed into a set in the AXIOMS clause, as can be seen in Algorithm 2 from lines 19 to 34. All of the component names constitute up the set of primary components for Internet of Things systems. Formally, this is converted into a partition.

Automated Transformation of IoT Systems Models

Algorithm 2: IoT components transformation rules algorithm 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46

BEGIN Write (”SETS”) Write (’IoT Component’) Write (’PhysicalEntity’) Write (”CONSTANTS”) if exist Sensor then Write (’Sensor’) for each Sensor do Write (Sensor.Name) end end if exist IoTGateway then Write (’IoTGateway’) for each IoTGateway do Write (IoTGateway.Name) end end Write (”AXIOMS”) Write (’IoT Component Partition :partition(IoT Component’) if exist Sensor then Write (’,Sensor’) end if exist Actuator then Write (’,Actuator’) end if exist IoTGateway then Write (’,IoTGateway’) end if exist IoTCloudPlatform then Write (’,IoTCloudPlatform’) end if exist EndDevice then Write (’,EndDevice’) end if exist Sensor then Write (’Sensor Partition (Sensor,’) for each Sensor do Write (Sensor.Name) end end if exist Actuator then Write (’Actuator Partition (Actuator,’) for each Actuator do Write (Actuator.Name) end end

97

98

A. Saidi et al.

For example, the set of Actuators is composed of all actuator names as shown in Algorithm 2 from line 41 to line 46, with Actuator = {A1 ,...,An } ∧ A1 = A2 ∧ ... ∧ An−1 = An . The same thing is done for the other components. Algorithm 2 presents a part of the transformation rules for the components that constitute the IoT architecture. 3. The ServiceInterface TRs ServiceInterface models, IoTComponent interfaces, and their relationships with messages. From this diagram, we can define the list of IoTComponents and the list of messages that are able to send them. This is specified by a relation called “can Transmit”. This rule defines the “can Transmit” relation and initializes it. Algorithm 3 gives more details on this transformation.

Algorithm 3: Service Interface transformation rules algorithm 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

BEGIN Write (”CONSTANTS”) if exist IoT Component then Write (’can Transmit’) end Write (”AXIOMS”) Write (’Can Transmit Relation : can Transmit ∈ IoT Component ←→ MessageType’) Write (’Can Transmit Partition : can Transmit = {’) X ← IoT ComponentType:RequestPort.Name Y ← ServiceInterface.Name Z ← AssociationUse.Origin for each IoT ComponentType do Write (IoT ComponentType.name) Write (’→’) if (X = Y) ∧ (X = Z) then Select(AssociationUse.Recipient) Write (Interface.OperationInterface.Name) end Write (’}’) end END

4. The Connections TRs According to Algorithm 4 (from line 2 to line 6), each connection between two IoT Component (Sensor, IoTCloudPlatform, etc.) is transformed into a constant by this rule and placed in the CONSTANTS clause. The graphical link (Provided & Required interface) with an Event-B relation between two IoTComponents is also described in this rule. According to Algorithm 4, the Connector is specified

Automated Transformation of IoT Systems Models

99

as a relation between two IoTComponents (see line 16). All of the connection names constitute up the set of connections. The partition (Connectors Partition) is then used to formalize this. The Domain and Range axioms produced by this rule specify the source and destination of each link. Algorithm 4: Connector transformation rules algorithm 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

4.2

BEGIN Write (”CONSTANTS”) if exist Connector then Write (’Connector’) for each Connector do Write (Connector.Name) end end Write (”AXIOMS”) if exist Connector then Write (’Connector Partition :partition(Connector,’) for each Connector do Write (Connector.Name) end end Write (’Connector Relation : Connector ∈ IoT Component ←→ IoT Component’) for each ProviderInterface do Write (ProviderInterface.Origin) Write (’ Domain :dom’) Write ((ProviderInterface.Origin)) Write (’=’) Write (ProviderInterface.Recipient) end for each RequireInterface do Write (RequireInterface.Recipient) Write (’ Range:dom’) Write ((RequireInterface.Recipient)) Write (’=’) Write (RequireInterface.Origin) end END

Behavioral Features

Before we start the transformation, we need to say which parts of the sequence diagram meta-model of UML will be used to make Event-B specifications. – The sequence diagram variables. – The sequence diagram events.

100

A. Saidi et al.

1. The Variables TRs A list of variables are used in the Event-B method to define an IoT system’s state. The objective of Algorithm 5 is to extract all the variables from a sequence diagram. First, we transform the activation of a Lifeline into a variable available in the clause VARIABLES (see lines 3, 4, and 5 in Algorithm 5). The Available Relation invariant is used in the INVARIANTS clause to specify a partial function between the IoT component type and the Boolean type that defines this activation (lines 13 and 14 of Algorithm 5). The TRUE value means that the component is ready to transmit the message, else it is not ready for transmission. Then, a transition in a sequence diagram can be between two IoT components or reflexive (requiring only one IoT component to process the message). In the first case, we define the source, the destination, and the message itself (as mentioned in lines 6 to 8 and 15 to 17 in Algorithm 5). In the second case, we will define the message and the IoT component that will process this message (see lines 9 to 11 and 18 to 20 in Algorithm 5).

Algorithm 5: Variables transformation rules algorithm 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

BEGIN Write (”VARIABLES”) if exist IoT Component then Write (’Available’) end if exist AsynchronousMessage then Write (”Substring-before(AsynchronousMessage.Name”,”ing”) end if exist ReflexiveMessage then Write (”Substring-before(ReflexiveMessage.Name”,”ing”) end Write (”INVARIANTS”) if exist IoT Component then Write (”Available Relation: Available ∈ IoT Component →  BOOL”) if exist AsynchronousMessage then Write (”inv1: Substring-before(AsynchronousMessage.Name”,”ing” ∈ Connector ↔ MessageType) end if exist ReflexiveMessage then Write (”inv1: Substring-before(ReflexiveMessage.Name”,”ing” ∈ IoT ComponentType ↔ MessageType) end end END

Automated Transformation of IoT Systems Models

101

2. The Events TRs In an Event-B machine, we find a default event called INITIALISATION. This event is responsible for the initialisation of the used variables (this is shown in lines 3 to 9 of Algorithm 6). Lines 10 to 18 in Algorithm 6 show how to transform a transition in the sequence diagram model into an event in the Event-B machine. Algorithm 6: Events transformation rules algorithm 1 2 3 4 5 6 7 8 9 10 11

12 13 14 15 16 17 18 19

5

BEGIN Write (”EVENTS”) Write (”INITIALISATION”) V AR1 ← Lifeline[1].Name ... V ARn ← Lifeline[n].Name Write (”available := V AR1 → TRUE, ..., V ARn → TRUE”) Write (”inv1 := ∅ ”) Write (”inv2 := ∅ ”) for each SynchronousMessage.Name do V AR1 ← Substing-after(Substring-before(SynchronousMessage.Origin,”/@A”,”//”) V AR2 ← Substing-after(Substring-before(SynchronousMessage.Origin,”/@A”,”.”) V AR3 ← document(IoTStackholderDiagram.xml)Assemblage. Connector[V AR2 ].NameConnector V AR4 ← Substing-after(Substring-before(SynchronousMessage.Name,”)”,”(”) Write (”V AR1 ∈ dom(available) ∧ available(V AR1 ) = TRUE”) Write (”Transmit := Transmit ∪ V AR3 → V AR4 ” end END

Case Study: IoT Based Health Monitoring of Elderly (IHME)

We applied our approach to an IoT healthcare system. The system consists of visualizing the vital signs of elderly people. The elderly are equipped with a sensor that measures their heartbeat and oxygen saturation. Through the home’s router, the gathered data is transmitted to an IoT cloud platform. Also, the system has a buzzer to generate an alert. The doctor can visualize these measurements through an Android application. As a first step, we model the components that constitute the system as well as their connections. To attribute a formal semantic to these structural models, we transform them into a context named IHME Context. We chose the components and connectors as an example to illustrate our method. 1. Components Transformation The application of Algorithm 2 permits to transform each component of the IoT system into a constant in the generated context file (see the clause CONSTANTS

102

A. Saidi et al.

in Fig. 1). Beside, all IoT components are regrouped into a partition called System Partition (see the first axiom in the AXIOMS clause in Fig. 1). Also, each component has its own partition. For example, we have a set of sensors (see other axioms in the AXIOMS clause in Fig. 1).

Fig. 1. IHME components and connectors Specifications

2. Connections Transformation To transform each connection into the Event-B specification, we will apply Algorithm 4. Each connection’s name in the model is formalized into a constant in the CONSTANTS clause of the context (see the clause CONSTANTS in Fig. 1). According to Algorithm 4, it is formalized using a relation, which is a relation between two IoT components (see Fig. 1’s CONSTANTS clause for more information). As mentioned in Fig. 1, we regroup all connectors into one partition called Connectors Partition. Finally, we specify the two components connected with this connector using the dom and ran operators.

Automated Transformation of IoT Systems Models

103

3. Variable and Event Transformation To illustrate the transformation of the behavioral view of our system, we chose the transmission of the oxygen saturation by the MAX30100 sensor to the GlobalNet router as an example. The application of Algorithm 5 results in the Available and Transmit variables (as shown in the VARIABLES and INVARIANTS clause of Fig. 2). Because there was no message transmitted at the start, the variable Transmit was set to emptyset. Action 1 of the Initialisation event (see Fig. 2) means that the sensor is ready for transmission. As mentioned in Fig. 2, the behavior of transmitting the oxygen saturation of the elderly is specified with the Transmitting SpO2 event.

Fig. 2. IHME Variable and Event Specifications

6

Conclusion

We introduced a series of algorithms in this work for automatically generating Event-B specifications from conceptual modeling of an Internet of Things system instantiating our proposed meta-model. We presented a series of transformation rules for transforming structural and behavioral views of the IoT system models into Event-B specifications. In the future, we will implement these algorithms using a model transformation language, M2T (Model to Text), which takes the XML representation of models as input and generates the adequate specifications as output. Furthermore, we will upload these specifications to the Rodin platform in order to verify some properties such as security. Acknowledgement. This work was partially supported by the LABEX-TA project MeFoGL: “M´ethodes Formelles pour le G´enie Logiciel”.

104

A. Saidi et al.

References 1. Abbassi, I., Sliman, L., Graiet, M., Gaaloul, W.: On the verification of data encryption requirements in internet of things using event-B. In: Jallouli, R., Bach Tobji, M.A., B´elisle, D., Mellouli, S., Abdallah, F., Osman, I. (eds.) ICDEc 2019. LNBIP, vol. 358, pp. 147–156. Springer, Cham (2019). https://doi.org/10.1007/978-3-03030874-2 11 2. Guan, Y., Guo, J., Li, Q.: Formal verification of a hybrid IoT operating system model. IEEE Access 9, 59171–59183 (2021) 3. Hachicha, M., Ben Halima, R., Hadj Kacem, A.: Translation of UML models for self-adaptive systems into event-B specifications. In: Abraham, A., Cherukuri, A.K., Melin, P., Gandhi, N. (eds.) ISDA 2018 2018. AISC, vol. 941, pp. 421–430. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-16660-1 42 4. Ihirwe, F., Di Ruscio, D., Mazzini, S., Pierantonio, A.: Towards a modeling and analysis environment for industrial IoT systems. In: CEUR Workshop Proceedings, vol. 2999, pp. 90–104 (2021) 5. Jarrar, A., Gadi, T., Balouki, Y.: Modeling the internet of things system using complex adaptive system concepts. In: Proceedings of the 2nd International Conference on Computing and Wireless Communication Systems, pp. 1–6 (2017) 6. Karmakar, R., Sarkar, B.B.: A prototype modeling of smart irrigation system using event-B. SN Comput. Sci. 2(1), 1–9 (2021). https://doi.org/10.1007/s42979-02000412-8 7. Khlif, I., Hadj Kacem, M., Eichler, C., Drira, K., Hadj Kacem, A.: A model transformation approach for multiscale modeling of software architectures applied to smart cities. Concurr. Comput. Pract. Exp. 34(7) (2022) 8. Kirchhof, J.C., Rumpe, B., Schmalzing, D., Wortmann, A.: MontiThings: modeldriven development and deployment of reliable IoT applications. J. Syst. Softw. 183, 111087 (2022) 9. Plazas, J.E., et al.: Sense, transform & send for the Internet of Things (STS4IoT): UML profile for data-centric IoT applications. Data Knowl. Eng. 101971 (2022) 10. Saidi, A., Hadj Kacem, M., Tounsi, I., Hadj Kacem, A.: A meta-modeling approach to describe internet of things architectures. In: Proceedings of the TunisianAlgerian Joint Conference on Applied Computing (TACC 2021), Tabarka, Tunisia, 18–20 December 2021. CEUR Workshop Proceedings, vol. 3067, pp. 25–36. CEURWS.org (2021) 11. Siyuan, H., Hong, Z.: Towards transformation from UML to event-B. In: 2015 IEEE International Conference on Software Quality, Reliability and Security-Companion, pp. 188–189. IEEE (2015) 12. Tounsi, I., Hrichi, Z., Hadj Kacem, M., Hadj Kacem, A., Drira, K.: Using SoaML models and event-B specifications for modeling SOA design patterns. In: ICEIS 2013 - Proceedings of the 15th International Conference on Enterprise Information Systems, Angers, France, 4–7 July 2013, vol. 2, pp. 294–301. SciTePress (2013) 13. Weixuan, S., Hong, Z., Yangzhen, F., Chao, F.: A method for the translation from UML into event-B. In: 2016 7th IEEE International Conference on Software Engineering and Service Science (ICSESS), pp. 349–352. IEEE (2016)

Prediction of Business Process Execution Time Walid Ben Fradj(B) and Mohamed Turki MIRACL LABORATORY, ISIMS, University of Sfax, P.O. Box 242, 3021 Sfax, Tunisia [email protected], [email protected]

Abstract. The use of immense amounts of data on the execution of applications based on business processes can make it possible, thanks to Process Mining, to detect trends. Indeed, human intelligence in decisionmaking is enriched by Machine Learning in order to avoid bottlenecks, improve efficiency and highlight potential process improvements. In this research article, we present a method (BPETPM) for predictive monitoring of business processes. This method allows to predict the execution time of a business process according to the path followed by the process instance. It predicts whether a process instance will run in time or late. We follow the CRISP-DM approach, known in Data Science, to carry out our method. The input data for learning is extracted from the event logs saving the execution traces of the workflow engine of a BPMS. We start by cleaning data, adding additional attributes, and encoding categorical variables. Then, at the modelling level, we test six classification algorithms : KNN, SVM(kernel=linear), SVM(kernel=rbf), Decision Tree, Random Forest and Logestic Regression. Then, using the BPETPM method, we create an intelligent process management system (iBPMS4PET). This system is applied to a process for managing incoming mail in the mutual health sector. Keywords: Business Process · BPMS · Workflows · Event Logs Process Mining · Artificial Intelligence · Machine Learning · Classification · CRISP-DM

1

·

Introduction

In a variety of application areas, Process Mining offers a new means of process improvement. It allows organizations to diagnose problems based on facts. The majority of information systems record execution traces and information on the activities executed in event logs. These can be subject to auditing of workflow management systems. According to [1], in Process Mining, there are four different analyses, three of which are offline (Process discovery, verification of com-

c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023  A. Abraham et al. (Eds.): ISDA 2022, LNNS 715, pp. 105–114, 2023. https://doi.org/10.1007/978-3-031-35507-3_11

106

W. Ben Fradj and M. Turki

pliance and hardening) and an online analysis (operational support techniques). According to [2], the difficulty arises at the level of online approaches because of modifications and possible drifts. By nature, the prediction of the evolution of a running case is difficult because of the probabilities of each instance and especially because of the presence of the human factor. Following a review of the literature, we notice that there is no specific procedure to follow for the realization of a Process Mining approach. In our work, we follow the CRISP-DM procedure, known in Data Science, for the predictive monitoring of business processes. The event logs used in the various Process Mining jobs back up business data. In our approach we focus on the event logs which save the execution data of the workflow engines. We also notice that the transitions are generally represented by annotations or Petri nets. In our approach, we extract the different transitions and we enrich our database with columns that represent all the paths followed by tasks (human or automatic). In this research article, our general approach consists in exploiting event logs saving the execution data of the workflow engines of BPMS software. We apply Process Mining techniques to create an intelligent system based on machine learning for the predictive monitoring of business processes. Our approach allows the prediction of the execution time of the instances of a process. In the second section of this article, we present BPM (Business Process Management) and its evolution towards iBPMS (Intelligent Business Process Management System). In the third section, we present Process Mining and existing works on the predictive monitoring of business processes according to the temporal perspective. The fourth section is devoted to the presentation of our method BPETPM for predicting the execution time of business processes using the execution data of a BPMS workflow engine. In the fifth section, based on the BPETPM method, we present an intelligent business process management system for the prediction of execution times iBPMS4PET. we end with a conclusion and the perspectives.

2

BPM: Business Process Management

BPM is a business process-centric approach. It allows to have a global vision on the functioning of an organization and the associated collaborators. It makes it possible to effectively monitor the progress of activities within the company with the aim of improving overall performance. Consequently the results [3] BPM provides more real-time traceability of the various human and computer exchanges. BPMN (Business Process Management and Notation) offers a common, accessible and intuitive language that homogenizes process representations. It also offers standard specifications to support the implementation and execution of business processes. BPMS (Business Process Management System) is an approach that facilitates the process improvement life cycle. It allows users to model processes, build and run applications, configure functionality, create custom reports, and monitor process results. iBPMS is the combination of business process management (BPM) with artificial intelligence (AI) to create dynamic

Prediction of Business Process Execution Time

107

workflow experiences in a cloud-based platform and low-code tools ( low code). It is a joining between people, machines and the Internet of Things (IoT) to provide business process intelligence and support. iBPMS is considered the evolution of BPM by merging it with predictive analytics, process intelligence and new technologies.

3

Process Mining

According to [4], Process Mining plays the role of intersection between data science and process science which we seek to understand and improve business processes. It is a form of event log analysis allowing process discovery [5], compliance check [6], performance analysis [7] and provide operational support techniques that put today’s massive data in a process context [8]. According to [9], techniques for predictive monitoring of process instances have appeared. These techniques are based on data mining, machine learning, quality of service aggregation and predictive event processing. Predictive monitoring of business processes focused on the temporal perspective is a research topic first treated by [10]. The proposed prediction approach allows to extract a transition system from the event log with additional temporal information. Along the same lines, this approach was a baseline for [11–14]. According to [11], the transition system uses decision trees to predict completion time as well as the next activity of an instance of process. In both works [12,13], the [10] approach has been extended by clustering traces from the event log based on context characteristics. According to [14], one approach is exploited by adding Naive Bayes and Support Vector Regressors models for transition system annotation. The additional attributes positively influenced the prediction quality. The disadvantage of these methods is that they assume that the event log used for training contains all possible behaviors of the process. This assumption is not generally valid in reality. The two approaches proposed by [11,15] are similar. They present a general framework for enriching the event log with derived information and discovering correlations between process characteristics using decision trees. In the work [16], two probabilistic methods based on the hidden Markov model (HMM) are proposed. The probabilistic method makes it possible to predict the probability of future activities which gives information on the progress of the process. The approach proposed by [17], exploits a generic model based on decision trees which makes it possible to provide decision criteria according to real objectives of the process. Other approaches from different fields have been proposed for delay prediction. According to [18], process mining is based on queues. This approach is based on the construction of a system of annotated transitions and on nonlinear regression algorithms for the prediction of delays. According to [19], the proposed approach allows prediction of remaining time using expressive probabilistic models and only flow information of work. A predictive model based on decision trees is proposed by [20]. This model estimates the probability that a user-defined constraint is satisfied by running instances. Then, approaches based on deep neural networks

108

W. Ben Fradj and M. Turki

appeared. The [21] approach presents an LSTM-based neural network to predict the next activity and its execution time. The approach of [9], relies on ANN neural networks to predict if a process instance runs out of time. [22] presented a comparison between two Machine Learning models (Random Forest and SVM) and two Deep Learning models (LSTM and DNN). The results showed that the LSTM model is the best. According to the study of the works cited above, we do not find a standard scientific approach to follow to carry out a Process Mining project. The event logs used in the different approaches record business data, while there are event logs that record execution data (engine data) from workflow engines. Transition systems are generally in the form of annotations or graphical representations (Petri network). These transitions are not integrated into the databases used for machine learning. For the prediction of delays, we do not find an approach that evaluates the performance of various models to affirm the choice of a model among others.

4

BPETPM: Business Process Execution Time Prediction Method

Following the CRISP-DM approach [23], we present in this section a new approach to predictive monitoring of business processes. We position ourselves at the level of the temporal perspective in order to predict the execution time of the instances of the business processes. Basic data is extracted from event logs generated by workflow engines. In our approach, we generate a method capable

Fig. 1. Experimental procedure of predictive process monitoring approach

Prediction of Business Process Execution Time

109

of extracting the transitions per task (human and automatic) and adding them as new columns in the database. This can provide us with correlations between these transitions and the data recorded in the event logs. For time prediction, we use classification to know whether a process instance will complete on time or late. (Fig. 1) shows the different stages of realizing this approach. Phase 1. Business understanding: Our approach is centered on the company’s business processes and the flow of activities with the aim of improving overall performance and therefore results. BPMS is a solution to manage these processes. Phase 2. Data understanding: We are interested in event logs, of CSV type, which save the execution data of the workflow engines. Phase 3. Data preparation: Having a log file saving the execution data of a workflow engine does not mean that it is ready for modeling. This file may contain redundancies, missing values, useless values, categorical variables that require encoding, etc. It also requires the extraction and addition of additional attributes in order to be ready for modeling. In our treatment, we follow the following approach: – – – – – – – – – – – – – – – – – –

Delete empty columns. Remove indexing columns because they are useless for modeling. Remove redundant columns. Sort data by date column. Add a column containing task durations. Add a column containing the durations of the instances. For each instance, determine the path followed by user. For each instance, determine the path followed by task(human or automatic). Add a column containing the path followed by task/user for each instance. Add a column containing the durations of the instances. Calculate the third quartile Q3 of durations of the instances. Add a “description” column containing the value “late” if the duration of the instance is greater than Q3 and the value “in time” if not. Add in a Data Frame all the columns created previously. Convert this Data Frame to a CSV file. Decompose the dataset into independent variables X and dependent variable y. Check the type of variables and encode those of categorical types using LabelEncoder. Divide the dataset (X; y) into a training set (X train; y train) and a test set (X test; y test) Standardize dataset values X train and X test.

Phase 4. Modeling: At the modeling level, the first step is to determine the type of problem. In our case, we seek to predict whether a process instance terminates on time or late. So it is the prediction of a qualitative variable and therefore our problem is a classification type supervised learning problem. In our approach

110

W. Ben Fradj and M. Turki

we test six classification models: SVM(kernel=linear), SVM(kernel=rbf), KNN, Logestic Regression, Decision Tree and Random Forest. Phase 5. Evaluation: For the evaluation of each model, we start by predicting the outcome y pred for the test independent variables X test. Then we use the Accuracy metric for the evaluation. Phase 6. Result released: At the end of our work, we obtain a predictive monitoring method allowing the prediction of the execution times of the instances of a business process.

5

iBPMS4PET: Intelligent Business Process Management System for Prediction of Execution Time

Business process management requires the implementation of a process management system. powerful business process that facilitates the design, modeling, implementation and workflow measurement. The purpose of this system may be to reduce inefficiencies, human errors or miscommunications. In this section, we apply the BPETPM method on a real case and we use the proposed iBPMS4PET system. Phase 1. Business understanding: The framework chosen is the intelligenceWay group (https ://iway-tn.com/). We are interested in the field of mutual health and more particularly to the incoming mail management process. (Fig. 2) represents the model produced by BonitaSoft BPMS.

Fig. 2. Mail-in process

Phase 2. Data understanding: In our work, we use two csv type event logs. They are extracted from a MySQL database configured by the I-WAY group. They save the execution data of a BonitaSoft BPMS workflow engine. The first log file “BN PROC INST.csv” saves the execution traces per process instance and the second file “BN ACT INST.csv” saves the execution traces by task.

Prediction of Business Process Execution Time

111

Phase 3. Data preparation: In our treatment, we follow the following approach: 1. Import the necessary Python libraries: pandas, numpy, matplotlib and seaborn. 2. Import the two files “BN ACT INST.csv” and “BN PROC INST.csv”. 3. Create two Data Frames: “dfACT” containing the event log records “BN ACT INST.csv” and “dfPROC” containing the records of the event log “BN PROC INST.csv”. The columns of the“dfACT” Data Frame are: [DBID , ACT INST UUID , PROCESS UUID , INST UUID , SUB INST UUID , ROOT INST UUID , ACTIVITY STATE , READY DATE, START DATE, END DATE, LAST UPD, START BY, END BY, EXPEN DATE, IS HUMAN , ACT DEF UUID , TYPE , PRIORITY, NAME, DESCR, LABEL, DYN DESCR, DYN LABEL, DYN EXECUTION SUMMARY, USERID , ITERATION ID , ACT INST ID , LOOP ID, INSTANCE DBID ] The columns of the “dfACT” Data Frame are: [DBID , NB , PROCESS UUID , INST UUID , ROOT INST UUID , PARENT INST UUID , PARENT ACT INST UUID , ARCHIVED, LAST UPD, XP READ , NB OF ATTACHMENTS , START DATE, END DATE, START BY, END BY, INST STATE , ROOT EXEC ] 4. For the “dfPROC” Data Frame: – Remove empty columns: [ PARENT INST UUID , PARENT ACT INST UUID , ROOT EXEC ]. – Remove index columns [ DBID , NB ] unnecessary for modeling. – Delete the column [ ROOT INST UUID ] because it has the same content than the [ INST UUID ] column. 5. For the “dfACT” Data Frame: – Delete empty columns [SUB INST UUID , DESCR, DYN DESCR, DYN LABEL, DYN EXECUTION SUMMARY] – Delete index column [DBID ] – Delete the [ROOT INST UUID ] column (same content as the [INST UUID ] column) – Delete the [IS HUMAN ] column (same role as the [TYPE ] column) – Delete the [LABEL] column (same content as the [NAME] column) – Delete the [ACT INST UUID ] column (same content distributed between [NAME , ITERATION ID , ACT INST ID , LOOP ID ]) – Delete the [ACT DEF UUID ] column (same content distributed between [PROCESS UUID , NAME]) – Delete columns [START BY , END BY] (same content as [USERID ]) – Delete the [INSTANCE DBID ]column (same role as the [INST UUID ]column) 6. In “dfPROC”, add a column [taskuserpath] which represents the path followed by task and user.

112

W. Ben Fradj and M. Turki

7. In “dfPROC”, add a column [ duration INST ] which represents the duration of the instance. 8. Calculate the third quartile Q3 of column [ duration INST ] in “dfPROC”. 9. In “dfPROC”, add a “description” column containing “late” if the value of [ duration INST ] is greater than Q3 or “in time” if the value of [ duration INST ] is less than Q3. 10. in a new data frame “DFW”, add the columns created previously. 11. Convert the “DFW” DataFrame to a csv file named “BFW.csv”. The result obtained is a csv file whose columns are: [PROCESS UUID , INST UUID , START BY, END BY, user act0, user act1, user act2,. . ., useract nb, duration INST, description]. BFW.csv contains 4718 rows and 15 columns. PROCESS UUID : designates the process identifier. INST UUID : represents the identifier of the instance. START BY: Designates the user who started the process instance. END BY: designates the user who terminated the instance of the process. user act i: designates the executor (person or machine) of activity number i. duration INST: designates the execution duration of the instance. Description: Designates an instance delay description. 12. Decompose the dataset into independent variables X=[PROCESS UUID , INST UUID , START BY , END BY , user act0 , . . ., useract nb] and dependent variable y=[Description] 13. Check the type of variables and encode those of categorical types using LabelEncoder. 14. Divide the dataset (X; y) into a training set (X train; y train) and a test set (X test; y test) 15. Standardize dataset values X train and X test. The new X train s and X test s values are between -1 and 1. Phase 4. Modeling: import the following classifiers: KNeighborsClassifier, DecisionTreeClassifier, RandomForestClassifier, SVM(kernel=linear), SVM (kernel=rbf) and LogisticRegression. For each model, start learning with the training variables “X train s” and “y train” then test with the test variables X test s and keep the prediction result in a variable “y pred”. Phase 5. Evaluation: For each model, import “classification report” from the “sklearn.metrics” library and display the full classification report. in terms of metric Accuracy the results obtained are as follows:78% for KNN, 81% for RandomForest, 81% for SVM(kernel=linear), 84% for SVM(kernel=rbf) and 80% for LogisticRegression. Phase 6. Result released: the result to be published is iBPMS4PET: An intelligent business process management system for the prediction of execution time.

6

General Conclusion and Perspectives

In this article, we have addressed the issue predictive monitoring of business processes. This issue is a subject of current research in the field of BPM and

Prediction of Business Process Execution Time

113

occupies an important place in organizations process oriented. The contributions made in this work can be summarized in the following points: – Follow the approach, CRISP-DM known in Data Science, to carry out a method in Process Mining. – Apply process mining techniques on event logs that record workflow engine execution data of a BPMS. – Extract all paths followed by a business process from event logs and use these paths as additional attributes with the data to be analyzed. – Predict the execution times of a business process according to the paths followed by its instances. as future work, it might be possible to use other types event logs, use the business data contained in the databases relational or NoSQL data with the execution data of the WorkFlow engines for predict execution times and detect bottlenecks and develop a monitoring system for the visualization of performance indicators performance related to business processes.

References 1. Van der Aalst, W., Pesic, M., Song, M.: Beyond Process Mining: From the Past to Present and Future. In: Pernici, B. (ed.) CAiSE 2010. LNCS, vol. 6051, pp. 38–52. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13094-6 5 2. Ceci, M., Lanotte, P.F., Fumarola, F., Cavallo, D.P., Malerba, D.: Completion time and next activity prediction of processes using sequential pattern mining. DS, pp. 49–61, (2014) 3. Dumas, M., La Rosa, M., Mendling, J., Reijers, H.A.: Fundamentals of business process management. Second Edition, (2018). https://doi.org/10.1007/978-3-66256509-4 4. Van der Aalst, W., et al.: Process Mining Manifesto. In: Daniel, F., Barkaoui, K., Dustdar, S. (eds.) BPM 2011. LNBIP, vol. 99, pp. 169–194. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28108-2 19 5. Carmona, J., Cortadella, J., Kishinevsky, M.: A Region-Based Algorithm for Discovering Petri Nets from Event Logs. In: Dumas, M., Reichert, M., Shan, M.C. (eds.) BPM 2008. LNCS, vol. 5240, pp. 358–373. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85758-7 26 6. Van der Aalst, W., Adriansyah, A.: Replaying History on Process Models for Conformance Checking and Performance Analysis. Article in Wiley Interdisciplinary Reviews Data Min. Knowl. Discov. 2, 182-192 (2012) 7. Van der Aalst, W.: Process Mining: data Science in Action. Second Edition. Springer Heidelberg New York Dordrecht London, pp. 85-88, (2016). https://doi. org/10.1007/978-3-662-49851-4 8. Van der Aalst, W., Bichler, M., Heinzl, A.: Robotic Process Automation. Bus. Inf. Syst. Eng. 60(4), 269–272 (2018). https://doi.org/10.1007/s12599-018-0542-4 9. Teinemaa, I., Dumas, M., La Rosa, M., Maggi, F.M.: Outcome-oriented predictive process monitoring: review and benchmark. ACM Trans. Knowl. Discov. Data 13(2), 17 (2019)

114

W. Ben Fradj and M. Turki

10. Van der Aalst, W., Schonenberg, H, Song, M.: Time prediction based on process mining. Ulsan National Institute of Science and Technology, 100 Banyeon-RI, Ulsan, South Korea, pp. 689-798, (2011) 11. Ceci, M., Lanotte, P.F., Fumarola, F., Cavallo, D.P., Malerba, D.: Completion time and next activity prediction of processes using sequential pattern mining. DS, pp. 49–61, (2014) 12. Folino, F., Greco, G., Guzzo, A., Pontieri, L.: Mining usage scenarios in business processes: outlieraware discovery and run-time prediction. Data Knowl. Eng. 70(12), 1005–1029 (2011) 13. Folino, F., Guarascio, M., Pontieri, L.: Discovering context-aware models for predicting business process performances. Confederated International Conferences : CoopIS, DOA-SVI, and ODBASE 2012, Rome, Italy, September 10-14, 2012 In: Proceedings, Part I, Vol. 7565 of Lecture Notes in Computer Science, Springer, pp. 287-304, (2012). https://doi.org/10.1007/978-3-642-33606-5 18 14. Polato, M., Sperduti, A., Burattin, A., de Leoni, M.: Data-Aware Remaining Time Prediction of Business Process Instances. In: IJCNN (WCCI), (2014) 15. de Leoni, M., van der Aalst, W., Dees, M.: A General Framework for Correlating Business Process Characteristics. In: Sadiq, S., Soffer, P., V¨ olzer, H. (eds.) BPM 2014. LNCS, vol. 8659, pp. 250–266. Springer, Cham (2014). https://doi.org/10. 1007/978-3-319-10172-9 16 16. Lakshmanan, G.T., Shamsi, D., Doganata, Y.N., Unuvar, M., Khalaf, R.: A markov prediction model for data-driven semi-structured business processes. Knowl. Inf. Syst. 42(1), 97–126 (2013). https://doi.org/10.1007/s10115-013-0697-8 17. Ghattas, J., Soffer, P., Peleg, M.: Improving business process decision making based on past experience. Decis. Support Syst. 59, pp. 93-107, (2014) 18. Senderovich, A., Weidlich, M., Gal, A., Mandelbaum, A.: Queue mining for delay prediction in multiclass service processes. Inf. Syst. 53, 278–295 (2015) 19. Rogge-Solti, A., Weske, M.: Prediction of business process durations using nonmarkovian stochastic petri nets. Inf. Syst. 54, 1–14 (2015) 20. Maggi, FM., Di Francescomarino, C., Dumas, M., Ghidini, C.: Predictive monitoring of business processes. In: Jarke M et al (eds) CAiSe proceedings, Thessaloniki, pp. 457–472, (2014) 21. Tax, N., Verenich, MLR., Dumas, M.: Predictive Business Process Monitoring with LSTM Neural Networks. In CAiSE, (2017) 22. Kratsch, W., Manderscheid, J., R¨ oglinger, M.: Machine Learning in Business Process Monitoring: a Comparison of Deep Learning and Classical Approaches Used for Outcome Prediction. Bus. Inf. Syst. Eng. 63(3), 261–276 (2020) 23. Zhang, Y.: Sales Forecasting of Promotion Activities Based on the Cross-Industry Standard Process for Data Mining of E-commerce Promotional Information and Support Vector Regression. J. Comput. 32(1), 212–225 (2021)

Pre-processing and Pre-trained Word Embedding Techniques for Arabic Machine Translation Mohamed Zouidine1(B) , Mohammed Khalil1 , and Abdelhamid Ibn El Farouk2 1

LMCSA, FSTM, Hassan II University of Casablanca, Casablanca, Morocco [email protected], [email protected] 2 LLEC, FLSH, Hassan II University of Casablanca, Casablanca, Morocco [email protected]

Abstract. In this paper, we aim to systematically compare the impact of different pre-processing techniques and different pre-trained word embeddings on the translation quality of a neural machine translation model that translates from the Arabic language to English. For the preprocessing, we compare Arabic segmentation, Arabic normalization, and English lower-casing. For the pre-trained embeddings, we use pre-trained models trained based on three context-independent models; Word2Vec, GloVe, and FastText. Our experimental results show that pre-processing techniques help to improve the translation quality with a gain of BLEU score up to +1.91 point. Furthermore, we find that the impact of pretrained word embeddings strictly depends on the training data size. Keywords: Machine translation Glove · FastText · BLEU

1

· Word embeddings · Word2vec ·

Introduction

Machine Translation (MT) is a sub-field of Natural Language Processing (NLP) that aims to develop computer-based translation systems. MT is the process of automatically transforming some given text from one language to another while preserving its meaning and style. Since 2014, Neural Machine Translation (NMT) models [3,18] have attracted more attention due to their robust improvements in the quality of MT systems. The encoder reads the source sentence token by token and encodes it into a fixed-length vector, called the context vector. Then, the decoder outputs the translation in the target language, token by token, from the context vector. This architecture has been improved by introducing the attention mechanism [3], which helps to translate and align words jointly.

c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023  A. Abraham et al. (Eds.): ISDA 2022, LNNS 715, pp. 115–125, 2023. https://doi.org/10.1007/978-3-031-35507-3_12

116

M. Zouidine et al.

Both the encoder and the decoder used Recurrent Neural Networks (RNN). RNNs take the word representations of the sentences as input. Word representations or Word Embeddings (WE) represent each word in the sequence by a fixed-size vector of continuous real numbers; this representation ensures that words with similar contexts are represented with nearby vectors. Pre-trained word embeddings have proven their usefulness in NLP tasks such as text classification [19,21] and machine translation [14]. One challenge for the Arabic MT is the linguistic complexity of the Arabic language [20]. This challenge can be over-passed by using some pre-processing techniques. Morphology tokenization and orthographic normalization are two pre-processing techniques that have proven their efficiency in improving the performance of Arabic translation [2]. Therefore, the objectives of our work are to study the pre-processing and the use of pre-trained WE impacts on the final translation quality of Arabic to English translation using a neural machine translation model [3]. We use pre-trained WE built based on three different word representations techniques, namely Word2Vec [10], GloVe [13], and FastText [9]. The rest of this paper is structured as follows. In Sect. 2, we present the related works. In Sect. 3, we introduce the background of this work. We describe the experimental setup in Sect. 4. We list and discuss the results in Sect. 5. Finally, we conclude our work in Sect. 6.

2

Related Works

After the success of neural machine translation for different European languages [3,17], several works have been proposed based on neural machine translation for the Arabic language. The authors of [2] presented the first result of NMT for Arabic/English translation. In both directions, they compared an attention-based NMT system [3] against a phrase-based system [8] with different pre-/post-processing techniques. They found that NMT performs comparably to the phrase-based translation. Also, they found that the tokenization and the normalization techniques improve the NMT performance. The authors of [11] studied several pre-processing techniques as well as their impacts on the translation quality for Arabic-English Statistical MT (SMT) and NMT. Their results show that the translation performance of NMT is particularly sensitive to the data size. In [4], the authors used four types of RNN with an intention-based NMT model [3], namely, LSTM, GRU, BiLSTM, and BiGRU. Furthermore, they studied the impact of pre-processing on Arabic MT. As a result, they found that the best combination between the four RNN types is BiGRU for the Encoder and BiLSTM for the Decoder. The authors of [22] proposed a training objective based on the combination of the cross-entropy loss and the policy gradient loss from deep reinforcement learning. They trained two Seq2Seq models [18] for the case of Arabic to English

Pre-processing and Pre-trained Word Embedding Techniques

117

translation. One model was trained based on the classical cross-entropy loss, while the other model was trained based on their proposed training objective. Their experimental results showed that this training objective improved the performance of the Seq2Seq model.

3 3.1

Methodology Neural Machine Translation

We use an attention-based sequence-to-sequence model [3] that consists of an encoder, a decoder, and an attention mechanism. Given a source sentence X = {x1 , x2 , x3 , . . . , xT } and a target sentence Y = {y1 , y2 , y3 , . . . , yT  } where xj and yj are the word representations of the source and target words at time-step j, and T and T  are respectively the maximum source and target sequence lengths. The encoder is implemented as a Bidirectional RNN (BiRNN) [15]. A BiRNN uses a forward RNN to read the sentence from left to right and a backward RNN to read it from right to left. The forward RNN reads the input sentence X from x1 to xT and encodes it − − → → − → → − into a sequence of forward hidden states he1 , he2 , . . . , heT , where hej is calculated by: −−→  − → (1) hej = f hej−1 , xj f is a nonlinear function, in this work we use for f the Gated Recurrent Units (GRU) [6]. On the other hand, the backward RNN reads the input sentence X from xT to x1 and encodes it into a sequence of backward hidden states  ← − ← − ← − he1 , he2 , . . . , heT . The annotation for each word representation xj is obtained by ← − − → concatenating the forward and the backward hidden states hej and hej : − → hej e (2) hj = ← − . hej The decoder is implemented as an RNN, it’s goal is to generate a translation Y by calculating the probability given by:   p (yi |y1 , . . . , yi−1 , X) = g yi−1 , hdi , ci (3) where hdi is the decoder hidden state at time-step i, calculated by   hdi = f hdi−1 , yi , ci

(4)

and ci is a distinct context vector for each word yi . The context vector ci is calculated by summing the annotations hej multiplied by their corresponding weights αij : T  αij hej (5) ci = j=1

118

M. Zouidine et al.

and the weight αij is calculated as follows: exp(eij ) αij = T k=1 eik

(6)

eij is an energy that calculates the matching between the output at position i and the inputs around position j. This is calculated by applying an alignment model a on the decoder hidden state at time-step i − 1, hdi−1 , and the jth annotation hej :   (7) eij = a hdi−1 , hej The alignment model (or attention mechanism) is parameterized as a feedforward neural network. It is jointly trained with all the other parameters of the model. Its goal is to calculate the probability that a target word yi is aligned to a source word xj [3]. 3.2

Word Representations

Let us assume that the input sentence consists of T words; {w1 , w2 , w3 , . . . , wT }. The word representation uses pre-trained word embeddings to convert each token wi to a d-dimensional word vector xi ∈ Rd . The whole input sequence is converted to a list of word vectors; X = {x1 , x2 , x3 , . . . , xT }. In this work, we will focus only on context-independent embeddings that generate representations independently of the context. They learn a unique vector for each word, whatever its context is. The most popular context-independent embeddings are: Word2Vec [10], GloVe [13], and FastText [9]. Word2Vec [10]: There are two main algorithms in Word2Vec, continuous bagof-words (CBOW) and continuous skip-gram (SG) [10]. Both algorithms are an application of unsupervised learning using a neural network. It consists of an input layer, a projection layer, and an output layer. CBOW predicts the current word given the context, while SG predicts the context given the current word. GloVe [13]: Is a technique of word representations that takes advantage of two model families: local context window and global matrix factorization. Starting from the idea that the statistics of the occurrences of words in a corpus have the potential to encode some meaning, GloVe learns a relationship between words based on the global word-word co-occurrence counts. FastText [9]: In this approach, the word representation is investigated by including sub-word information. A sequence of n-gram characters represents each word. After representing the word using n-gram characters, a CBOW model [10] is used to learn the word’s embedding vector for each n-gram character. The vector representation of a given word is obtained as the total of the vector representations of n-gram characters composing it.

Pre-processing and Pre-trained Word Embedding Techniques

3.3

119

Pre-processing

Because the text data usually contains some unwanted special formats, preprocessing is an essential block of any NLP system. In machine translation, we use pre-processing techniques for both source (Arabic) and target (English) languages. The first pre-processing stage is a cleaning step, where we remove all special characters and keep only alphabets, numbers, and punctuation marks. Arabic Pre-processing. In addition to the cleaning step, we use other Arabic pre-processing methods, which include segmentation using Farasa [1] segmenter and normalization. – Segmentation: One challenge for the machine translation from/to Arabic is the morphology of the Arabic language. For example, the Arabic token ‘ ’ (‘and to his school’ in English) is formed by ‘ ’ (‘and’), ‘ ’ (‘to’), ‘ ’ (‘school’), and ‘ ’ (‘his’). This characteristic of the Arabic language complicates the word-level alignment between Arabic and another language (English in our case). To overpass this challenge, we can use segmentation. The goal of segmentation is to break a word into its constituent clitics. In this work, we use the Farasa segmenter [1]; the result of the application of ’. this segmenter on the previous example is the following: ‘ – Normalization: Knowing that the major source of orthographic ambiguity in ’, and Ta Marbota ‘ ’, we convert Arabic are the letters Alif ‘ ’, Ya ‘ all the variants of Hamzated Alif ‘ ’ to become Alif ‘ ’, we replace the ’ with Ya ‘ ’ and the Ta Marbota ‘ ’ with Ha ‘ ’. We Alif Maqsura ‘ also remove the diacritics. English Pre-processing. For the English language, and in addition to the cleaning step, we apply Lower-casing, which consists of converting all uppercase characters to lowercase. In addition to the above pre-processing steps, a final step called tokenization is applied. The goal of tokenization is to split a given sequence into a list of tokens. We use the Moses tokenizer for both the source and the target languages.

4 4.1

Experimental Setup Dataset

We evaluate the performance of each model on the IWSLT Evaluation Campaign [5]. It includes translation data based on TED talks. The original training data contains 224126 training examples. We only keep sequences with lengths less than 30 tokens; that reduces the training data size to 195885 samples. We concatenate ‘dev2010’, ‘tst2011’, and ‘tst2013’ sub-sets to create a validation data set with 2803 translation pairs. Also, we concatenate ‘tst2010’, ‘tst2012’, and ‘tst2014’ sub-sets to create a test data set with 3941 samples.

120

4.2

M. Zouidine et al.

Model Hyper-parameters

We experiment with the model described in Sect. 3.1. The encoder is a twolayers BiGRU with 256 hidden units. The decoder is one layer of GRU with 256 hidden units. For both cases, we define the word embedding vector size to be 300 and to avoid over-fitting we set a dropout rate of 0.5 [17]. As an optimization algorithm, we use Adam [7] with a learning rate α = 10−3 . We train the model in 10 epochs with a mini-batch of 64 training examples. During inference, we use greedy search to generate translation. The translation quality is then measured using the BLEU [12] score. 4.3

Pre-trained Word Embeddings

Our aim in this work is to investigate the impact of pre-trained word embeddings on the Arabic to English NMT performance. This can be achieved by initializing the embedding layer with some pre-trained word embeddings and analyzing the final translation quality. Below we list more information about the per-trained embeddings used in our experiments: – Baseline: We randomly initialize a trainable embedding layer with a dimension of 300 for both the encoder and the decoder. The weights of this layer are learned from scratch in parallel with the other model parameters. – Word2Vec: We initialize the embeddings layer of the encoder (Arabic side) with AraVec [16]. It is a Word2Vec model per-trained on Arabic data from Twitter, World Wide Web, and Wikipedia1 . For the decoder (English side), we use the publicly available Word2Vec vectors pre-trained on the Google News dataset (about 100 billion words)2 . – GloVe: For Arabic, we use publicly distributed word vectors learned with the GloVe model3 . It contains 256-dimensional vectors for 1.5 million words pretrained on an Arabic corpus of about 1.9 billion words. For the English side, we initialize the embeddings layer with weights obtained from the original English version of the GloVe model [13]. We use the 300-dimension GloVe embedding pre-trained on Wikipedia and Gigaword data (6 billion tokens)4 . – FastText: Pre-trained FastText embeddings are available for many languages5 . For Arabic, we choose the 300-dimension Arabic version. While for English, we choose the 300-dimension English version.

1 2 3 4 5

http://github.com/bakrianoo/aravec. http://www.code.google.com/archive/p/word2vec/. http://www.github.com/tarekeldeeb/GloVe-Arabic. https://nlp.stanford.edu/projects/glove/. https://fasttext.cc/docs/en/crawl-vectors.html.

Pre-processing and Pre-trained Word Embedding Techniques

5 5.1

121

Results Pre-processing Results

Table 1 reports the obtained results on the test set in terms of BLEU scores with different pre-processing techniques. We examine all the possible combinations between segmentation, normalization, and lower-casing. In all cases, we first apply the cleaning method. We evaluate the results against a baseline model (first row of Table 1) where no per-processing technique was applied except for the cleaning method. Table 1. Results with the test in terms of BLEU scores set with different pre-processing routines. Seg: Segmentation, Norm: Normalisation, #Voca: Vocabulary size, Lower: Lower-casing. Arabic English BLEU Seg Norm #Voca Lower #Voca  

 

 

 

75115 28176 67302 25789 75115 28176 67302 25789

   

34833 34833 34833 34833 32766 32766 32766 32766

14.08 15.23 14.05 15.47 14.45 15.91 14.43 15.99

We clearly notice that the pre-processing directly impacts the vocabulary size. For Arabic, the application of segmentation decreases the vocabulary size from 75115 to 28176 and to 25789 when adding normalization. For English, when applying lower casing, the vocabulary size decreases from 34833 to 32766. From Table 1, we notice that the BLEU score improves as a better preprocessing technique is used. Clearly, Arabic pre-processing is important. By applying the Arabic segmentation, the model achieves as much as +1.15 BLEU point over the baseline. This improvement is even more obvious when applying both segmentation and normalization with a gain of +1.39 BLEU point. However, normalization alone does not seem to have a good impact on the translation quality (−0.03). For English, the lower-casing method help with a gain of +0.37 BLEU. The best result is obtained when all the pre-processing techniques are applied with an improvement of +1.91 BLEU. As a result, the best routine for pre-processing phase is to use all the discussed techniques. We will use this result as the baseline for the next series of experiments.

122

M. Zouidine et al.

5.2

Pre-trained Word Embeddings Results

In the second set of experiments, we investigate the efficacy of pre-trained embeddings obtained with different word representations techniques (Sect. 3.2). Table 2 lists the obtained results. The results in Table 2 clearly demonstrate that the pre-trained word embeddings does not really lead to improve translation quality. In contrast, the majority of pre-trained embeddings lead to a decrease in the BLEU scores to some degree. However, a tiny gain in the BLEU score was noticed when using the FastText pre-trained for the Arabic side and Word2Vec for the English (+0.19). Table 2. BLEU score on the test set with different pre-trained word embeddings. The rows represent the Arabic pre-trained embeddings, while the columns represent the English ones. Rand is the case where we randomly (no pre-trained embedding is used) initialize the embedding layer. Rand Word2Vec GloVe FastText Rand Word2Vec GloVe FastText

15.99 13.73 10.47 14.29

15.63 12.47 10.50 16.18

15.83 13.22 10.28 14.67

15.89 13.01 10.81 14.30

Table 3. BLEU score on the test set with reduced training data. (a) 100%. (b) 25%. (c) 10%. (a)

(b) Rand Word2Vec

Rand

15.99 –

FastText –

16.18

(c) Rand Word2Vec

Rand

11.54 –

FastText –

12.37

Rand Word2Vec Rand

5.95

FastText –

– 7.78

These results can be explained by two hypotheses: – The quality of the pre-trained word embedding: Here, by the quality, we usually mean the capability of the word embedding model to generate adequate word representations. This capability is generally linked to the pretraining data size, its quality, source, diversity, etc. For now, it is clear from Table 2 that both Arabic Word2Vec and Arabic GloVe are two models of poor quality. With Arabic Word2Vec, we lose from 2.26 to 3.52 BLEU points, while with Arabic GloVe, we lose between 5.18 and 5.71 points. – The training data size: When the model has access to enough data to learn word representations from scratch, pre-trained word embeddings seem to have no benefits to add [14]. To check this hypothesis, we retrain the model

Pre-processing and Pre-trained Word Embedding Techniques

123

with reduced training data (25% and 10% of the original training data). We only compare two models; for the first one, we use random initialization for the embedding layer, while the Arabic and English embedding layers of the second model were respectively initialized with FastText and Word2Vec. Table 3 shows the obtained results with 100%, 25%, and 10% of the original training data. It is clear that the translation quality is strictly dependent on the training data size. However, we can use pre-trained word embeddings to further improve this translation quality. As the training data size goes down, the gain when using pre-trained embeddings goes up (+0.19, +0.83, and +1.83 for 100%, 25%, and 10% of the training data, respectively). These findings confirm the latter hypothesis.

6

Conclusion

In this work, we have studied the impact of different pre-processing techniques on Arabic/English neural machine translation. Moreover, we investigate the benefits of initializing the embedding layer with different pre-trained word embeddings. Experimental results showed that pre-processing techniques improve the translation quality and that pre-trained word embeddings have an obvious impact on smaller training data. In future work, we plan to use contextual word embeddings such as ELMO (Embeddings from Language Models) and BERT (Bidirectional Encoder Representations from Transformers).

References 1. Abdelali, A., Darwish, K., Durrani, N., Mubarak, H.: Farasa: a fast and furious segmenter for Arabic. In: Proceedings of the Demonstrations Session, NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA, 12-17 June 2016, pp. 11–16. The Association for Computational Linguistics (2016) 2. Almahairi, A., Cho, K., Habash, N., Courville, A.: First result on Arabic neural machine translation. ArXiv abs/1606.02680 (2016) 3. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7-9 May 2015, Conference Track Proceedings (2015) 4. Bensalah, N., Ayad, H., Adib, A., Ibn El Farouk, A.: LSTM vs. GRU for Arabic machine translation. In: Abraham, A., et al. (eds.) SoCPaR 2020. AISC, vol. 1383, pp. 156–165. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-736897 16 5. Cettolo, M., Girardi, C., Federico, M.: WIT3: web inventory of transcribed and translated talks. In: Proceedings of the 16th Annual conference of the European Association for Machine Translation, EAMT 2012, Trento, Italy, 28-30 May 2012, pp. 261–268. European Association for Machine Translation (2012)

124

M. Zouidine et al.

6. Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, 25-29 October 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pp. 1724–1734. ACL (2014) 7. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7-9 May 2015, Conference Track Proceedings (2015) 8. Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, HLT-NAACL 2003, Edmonton, Canada, 27 May - 1 June 2003. The Association for Computational Linguistics (2003) 9. Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., Joulin, A.: Advances in pretraining distributed word representations. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018) (2018) 10. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013) 11. Oudah, M., Almahairi, A., Habash, N.: The impact of preprocessing on ArabicEnglish statistical and neural machine translation. In: Proceedings of Machine Translation Summit XVII Volume 1: Research Track, MTSummit 2019, Dublin, Ireland, 19-23 August 2019, pp. 214–221. European Association for Machine Translation (2019) 12. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311-318. ACL 2002, Association for Computational Linguistics, USA (2002) 13. Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532–1543 (2014) 14. Qi, Y., Sachan, D., Felix, M., Padmanabhan, S., Neubig, G.: When and why are pre-trained word embeddings useful for neural machine translation? In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pp. 529–535. Association for Computational Linguistics (2018) 15. Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997) 16. Soliman, A.B., Eissa, K., El-Beltagy, S.R.: Aravec: A set of Arabic word embedding models for use in Arabic NLP. In: Third International Conference On Arabic Computational Linguistics, ACLING 2017, 5-6 November 2017, Dubai, United Arab Emirates. Procedia Computer Science, vol. 117, pp. 256–265. Elsevier (2017) 17. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014) 18. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems. vol. 27. Curran Associates, Inc. (2014) 19. Wang, C., Nulty, P., Lillis, D.: A comparative study on word embeddings in deep learning for text classification. In: NLPIR 2020: 4th International Conference on Natural Language Processing and Information Retrieval, Seoul, Republic of Korea, 18-20 December 2020, pp. 37–46. ACM (2020)

Pre-processing and Pre-trained Word Embedding Techniques

125

20. Zakraoui, J., Saleh, M., Al-Maadeed, S., AlJa’am, J.M.: Evaluation of Arabic to English machine translation systems. In: 2020 11th International Conference on Information and Communication Systems (ICICS), pp. 185–190. IEEE (2020) 21. Zouidine, M., Khalil, M.: A comparative study of pre-trained word embeddings for Arabic sentiment analysis. In: 46th IEEE Annual Computers, Software, and Applications Conference, COMPSAC 2022, Los Alamitos, CA, USA, 27 June - 1 July 2022, pp. 1243–1248. IEEE (2022) 22. Zouidine, M., Khalil, M., Farouk, A.I.E.: Policy gradient for Arabic to English neural machine translation. In: Lazaar, M., Duvallet, C., Touhafi, A., Al Achhab, M. (eds.) Proceedings of the 5th International Conference on Big Data and Internet of Things. BDIoT 2021. Lecture Notes in Networks and Systems, vol. 489, pp. 469– 480. Springer, Cham (2021). https://doi.org/10.1007/978-3-031-07969-6 35

A Framework for Automated Abstraction Class Detection for Event Abstraction Chiao-Yun Li1,2(B) , Sebastiaan J. van Zelst1,2 , and Wil M.P. van der Aalst2 1

Fraunhofer FIT, Birlinghoven Castle, Sankt Augustin, Germany {chiao-yun.li,sebastiaan.van.zelst}@fit.fraunhofer.de 2 RWTH Aachen University, Aachen, Germany [email protected]

Abstract. Process mining enables companies to extract insights into the process execution from event data. Event data that are stored in information systems are often too fine-grained. When process mining techniques are applied to such system-level event data, the outcomes are often overly complex for human analysts to interpret. To address this issue, numerous (semi-)automated event abstraction techniques that “lift” event data to a higher level have been proposed. However, most of these techniques are supervised, i.e., the knowledge of which low-level event data or activities need to be abstracted into a high-level instance or concept is explicitly given. In this paper, we propose a fully unsupervised event abstraction framework for partially ordered event data, which further enables arbitrary levels of abstraction. The evaluation shows that the proposed framework is scalable and allows for discovering a more precise process model. Keywords: Process mining

1

· Event abstraction · Partial orders

Introduction

Nowadays organizations execute their processes with the support of information systems. Historical records of the process execution are stored as event data in the systems. Process mining [1] provides techniques to extract knowledge of such data to enhance the performance and the compliance of said processes. For example, process discovery techniques detect the relations of activities in a process, i.e., well-defined process steps, and represent the identified behavior as a process model [3,6]. Most process mining techniques are directly applied on event data as recorded in information systems. Figure 1 shows how a human analyst translates the systemlevel event data into activities performed at the business-level. Such system-level event data often present an overly fine granularity that is too detailed to understand a process at the business-level. In case of a large or flexible process, the complexity further causes challenges in interpreting process mining results. Therefore, the principle of abstraction is often applied to lift event data to a higher level, i.e., referred to as event abstraction, which aggregates activities or low-level event data to which process mining techniques are applied. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023  A. Abraham et al. (Eds.): ISDA 2022, LNNS 715, pp. 126–136, 2023. https://doi.org/10.1007/978-3-031-35507-3_13

A Framework for Automated Abstraction Class Detection

127

Fig. 1. A motivating example of interpreting sensor data, i.e., system-level event data, to business-level executions. The same shapes of event data at the system-level reflect the executions performed as time intervals, e.g., door 1 is opened from door1 (open) until door1 (close). The activities with the same pattern define a concept at the businesslevel.

Existing event abstraction techniques assume that event data are recorded and ordered chronologically. Meanwhile, most of the techniques operate in a semi- or fully supervised manner. In practice, both assumptions are often not applicable. Activities are often executed in time intervals and, thus, partially ordered. For example, an office worker can drink coffee during a meeting after a break. Furthermore, the application of event abstraction in real-life varies in the abstraction level required, which often depends on scenarios and stakeholders. One can apply the abstraction iteratively to construct a hierarchy of abstractions; however, it requires a technique to be able to deal with interleaving intervals as shown in the business-level executions in Fig. 1, where the break overlaps with the meeting in time. Moreover, a (semi-)supervised approach requires domain knowledge of event data to abstract; yet, such information is often not present and it takes a tremendous amount of time and effort to label event data. If a different level of abstraction is required, further domain knowledge must be provided and the event data need to be relabeled. In this paper, we propose a framework that tackles the two aforementioned issues, i.e., a fully unsupervised framework for partially ordered event data. Meanwhile, the support for partial orders inherently yields iterative applicability to construct a hierarchy of abstractions for various stakeholders. The framework detects concepts at the higher level, namely abstraction classes, which are defined by activities based on their observed execution context. The paper is compatible with the generic partial-order-based abstraction framework proposed in [9] and we further extend and define such context. The framework is evaluated using event data based on two real-life processes as in [9]. Additionally, we compare the proposed framework with two other event abstraction techniques. The experiments show that the proposed framework is the most applicable in practice for its scalability, robustness against infrequent activities, and effectiveness of abstraction, which is quantitatively and qualitatively evaluated and reasoned based on the process models discovered.

128

C.-Y. Li et al.

The rest of the paper is structured as follows. In Sect. 2, we discuss existing abstraction techniques in the field of process mining. We introduce the framework in Sect. 3, which is evaluated in Sect. 4. Finally, Sect. 5 summarizes the framework and the evaluation and concludes the paper with future work.

2

Related Work

This section reviews the applicability of existing event abstraction techniques from two perspectives: the level of domain knowledge required and whether a technique supports partially ordered event data. Most existing event abstraction techniques require explicit domain knowledge as input. In general, the techniques expect either the behavioral knowledge of the activities in a process [11,15] or the mapping of activities to high-level concepts [5,7,14]. In [11], the authors explore all the possible behavior between activities, so-called local process models (LPMs), and extract event data at a higher level based on the ones chosen by domain experts. In [15], the authors search for instances of predefined patterns with a focus on the education field. Leemans et al. discover hierarchical process models based on the assumption that a hierarchical mapping of low-level instances to high-level concepts is annotated in event data [7]. In [5] and [14], statistical models are trained using labeled event data to predict concepts at a higher level of abstraction. Unsupervised approach groups low-level concepts or instances by identifying their relationship in event data and allows for exploration through parameter tuning. Alharbi et al. discover statistical models to learn activities at a higher level [2]. Nguyen et al. decompose all activities in a process into sets of activities by maximizing the modularity measure [12]. In [8], the authors apply classical clustering techniques by extracting features using frequency- or duration-based encoding in a session, i.e., a segment of a sequence of event data. Unsurprisingly, the more domain knowledge is provided, the more precise the abstraction is; however, this limits the practical applicability of a technique. Meanwhile, since activities are often performed as time intervals in practice, it is crucial that a technique supports partially ordered event data. Although partially ordered event data may be transformed into sequences of executions by introducing life cycle information, applying existing abstraction techniques may result in unreasonable outcomes. For example, the start of drink coffee is considered to be part of a meeting and the complete of drink coffee is an activity performed during a break. The method proposed in this paper is the first unsupervised technique focusing on abstracting partially ordered event data.

3

Framework

In this section, we present the proposed framework to detect concepts at a higher level, namely abstraction classes, from event data. Figure 2 presents an overview of the framework. First, event data is provided in the format of an event log L. Next, we extract a preceding and a succeeding context as categorical distributions

A Framework for Automated Abstraction Class Detection

129

Fig. 2. Schematic overview of the framework. The abstraction classes are defined by activities with similar preceding and succeeding contexts. An event log at the higher level is then extracted and can be applied as input for the next iteration.

for every activity. The similarity of the activities is calculated accordingly and the activities that are similar enough define an abstraction class. Finally, an event log L containing only the identified abstraction classes is extracted and the framework can be iteratively applied to the output to construct a hierarchy of abstraction. First, we introduce the mathematical notations applied and define event data that is more generally applicable in practice. Then, we show how a context is constructed and the identification of abstraction classes. Notation. Given an arbitrary set X, P(X)={X  |X  ⊆X} denotes the powerset of X. A multiset is a generalization of a set where an element in X may appear multiple times. For example, [a, b, b]=[a, b2 ] is a multiset. M(X) denotes all the possible multisets over X. A sequence σ of length n over X is a function σ : {1, . . ., n}→X, denoted as σ=x1 , . . ., xn , where 1≤i≤n, xi =σ(i). We write |σ| to denote the length of a sequence. 3.1

Event Data

A case is a process instance defined by a set of activity instances, i.e., records of the execution of activities. The collection of cases defines an event log. We define an activity instance, a case, and an event log as follows. Definition 1 (Activity Instance). An activity instance records the execution of an activity and is described by a set of attributes. A is the universe of activity instances, N is the universe of attribute names, and Un denotes all the possible values of n∈N . UN = ∪n∈N Un denotes the universe of the values of all the attributes. Given n∈N and a∈A, we define projection function πn : A  Un , where πn (a)∈Un if a has a value for n, else πn (a)=⊥. Uact denotes the universe of activities. Given a∈A, the following attributes are assumed to be always defined: πact (a)∈Uact for the activity of a; πst (a)∈R+ for the start time of a; πct (a)∈R+ for the complete time of a where πst (a)≤πct (a). Definition 2 (Case, Event Log). A case is the collection of activity instances executed in the context of a process instance. Let c⊆A be a case. Given a, a ∈c, we

130

C.-Y. Li et al.

Table 1. An example event log L0 . An activity instance is represented in a row and is described by a set of attributes and the case it belongs to. ID Case

Activity (Abbreviation) Activity Instance

Timestamp

Resource

Date

Start

Complete

1

1

receive order (ro)

Sep. 6

14:32:21

14:32:21

1

2

pay in cash (pi)

Sep. 6

14:34:02

14:34:02

Peter Mike

1

3

confirm payment (cp)

Sep. 6

14:35:40

14:36:14

Peter Sara

1

4

bake pizza (bp)

Sep. 6

14:46:21

15:03:56

1

5

make salad (ms)

Sep. 6

14:49:32

15:00:11

Sara

1

6

pack & deliver (pd)

Sep. 6

15:15:16

15:27:48

Ethan

2

7

receive order (ro)

Sep. 7

15:08:00

15:08:00

Peter

2

8

pay by credit card (pb)

Sep. 7

15:09:27

15:09:27

Ben Ethan

2

9

confirm payment (cp)

Sep. 7

15:10:29

15:10:41

2

10

bake pizza (bp)

Sep. 7

15:30:38

15:49:40

Sara

2

11

make salad (ms)

Sep. 7

15:47:56

15:57:10

Alex

2

12

pack & deliver (pd)

Sep. 7

15:55:02

16:17:59

Ethan

write a≺a if and only if πct (a)