Intelligent Systems and Applications: Proceedings of the 2022 Intelligent Systems Conference (IntelliSys) Volume 1 (Lecture Notes in Networks and Systems, 542) 3031160711, 9783031160714

This book is a remarkable collection of chapters covering a wide domain of topics related to artificial intelligence and

117 110 87MB

English Pages 831 [832] Year 2022

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Editor’s Preface
Contents
Anomaly-Based Risk Detection Using Digital News Articles
1 Introduction
2 Background
2.1 Elliptic Envelope
2.2 Local Outlier Factor
2.3 Isolation Forest
2.4 Seasonal and Trend Decomposition Using Loess
2.5 Natural Language Processing
3 Approach
3.1 Data and Preprocessing
3.2 Anomaly Detection
3.3 News-Cluster Risk Analysis
3.4 Matching Anomalies and News-Cluster Risks
4 Evaluation
5 Results
6 Related Work
7 Conclusion
References
Robust Rule Based Neural Network Using Arithmetic Fuzzy Inference System
1 Introduction
2 Distending Function and Weighted Dombi Operator
3 Proposed Network Structure
3.1 Feed-Forward Calculations
3.2 Feedback Calculations
4 Training
5 Simulation Results and Discussion
5.1 Selling Price Prediction of Used Cars
5.2 IRIS Flower Species Classification
5.3 Wine Quality Dataset
6 Conclusion
References
A Hierarchical Approach for Spanish Sign Language Recognition: From Weak Classification to Robust Recognition System
1 Introduction
2 Related Work
3 Proposed Approach
3.1 Data Collection
3.2 Followed Pipeline
4 Experimental Results
5 Discussion
6 Conclusion and Future Work
References
Deontic Sentence Classification Using Tree Kernel Classifiers
1 Introduction
2 Methodology
2.1 Akoma Ntoso
2.2 LegalRuleML
2.3 Tree Kernels and Tree Representation
3 Related Works
4 Data
5 Experiment Settings and Results
6 Conclusions
References
Sparse Distributed Memory for Sparse Distributed Data
1 Purpose
2 Background
3 Model
4 Results
5 Conclusions
References
Quantum Displacements Dictated by Machine Learning Principles: Towards Optimization of Quantum Paths
1 Introduction
2 The Quantum Mechanics Machinery
2.1 Discrete Case
2.2 Continuous Case
2.3 Applications
2.4 Extraction of Propagator
2.5 Choice of Hamiltonian
2.6 Iterations Involving Propagators
2.7 Errors at the Energy Measurement
3 The Quantum Mechanics Machinery
4 Applications
4.1 Numerical Applications
5 Conclusion
References
A Semi-supervised Vulnerability Management System
1 Introduction
2 Related Work
3 Contributions of This Work
4 Problem Formulation
5 Algorithm
6 Experiments and Results
6.1 Database
6.2 Experiments
6.3 Preprocessing
6.4 Results
7 Analysis of Results
7.1 Improvement in Results
7.2 Word2Vec Embedding Space
8 Implementation Details
9 Model Deployment
9.1 Architecture
10 Conclusion
References
Evaluation of Deep Learning Techniques in Human Activity Recognition
1 Introduction
1.1 Motivation
1.2 Objectives
2 Background
2.1 Deep Learning
2.2 Deep Learning Architectures
2.3 Internet of Things
3 Methods
3.1 Deep Learning Models
3.2 Dataset
4 Results and Discussion
5 Conclusions
References
Siamese Neural Network for Labeling Severity of Ulcerative Colitis Video Colonoscopy: A Thick Data Approach
1 Introduction
2 The UC Video Scoring Method
3 Discussion and Conclusions
References
Self-supervised Contrastive Learning for Predicting Game Strategies
1 Introduction
2 Related Works
2.1 Convolutional Neural Networks
2.2 Self-supervised Contrastive Learning
3 Proposed Framework
3.1 Dataset
3.2 Modified Momentum Contrast
4 Experiments
4.1 Training
4.2 Evaluation
5 Conclusions
References
Stochastic Feed-forward Attention Mechanism for Reliable Defect Classification and Interpretation
1 Introduction
2 Related Works
3 Proposed Method
3.1 Feature Extraction
3.2 Feed-Forward Attention
3.3 Uncertainty Quantification
4 Experiments
4.1 Display Manufacturing Facility Datasets
4.2 Results
5 Conclusions
References
Bregman Divergencies, Triangle Inequality, and Maximum Likelihood Estimates for Normal Mixtures
1 Introduction
2 Bregman Divergencies
3 Normal Mixtures
4 Reproducing Kernel Hilbert Space, Clustering, and Normal Mixtures
4.1 RKHS
4.2 Clustering
4.3 Clustering and Maximum Likelihood Estimates for Normal Mixtures
5 Conclusion
References
Self-supervised Learning for Predicting Invisible Enemy Information in StarCraft II
1 Introduction
2 Related Works
3 Proposed Method
4 Experiments
4.1 Data and Experiments Setting
4.2 Results
5 Conclusions and Future Works
References
Measuring Robot System Agility: Ontology, System Theoretical Formulation and Benchmarking
1 Introduction
2 Related Work
2.1 Ontology of Robotic Systems and Components
2.2 Robot Autonomy
2.3 Standard for Measuring Robot Agility
3 Mathematical Framework for Measuring Agility
3.1 Systems Theoretical Approach
3.2 System Model
3.3 Formal Robot System Agility Definition
3.4 Time and Cost Constraints
3.5 Adaptability, Reactivity and Cost-Efficiency
4 Agility Evaluation
4.1 Exact Agility
4.2 Estimated Agility
4.3 Utility of Agility
4.4 Cost of Agility
5 Benchmarking
5.1 Different Types of Challenges
5.2 Adapting to Challenges
5.3 Benchmarking Procedures
6 Example Use Case: Robotic Arm Pick Operation
6.1 Robotic System
6.2 Challenges and Performance
6.3 Agility Evaluation
7 Discussion on Further Directions
References
A New Adoption of Cloud Computing Model for Saudi Arabian SMEs (ACCM-SME)
1 Introduction
2 Literature Review
3 Adoption Theories and Frameworks
4 Conceptual Framework
4.1 The Technological Context
4.2 The Organisational Context
4.3 The Environmental Context
4.4 The Social Context
5 Conclusion and Future Work
References
A New Approach for Optimal Selection of Features for Classification Based on Rough Sets, Evolution and Neural Networks
1 Introduction
2 Materials and Methods
2.1 Rough Set Theory
2.2 Neural Networks
2.3 Evolutionary Strategy
2.4 MNIST Data Set
2.5 Proposed Method
2.6 Experimental Setup
3 Results and Discussion
3.1 Performance Evaluation
3.2 State of the Art-Based Comparison
4 Conclusions and Future Work
References
An Unexpected Connection from Our Personalized Medicine Approach to Bipolar Depression Forecasting
1 Introduction
2 Methods
3 Results
4 Discussion
References
Real-Time Parallel Processing of Vibration Signals Using FPGA Technology
1 Introduction
2 The Architecture of the Acquisition System
2.1 Case 1: FFT Processing at the FPGA Circuit
2.2 Case 2: FFT Processing at the Microprocessor Level RT
2.3 Case 3: Processing at the PC Level
3 Software Development of Processing Application
3.1 Case 1: FFT Processing at FPGA Level
3.2 Case 2: FFT Processing at the Microprocessor Level RT
3.3 Case 3: FFT Processing at the PC Level
4 Experimental Results
4.1 Time Performance
4.2 Spectra View Obtained in Various Engine Scenarios
5 Future Work
6 Conclusions
References
Robust Control Design Solution for a Permanent Magnet Synchronous Generator of a Wind Turbine Model
1 Introduction
1.1 Related Literature Review
2 Wind Turbine Model
3 Control Scheme Implementation
4 Perturb and Observe Algorithm
5 Simulation Results
6 Conclusion
References
Knowledge Translator for Clinical Decision Support in Treating Hearing Disorders
1 Introduction
1.1 Tinnitus - A Medical “Enigma”
1.2 Tinnitus Retraining Therapy (TRT)
1.3 Clinical Decision Support System eTRT
1.4 eTRT Knowledge Base
2 Methods
2.1 Association Rules
2.2 Action Rules
2.3 Knowledge Translating Procedure
3 Experiments and Results
3.1 Graphical User Interface
3.2 Results
3.3 Discussion
4 Conclusions
References
Framework for Modelling of Learning, Acting and Interacting in Systems
1 Introduction
2 Definitions and Notations
3 Acting and Learning
3.1 Motivation
3.2 Stochastic Approach
3.3 Reductions to Games
4 Algorithms and Architectures
4.1 -greedy Algorithm
4.2 UCB Algorithm
4.3 Fictitious Play
4.4 From Classifier to Player
4.5 Neural Network
5 Further Work
References
How to Reduce Emissions in Maritime Ports? An Overview of Cargo Handling Innovations and Port Services
1 Introduction
2 Background
3 Methodology
3.1 Scope
3.2 Focus
3.3 Sources
4 Overview of Means to Facilitate Energy Transition in Ports
4.1 Berth
4.2 Transport for Cargo Operations and Allocation
4.3 Storage Yard
4.4 Port Gates
4.5 Port Administrative
4.6 Summary
5 Discussion
5.1 The Concept of a New Port
5.2 Extension of Port Orientation
5.3 Energy Transition
5.4 Limitations
6 Conclusion
References
Bridging the Domain Gap for Stance Detection for the Zulu Language
1 Introduction
2 Related Work
2.1 Domain Generalization, Adaptation, Randomization
2.2 Stance Detection
2.3 Explicit and Implicit Transfer Learning for NLP
3 Architecture, Methodology and Dataset
3.1 Step 1: Build the Training Dataset
3.2 Step 2: Build the Training Pipeline
3.3 Dataset
3.4 Baselines
4 Evaluation and Results
4.1 Domain Randomization
4.2 Domain Adaptation
5 Conclusions
References
Arcface Based Open Set Recognition for Industrial Fault
1 Introduction
2 Related Work
3 Method
3.1 Proposed Fault Classification Approach
3.2 Proposed Open Set Recognition Method
3.3 Arcface Loss Function
4 Experiments
4.1 Data Description
4.2 Experimental Design
4.3 Comparison Method
4.4 Experimental Result
5 Conclusion
References
Sensitivity of Electrophysiological Patterns in Level-K States as Function of Individual Coordination Ability
1 Introduction
2 Materials and Methods
2.1 Individual Coordination Ability (iCA)
2.2 Experimental Design
3 Results and Discussion
3.1 EEG Preprocessing Scheme
3.2 Classifying EEG Segments into their Corresponding Level-K
3.3 EEG Patterns Sensitivity Analysis
4 Conclusions and Future Work
References
A Machine Learning Approach for Discovery of Counter-Air Defense Tactics for a Cruise Missile Swarm
1 Introduction
2 A Framework for Evaluating Mission Effectiveness
3 SAM Engagement Geometry
4 ML Swarm Agent Design
4.1 Observation Function
4.2 Action Function
4.3 Behavior Function
5 Training Methodology
6 Baseline Simulation Scenario
7 Analytic Prediction of Mission Effectiveness
8 Simulation Results
8.1 Non-reactive CM Attack
8.2 Non-reactive Reduced RCS CM Attack
8.3 Autonomous ML Agent-Controlled CM Attack
8.4 Autonomous ML Agent-Controlled Reduced RCS CM Attack
9 Results Summary
10 Conclusions
References
Tackling Marine Plastic Littering by Utilizing Internet of Things and Gamifying Citizen Engagement
1 Introduction
2 Proposal Objectives and Challenges
3 The Proposed Solution
4 Research Methodology
5 Added Value and Impact
6 Conclusion
References
HiSAT: Hierarchical Framework for Sentiment Analysis on Twitter Data
1 Introduction
2 Related Work
3 Proposed Solution: HiSAT
3.1 Data Preprocessing
3.2 Word Cloud Visualization
3.3 Feature Extraction
3.4 Model Training with Batch of Classifiers
4 Implementation and Evaluation of Classifiers
4.1 Evaluation Metrics
4.2 Logistic Regression Classifier
4.3 Random Forest Classifier
4.4 Naïve Bayes Classifier
4.5 Comparison and Discussion
5 Conclusions and Future Work
References
Characterization of Postoperative Pain Through Electrocardiogram: A First Approach
1 Introduction, Motivation and Goals
2 Materials and Methods
2.1 Setup and Data Collection
2.2 ECG Processing
3 Results and Discussion
3.1 Results
3.2 Discussion
4 Conclusions and Future Work
References
Deep Learning Applied to Automatic Modulation Classification at 28 GHz
1 Introduction
2 System Model and Problem Description
2.1 RF Signal Description
2.2 Modulation Constellations
2.3 Statistical Features for the GRF
3 Classification Method
3.1 System Structure
3.2 Graphical Representation of Features (GRF)
3.3 Deep Learning Networks
4 Evaluated Performance
5 Conclusion
References
An Exploration of Students and Lecturers’ Insights of Online University Learning Implemented During the COVID‐19 Contagion
1 Introduction
2 Literature Review
3 Methodology
3.1 Data Collection and Analysis
3.2 Participants
4 Discussion of Results and Future Work
5 Conclusion
References
Combining Rule-Based System and Machine Learning to Classify Semi-natural Language Data
1 Introduction
2 Commands' Similarities
3 Related Work
4 Architecture
5 Data
5.1 Graphs
6 Combination of Rule-Based and ML Systems
7 Rule-Based System to Classify Commands
7.1 Set of Rules
7.2 Transforming Data to Binary Classes
8 Machine Learning Systems to Classify Commands
8.1 Document Classifier: Logistic Regression
8.2 Document Classifier: Using Transformers
8.3 Sentence-Pair Classifier: Using Transformers
9 Results
9.1 Flexibility of ML System
9.2 Testing on Unseen Data
10 Discussion
11 Conclusion
12 Future Work
References
ASDLAF: A Novel Autism Spectrum Disorder Learning Application Framework for Saudi Adults
1 Introduction
2 Related Work
3 ASDLAF Framework
3.1 Intermediate Factors
3.2 Technological Factors
3.3 Cultural Factors
3.4 Pedagogical Factors
4 Conclusion and Future Work
References
A Comprehensive eVTOL Performance Evaluation Framework in Urban Air Mobility
1 Introduction
2 Literature Review
3 Problem Statement
4 Proposed Framework
4.1 UTM Simulator
4.2 Dilation
4.3 Performance Evaluation
5 Results
6 Discussion
7 Conclusion
References
A New Arabic Online Consumer Reviews Model to Aid Purchasing Intention (AOCR-PI)
1 Introduction
2 Online Review Platforms
3 Theoretical Frameworks of OCRs Influence
3.1 Elaboration Likelihood Model
3.2 Hali’s Cultural Model
3.3 Hofstede’s Cultural Dimensions Framework
4 Conceptual Framework
5 Hypotheses Development
5.1 Review Depth and Purchase Intention
5.2 Review Valence and Purchase Intention
5.3 Review Readability and Purchase Intention
5.4 Review Images and Purchase Intention
5.5 Review Volume and Purchase Intention
5.6 Reviewer Identity Disclosure and Purchase Intention
5.7 Reviewer Reputation and Purchase Intention
5.8 Reviewer Experience and Purchase Intention
6 Conclusion and Future Work
References
Trustworthy Artificial Intelligence for Cyber Threat Analysis
1 Overview
1.1 Literature Review
1.2 Bias in Existing AI/ML Algorithms
2 Using Adversarial ML Model to Discover Bias
2.1 Notation of Bias and Mitigation
3 AI Bias and ML-Based Threat Analytics
4 Programming and Experiments
References
Entropy of Shannon from Geometrical Modeling of Covid-19 Infections Data: The Cases of USA and India
1 Introduction
2 Motivation of Geometry-Based Models
3 Trapezoid Model of Pandemic
4 Entropy of Shannon
5 Discussion of Results from Entropic Modeling
5.1 United States of America
5.2 India
6 Conclusion
References
Optimization of the BANK’s Branch Network Using Machine Learning Methods
1 Introduction
1.1 Overview of Related Works
1.2 Description of Task
2 Data
3 Machine Learning Model for Customer Flow
3.1 Algorithm Choice
3.2 Model Tuning
3.3 Model Results
4 Conclusion
References
An Adaptive Hybrid Active Learning Strategy with Free Ratings in Collaborative Filtering
1 Introduction
2 Related Work
3 Background
4 Our Proposed Method
4.1 Adaptive Hybridization of Personalized and Non-personalized AL Strategies
4.2 Rating Elicitation with Free Ratings
5 Experimental Setup
5.1 Dataset
5.2 Base Recommender System
5.3 Comparison Methods
5.4 Evaluation Measure
6 Results and Discussion
6.1 Personalized vs Non-personalized
6.2 Adaptive Hybrid Strategy
6.3 Free Ratings
7 Conclusion and Future Work
References
Service Response Time Estimation in Crowdsourced Processing Chain
1 Introduction
2 Related Work
3 Methodology
3.1 Map-Matching Service
3.2 Response Time Estimation
4 Experiment
4.1 Experimental Setup
4.2 Datasets
4.3 Preprocessing Data
4.4 Evaluation
4.5 Experimental Methodology
5 Results and Discussion
5.1 Data Distribution Fitting
5.2 Response Time Estimation per Data Point
5.3 Accuracy Assessment
6 Conclusions
References
How to Build an Optimal and Operational Knowledge Base to Predict Firefighters' Interventions
1 Introduction
2 Related Work
3 Database Design
3.1 The Targets
3.2 The Features
4 Experimental Results
4.1 Experimental Protocol
4.2 Obtained Results
5 Conclusion
References
Explainable Black Box Models
1 The Quest for Explainability in Artificial Intelligence Models
1.1 Introduction
1.2 Purpose of the Paper
2 Are Inherently Explainable Models Needed?
2.1 Black Box Models Versus Explainable Models
2.2 Explanations that Are Not Inherent in the Model Are Not Faithful to What the Original Model Computes
2.3 Completing Explanations
3 Explainable Black Box Models
3.1 Explanations as Output of a Black Box Model
3.2 Variance of Outcomes of Explainable Black Box Models
4 The Use of Explainable Black Box Models
4.1 Explainable Black Box Models Are More Application-Oriented Than Explainable Models
4.2 Rigorous Analysis of Explanations
4.3 Simulations
5 Conclusion
References
Vehicle Usage Extraction Using Unsupervised Ensemble Approach
1 Introduction
2 Related Work
3 Data Representation
4 Problem Formulation
5 Proposed Methodology
5.1 Data Integration
5.2 Data Segmentation
5.3 Ensemble Clustering
5.4 Pattern Extraction
6 Results
6.1 Ensemble Clustering Evaluation
6.2 Cluster Analysis
6.3 Utility Function
6.4 Vehicle Usage Style
7 Discussion and Conclusion
References
Experimental Design of a Quantum Convolutional Neural Network Solution for Traffic Sign Recognition
1 Introduction
2 Designing a Solution
2.1 Quantum Limitations and Dataset Reduction
2.2 Transforming Data into Qubits
2.3 Applying Convolutional and Pooling Layers
2.4 Quantum Principle Component Analysis
3 Testing and Evaluation
3.1 Program Optimization
3.2 Program Results with Optimised Parameters
4 Conclusions and Future Work
References
Impact of Image Sensor Output Data on Power Consumption of the Image Processing System
1 Introduction
2 Related Work
3 Methodology
3.1 Power Measurement Setup of the MIPI Communication
3.2 Energy Measurement Setup of the System
4 MIPI Power Measurement Results
5 System Measurement Results
6 Conclusion and Future Work
References
IAI-CGM: A Framework for Intention to Adopt IoT-Enabled Continuous Glucose Monitors
1 Introduction
2 Related Work
2.1 Diabetes Testing
2.2 Diabetes Monitoring
2.3 Role of CGM in T1DM Primary Care
2.4 IoT-CGM Adoption Concerns
3 Proposed Adoption Framework: IAI-CGM
3.1 Human Factors
3.2 Interpersonal Influence
3.3 Innovativeness
3.4 Trustworthiness
3.5 Self-efficacy
3.6 Attitude Towards Wearable Devices
3.7 Health Interest
3.8 Perceived Value
3.9 IAI-CGM Dimensions
4 Research Hypothesis
5 Conclusion and Future Work
References
Artificial Vision Algorithm for Behavior Recognition in Children with ADHD in a Smart Home Environment
1 Introduction
2 Related Work
3 Methodology
3.1 Hardware Customizability
3.2 Classification Algorithm
3.3 Experiment
4 Discussion of the Results
5 Conclusions
References
Spread of Fake News About Covid: The Ecuadorian Case
1 Introduction
2 Methodology Followed for This Case Study
3 Results Obtained
4 Discussion/Opinions
5 Conclusion
References
Data Augmentation Methods for Electric Automobile Noise Design from Multi-Channel Steering Accelerometer Signals
1 Introduction
2 Proposed Method
3 Experiments
3.1 Data Collection and Experimental Setting
3.2 Results
4 Conclusions and Future Works
References
Real-Time Student Attendance System Using Face Recognition
1 Introduction
2 Proposed System
3 Conclusions
References
The Unfortunate Footnote: Using the Affective Reasoner to Generate Fortunes-of-Others Emotions in Story-Morphs
1 Introduction and Motivation
2 Background Theory
3 How Story Morphing Works
3.1 The Formal Discrete-Sequenced Plot Derived from an Original Text Narrative
3.2 Goals
3.3 Goals that Q Believes that R has (Goals of Others)
3.4 Goals that Q Believes that R Believes that S has (Goals of Others of Others)/Relationship that Q Believes R has Toward S
3.5 Standards/Principles
3.6 Standards that Q Believes that R has (Standards of Others)
3.7 Standards that Q Believes that R Believes that S has (Standards of Others of Others)/Relationship that Q Believes R has Toward S
3.8 Preferences
3.9 Friendship, Animosity and Cognitive Unit Relationships
3.10 Personalities
3.11 Story-Morphs
4 Implementation
5 Conclusions and Summary
References
Application of the Solution for Analysis of IT Systems Users Experience on the Example of Internet Bank Usage
1 Introduction
2 Background and Related Work
2.1 The Scope of UX Analysis
2.2 Trends in UX Analysis
3 Approach Used to Obtain and Visualise UX Characteristics from Audit Log Data
4 User Study
4.1 The Most/Least Popular or Achieved vs. Unachieved Goals
4.2 Analysis of User Journeys
4.3 Analysis of Individual Actions of User Journeys
5 Conclusions and Future Research
References
Real-World Computer Vision for Real-World Applications: Challenges and Directions
1 Introduction
2 Challenges of Real-World Computer Vision
2.1 Privacy-Awareness
2.2 Latency Constraints and Online Execution
2.3 Visual Robustness and Resiliency
2.4 Scalability and Generalization
3 Background and Related Work
3.1 End-to-End Computer Vision
3.2 Robustness Against Noisy Data
3.3 Domain Adaptation and Generalization
4 Proposed Algorithmic Shifts for Real-World Computer Vision
4.1 Privacy-Aware Identity-Neutral Vision Pipeline
4.2 Semantic Feedback with Spatial/Temporal Graph to Enable End-to-End Robustness
5 Early Results and Evaluation
5.1 Case Study: Smart Health Monitoring System
5.2 Experimental Setup
5.3 System Validation
5.4 Person Trajectory Prediction
5.5 Action Detection
5.6 Health Monitoring
6 Conclusions
References
Proactive Chatbot Framework Based on the PS2CLH Model: An AI-Deep Learning Chatbot Assistant for Students
1 Introduction
2 Literature Review
3 The Proposed Proactive Chatbot Framework for Students Based on PS2CLH’s Model
3.1 Wide-ranging Extended Chatbot
3.2 Educational Chatbot Ecosystem
4 Test the Proactive Chatbot Framework and Extending BERT Chatbot, Results
5 Conclusion and Future Works
References
MR-VDENCLUE: Varying Density Clustering Using MapReduce
1 Introduction
2 Challenges Of Implementing a Distributed VDENCLUE Algorithm
3 MR-VDENCLUE Algorithm
3.1 Step 1: LSH Partitioning
3.2 Step 2: Local Clustering
3.3 Step 3: Attractors Partitioning
3.4 Step 4: Merging
3.5 Step 5: Relabelling
4 Implementation Using MapReduce
5 Experimental Evaluation
5.1 Selection of k Parameter
5.2 Computational Cost
6 Conclusion
References
A Case for Modern Build Automation for Intelligent Systems
1 Introduction
2 Comparison: Our Approach
2.1 Reducing the Subjects for Comparison
2.2 Analysis of the Research Subjects
2.3 Development of Comparison Criteria
2.4 Development of a Reference Scenario
2.5 Concluding Comparison of the Analyzed Subjects
3 CI/CD Tool Evaluation: GitLab
3.1 Internals
3.2 Pipelines
3.3 Additional Features
3.4 Platforms
4 Evaluation Summary
5 Conclusion
References
The Self-discipline Learning Model with Imported Backpropagation Algorithm
1 Introduction
2 Basic Theories
2.1 Probability Scale Self-organization
2.2 Probability Space Distance
2.3 CNN Backpropagation
3 Model Based on the Combination of CNN and SDL
4 Data and Preprocessing
5 Analysis and Results
6 Conclusion
References
Author Index
Recommend Papers

Intelligent Systems and Applications: Proceedings of the 2022 Intelligent Systems Conference (IntelliSys) Volume 1 (Lecture Notes in Networks and Systems, 542)
 3031160711, 9783031160714

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Lecture Notes in Networks and Systems 542

Kohei Arai   Editor

Intelligent Systems and Applications Proceedings of the 2022 Intelligent Systems Conference (IntelliSys) Volume 1

Lecture Notes in Networks and Systems Volume 542

Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas— UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Turkey Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong

The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science. For proposals from Asia please contact Aninda Bose ([email protected]).

More information about this series at https://link.springer.com/bookseries/15179

Kohei Arai Editor

Intelligent Systems and Applications Proceedings of the 2022 Intelligent Systems Conference (IntelliSys) Volume 1

123

Editor Kohei Arai Saga University Saga, Japan

ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-3-031-16071-4 ISBN 978-3-031-16072-1 (eBook) https://doi.org/10.1007/978-3-031-16072-1 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Editor’s Preface

It gives me immense pleasure and privilege to present the proceedings of Intelligent Systems Conference (IntelliSys) 2022 which was held in a hybrid mode on 1 and 2 September 2022. IntelliSys was designed and organized in such a wonderful manner in Amsterdam, the Netherlands, that in-person and online delegates shared their valuable reasearch in an engaging discussion, and hence, we were able to take advantage of the best that the two modalities can offer. IntelliSys is a prestigious annual conference on artificial intelligence and aims to provide a platform for discussing the issues, challenges, opportunities and findings of its applications to the real world. This conference was hugely successful in discussing the erstwhile approaches, current researches and future areas of study in the field of intelligent systems. The researches managed to give workable solutions to many intriguing problems faced across different fields covering deep learning, data mining, data processing, human–computer interactions, natural language processing, expert systems, robotics, ambient intelligence, to name a few. They also let us see through what the future would look like if artificial intelligence was entwined in our life. One of the meaningful and valuable dimensions of this conference is the way it brings together researchers, scientists, academicians and engineers on one platform from different countries. The aim was to further increase the body of knowledge in this specific area by providing a forum to exchange ideas and discuss results and to build international links. Authors from 50+ countries submitted a total of 494 papers to be considered for publication. Each paper was reviewed on the basis of originality, novelty and rigorousness. After the reviews, 193 were accepted for presentation, out of which 176 (including 8 posters) papers are finally being published in the proceedings. We would like to extend our gratitude to all the learned guests who participated on site as well as online to make this conference extremely fruitful and successful and also special note of thanks to the technical committee members and reviewers for their efforts in the reviewing process.

v

vi

Editor’s Preface

We sincerely believe this event will help to disseminate new ideas and inspire more international collaborations. We kindly invite all to continue to support future IntelliSys conferences with the same enthusiasm and fervour. Kind Regards, Kohei Arai

Contents

Anomaly-Based Risk Detection Using Digital News Articles . . . . . . . . . . Andreas Pointner, Eva-Maria Spitzer, Oliver Krauss, and Andreas Stöckl

1

Robust Rule Based Neural Network Using Arithmetic Fuzzy Inference System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . József Dombi and Abrar Hussain

17

A Hierarchical Approach for Spanish Sign Language Recognition: From Weak Classification to Robust Recognition System . . . . . . . . . . . Itsaso Rodríguez-Moreno, José María Martínez-Otzeta, and Basilio Sierra

37

Deontic Sentence Classification Using Tree Kernel Classifiers . . . . . . . . Davide Liga and Monica Palmirani

54

Sparse Distributed Memory for Sparse Distributed Data . . . . . . . . . . . . Ruslan Vdovychenko and Vadim Tulchinsky

74

Quantum Displacements Dictated by Machine Learning Principles: Towards Optimization of Quantum Paths . . . . . . . . . . . . . . . . . . . . . . . Huber Nieto-Chaupis A Semi-supervised Vulnerability Management System . . . . . . . . . . . . . . Soumyadeep Ghosh, Sourojit Bhaduri, Sanjay Kumar, Janu Verma, Yatin Katyal, and Ankur Saraswat

82 97

Evaluation of Deep Learning Techniques in Human Activity Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 Tiago Mendes and Nuno Pombo Siamese Neural Network for Labeling Severity of Ulcerative Colitis Video Colonoscopy: A Thick Data Approach . . . . . . . . . . . . . . . . . . . . . 124 Jinan Fiaidhi, Sabah Mohammed, and Petros Zezos

vii

viii

Contents

Self-supervised Contrastive Learning for Predicting Game Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 Young Jae Lee, Insung Baek, Uk Jo, Jaehoon Kim, Jinsoo Bae, Keewon Jeong, and Seoung Bum Kim Stochastic Feed-forward Attention Mechanism for Reliable Defect Classification and Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 Jiyoon Lee, Chunghyup Mok, Sanghoon Kim, Seokho Moon, Seo-Yeon Kim, and Seoung Bum Kim Bregman Divergencies, Triangle Inequality, and Maximum Likelihood Estimates for Normal Mixtures . . . . . . . . . . . . . . . . . . . . . . 159 Bernd-Jürgen Falkowski Self-supervised Learning for Predicting Invisible Enemy Information in StarCraft II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 Insung Baek, Jinsoo Bae, Keewon Jeong, Young Jae Lee, Uk Jo, Jaehoon Kim, and Seoung Bum Kim Measuring Robot System Agility: Ontology, System Theoretical Formulation and Benchmarking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 Attila Vidács, Géza Szabó, and Marcell Balogh A New Adoption of Cloud Computing Model for Saudi Arabian SMEs (ACCM-SME) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 Mohammed Alqahtani, Natalia Beloff, and Martin White A New Approach for Optimal Selection of Features for Classification Based on Rough Sets, Evolution and Neural Networks . . . . . . . . . . . . . 211 Eddy Torres-Constante, Julio Ibarra-Fiallo, and Monserrate Intriago-Pazmiño An Unexpected Connection from Our Personalized Medicine Approach to Bipolar Depression Forecasting . . . . . . . . . . . . . . . . . . . . . 226 Milena B. Čukić, Pavel Llamocca, and Victoria Lopez Real-Time Parallel Processing of Vibration Signals Using FPGA Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 Bogdan Popa, Dan Selișteanu, and Ion Marian Popescu Robust Control Design Solution for a Permanent Magnet Synchronous Generator of a Wind Turbine Model . . . . . . . . . . . . . . . . 258 Silvio Simani and Edy Ayala Knowledge Translator for Clinical Decision Support in Treating Hearing Disorders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 Katarzyna Tarnowska and Jordan Conragan

Contents

ix

Framework for Modelling of Learning, Acting and Interacting in Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 Artur Popławski How to Reduce Emissions in Maritime Ports? An Overview of Cargo Handling Innovations and Port Services . . . . . . . . . . . . . . . . . . . . . . . . 295 Sergey Tsiulin and Kristian Hegner Reinau Bridging the Domain Gap for Stance Detection for the Zulu Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 Gcinizwe Dlamini, Imad Eddine Ibrahim Bekkouch, Adil Khan, and Leon Derczynski Arcface Based Open Set Recognition for Industrial Fault . . . . . . . . . . . 326 Jeongseop Yoon, Donghwan Kim, and Daeyoung Kim Sensitivity of Electrophysiological Patterns in Level-K States as Function of Individual Coordination Ability . . . . . . . . . . . . . . . . . . . . . 336 Dor Mizrahi, Inon Zuckerman, and Ilan Laufer A Machine Learning Approach for Discovery of Counter-Air Defense Tactics for a Cruise Missile Swarm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348 Joshua Robbins and Laurie Joiner Tackling Marine Plastic Littering by Utilizing Internet of Things and Gamifying Citizen Engagement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367 Stavros Ponis, George Plakas, Eleni Aretoulaki, and Dimitra Tzanetou HiSAT: Hierarchical Framework for Sentiment Analysis on Twitter Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376 Amrutha Kommu, Snehal Patel, Sebastian Derosa, Jiayin Wang, and Aparna S. Varde Characterization of Postoperative Pain Through Electrocardiogram: A First Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393 Raquel Sebastião Deep Learning Applied to Automatic Modulation Classification at 28 GHz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403 Yilin Sun and Edward A. Ball An Exploration of Students and Lecturers’ Insights of Online University Learning Implemented During the COVID‐19 Contagion . . . 415 Mohammed Yahya Alghamdi Combining Rule-Based System and Machine Learning to Classify Semi-natural Language Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424 Zafar Hussain, Jukka K. Nurminen, Tommi Mikkonen, and Marcin Kowiel

x

Contents

ASDLAF: A Novel Autism Spectrum Disorder Learning Application Framework for Saudi Adults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442 Yahya Almazni, Natalia Beloff, and Martin White A Comprehensive eVTOL Performance Evaluation Framework in Urban Air Mobility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459 Mrinmoy Sarkar, Xuyang Yan, Abenezer Girma, and Abdollah Homaifar A New Arabic Online Consumer Reviews Model to Aid Purchasing Intention (AOCR-PI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475 Ahmad Alghamdi, Natalia Beloff, and Martin White Trustworthy Artificial Intelligence for Cyber Threat Analysis . . . . . . . . 493 Shuangbao Paul Wang and Paul A. Mullin Entropy of Shannon from Geometrical Modeling of Covid-19 Infections Data: The Cases of USA and India . . . . . . . . . . . . . . . . . . . . 505 Huber Nieto-Chaupis Optimization of the BANK’s Branch Network Using Machine Learning Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514 Dorzhiev Ardan An Adaptive Hybrid Active Learning Strategy with Free Ratings in Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531 Alireza Gharahighehi, Felipe Kenji Nakano, and Celine Vens Service Response Time Estimation in Crowdsourced Processing Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546 Jorge Rodríguez-Echeverría, Casper Van Gheluwe, Daniel Ochoa, and Sidharta Gautama How to Build an Optimal and Operational Knowledge Base to Predict Firefighters’ Interventions . . . . . . . . . . . . . . . . . . . . . . . . . . . 558 Christophe Guyeux, Abdallah Makhoul, and Jacques M. Bahi Explainable Black Box Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573 Wim De Mulder Vehicle Usage Extraction Using Unsupervised Ensemble Approach . . . . 588 Reza Khoshkangini, Nidhi Rani Kalia, Sachin Ashwathanarayana, Abbas Orand, Jamal Maktobian, and Mohsen Tajgardan Experimental Design of a Quantum Convolutional Neural Network Solution for Traffic Sign Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . 605 Dylan Cox Impact of Image Sensor Output Data on Power Consumption of the Image Processing System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618 Gernot Fiala, Johannes Loining, and Christian Steger

Contents

xi

IAI-CGM: A Framework for Intention to Adopt IoT-Enabled Continuous Glucose Monitors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637 Hamad Almansour, Natalia Beloff, and Martin White Artificial Vision Algorithm for Behavior Recognition in Children with ADHD in a Smart Home Environment . . . . . . . . . . . . . . . . . . . . . . 661 Jonnathan Berrezueta-Guzman, Stephan Krusche, Luis Serpa-Andrade, and María-Luisa Martín-Ruiz Spread of Fake News About Covid: The Ecuadorian Case . . . . . . . . . . 672 Lilian Molina, Gabriel Arroba, Xavier Viteri-Guevara, Sonia Tigua-Moreira, Arturo Clery, Lilibeth Orrala, and Ericka Figueroa-Martínez Data Augmentation Methods for Electric Automobile Noise Design from Multi-Channel Steering Accelerometer Signals . . . . . . . . . . . . . . . 679 Yongwon Jo, Keewon Jeong, Sihu Ahn, Eunji Koh, Eunsung Ko, and Seoung Bum Kim Real-Time Student Attendance System Using Face Recognition . . . . . . . 685 Ahmad Aljaafreh, Wessam S. Lahloub, Mohamed S. Al-Awadat, and Omar M. Al-Awawdeh The Unfortunate Footnote: Using the Affective Reasoner to Generate Fortunes-of-Others Emotions in Story-Morphs . . . . . . . . . . . . . . . . . . . 690 Clark Elliott Application of the Solution for Analysis of IT Systems Users Experience on the Example of Internet Bank Usage . . . . . . . . . . . . . . . 708 Oksana Ņikiforova, Vitaly Zabiniako, Jurijs Kornienko, Madara Gasparoviča-Asīte, and Amanda Siliņa Real-World Computer Vision for Real-World Applications: Challenges and Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 727 Hamed Tabkhi Proactive Chatbot Framework Based on the PS2CLH Model: An AI-Deep Learning Chatbot Assistant for Students . . . . . . . . . . . . . . 751 Arlindo Almada, Qicheng Yu, and Preeti Patel MR-VDENCLUE: Varying Density Clustering Using MapReduce . . . . . 771 Ghazi Al-Naymat, Mariam Khader, Mohammed Azmi Al-Betar, Raghda Hriez, and Ali Hadi A Case for Modern Build Automation for Intelligent Systems . . . . . . . . 789 Alexander Grunewald, Moritz Lange, Kim Chi Tran, Arne Koschel, and Irina Astrova

xii

Contents

The Self-discipline Learning Model with Imported Backpropagation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 800 Zecang Gu, Xiaoqi Sun, and Yuan Sun Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817

Anomaly-Based Risk Detection Using Digital News Articles Andreas Pointner(B) , Eva-Maria Spitzer, Oliver Krauss, and Andreas St¨ ockl University of Applied Sciences Upper Austria, Hagenberg i. M¨ uhlkreis, Austria {andreas.pointner,eva-maria.spitzer,oliver.krauss, andreas.stockl}@fh-hagenberg.at

Abstract. Enterprise risk management is a well established methodology used in industry. This area relies heavily on risk owners and their expert opinion. In this work, we present an approach to a semi-automated risk detection for companies using anomaly detection. We present various anomaly detection algorithms and present an approach on how to apply them on multidimensional data sources like news articles and stock data to automatically extract possible risks. To do so, NLP methods, including sentiment analysis, are used to extract numeric values from news articles, which are needed for anomaly analysis. The approach is evaluated by conducting interview questionnaires with domain experts. The results show that the presented approach is a useful tooling that helps risk owners and domain expert to find and detect potential risks for their companies. Keywords: Anomaly detection analytics

1

· Risk management · Predictive

Introduction

In this work, we propose a methodology to support risk managers to detect and identify possible risks based on mining anomalies in data sources. Due to the increasing amount of data that is produced, it is nearly impossible for risk managers to manually analyze the wealth of information. To overcome this problem, we developed a concept to aggregate data sources and enable the automatic detection of risks in various data sources by utilizing anomaly detection algorithms. We created a prototype dashboard (see Fig. 1) where we combined dataset ranging from news articles to stock values. We tested it with data from five different companies, identified risks from anomalies, and analyzed the results with domain experts. Risk analysis has been performed for 100s of years across various epochs in human history [7]. Enterprise risk management (ERM) has gained more and more significance in companies. Resource Based View theory (RBV) classifies ERM as a strategic asset, that can lead to competitive advantages. Saeidi et al. [17] prove that theory and show, that ERM leads to a competitive advantage. In c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023  K. Arai (Ed.): IntelliSys 2022, LNNS 542, pp. 1–16, 2023. https://doi.org/10.1007/978-3-031-16072-1_1

2

A. Pointner et al.

large companies, there is a high amount of impact factors for identifying risks. Beside internal data, and company related values, it is necessary to keep track of competitors, suppliers, and customers. It is necessary to find upcoming issues as fast as possible to be able to take countermeasures.

Fig. 1. Dashboard view for a single company. The top visualization shows the various data sources like number of news entries, positive and negative number of entries as well as the stock combined with the detected anomalies. The bottom visualization shows news clusters with detected possible risks for a selected news cluster. The slider on the top allows regulating the sensitivity of the anomaly detection for the top visualization.

The goal of this work is to automatically detect possible risks from various key figures, that are extracted from news sources, company internal data or publicly available information outlets. In order to get a high amount of up-to-date data, our approach focuses on analyzing recent news articles. Additionally, we use key performance indicators

Anomaly-Based Risk Detection Using Digital News Articles

3

available in real time, such as stock data. As all of those values have in common that they do have a certain timestamp for each value, it is possible to treat them as time series data. This allows performing time series anomaly detection on them, to find possible risks in the data. To be able to analyze the data, it is necessary to convert it from text into numerical information. For the news data, we are using the news count per day. Via sentiment analysis, the count of positive and negative news entries is also used as a feature. Our approach allows combining these different data points, into a single multivariate scenario. This means that anomaly detection approaches must be able to perform on multidimensional data. If an anomaly is found in the data, this does not automatically indicate that there is a risk present. In order to identify a possible risk from anomalies in data, we utilize news clusters and analyze the news articles to check if they contain specific keywords, or if they are associated with specific subjects. These keywords and subjects are then mapped to specific risks. In summary, this paper makes the following contributions: – Showing the relevance of anomaly detection strategies on risk detection. – Presenting an approach on how to extract possible risks from news articles, stock data and other KPIs such as internal sales data. The remainder of this work outlines an overview of anomaly detection algorithms as well as seasonal decomposition and sentiment analyses in Sect. 2. Section 3 describes our approach on finding risks in news articles utilizing anomaly detection on time series data. Our approach was evaluated by interviewing domain experts in the field of risk management, in Sect. 4. Following that, the results of the interviews were presented in Sect. 5. In Sect. 6 the related work on risk detection and analyses is shown. Finally, Sect. 7 concludes the paper with some final remarks and possible future work.

2

Background

Anomaly detection algorithms are required to find relevant data points in time series data that could contain potential risks. All algorithms perform unsupervised anomaly detection. As Goldstein et al. [10] state, it is the most flexible setup, and does not require any labels on the data. Our work targets multiple datasets and is extensible for proprietary data sources, we cannot guarantee the presence of labels. Thus, unsupervised learning is an important aspect, and the algorithms selected are from this domain. In addition, we describe the concept of natural language processing, which is used to preprocess news articles. It allows us to extract sentiment scores from news articles, which can be used to filter for positive or negative news articles. Especially negative news may contain possible risks.

4

2.1

A. Pointner et al.

Elliptic Envelope

Elliptic Envelope (EE) is a commonly used approach to detect outliers in data. Using this algorithm, it is assumed that the data follows a specific distribution (e.g., Gaussian). Based on this assumption, the shape of the data is used to detect outliers. In the training phase, the EE creates an imaginary ellipse around the data, whereby regular data points are inside the ellipse and outliers outside. In the test phase, all points that are outside the shape are outliers and all points inside the shape are considered as regular data points. The Elliptic Envelope makes use of the FAST-Minimum Covariance Determinant (FastMCD) algorithm [16] that supports the EE to learn an optimized ellipse boundary while keeping most of the data. 2.2

Local Outlier Factor

The Local Outlier Factor (LOF) is a density based outlier detection approach [2]. This algorithm takes the density of the neighbors of a particular sample into account when determining outliers. The calculation compares the density of a specific sample with the reachability distance of the k nearest neighbor samples. All points that are above a specified threshold are outliers. The concept behind this approach is that outliers come from a low-density area. Thus, the ratio will be larger for outliers than for regular data points. 2.3

Isolation Forest

The Isolation Forest is an outlier detection method with similarities to the random forest classifier [1]. The term isolation originates from “separating the data into isolated areas” meaning that the data is partitioned iteratively [14]. Generally, the isolation forest consists of a set of isolation trees. For each tree in the forest, first an attribute (=feature) is selected randomly to split the data in the initial node. The split is made at a random point between the min and max range of the attribute. The initial node is then split into two children. These steps are repeated until each node contains only one element [1]. Points in denser regions require more splits to be isolated than points in sparse regions [1]. Points in sparse regions are likely to be anomalies, as they require fewer splits. The number of splits made from the root node to the leaf corresponds to the outlier score. Since an isolation forest contains many isolation trees, the average of the scores of the respective trees is taken. According to Liu et al. [14] sub-sampling enables the algorithm to be more efficient by a) better isolating examples of anomalies as the data size is controlled and b) sub-sampling applied to anomalies and non anomalies. 2.4

Seasonal and Trend Decomposition Using Loess

Seasonal and Trend decomposition using Loess (STL) is an algorithm for decomposing time series data into trend and seasonal components. The method was

Anomaly-Based Risk Detection Using Digital News Articles

5

developed by Cleveland et al. [5]. The method consists of three parts, namely the loess regression curve, as well as the inner and outer loops. The loess regression is used to provide a smoothing of the input curve, as well as transform from discrete to continuous data. The inner loop updates in each iterate the seasonal and trend components, whereas the outer loop computes the robust weights, which are then in turn used for the next inner-loops iterations. The results of the overall process leads to a decomposition of the input data into three parts: The trend, the seasonal and the reminder, which is also often call residuals. In the remainder of this work by STL, we are referencing to this algorithm and corresponding the implementation of stats models1 library. STL can, in combination with an anomaly detection algorithm, be used for detecting anomalies in seasonal data sources. For this, the anomaly detection is performed on the residuals of the trend analyzation [18]. 2.5

Natural Language Processing

Natural language processing (NLP) is a field of computer sciences which deals with processing the human language [4]. It can be used to analyze human written text, and extract information from it. A common subfield of NLP is sentiment analysis, which is used to determine the opinion, sentiment, or attitude from a written language [9]. In this work, we are utilizing sentiment analysis to evaluate if news articles are negative, neutral, or positive. For our approach, we are utilizing germansentiment 2 for German articles, as well as Vader from nltk 3 for English articles. Germansentiment is a model that was trained by Guhr et al. [11] using the language model BERT [8]. Vader was developed by Hutto et al. [12] and is a rule-based model for performing sentiment analysis of social media texts.

3

Approach

In this section, we present our risk detection approach. This approach is split into multiple steps (1) obtaining data, (2) merging multiple data sources, (3) detecting anomalies in time series, (4) finding possible risks in news-clusters and finally (5) combining the results of (3) and (4). 3.1

Data and Preprocessing

We first describe various data sources that can be utilized to obtain data. Following that, we explain pre-processing steps, that are necessary to work with the described data. 1 2 3

https://www.statsmodels.org/v0.12.1/generated/statsmodels.tsa.seasonal.STL. html. https://pypi.org/project/germansentiment/. https://pypi.org/project/nltk/.

6

A. Pointner et al.

The main data source is a news API, providing an intelligence platform to access NLP-enriched news entries. The API enables the retrieval of news stories for different search queries, which allows filtering for text, language, and publication timestamps to name a few. Beside the ability to query news, it also offers endpoints to list trends, collect clusters or list news counts. We collect the news count, the news stories as well as the trends, for a given search query. These search queries highly depend on the use-case and can either focus on a single company, a specific topic, profile, or a complete market environment, including competitors, suppliers, and customers. Based on the trend results, we can dig deeper into the information space and collect associated news clusters. Newsclusters are a collection of news stories, which are related to the same real-world event. More data is obtained from different sources, depending on the type of the company. For joint stock companies, the stock data is another key performance indicator. We utilize a stock API that allows us to retrieve stock data for various companies. The endpoint that is used in this work returns the open, close, high, and low value of the stock data for each day in the selected period. For companies where we have insights in internal data, we import only data that contains a timestamp for its entries, as only such data is applicable to time series analysis. This allows including internal key figures like sales figures. When querying various data sources, it is necessary to convert them into one multidimensional time series. As all the data points contain a timestamp, it is necessary to correlate the various timestamps. To do so, one data source is used as the base source, and all other data are merged into these timestamps. This allows various approaches, that allow to merge the data into the correct slots. Our approach offers different merging strategies to fulfill that requirement. We differentiate between three approaches: latest, closest and earliest. There are different scenarios, when to use which one. For the next few examples, we will use two data source, and call them source and merge. Merge is the data source, that is going to be added to the source data points, using the timestamps for source. The strategy Latest can be used, when a real world event influences merge later than it does with source. E.g., seismological data and stock value. An earthquake will influence the seismological data directly, but the stock values of affected companies will probably change in the next day. Earliest is the inverse of Latest. Finally, Closest merge the data points at the best matching timestamp. This strategy is used, when there is no known correlation between the source and merge. In addition to these three strategies, we do have to distinguish between the amount of data points. If we merge a dataset, where the source has more points, than the one that is merged, or vice versa. When merging a data source with fewer points, values are duplicated to fill empty timestamps between the available data points. When we have to deal with multiple data points being merged to one timestamp, we need to reduce multiple values to a single one. In order to not lose important information, we are not only averaging the value, but also adding additional columns. This includes the number of values, as well as min and

Anomaly-Based Risk Detection Using Digital News Articles

7

max value. These two different scenarios, as well as the three different merging strategies, are shown in Fig. 2.

Source

Source

Merge

Merge

Source

Source

Merge

Merge

Source

Source

Merge

Merge

Latest

Closest

Earliest

Fig. 2. The different merging strategies latest, closest and earliest from top to bottom. The visualization on the left is showing the case when a data source with fewer data points is merged into one with more data points. The image on the right shows more data points being combined into fewer timestamps.

For the remainder of the work, we are going to use the following data elements, numeric data and non-numeric data, as they represent the most commonly used data that was used in our use-cases. The data consisted of two different types of data elements. On the one hand, we have numeric data, which consists of the number of news entries, the number of positive and negative news entries and the stock value. Each of these four data elements is present once a day. In order to find out which news articles are positive or negative, the news text are preprocessed using a sentiment analysis. These data elements are combined using the above described merging approach, resulting in a single time-series dataset, where the x-axis represents the timestamp with a step-width of one day, and the y-axis represents the corresponding value. These data sets are also shown in Fig. 4. On the other hand, we have non-numeric data, which are the news articles themselves, or to be more precise news clusters. Each of these clusters combine a set of news articles, that are related to the same real world event. In addition to the set of articles, the clusters also provide statistical information. This includes the start and end-date of the event, based on the timestamps of the news articles, as well as a representative article. The news clusters are retrieved for the same timestamp as the numeric data above, which allows combining the information of both, in a latter step. An example of such clusters is shown in the top visualization of Fig. 5. In the following, a running example is shown on how the data could be retrieved and which numeric data, as well as non-numeric data, is extracted.

8

A. Pointner et al.

Figure 3 shows an example of eight different news articles, each containing a date and a sentiment score, that are clustered into three clusters. From these articles, the numerical data can be retrieved. The results are shown in Table 1. In addition, the cluster’s start and end date can be determined, by using the earliest as well as the latest news article in the respective cluster. All these news articles also contain a news-text, where keywords are extracted from. This data is omitted in the example.

Cluster 1

Cluster 2 News 1

News 2

Cluster 3 News 4

News 5

News 6

date: 03.03.2021

date: 02.03.2021

date: 21.02.2021

date: 03.03.2021

date: 08.03.2021

sentiment: positive

sentiment: negative

sentiment: neutral

sentiment: positive

sentiment: negative

News 7

News 3

News 8

date: 08.03.2021

date: 08.03.2021

date: 11.03.2021

sentiment: negative

sentiment: negative

sentiment: neutral

Fig. 3. Running example of eight news articles, which are each labelled with a date (timestamp) and a sentiment. They are clustered into three news clusters.

Table 1. Results from the running example from Fig. 3 showing the cumulated number of articles for a given day, as well as the number of positive, neutral and negative articles. Date

3.2

News count Positive count Neutral count Negative count

21.02.2021 1

0

1

0

02.03.2021 1

0

0

1

03.03.2021 2

2

0

0

08.03.2021 3

0

0

3

11.03.2021 1

0

1

0

Anomaly Detection

We developed an architecture that enables the execution of various algorithms on time series data. For detecting anomalies we support algorithms like Elliptic Envelope, Local Outlier Factor and Isolation Forest. In addition to these stateof-the-art anomaly detection algorithms, a manual threshold where lower and upper boundaries can be defined, as well as an automatic threshold [19] where the boundaries are determined using interquartile range (IQR), were developed. The first three algorithms allow multivariate anomaly detection, whereas the two latter ones are limited to univariate datasets. Additionally, a concept for

Anomaly-Based Risk Detection Using Digital News Articles

9

detecting anomalies in seasonal data was developed. This uses the STL algorithm to decompose univariate data into trend, seasonal and residuals. The residuals are then used as an input for one of the above anomaly detection algorithms. The developed architecture concept allows the dynamic application of each of these algorithms in respect to the use-case specific data sources and their limitations like uni- vs multivariate. The selected algorithm returns a result for each data point in the dataset, indicating whether the data point is an outlier or not. As each of our data points is assigned to a timestamp, we are also able to conclude the date and time of the anomalies. The anomaly detection for the key figures is performed on the numeric data only, meaning it includes the total number of articles, as well as the number of positive and negative articles and the stock value. The detected anomalies for a given example dataset can be seen in Fig. 4.

Fig. 4. Example visualization of the multidimensional data set containing number of news articles, number of positive and negative articles as well as the stock value of a company. Additionally, the detected anomalies are displayed with red circles.

3.3

News-Cluster Risk Analysis

Based on anomaly results only, it is not possible to detect specific risks. To overcome this issue, we utilize the news cluster endpoint of the news API. In order to extract possible risks from news clusters, we check it against specific keywords. This is done by using the set of news entries from a news cluster, and extract the relevant keywords from each of the news entries. These keywords are then used to compare them with a predefined mapping between keywords and risks. Based on the number of matching keywords, we can calculate a score for a specific risk. If the score exceeds a certain threshold, we consider that the risk may be contained in the cluster. This process is repeated for every cluster that we selected for a specific scenario. This process allowed us to enrich each cluster for possible risks that may occur, related to the event that happened in

10

A. Pointner et al.

that specific cluster. The clusters plus the risks for a specific selected cluster are displayed in Fig. 5.

Fig. 5. Example of clustered news articles on a timeline with their detected risks. The risks are listed in the bottom of the image, showing the keywords that were responsible to mark it as a potential risk in round brackets.

3.4

Matching Anomalies and News-Cluster Risks

In Subsect. 3.2 we showed how to find timestamps, that contain anomalies within the given datasets. Section Subsect. 3.3 introduced an approach on how to find possible risks inside the news clusters. These two concepts are now combined to identify “real” risks. To do so, we combine the results of the two previous steps and overlay the timestamps for the anomaly detection with the timeline from the news clusters. This results in four scenarios: 1. 2. 3. 4.

Clusters that contain possible risks and overlay with an anomaly Clusters that contain possible risks, but do not overlay with an anomaly Clusters that do not contain possible risk, but overlay with an anomaly Clusters that do not contain possible risks and also do not overlay with an anomaly

For our prototype, we decided that (1) is the most relevant scenario, as these may be considered as “real” risks. Therefore, these clusters are mark in red in the prototype. Scenario (2) may not be as relevant as (1) as the numeric data, does not show an anomaly in that particular timestamp, but the clusters still contain keywords that may be connected to a risk. Thus, these clusters are marked in yellow in the prototype. Finally, scenario (3) and (4) are considered

Anomaly-Based Risk Detection Using Digital News Articles

11

to be irrelevant as (3) may be connected to anomalies but cannot be mapped to any risk and (4) neither contains risks nor is it connected to anomalies. These clusters are shaded in gray in the prototype. An example of the marked news clusters and possible risks for a specific news cluster is shown in Fig. 6.

Fig. 6. Example for matching anomalies with news-clusters. The left image shows different key figures, which were used to determine anomalies. The right image shows the news-clusters from the same time period. Both images are annotated with vertical red lines at the timestamp where an anomaly was detected. The time of the anomaly (left) is then matched with clusters containing risk keywords (right).

4

Evaluation

To test our approach, we developed a dashboard that visualizes the various data points over time. The goal of the evaluation is to analyze the usefulness of our approach via the user interface when evaluating risks. The user interface is not intended to replace a risk owner. Instead, it has the purpose of supporting risk owners in their work, allowing them to be more efficient and helping them to identify risks they have not been aware of before. The interface consists of two parts: Time history of news data: This part shows the distribution of positive and negative news entries, the number of total news entries, and the stock share for a given period. In addition, the visualization includes anomalies detected by the implemented algorithms. This graph is shown in Fig. 4. In addition, a sensitivity slider is used to change the detection rate of the anomaly detection algorithms. It ranges from 0.01 to 0.25 where a lower value means less anomalies are detected and a higher means more anomalies respectively. Grouping of news articles: Similar news entries build groups that are shown on a timeline on our interface. The visualization enables the risk owner to analyze if possible risks came up. An example visualization of this is shown in Fig. 5. The evaluation is conducted with five companies. The data sets for this study are from the period Q1 2021 to the end of Q2 2021 and vary slightly for each company. For the evaluation part, we use the concept of a focus group with

12

A. Pointner et al.

some modifications. Focus groups, are group interviews where multiple people can participate and discuss questions [20]. One moderator leads the participants through the discussion using a guideline. We decided to use that type of evaluation, because the focus of the concept is to support risk managers with assessing large amounts of information and help them identify previously unknown risks. Thus, it is necessary to see if the responsible individuals understand the provided information and if it actually helps them to improve their risk findings. In total, we conducted interviews for five companies. For confidentiality reasons, the names of the companies will not be divulged. However, in the following, we describe their fields of operation. One of the companies operates in the aerospace industry (Company 1). Company 2 has its business in the field of risk management and risk analysis. Company 3 operates as a public utility service provider. The fourth company (Company 4) is an international company which operates in the field of medical and safety technology. The last company, Company 5, is an internationally acting gas and oil company. We designed our study as follows: – – – –

Short introduction of the project. Discussion about risks in general in the evaluated timeline. Introduction of the prototype and hands-on work. Discussion round with questionnaire.

The questionnaire consists of eight questions, inquiring if the prototype allows exemplifying potential risks and if it is of assistance in identifying them. The questionnaire is semi-standardized and consists of open and closed questions. The results of the study can be found in Sect. 5 (Results). Open questions are summarized in terms of content, and from the answers of the closed questions the average values are presented. We selected various settings and algorithms in advance to conduct the study. Based on an analysis of Coleman et al. [6] isolation forest performs best out of our three main multivariate anomaly detection strategies. Therefore, we selected this algorithm for our evaluation scenarios. The settings of the algorithm uses the default values from the sci-kit learn framework, except setting a random seed to ensure the same results over multiple runs. Additionally, the contamination can be configured on the various dashboard pages. The contamination parameter specifies the expected number of outliers in the dataset. Since this number strongly depends on the dataset, this parameter is configurable.

5

Results

We present the results of the expert interviews. The results of all eight questions are summarized and a conclusion for each is provided.

Anomaly-Based Risk Detection Using Digital News Articles

13

1. In your opinion, are the plotted anomalies “real” anomalies? Out of the five questionnaires, three answers were a clear “Yes”. The other two made it dependent on various factors. One mentioned, that it is dependent on the sensitivity settings, and that even the lowest setting results in more anomalies than expected, and some of them are irrelevant peaks. Another one concluded, that for some scenarios the results are superb, and that there are risks, that the company wasn’t even aware of, but also that there are scenarios, where the anomalies are not reasonable. This may be due to an unoptimized query for the news data source. 2. Are there anomalies that are not detected correctly? In general, the feedback shows that most of the detected anomalies are correct. Again, the participants mentioned, that it is highly dependent on the sensitivity settings. Nevertheless, there are a few anomalies, that do not match with the expert opinions of the risk owner in the corresponding companies. Furthermore, there are sometimes issues with the timestamp of the risk, because they are sometimes one or two days late. Others mentioned, that additional filtering would be helpful to make it more distinguishable. 3. What additional anomalies would you add? Four out of the five participants didn’t find further anomalies in the specified time period. One added a specific event that occurred in the period, which they would have liked to see as an anomaly in the data. 4. Which sensitivity values provide good results for you? This question showed quite interesting results, as all participants gave different answers. The first contestant mentions that even using 0.01 as sensitivity there are still cases where there are too many anomalies and that overall 0.01 to 0.05 seems to be a good value. The second one was using values between 0.05 and 0.10 which provided sufficient results. Everything above this threshold led to too many anomalies, according to the second participant. Another participant partially agrees with the previous statements, and also mentions that 0.25 results in way too many peaks. The final two both agreed on having sensitivities starting from 0.20, as they want to see as many anomalies as possible and decide on their own, which of them are really relevant for their company. They concluded that 0.01 and 0.05 remove too many anomalies and that useable results start above 0.15. Out of all these statements, we conclude that the sensitivity slider is highly use-case and user dependent and cannot be fixed. 5. How do you assess the identified risks in the risk groups? This question was answered by four out of the five contestants. One of them voted the risks groups as being very relevant, without any additional statements. Two others ticked somewhat relevant and mention that the risk groups give a good overview over the overall state, but it is hard to say if the risks can directly be derived from that. The last one said that they are less relevant, as there are clusters with up to 3,000 news articles in it, which does not help, as this does not allow identifying specific risks. The last issue was caused by the news query, which had too little granularity in it and thus resulted in too many news articles.

14

A. Pointner et al.

6. Would your expert opinion identify the same potential risks? Four out of the five participants answered that question with “No”, and explained, that there are a lot of risks that they would have not noticed without the use of this tooling. The fifth did not participate in the question. 7. Were potential risks not correctly identified (which risks)? Overall, the participants found a few risks that are not directly related to them as a company, but are seen moreover as a general risk, e.g., COVID-19. In addition, some news results lead to the “political unrest” risk, which was not really seen as a company related risk by the experts. 8. What additional groups do you see as risky (date and risk designation)? Only one participant provided an answer to the question, and that concluded, that due to the high amount of information provided, it would be too timeconsuming to analyze everything in detail. No results can be drawn from this question. Overall, our approach and the prototype dashboard was perceived as helpful to risk managers. Each of the risk expert noted that this software adds a great value to the risk identification. The main negative comments were mostly related due to bad queries for the news API, which resulted in uninteresting news entries, that did not allow a correct detection of risks for a specific company.

6

Related Work

Several approaches for risk detection are identified in literature. They primarily depend on the specific area they are applied to. An application of risk management using machine learning is presented by Paltrinieri et al. [15]. In the paper, the authors, point out some main challenges of industrial risk assessment and a possible approach to overcome these challenges. They applied a Deep Neural Network (DNN) to assess risks in an Oil and Gas drilling rig to avoid potential damages during drilling operations. To compare the DNN results, they applied a multilinear regression model to the data, whose results are less accurate and flexible to handle unforeseen events than the DNN. Khandani et al. [13] present forecasting models consisting of machine-learning techniques for detecting consumer risks. They evaluate the generalized classification and regression tree on the supervised consumer data. The studies result lead to an accurate credit forecasting model with an R2 of 85% for monthly predictions. Chandrinos et al. [3] deal with the application of machine learning models in risk management. They developed an artificial intelligent risk management system consisting of decision trees and artificial neural networks. Their goal is to predict future trades in the financial industry. The result of the case study shows that the developed risk management tool could lead to a significant improvement when classifying produced signals from the trading strategy. Compared to the work of Chandrinos et al., our approach supports detecting risks in unsupervised data.

Anomaly-Based Risk Detection Using Digital News Articles

15

The latest publications dealing with this topic apply machine learning approaches to the data. However, current approaches lack a simple and understandable provisioning of the results for risk owners, such as visualizations. In contrast, our approach allows risk managers to analyze possible risks though a user interface. The visualized data and the results of the applied machine learning approaches allow risks to be easily identified. To the best of our knowledge, our work is the first approach that is using anomaly detection for risk detection.

7

Conclusion

In this work, we showed an approach on how to apply anomaly detection on multidimensional time series data and combined that with news clusters derived from news articles in order to be able to automatically detect possible risks for specific companies. To do so, we collected data such as the number of articles, number of positive and negative articles as well as stock value from both a news API and a stock API. These datasets were then combined using a data merging approach, which resulted in a single time series dataset. On this dataset, we then performed an anomaly detection to find relevant outliers. Parallel to that, we retrieved news clusters containing a set of news articles that are related to a single real world event. For each of those clusters, we extracted relevant keywords and compared them to a predefined list. The keywords in that predefined list are then linked to a specific risk, allowing us to find possible risks inside the clusters. Finally, we combined the results of the two previous steps and overlaid the timestamps of the anomalies with the cluster’s timespan. If a cluster that contains a risk now overlaps with an anomaly timestamp, we conclude that the risks may be considered as “real” risks. We then evaluated our work by doing interviews with domain experts from five companies. These resulted in mostly positive feedback, but also showed, that it is highly dependent on the quality of the queried news data. Finally, we compared our work to various state-of-the-art technologies for anomaly detection as well as risk identification. It shows that risk analysis is an important topic with an active research community focusing on machine learning approaches. Our approach is the first to use anomaly detection to make risks identifiable for risk owners. In the future, we plan to extend the approach with trend analysis to identify the likelihood of risks occurring. In addition, the system will be extended with an alert mechanism, that allows an early detection of possible risks, and should give the risk owner the possibility to take countermeasures.

References 1. Aggarwal, C.C.: Outlier Analysis. Springer, Cham (2017). https://doi.org/10. 1007/978-3-319-47578-3

16

A. Pointner et al.

2. Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: LoF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 93–104 (2000) 3. Chandrinos, S.K., Sakkas, G., Lagaros, N.D.: AIRMS: a risk management tool using machine learning. Expert Syst. Appl. 105, 34–48 (2018) 4. G.G.: Natural language processing. Annu. Rev. Inf. Sci. Technol. 37(1), 51–89 (2003) 5. Cleveland, R.B., Cleveland, W.S., McRae, J.E., Terpenning, I.: STL: a seasonaltrend decomposition. J. Off. Statis. 6(1), 3–73 (1990) 6. Coleman, J., Kandah, F., Huber, B.: Behavioral model anomaly detection in automatic identification systems (AIS). In: 2020 10th Annual Computing and Communication Workshop and Conference (CCWC), pp. 0481–0487. IEEE (2020) 7. Covello, V.T., Mumpower, J.: Risk analysis and risk management: an historical perspective. Risk Anal. 5(2), 103–120 (1985) 8. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018) 9. Feldman, R.: Techniques and applications for sentiment analysis. Commun. ACM 56(4), 82–89 (2013) 10. Goldstein, M., Uchida, S.: A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PLoS ONE 11(4), e0152173 (2016) 11. Guhr, O., Schumann, A.-K., Bahrmann, F., B¨ ohme, H.J.: Training a broadcoverage German sentiment classification model for dialog systems. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 1627–1632 (2020) 12. Hutto, C., Gilbert, E.: Vader: a parsimonious rule-based model for sentiment analysis of social media text. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 8 (2014) 13. Khandani, A.E., Kim, A.J., Lo, A.W.: Consumer credit-risk models via machinelearning algorithms. J. Bank. Financ. 34(11), 2767–2787 (2010) 14. Liu, F.T., Ting, K.M., Zhou, Z.-H.: Isolation forest. In: 2008 Eighth Ieee International Conference on Data Mining, pp. 413–422. IEEE (2008) 15. Paltrinieri, N., Comfort, L., Reniers, G.: Learning about risk: machine learning for risk assessment. Saf. Sci. 118, 475–486 (2019) 16. Rousseeuw, P.J., Van Driessen, K.: A fast algorithm for the minimum covariance determinant estimator. Technometrics 41(3), 212–223 (1999) 17. Saeidi, P., Saeidi, S.P., Sofian, S., Saeidi, S.P., Nilashi, M., Mardani, A.: The impact of enterprise risk management on competitive advantage by moderating role of information technology. Comput. Stand. Interfaces 63, 67–82 (2019) 18. Sezen, I., Unal, A., Deniz, A.: Anomaly detection by STL decomposition and extended isolation forest on environmental univariate time series. In: EGU General Assembly Conference Abstracts, p. 18471 (2020) 19. Vinutha, H.P., Poornima, B., Sagar, B.M.: Detection of outliers using interquartile range technique from intrusion dataset. In: Satapathy, S.C., Tavares, J.M.R.S., Bhateja, V., Mohanty, J.R. (eds.) Information and Decision Sciences. AISC, vol. 701, pp. 511–518. Springer, Singapore (2018). https://doi.org/10.1007/978-981-107563-6 53 20. Wilkinson, S.: Focus group research. Qual. Res. Theory Method Pract. 2, 177–199 (2004)

Robust Rule Based Neural Network Using Arithmetic Fuzzy Inference System J´ ozsef Dombi and Abrar Hussain(B) Institute of Informatics, University of Szeged, Szeged 6720, Hungary [email protected]

Abstract. Deep Neural Networks (DNNs) are currently one of the most important research areas of Artificial Intelligence (AI). Various type of DNNs have been proposed to solve practical problems in various fields. However the performance of all these types of DNNs degrades in the presence of feature noise. Expert systems are also a key area of AI that are based on rules. In this work we wish to combine the advantages of these two areas. Here, we present Rule-Based Neural Networks (RBNNs) where each neuron is a Fuzzy Inference System (FIS). RBNN can be trained to learn various regression and classification tasks. It has relatively a few trainable parameters. It is robust to (input) feature noise and it provides a good prediction accuracy even in the presence of large feature noise. The learning capacity of the RBNN can be enhanced by increasing the number of neurons, number of rules and number of hidden layers. The effectiveness of RBNNs is demonstrated by learning real world regression and classification tasks. Keywords: Rule-Based Neuron (RBN) · Deep Neural Network (DNN) · Selling price prediction of used cars · IRIS flower species classification · Wine quality dataset

1

Introduction

Deep Neural Networks (DNNs) have numerous applications in almost every area of life such as in speech recognition [2], computer vision [25], synthetic data generation [17], business [20] and several other sub fields [1,24]. However there are some limitations of these DNNs: 1) The interpretation of the functionality is very difficult; 2) The performance of the trained model degrades very rapidly with an increase in the input feature noise; 3) A huge number of parameters are required. Besides DNNs, Fuzzy Inference Systems (FISs) can solve real-world engineering problems where classical mathematical techniques cannot produce accurate solutions such as diagnosis of crop diseases [32], the detection of crosssite scripting in Web application [4] and the optimization of combustion engine maintenance [33]. FISs are based on expert knowledge and they use linguistic variables to form rules. Each variable corresponds to a specific fuzzy set. The fuzzy sets are represented by membership functions. The inference using an FIS c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023  K. Arai (Ed.): IntelliSys 2022, LNNS 542, pp. 17–36, 2023. https://doi.org/10.1007/978-3-031-16072-1_2

18

J. Dombi and A. Hussain

generally consists of: 1) Mapping the numerical input values to the membership functions (fuzzification); 2) Calculating the firing strengths of rules; 3) Evaluating the fuzzy output using the firing strengths and 4) Defuzzification to get a single numerical value. The benefits of using the FIS are: 1) A good interpretation of the results because of the linguistic rules; 2) Robustness to noisy data; 3) Proper handling of uncertainties in the knowledge. There are however some limitations of the FIS. These are: 1) Expert knowledge is required to make the rules; 2) FIS lacks the generalization capability. FIS cannot provide solutions to the problems outside the domain of the rules. If the situation changes, then new rules have to be incorporated to accurately solve the problem. DNNs and FISs seem to complement each other. It is natural to expect that a combination of these two approaches will have the advantages of both. On the one hand, FISs can benefit from the computational learning procedures of the DNNs. On the other hand, the DNN can take advantages of benefits offered by an FIS. These benefits include a better interpretation of the results and robustness to noisy/uncertain data. This natural combination appears in the Neuro-Fuzzy systems (NFSs). Different structures have been proposed for the NFSs in the last three decades. All these structures are divided into three main categories. These are the Cooperative neuro-fuzzy systems, concurrent neuro-fuzzy systems and hybrid neuro-fuzzy systems. Czogala et al. proposed cooperative NFSs, in which the DNNs are just used in the initial phase to determine the initial sub-block of the FIS [7]. Ozkan presented concurrent NFSs, where the ANN and FIS are used in a parallel [28]. Hybrid NFSs are well known and are mostly used. These hybrid NFSs utilize the learning capability of the ANN to tune the parameters of the FIS. The important varients among hybrid NFS are FALCON [23], ANFIS [18], GARIC [5], NEFCON [26], SONFIN [19] and Fuzzy Nets [13]. Most of these variants have a special fixed structure (usually five layers). The tuning procedure of the parameters usually involves a forward and backward pass equipped with a suitable optimization technique. It is usually a gradient-based optimization method. These hybrid NFS have several applications in various fields such as agriculture [21], cyber-security [3] and control systems [31]. DNNs are extremely sensitive to input feature noise and the accuracy decreases very rapidly with the increase in attribute noise [9,15,27]. In [30] Jiawei et al. have shown that a change in one pixel value can significantly affect the accuracy of the DNN. To overcome these shortcoming, several efforts have been made. One of these involves the incorporation of fuzzy logic with a DNN. This has led to the development of deep fuzzy neural networks (DFNNs). A lot of structures for DFNN have been proposed in the past decade. Das et al. suggested that these structures can be categorized into two main types [8]: 1) Ensemble DFNN and 2) Integrated DFNN. In ensemble DFNN, fuzzy logics is used in a parallel or sequential manner [22]. Xiaowei Gu introduced multi-layer ensemble learning model to tackle high dimension complex problems [16]. In integrated DFNN, the fuzzy logic is the integral part of the training process. A single hybrid architecture is obtained by fusing a DNN with fuzzy logic. For

Robust Rule Based Neural Network

19

example, when Pythagorean fuzzy numbers are used as weights [34], this leads to the development of a Pythagorean Fuzzy Deep Boltzmann machine. Acknowledging the importance of all these efforts, here we try to generalize the concept of DFNNs by presenting a Rule-Based Neural Network (RBNN). The aim is to combine the advantages of a DNN and FIS. The layers contain Rule Based Neurons (RBNs). Each RBN has a built-in fuzzy inference system (FIS). An arithmetic-based FIS is used because it has low computational complexity [11]. This FIS is based on parametric rules. The parameter values of these rules are tuned during the training phase. Stochastic gradient (SG), batch gradient descent (BGD) and Levenberg-Marquadt (LM) optimization methods are used in the parameter update of back propagation. Here, we propose the training algorithms of an RBNN for regression and classification tasks. The number of neurons, number of hidden layers and number of rules inside each neurons are hyper-parameters of the RBNN. Compared to the DNN, the RBNN is robust to noise i.e. the prediction accuracy is not affected by noisy features (inputs). The number of RBNN parameters is quite small (only several hundred). The rest of the paper is organized as follows. In Sect. 2, we briefly introduce the distending function and weighted Dombi operator. These are the key elements of the FIS deployed in each neuron. In Sect. 3, we explain the proposed neural network structure, feed forward and feed back computations. In Sect. 4, we describe the training process of an RBNN. In Sect. 5, we present the benchmark results on one regression and two classification tasks. Also, the performance is compared with a DNN the classification tasks. In Sect. 6, we summarize our results and present possible directions for future research.

2

Distending Function and Weighted Dombi Operator

Each neuron in an RBNN uses an arithmetic-based inference system. The distending Function (DF) and weighted Dombi operator are the key elements of this inference system. First, we will briefly describe these two elements and then we will explain the network structure and training procedure. The Distending function [11] is a general parametric function based on the Kappa function derived from the Dombi operator [12]. It is a function that semantically models the soft equality. The symmetric form of the DF is defined as: 1 (λ) (x − c) = (1) δε,ν   . 1−ν  x−c λ 1+ ν ε Here ν ∈ (0, 1) is the threshold, ε > 0 is the tolerance, λ ∈ (1, +∞) is the sharpness and c ∈ R is the peak value parameter. The DF can have different shapes (trapezoid, triangular, Gaussian) depending on its four parameters (see Fig. 1). The DF has a peak value at x − c. In the rest of the paper, we will (λ) denote δε,ν (x − c) by δs (x) for the sake of simplicity. The motivation behind the use of the DF is to approximate various membership functions by varying its parameters.

20

J. Dombi and A. Hussain

Fig. 1. Various shapes of symmetric distending function (here c = 0).

The weighted Dombi operator (WDO) [10] is a continuously valued logical operator, defined as: (α)

OD (x) = 1+



1 n i=1

wi



1−xi xi

α  α1 ,

(2)

where wi is the weight of the input xi . Here, α determines the nature of the logical operation. If α > 0, the operator is conjunctive and if α < 0 then the operator is disjunctive. So, we can perform conjunction and disjunction operations between various inputs using a single operator. This is the main reason for using WDO in the parametric rules.

3

Proposed Network Structure

The proposed network structures for regression and classification tasks are shown in Figs. 2 and 3. The structures are similar to the multilayer perceptron model (MLP), but there are no weights on the connections. There are no bias terms also. Each node is a rule-based neuron (RBN) and it is fully connected. Next, we describe in detail the feed-forward and feedback calculations for these RBNNs. 3.1

Feed-Forward Calculations

The input layer performs the normalization function by transforming the feature values into the [0, 1] interval. Any existing normalization technique can be applied. Let x1 , x2 , . . . , xn be the normalized feature values generated by the input layer. The number of rules in each RBN of the hidden and output layer is fixed and is a hyper-parameter. Let us now consider the feed forward calculations

Robust Rule Based Neural Network

21

Fig. 2. RBNN structure for regression tasks.

in a single RBN unit and it can be generalized to all the RBNs in the network. Each RBN contains L rules and each rule has the following form: if x1 is fj1 and ...... and xn is fjn then yj isoj .

(3)

The part between if and then is called the antecedent and rest of the part is called the consequent. Here, x1 , . . . , xn are the inputs and fj1 , . . . , fjn are the n distending functions in the jth rule. yj is the consequent of the jth rule and its value is oj . A unique fji is associated with the input xj in the jth rule. If there are L rules in an RBN unit, then there are L DFs for each input (x1 , . . . , xn ). This implies that in a single RBN unit there is a total of nL DFs. Each rule is evaluated and it results in a single numerical value. This is the rule strength and it is given by uj = 1+



n i=1

wji

1 

1−fji (xi ) fji (xi )

α  α1 ,

(4)

22

J. Dombi and A. Hussain

Fig. 3. RBNN structure for classification tasks.

where i = 1, . . . , n is the number of inputs and j = 1, . . . , L is the number of the rules. Here, fji (xi ) = 1+

1−νji νji

1   .  xi −cji λji  εji 

Putting this into Eq. (4) gives uj =

 1+

n



i=1

wji

1 1−νji νji

  α  α1  xi −cji λji  εji 

.

(5)

L rules are evaluated using Eq. (5) to get L rule strengths (u1 , u2 , . . . , uL ). Then the output of the RBN unit is a single numerical value calculated using y=

u 1 o1 + · · · + u L oL , u1 + · · · + uL L j=1 uj oj , y = L j=1 uj

where oj is the consequent value in the jth rule.

(6)

Robust Rule Based Neural Network

23

Using Eqs. (5) and (6), the outputs of all the RBN units in the specific layer are calculated. Next, these values are propagated as input values to the next hidden layer. This feed-forward calculations are performed for all the layers. In the case of regression RBNN, the output value yˆ is the predicted value of the desired variable y. In the case of classification RBNN, the k output values yo1 , . . . , yok are passed through a SOFTMAX layer and we get the probability values (ˆ yo1 , . . . , yˆok ) of the k classes using the expression eyoi yˆoi = k . yoi i=1 e 3.2

(7)

Feedback Calculations

Here, we will concentrate only on the regression RBNNs. In the case of classification networks, most of the calculations are the same with a few minor changes. These changes are described at the end of this section. The squared loss function is n

J=

1  2 (yˆk − yk ) , 2n

(8)

k=1

where yˆk is the predicted output and yk is the label in the database. Here n denotes the number of the training samples. The gradient of J with respect to the predicted output yˆ is calculated via n

1 ∂J = (yˆk − yk ) . ∂ yˆ n

(9)

k=1

The gradient J with respect to the consequent parameter oj in the jth rule can be calculated using the chain rule ∂J ∂J ∂ yˆ = . ∂oj ∂ yˆ ∂oj

(10)

Using Eq. (6), ∂ yˆ uj = L , ∂oj j=1 uj where j = 1, . . . , L is the rule number in RBN. From Eq. (10), n



∂J 1 uj = (yˆk − yk ) . L ∂oj n j=1 uj

(11)

k=1

In a similar way, the gradient of J with respect to the jth rule strength uj can be calculated using the chain rule and it results in     L L j=1 uj oj − j=1 uj oj ∂J = . (12)  2 ∂uj L u j j=1

24

J. Dombi and A. Hussain

For the ith distending function in the jth rule, the gradient of J is given by ∂J ∂J ∂ yˆ ∂uj = , ∂fji ∂ yˆ ∂uj ∂fji and from Eq. (4) 1 αj

γj

−1

α

wji βjij ∂uj , = 1 2 ∂fji αj γj + 1 fji (1 − fji )

(13)

where βji =

1 − fji , fji

γj =

n 

αj

wji (βji )

.

i=1

In the same way, the gradient of J with respect to the weights of the Dombi operator is given by 1−αj αj

α

γj wji βjij −1 ∂uj αj =  1 2 . ∂wji αj γj + 1

(14)

Lastly, the gradients of J with respect to parameters of the DF can also be calculated using ∂fji ∂εji ∂fji ∂xji ∂fji ∂νji ∂fji ∂cji

λ (1 − fji ) fji , εji −λ = (1 − fji ) fji , xi − cji 1 (1 − fji ) fji , = νji (1 − νji ) λ = (1 − fji ) fji . xi − cji =

(15) (16) (17) (18)

Once the gradients of the output RBN are available, the gradients for the hidden layers can be derived using the chain rule and back propagating the ∂J ∂x term. This term is the loss function for the hidden layer which are connected directly to the output RBN. This method can be extended for the feedback calculations throughout the network. There are a few minor changes in the classification of RBNN. The loss function is the categorical cross entropy loss J given by J =−

no.of classes  k=1

yok log(yˆok ),

(19)

Robust Rule Based Neural Network

25

ˆ is the class where yok is the label vector obtained from the database and yok probabilities vector obtained from the softmax layer eyok yˆok = no.of classes k=1

eyok

.

(20)

Also for the classification network, ∂J = yˆok − yok . ∂yok

(21)

The rest of the calculations are same as in a regression network and Eqs. (11) to (18) can be used to calculate the gradients.

4

Training

The training process is similar to the training of the classical MLP, with some minor differences. A few meta parameters have to be chosen before starting the training process. These meta parameters are: 1) The number of hidden layers; 2) The number of RBN units in each hidden layer; 3) The number of rules in each RBN unit; 4) The learning rate; 5) The values of some fixed parameters of the DF. For low computational complexity, it is recommended that one starts the training using a small number of hidden layers (preferably one), a small number of RBN units in each layer (preferably two) and a few rules (preferably two) in each RBN unit. If the desired accuracy is not achieved after training, then increase the number of rules. In the next step increase the number of RBNs. If this is still insufficient then increase the number of hidden layers. As the learning capacity of an RBN is high, a large number of tasks can be learnt with good accuracy using only a single hidden layer with 3 RBN units and each having 2 to 5 rules. There are 4 parameters of the DF, namely ν, ε, λ and c. Usually λ and ε are fixed (λ = 2, ε < 0.3). The other two parameters (ν and c) are tunable and they are varied during the optimization process. Stochastic Gradient Descent (SGD), Mini-batch Gradient Descent (MBGD), Full-batch Gradient Descent (FBGD) and Lavenberg-Marquardt (LM) optimization methods can be used for gradient updates. In the case of LM optimization, the parameters are updated using the expression 

−1 ∂J ∂J 2 ( ) +μ . (22) Pi+1 = Pi − ∂Pi ∂Pi

26

J. Dombi and A. Hussain

Here μ is the damping factor and its values is adaptive. Initially a large value of μ is chosen. Later its value is gradually decreased, as the value of loss function decreases during the training phase. Pi is the value of tunable parameter at the ith training iteration. The tunable parameters are randomly initialized after setting proper values for the meta parameters.

5

Simulation Results and Discussion

Next, the learning capability and performance of the proposed RBNN are demonstrated using various machine learning tasks. Here we present one regression and two classification tasks. 5.1

Selling Price Prediction of Used Cars

Used cars selling price prediction is an open source Kaggle dataset [6]. It consists of 8 features including the selling price. The RBNN is trained to predict the selling price using features. Figure 4 shows the header of the dataset. A new feature Car Age was generated from the difference of year and current year. Data augmentation was used to increase the size of the dataset to 7000 records. The dataset is divided into three sets for cross validation i.e. training set with 4500 records, validation set with 1500 and test set with the remaining 1000 records. Categorical features were converted to numerical ones using the standard Python Pandas library. The heat map of the normalized features is shown in Fig. 5. Various RBNN regression models were trained using different optimization methods. Table 1 summarizes various configurations and the results obtained. It shows that the RBNN trained using the SGD has the least MSE. Figure 6 shows the loss function (Eq. 8) plots for RBNNs trained using the SGD, BGD and LM optimization methods.

Fig. 4. Header of used cars price prediction dataset

Robust Rule Based Neural Network

27

Fig. 5. Heat map of the car price features Table 1. Configurations and the performance of various RBNNs on the used cars price prediction task. Config. no.

Optimizer

No. of hidden RBN units

Rules per RBN

DF fixed Params

MSE

R2 Coefficient

1

SGD (lr = .01)

2

3

λ=2 ε = 0.25

0.00573

0.8935

2

SGD (lr = .01)

2

3

λ=2 ε = 0.3

0.00865

0.8931

3

BGD (lr = .0001)

3

5

λ=2 ε = 0.15

0.0209

0.7459

4

LM (μ = 5000)

2

3

λ=2 ε = 0.25

0.0185

0.7841

5.2

IRIS Flower Species Classification

The IRIS flower species dataset is available on the UCI ML repository [14]. It has four features i.e. sepal length, sepal width, petal length and petal width. Each sample belongs to one of the three classes (iris setosa, iris versicolour, iris verginica). It is a small dataset i.e. 150 records only. The dataset is fully balanced with 50 records for each class. An RBNN is trained to correctly identify the classes from the feature vectors. Table 2 summarizes the performance and meta parameter values for various RBNN configurations.

28

J. Dombi and A. Hussain

Fig. 6. Loss curves of used car price prediction RBNN models Table 2. Configurations and performance of various RBNNs on the IRIS flower classification task Config. Optimizer no.

No. of hidden layers

No. of hidden RBN units

Rules per RBN

DF fixed Params

Accuracy on test dataset

1

SGD (lr = .01)

1

3

5

λ = 2 ε = 0.25

93.33%

2

SGD (lr = .005)

1

3

2

λ = 2 ε = 0.25 ν = 0.3

96.67%

3

BGD (lr = .005)

1

3

3

λ = 2 ε = 0.25

90.0%

4

LM (μ = 1000)

2

3 +3 = 6

3

λ = 2 ε = 0.15

90.0%

A fully connected DNN (layers detailed in Fig. 7) was trained for comparison purpose. The performance of the relatively small RBNN using few tunable parameters is better than the DNN. Figure 9 shows the loss curves for various RBNN configurations and the trained DNN. Confusion matrices (an error matrix used to describe the performance of a classification network) of the RBNN and DNN are shown in Fig. 8.

Robust Rule Based Neural Network

29

Fig. 7. The layers configuration of the DNN for the IRIS flower classification task. The DNN achieved an accuracy of 90.0% on the test dataset.

Fig. 8. Confusion matrices of configuration 2 of a RBNN (Left) and DNN (right) on the IRIS flower species classification task.

Impact of Feature Noise. To find the classification accuracy for noisy features, various amplitudes of Gaussian noise were added to the test data. Table 3 shows the performance reduction of the RBNN and DNN in the presence of feature noise. It is evident that the classification performance of the DNN is severely degraded by noise. As a comparison, the RBNN accuracy decreases slightly in the presence of feature noise. Among various RBNNs, those which are trained using SG are more robust to noise. Figure 10 shows the confusion matrices of the RBNN and DNN for a noisy feature test set.

30

J. Dombi and A. Hussain

Fig. 9. Loss curves of various RBNNs and DNN for the IRIS flower species classification task. Table 3. Effect of noisy test set on the prediction accuracy of DNN and RBNN on the IRIS flower species classification task. σ is the standard deviation of gaussian noise. S. no. Noise magnitude

2

 σ σ − ,  σ3 σ3  −2, 2

3

[−σ, σ]

1

5.3

DNN test dataset accuracy

RBNN (config. no. 2) test dataset accuracy

76.67%

96.67%

73.33%

96.67%

66.67%

93.33%

Wine Quality Dataset

The wine quality dataset is available on Kaggle [29]. It has 12 features and two output classes i.e. red and white wine. It is an unbalanced dataset because it contains 75% white wines and 25% red wines. The features correlation heatmap is shown in Fig. 11. The database comprises 6498 records. It was divided into three parts for cross validation i.e. training set (4000), validation set (1000) and test set (1498). Various configurations of RBNNs were trained to correctly classify the wine class (Red wine/White wine). Table 4 summarizes the various configurations along with their classification accuracy scores. A fully connected DNN was also trained for this task. Its layers, structure and parameters are

Robust Rule Based Neural Network

31

Fig. 10. Confusion matrices of configuration 2 of RBNN (left) and DNN (right) on the IRIS flower species classification task in the presence of feature noise in the interval [−σ, σ]

listed in Fig. 13. The accuracy score by this DNN on the test set is 99% after 200 epochs with a batch size of 500. Figure 12 shows the confusion matrices of this DNN and RBNN on the test dataset. Figure 14 shows the training and validation loss curves of the DNN and RBNN. Table 4. Configurations and performance of various RBNNs on the wine quality classification task. Config. no.

Optimizer

No. of hidden layers

No. of Rules per hidden RBN RBN units

DF fixed parameters

Total trainable parameters

Accuracy on test dataset

1

SGD (lr = .001)

1

3

3

λ = 2, ε = 0.3

333

99.9%

2

SGD (lr = .001)

2

3 + 2 = 5

2

λ = 2, ε = 0.15, ν = 0.3

262

99.79%

3

LM (μ = 200)

1

3

2

λ = 2, ε = 0.15

198

99.07%

Impact of Feature Noise. Random Gaussian noise was added to the test dataset with various maximum amplitudes. The noise robustness of the trained DNN and RBNNs (in Table. 5) was investigated by making predictions on the noisy test dataset. Table 5 shows the performance degradation in the classification accuracy for these trained networks. The best robustness is shown when

32

J. Dombi and A. Hussain

Fig. 11. Heat map of the wine quality dataset features.

Fig. 12. Confusion matrices of configuration 1 of the RBNN (left) and the DNN (right) on the wine quality dataset.

the RBNN was trained using the SGD. In the case of a noisy test dataset, the DNN prediction accuracy is decreased considerably compared to the RBNN. Figure 15 shows the confusion matrices of the DNN and RBNN obtained using the noise-corrupted test dataset.

Robust Rule Based Neural Network

33

Fig. 13. Layers configuration and performance of the DNN trained on the wine quality classification task.

Fig. 14. Loss curves of the RBNN and DNN on the wine quality classification task. Table 5. Effect of noisy features test dataset on the prediction accuracy of the DNN and RBNN on the wine quality classification task. σ is the standard deviation of the Gaussian noise. S. No Noise magnitude DNN test dataset accuracy  σ σ −2, 2 1 98.34%

RBNN (config. no. 1) test dataset accuracy 99.0%

2

[−σ, σ]

93.15%

95.63%

3

[−2σ, 2σ]

81.41%

83.37%

34

J. Dombi and A. Hussain

Fig. 15. Confusion matrices for configuration 1 of the RBNN (left) and DNN (right) on the wine quality classification task for a noise corrupted test dataset

6

Conclusion

In this paper, we presented a new type of learning network called a rule-based neural network (RBNN). It consists of rule-based neurons (RBNs), arranged in layers and the layers are stacked together to form a fully connected network. Each RBN consists of parametric rules and the parameters of these rules are updated using the stochastic gradient descent (SGD), batch gradient descent (BGD) or Levenbverg Marquadt (LM) optimization method. We presented the algorithm for the training of RBNN for solving regression and classification problems. The number of trainable parameters are few compared with a DNN. We got high prediction accuracy scores for the regression and classification tasks. The RBNN is robust to noisy feature and provides an excellent accuracy even in the case of a noise-corrupted test dataset. In the future, we would like to incorporate continuous logic and more versatile rules to develop interpretable AI models. Acknowledgment. The study was supported by the Ministry of Innovation and Technology NRDI Office within the framework of the Artificial Intelligence National Laboratory Program. The research was also funded from the National Research, Development and Innovation Fund of Ministry of Innovation and Technology of Hungary under the TKP2021-NVA (Project no. TKP2021-NVA-09) funding scheme.

References 1. Abiodun, O.I., Jantan, A., Omolara, A.E., Dada, K.V., Mohamed, N.A.E., Arshad, H.: State-of-the-art in artificial neural network applications: a survey. Heliyon 4(11), e00938 (2018) 2. Alam, M., Samad, M.D., Vidyaratne, L., Glandon, A., Iftekharuddin, K.M.: Survey on deep neural networks in speech and vision systems. Neurocomputing 417, 302– 321 (2020)

Robust Rule Based Neural Network

35

3. Altaher, A.: An improved android malware detection scheme based on an evolving hybrid neuro-fuzzy classifier (EHNFC) and permission-based features. Neural Comput. Appl. 28(12), 4147–4157 (2017) 4. Ayeni, B.K., Sahalu, J.B., Adeyanju, K.R.: Detecting cross-site scripting in web applications using fuzzy inference system. J. Comput. Netw. Commun. 2018 (2018) 5. Berenji, H.R., Khedkar, P., et al.: Learning and tuning fuzzy logic controllers through reinforcements. IEEE Trans. Neural Netw. 3(5), 724–740 (1992) 6. Birla, Nehal.: Vehicle dataset from Cardekho. Accessed 21 Dec 2020 7. Czogala, E., Leski, J.: Fuzzy and Neuro-fuzzy Intelligent Systems, vol. 47. Springer, Cham (2012). https://doi.org/10.1007/978-3-7908-1853-6 8. Das, R., Sen, S., Maulik, U.: A survey on fuzzy deep neural networks. ACM Comput. Surv. (CSUR) 53(3), 1–25 (2020) 9. Dodge, S., Karam, L.: Understanding how image quality affects deep neural networks. In: 2016 Eighth International Conference on Quality of Multimedia Experience (QoMEX), pp. 1–6. IEEE (2016) 10. Dombi, J.: The generalized dombi operator family and the multiplicative utility function. In: Balas, V.E., Fodor, J., V´ arkonyi-K´ oczy, A.R. (eds.) Soft Computing Based Modeling in Intelligent Systems. SCI, vol. 196, pp. 115–131. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00448-3 6 11. Dombi, J., Hussain, A.: A new approach to fuzzy control using the distending function. J. Process Control 86, 16–29 (2020) 12. Dombi, J., J´ on´ as, T.: Kappa regression: an alternative to logistic regression. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 28(02), 237–267 (2020) 13. Figueiredo, M., Gomide, F.: Design of fuzzy systems using neurofuzzy networks. IEEE Trans. Neural Netw. 10(4), 815–827 (1999) 14. Fisher, R.A.: Iris data set. Accessed 21 Oct 2020 15. Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014) 16. Xiaowei, G.: Multilayer ensemble evolving fuzzy inference system. IEEE Trans. Fuzzy Syst. 29(8), 2425–2431 (2021) 17. Gui, J., Sun, Z., Wen, Y., Tao, D., Ye, J.: A review on generative adversarial networks: algorithms, theory, and applications. arXiv preprint arXiv:2001.06937 (2020) 18. Jang, R.: Neuro-fuzzy modelling: architectures. Analysis and applications. Ph.D. thesis, University of California, Berkley (1992) 19. Juang, C.-F., Lin, C.-T.: An online self-constructing neural fuzzy inference network and its applications. IEEE Trans. Fuzzy Syst. 6(1), 12–32 (1998) 20. Kietzmann, J., Lee, L.W., McCarthy, I.P., Kietzmann, T.C.: DeepFakes: trick or treat? Bus. Horiz. 63(2), 135–146 (2020) 21. Kisi, O., Azad, A., Kashi, H., Saeedian, A., Hashemi, S.A.A., Ghorbani, S.: Modeling groundwater quality parameters using hybrid neuro-fuzzy methods. Water Resour. Manag. 33(2), 847–861 (2019) 22. Li, M., Feng, L., Zhang, H., Chen, J.: Predicting future locations of moving objects with deep fuzzy-LSTM networks. Transportmetrica A: Transp. Sci. 16(1), 119–136 (2020) 23. Lin, C.-T., Lee, C.S.G.: et al.: Neural-network-based fuzzy logic control and decision system. IEEE Trans. comput. 40(12), 1320–1336 (1991) 24. Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., Alsaadi, F.E.: A survey of deep neural network architectures and their applications. Neurocomputing 234, 11–26 (2017)

36

J. Dombi and A. Hussain

25. Mane, D.T., Kulkarni, U.V.: A survey on supervised convolutional neural network and its major applications. In: Deep Learning and Neural Networks: Concepts, Methodologies, Tools, and Applications, pp. 1058–1071. IGI Global (2020) 26. Nauck, D., Kruse, R.: Neuro-fuzzy systems for function approximation. Fuzzy Sets Syst. 101(2), 261–271 (1999) 27. Nettleton, D.F., Orriols-Puig, A., Fornells, A.: A study of the effect of different types of noise on the precision of supervised learning techniques. Artif. Intell. Rev. 33(4), 275–306 (2010) 28. Ilker Ali Ozkan: A novel basketball result prediction model using a concurrent neuro-fuzzy system. Appl. Artif. Intell. 34(13), 1038–1054 (2020) 29. Parmar, R.: Wine quality data set. Accessed 02 Dec 2020 30. Su, J., Vargas, D.V., Sakurai, K.: One pixel attack for fooling deep neural networks. IEEE Trans. Evol. Comput. 23(5), 828–841 (2019) 31. Teklehaimanot, Y.K., Negash, D.S., Workiye, E.A.: Design of hybrid neuro-fuzzy controller for magnetic levitation train systems. In: Mekuria, F., Nigussie, E., Tegegne, T. (eds.) ICT4DA 2019. CCIS, vol. 1026, pp. 119–133. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26630-1 10 32. Toseef, M., Khan, M.J.: An intelligent mobile application for diagnosis of crop diseases in Pakistan using fuzzy inference system. Comput. Electron. Agric. 153, 1–11 (2018) ˇ ak, L., Vintr, Z.: Application of fuzzy inference system for analysis of 33. Valiˇs, D., Z´ oil field data to optimize combustion engine maintenance. Proc. Inst. Mech. Eng. Part D J. Automob. Eng. 233(14), 3736–3745 (2019) 34. Zheng, Y.-J., Chen, S.-Y., Xue, Y., Xue, J.-Y.: A pythagorean-type fuzzy deep denoising autoencoder for industrial accident early warning. IEEE Trans. Fuzzy Syst. 25(6), 1561–1575 (2017)

A Hierarchical Approach for Spanish Sign Language Recognition: From Weak Classification to Robust Recognition System Itsaso Rodr´ıguez-Moreno(B) , Jos´e Mar´ıa Mart´ınez-Otzeta, and Basilio Sierra Department of Computer Science and Artificial Intelligence, University of the Basque Country (UPV/EHU), Donostia-San Sebasti´ an, Spain [email protected] http://www.sc.ehu.es/ccwrobot/

Abstract. Approximately 5% of the world’s population has hearing impairments and this number is expected to grow in the coming years due to demographic aging and the amount of noise we are exposed to. A significant fraction of this population has to endure severe impairments even since their childhood and sign languages are an effective mean of overcoming this barrier. Although sign languages are quite widespread among the deaf community, there are still situations in which the interaction with hearing people is difficult. This paper presents the sign language recognition module from an ongoing effort to develop a real-time Spanish sign language recognition system that could also work as a tutor. The proposed approach focuses on the definitions of the signs, first performing the classification of their constituents to end up recognizing full signs. Although the performance of the classification of the constituents can be quite weak, good user-independent sign recognition results are obtained.

Keywords: Sign language recognition Hidden Markov Model

1

· Spanish sign language ·

Introduction

Currently about 1,500 million people live with some degree of hearing loss. Around 430 million people have a disabling hearing loss, which is equivalent to approximately 5% of the world’s population. Of those affected, 32 million are children according to the World Health Organization (WHO)1 . Hearing loss can be due to different causes, such as complications in childbirth, certain infectious diseases, exposure to loud sounds or ageing. Due to the continuous exposure of young people to loud noises, it is estimated that by 2050 there will be almost 1

https://www.who.int/news-room/fact-sheets/detail/deafness-and-hearing-loss.

c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023  K. Arai (Ed.): IntelliSys 2022, LNNS 542, pp. 37–53, 2023. https://doi.org/10.1007/978-3-031-16072-1_3

38

I. Rodr´ıguez-Moreno et al.

2,500 million people with some degree of hearing loss and that at least 700 million will require rehabilitation. This suggests that the user population of Sign Language will grow. Specifically, in Spain there are more than a million people with hearing impairments and around 70,000 of them use sign languages to communicate. Sign language is not universal and there are more than 300 different sign languages around the world. Since 2007, two sign languages have been recognized in Spain: Spanish and Catalan. Not all people who communicate in sign language have hearing impairments, and not all people with hearing impairments communicate using a sign language. Usually, sign languages are used by people with hearing impairments, family members, professionals and people who have difficulties in communicating through oral languages. However, sign languages are not yet widespread among hearing people, leading to situations where people with hearing impairments may find it difficult to communicate without an interpreter. Therefore, in this paper an approach for automatic recognition of some signs of the Spanish Sign Language (SSL) is presented. The recognition of this language is not trivial, since, like oral languages, sign languages have their own structure, grammar and vocabulary. Thus, sign languages are visual and manual languages with their own grammar that fulfill the same functions as any other language. The presented system takes the definitions of the selected signs, extracted from [5], as the basis for the recognition task. Furthermore, we suggest a hierarchical approach where signs are recognized based on a previously trained model which classifies their constituents. We show that even with weak models such hierarchical approach can achieve good performance. The rest of the paper is organized as follows. First, in Sect. 2 some related works are described in order to introduce the topic. In Sect. 3 the proposed approach is presented, explaining the process that has been carried out. Then, in Sect. 4 the obtained results are shown which are then discussed in Sect. 5. Finally, in Sect. 6 the conclusions extracted from this work are presented and future work is pointed out.

2

Related Work

In recent years, sign language recognition (SLR) has drawn increasing attention from the researchers community [4,13,15]. Due to the complex grammar and semantics of sign languages, their recognition and posterior translation to text is not trivial. This is mainly due to the temporal dimension of the sign, which adds difficulty in top of the extraction of relevant features in a given moment. These features could come from images recorded from cameras of from other kind of data obtained from some wearable device. Many studies have made use of data gloves to extract the features to perform the SLR. These sensors can be invasive for the signer so there is a great interest in studies that use vision-based systems to collect the data. Most SLR systems presented so far are user dependent and focus on recognizing isolated signs. A SLR system must combine pattern matching, computer vision, natural language processing and

Hierarchical SLR

39

linguistics in order to recognize the signs that are being performed and give a correct translation. These systems would be very useful in services such as hotels, stations or banks to facilitate interaction with people with hearing impairments. Different solutions have been proposed to address this task. The authors of [11] propose a deep learning-based approach for hand SLR. First, they extract 3D hand keypoints from frames of 2D input videos using a Convolutional Neural Network (CNN) architecture and connect them to get the hand skeleton. The 3D hand skeleton is projected to three views surface images and the heatmap image of the detected keypoints is extracted. In order to obtain spatio-temporal features, a 3DCNN is applied where the pixel level information, multi-view hand skeleton and heatmap features are used as input. The obtained features are finally fed into a Long Short-Term Memory (LSTM). Kratimenos et al. [6] extract 3D body shape, face and hands information from a single image usin SMPL-X [10] model. The classification is done with a Recurrent Neural Network (RNN) consisting of one Bi-LSTM layer of 256 units and a Dense layer. They also extract skeleton information with OpenPose [2] and use these features to train the RNN in order to make a comparison, demonstrating the superiority of their approach. The use of SMPL-X holistic 3D reconstruction also obtains higher accuracy than a state-of-the-art I3D network [3] fed by raw RGB images and their optical flow. The authors of [1] present an approach to perform Continuous Sign Language Recognition (CSLR). They introduce a Sign Language Recognition Transformer (SLR), an encoder transformer model to predict sign gloss sequences, which uses spatial embeddings of signs videos to learn spatio-temporal representations. Then, an autoregressive transformer decode model, called Sign Language Translation Transformer (SLTT), is trained to predict words and generate the corresponding spoken language sentence. In order to perform both recognition and translation, a Connectionist Temporal Classification (CTC) loss is used. On the other hand, Ma et al. [9] propose a system for SLR using Wifi, called SignFi. In their approach, wireless signal features of sign gestures are captured collecting Channel State Information (CSI) measurements by WiFi packets. After pre-processing the raw CSI measurements to remove noises, these are used to train a 9-layer CNN to perform the classification. Specifically, the amplitude and phase of the pre-processed CSI signals are used to fed the network. Regarding SSL recognition, the authors of [14] use the skeleton-based MSG3D [7] architecture with the idea of retaining more reliable semantic connection between hands and body parts, as this is one important characteristic of sign languages. The MS-G3D architecture consists of stacking blocks of spatial-temporal graph convolutional networks (ST-GCN) composed by a unified spatial-temporal graph convolution module called G3D used to unify spatial and temporal features. The ST-GCN are followed by an average layer and a softmax classifier. They also use transfer learning over a SSL dataset. In the work presented in [12] a Spanish alphabet training system is presented. A data glove, which includes an accelerometer connected to each finger, is used to acquire data. They use LabVIEW2 development environment to create an 2

https://www.ni.com/es-es/shop/labview.html.

40

I. Rodr´ıguez-Moreno et al.

interface for data acquisition. J48 decision tree, sequential minimal optimization (SMO) and multilayer peceptron (MLP) are used for classification. After learning the signs, the system is able to confirm if the user is performing them correctly. The recognition of sign languages, due to the aforementioned difficulties, is a complicated task in which there is still much room for improvement.

3

Proposed Approach

The signs which compose the Spanish Sign Language (SSL), apart from the body position and facial expression, are defined by four main elements involving the hands: – Hand position: The position where the hand (or hands) is located. If there are contact points (part(s) of the active hand in contact with a body part), this is also indicated. The initial position and final position might be different. – Hand configuration: The shape of the hand. The initial configuration and final configuration might be different. – Hand movement: The trajectory and/or movement performed by the hand. – Hand orientation: The orientation of the palm of the hand with respect to the body of the signer. During the execution of the sign the orientation might change. From these four elements, we propose the use of hand configurations to perform the recognition of different signs. A hierarchical approach for Spanish Sign Language recognition is presented, based on the decomposition of signs into hand configurations in order to perform the classification. In this section, the proposed approach and the followed pipeline are explained step by step. 3.1

Data Collection

As a fist approach, the five different signs of the SSL presented in Table 1 have been selected: well (bien), happy (contento), woman (mujer ), man (hombre) and listener (oyente). These signs definitions have been obtained from the SSL database presented in [5], where each sign is defined with the elements mentioned above, including the configurations and the numbers associated to them. The selected signs are one-handed and all the recorded people are right-handed. The recognition of the signs is based on the classification of the configurations that compose these signs. For that purpose, two different data sets have been built from video sequences recorded with a webcam. – Signs dataset: It is composed by videos corresponding to the five signs, performed by five people and with a total of 875 videos. Each video has 25 frames. – Configurations dataset: It is a dataset composed by images of the eight configurations that are necessary to perform the selected signs. Each image refers to a configuration. Six people have been captured and the database is composed with a total of 9463 images.

Hierarchical SLR

41

Table 1. Definition of the selected signs and number of instances used to create the databases. INITIAL HAND CONFIGURATION

NUMBER OF VIDEOS

Well (Bien)

175

961

1019

Happy (Contento)

176

875

900

Man (Hombre)

174

915

915

Woman (Mujer )

175

938

958

Listener (Oyente)

175

991

991

TOTAL VIDEOS FOR SIGNS DATASET

875

NUMBER OF IMAGES

FINAL HAND CONFIGURATION

SIGN

TOTAL IMAGES FOR CONFIGURATIONS DATASET

NUMBER OF IMAGES

9463

The exact numbers of instances for each sign and configuration are shown in Table 1. In order to obtain the relevant information from the hand, since this is the body part in which we focus for this first approximation, we have used the MediaPipe Hands Tracking solution [16] from MediaPipe [8]. MediaPipe Hands Tracking, as its name indicates, performs the tracking of the hand position; more precisely, it returns twenty-one hand landmarks for each hand. Each key-point is represented by three coordinates (x, y, z), obtaining 21 × 3 = 63 values for each hand. The videos are processed frame by frame, obtaining the landmarks for every frame composing the video. An example of the solution obtained by MediaPipe algorithm is shown in Fig. 1. As the selected signs are one-handed, and all the signers which record the database are right-handed, just the right hand information is saved. This way the created dataset shapes are (num videos, 25, 21, 3) for signs and (num images, 21, 3) for configurations, preserving the anonymity of all participants, as the original images are not recorded. 3.2

Followed Pipeline

The recognition is performed by decomposing the signs into constituents, and using the learned models of these constituents (the hand shape in this case) and the movement from one constituent to another to be able to classify different signs. In Fig. 2, the followed pipeline is shown graphically.

42

I. Rodr´ıguez-Moreno et al.

Fig. 1. Example of the hand landmarks extracted using MediaPipe.

Briefly, the method can be divided into two parts. The former is focused on the recognition of the configuration in static images, while the latter predicts the signs performed in a video or live feed using the previously trained configurations model as basis to train Hidden Markov Models able to recognize the signs that are being performed in real time. As shown in Fig. 2, after applying MediaPipe, the feature selection is performed and the configurations models are trained. For videos, after selecting the features, the data are transformed using the configurations model. This way, each frame is converted to a prediction probability vector. These predictions are then used to train the HMMs which are finally used to recognize the sign that is being performed. The details of the full process are explained next. Training Configurations. The first part focuses on the recognition of the hand configurations, performing a static image classification. On the one hand, the position of the hand landmarks provided by MediaPipe are used as features, which include (x, y, z) values for each keypoint shown in Fig. 3a. In addition to these landmarks, it has been decided to use some additional information as feature, specifically the distances between some of the keypoints. The distances added are shown in Fig. 3b and Fig. 3c, where the distances between the thumb tip and the rest of the fingertips and the distances between contiguous fingertips are computed, respectively. The distances are independent of the spatial location of the hand and, therefore, expected to be useful when performing configuration classification. Thus, the shape of the feature vectors is different according to the features selected. The features are handled in the three groups shown in Fig 3 (3a, 3b, 3c) and the models for all possible combinations have been trained. In Table 2 the shape of the feature vector f of instance i is indicated for each combination of features, taking into account that the feature vectors are flattened.

Hierarchical SLR

Fig. 2. Pipeline.

(a) Finger Landmarks.

(b) Thumb Distances.

(c) Tip Distances.

Fig. 3. Features extracted from hand joints used to train different models.

43

44

I. Rodr´ıguez-Moreno et al.

Table 2. Shape of feature vector f of instance i according to the selected features (*). As THUMB INDEX distance is calculated in both distance types, when both are selected one is removed to avoid repeated features. Selected features

fi shape

Hand landmarks

(21 ∗ 3) = (63)

Hand landmarks + one distance

(21 ∗ 3 + 4) = (67)

Hand landmarks + both distances* (21 ∗ 3 + 4 + 3) = (70) Both distances*

(4 + 3) = (7)

One distance

(4)

Once each frame is converted to a feature vector fi , different classifiers have been trained: Random Forest and SVM with both polynomial and Radial Basis Function (RBF) kernels. The Random Forest maximum depth is set to ten, the degree of the polynomial kernel to three and the decision function shape of the SVM to “one-vs-one”. Signs. As it has been mentioned, we propose a hierarchical approach where the models trained to classify single frames containing constituents of the signs (configurations) are used to train the model which performs the classification of different SSL signs. In order to be able to use the configuration models, the first step is to transform the signs dataset to the input format of the models. From each frame of the video the same features used for the configuration model have to be selected. For instance, if the configuration model has been trained just with the finger landmarks, the feature vector of each frame has to be formed by the values of the finger landmarks extracted for that frame, with shape (21 ∗ 3) = 63. So far, the preprocessing of both databases is the same. However, when training the models for sign recognition another step is needed. After representing each image in the appropriate feature space, the configuration model that we have already trained is used to get the prediction of each of the images of each video. This way, each frame is transformed to a vector of predicted probabilities, where the probability of the image corresponding to each of the eight possible configurations is indicated, as predicted by the configuration model. As we impose that all our training videos are 25 frames long, each instance V is converted to a 25 × 8 matrix as the one showed in Eq. 1. Pi,j refers to probability P for frame i of corresponding to configuration j, which is one of the eight configurations presented in Table 1 (columns 3 and 5). ⎛ ⎞ P1,1 P1,2 · · · P1,8 ⎜ P2,1 P2,2 · · · P2,8 ⎟ ⎜ ⎟ (1) V =⎜ . .. . . . ⎟ ⎝ .. . .. ⎠ . P25,1 P25,2 · · · P25,8

Hierarchical SLR

45

These data, formed by the predicted probabilities of the configuration model, are used as input to train a group of Hidden Markov Models. A total of five HMMs are trained, one per sign. When training the HMM for each class, just the instances corresponding to that sign are used. According to the definition of the five chosen signs, a maximum of two configurations are used in a sign, moving from the initial configuration to the final configuration. Therefore, the HMMs are defined with two states (maximum of two configurations per sign). For example, in the case of the well (bien) sign, these two states would correspond to configurations 58 and 59. However, for this to happen, the models of the configurations should achieve very high accuracy. Even if this is not the case, it is intended to be able to clearly differentiate two clusters corresponding to the change from one configuration to another. The defined prior distribution of the transition matrix is shown on Eq. 2 and the prior distribution of the initial population probability on Eq. 3, considering that there is no coming back from the second state and that the signs start from the first state.  0.5 0.5 (2) transmat prior = 0 1

startprob prior = 1 0 (3) In Fig. 4 two HMM graph examples are show. These graphs have two states, as indicated in the definition, and the probabilities of moving from one state to the other. The graph on the left (Fig. 4a) represents the HMM before training, with the indicated number of states and the probabilities assigned in the prior transition matrix. On the other hand, the graph on the right (Fig. 4b) corresponds to a trained HMM. As it can be seen, the probabilities of moving between states have been adjusted to the data used for training. This example belongs to the class woman (mujer), so the states would correspond to configurations 73 and 74 respectively, if the models used to predict configurations were perfect.

(a) HMM Graph with Prior Transition Matrix.

(b) An Example of Trained HMM Graph.

Fig. 4. HMM graph examples.

To classify a new video, all the frames have to be transformed to the vectors of selected features and classified by the configuration model. Once the probability

46

I. Rodr´ıguez-Moreno et al.

matrix (Eq. 1) is obtained, this is used as input for all the trained HMMs. The HMM with the highest score determines the class of the sign performed in the video. Real-Time Prediction. After training both the configuration model and the HMMs, real-time classification can be performed. This is done in two steps: first a set of tentative signs are predicted with temporal sliding windows and then a final sign is predicted from the mode of all the tentative signs. The HMMs have been trained with videos of 25 frames length, therefore a sliding window of length 25 and step 1 is established to recognize a tentative sign. The feature vector of each frame is obtained as explained before, first the selected hand pose information is extracted and then these features are converted to a probability vector after applying the configuration classification model. When a new tentative sign is classified (25 frames compose a video), the sign corresponding to the HMM which gets a higher score for those 25 frames is predicted. Furthermore, another sliding window of length 10 and step 1 is defined to give the final prediction, which is the mode of the last 10 predicted tentative signs. In Fig. 5 a graphical representation of the real-time classification process is presented, where the mentioned sliding windows are shown.

4

Experimental Results

This section presents the results obtained for the trained models to get an estimation of their performance. The validation has been carried out through a Leave-One-Person-Out cross validation, using 5-folds since five different people have participated in the creation of both databases. The notation of each of the trained models represents which features have been used to train it. In Table 3, for each name, the features used to train that model are marked.

Fig. 5. Real-time prediction explanation. Each Si refers to a predicted sign. The 25 frames used to predict each tentative sign are colored in blue, while the predictions used each time to give the final prediction by calculating the mode are colored in green.

Hierarchical SLR

47

Table 3. Used features according to the name of each model. Model name

Finger landmarks

Distance

WRIST MCP PIP DIP TIP Thumb distances Tip distances DIP MCP PIP TIP WRIST

X

X

X

X

X

DIP DisTHUMB Dis TIP MCP PIP TIP WRIST

X

X

X

X

X

DIP DisTIP MCP PIP TIP WRIST

X

X

X

X

X

DIP DisTHUMB MCP PIP TIP WRIST

X

X

X

X

X

DisTHUMB DisTIP

X

X X

X X

X

DisTIP

X

DisTHUMB

X

Following the notation presented in Table 3, in Table 4 the accuracy values obtained for each of the trained configurations models are displayed, where the best value is highlighted in bold. Twenty-one different models have been trained, using seven groups of feature vectors and three types of classifiers. On the one hand, the best result is 0.6940 obtained SVM with RBF kernel as classifier and using finger landmarks and both distance types as features. On the other hand, the worst accuracy value is 0.5231 achieved by SVM classifier with polynomial kernel and with just tips distances as features. The range of the obtained accuracy values is not too large, but even so, certain differences are observed between the results obtained with different features and classifiers. Regarding the feature vectors, best results are obtained using all features together. However, when only the hand landmarks are used (or in combination with any of the distances), the difference is not very notable either. The worst results are obtained using just the distances, and especially when only one of the distances is selected. Although in these cases the results are worse, it should be taken into account that the latter uses only 4 values for each instance, thus reducing the resources needed to train the models. Concerning the classifiers, in general SVM with RBF kernel obtains higher accuracy values. Table 4. Configurations models accuracies. Features

RF

SVM-poly SVM-rbf

DIP MCP PIP TIP WRIST

0.5992 0.6734

DIP DisTHUMB DisTIP MCP PIP TIP WRIST 0.6653 0.6727

0.6805 0.6940

DIP DisTIP MCP PIP TIP WRIST

0.6520 0.6647

0.6889

DIP DisTHUMB MCP PIP TIP WRIST

0.6585 0.6797

0.6929

DisTHUMB DisTIP

0.6215 0.5918

0.6292

DisTIP

0.5234 0.5231

0.5293

DisTHUMB

0.6018 0.5454

0.5969

48

I. Rodr´ıguez-Moreno et al.

In Table 5 the accuracy values obtained with the trained HMM models for sign classification are shown. As mentioned before, five HMMs are trained for each configuration model (one per sign). Each instance is evaluated with every HMM and the predicted output is the label of the HMM which gets the best score for the input instance. The accuracy values are calculated applying a LeaveOne-Person-Out cross validation to the presented video data-set. When training the sign recognition model a previously trained configuration model is used. In order to perform the validation correctly, for each test person a configuration model trained without the instances belonging to that person is used. This way the complete evaluation is carried out on a unknown person for the model. The best accuracy value, 0.9843 for the trained HMMs is obtained using the configuration model trained with SVM classifier with RBF kernel and hand landmarks features. The HMMs trained using the data predicted by the model trained with SVM classifier with polynomial kernel and thumbs distances features obtained the lowest accuracy value, 0.8355. Every HMM achieves better accuracy values that the underlying configuration model. There is a correspondence with the previously obtained results, as the worst values are also obtained when using the models that have performed worst when classifying configurations. Table 5. Signs models accuracies. Features

RF

SVM-poly SVM-rbf

DIP MCP PIP TIP WRIST

0.9015 0.9398

0.9843

DIP DisTHUMB DisTIP MCP PIP TIP WRIST 0.9630 0.9198

0.9729

DIP DisTIP MCP PIP TIP WRIST

0.9815 0.9484

0.9786

DIP DisTHUMB MCP PIP TIP WRIST

0.9244 0.9127

0.9757

DisTHUMB DisTIP

0.9399 0.8711

0.8483

DisTIP

0.8984 0.8625

0.9056

DisTHUMB

0.8685 0.8355

0.8469

There is a significant difference between the accuracy values obtained when classifying configurations and signs. This is discussed in Sect. 5.

5

Discussion

In this section the obtained results are discussed, in order to shed light on them. Several difficulties are presented and analyzed in order to explain the performance of the trained models and be able to improve them in the future. Configurations. As shown in Table 4, the results obtained with the models trained for the classification of configurations are not as good as might be expected. Although different factors are involved in these results, first of all the

Hierarchical SLR

49

input data has to be analyzed, the accuracy of the hand information obtained with MediaPipe. Although MediaPipe is a great technology to obtain pose estimation, it still has some weaknesses and this could lead to incorrect data collection. In Fig. 6, two examples of hand landmarks obtained with MediaPipe for configuration 77 are shown. In Fig. 6a, a correct estimation is shown, where the middle finger is flexed towards the thumb. However, in Fig. 6b, the output of MediaPipe indicates that the index finger is flexed, leading to incorrect data.

(a) Correct Hand Landmarks.

(b) Wrong Hand Landmarks.

Fig. 6. Examples of obtained hand landmarks with MediaPipe.

Some of the configurations which form the selected signs are quite similar. For example, at first sight, 73–78 and 74–77 configurations might be misidentified. Furthermore, in the example showed in Fig. 6 of incorrect data obtained by MediaPipe, the hand landmarks of Fig. 6b which belong to class 77 could easily be considered to be an instance of class 74. In order to analyze the most misidentified classes, some confusion matrices are shown in Fig. 7. The two matrices shown correspond to two different subjects. On the left (7a), it can be seen that for this person the most misidentified classes are those already mentioned 73-74-77-78 and a square is clearly perceived where these labels are found (bottom right). On the other hand, for the person on the right (7b), although the 73–74 and 77–78 labels are also prone to confusion, mainly 50-58-59 classes are erroneously predicted. Taking into account the similarity of these classes and the possible erroneous data from MediaPipe, the accuracy values obtained for the configuration models are quite coherent.

50

I. Rodr´ıguez-Moreno et al.

(a) 73-74-77-78 Configurations are Mostly Confused.

(b) 50-58-59 Configurations are Mostly Confused.

Fig. 7. Examples of obtained confusion matrices.

It has been decided to perform a 10-fold cross validation over each of the people who participated in the creation of the datasets. That is, for each person only the instances corresponding to that person are used when training the model. This is done in order to analyze the degree of repeatability of each person. In conclusion, to see if the configurations are classified well when it is the same person who is performing them all the time. Table 6. Configurations models accuracies: 10-Fold CV over each person. Features

RF

SVM-poly SVM-rbf

DIP MCP PIP TIP WRIST

0.8334 0.8326

0.8556

DIP DisTHUMB DisTIP MCP PIP TIP WRIST 0.8875 0.8564

0.8939

DIP DisTIP MCP PIP TIP WRIST

0.8662 0.8414

0.8708

DIP DisTHUMB MCP PIP TIP WRIST

0.8711 0.8517

0.8775

DisTHUMB DisTIP

0.8212 0.8109

0.8109

DisTIP

0.7573 0.7294

0.7511

DisTHUMB

0.7605 0.7599

0.7475

In Table 6, the mean accuracy values of the trained five different models are shown. As expected, the results obtained are better than in the general case, since the classification is simpler when it is done over a single person. The results in Table 4 show that the ability of generalization of our models is limited. Still, even when single-person models are trained and evaluated, there are instances that are incorrectly predicted. To verify if the misclassified configurations coincide with the conclusions drawn previously, one of the obtained confusion matrices is shown in Fig. 8.

Hierarchical SLR

51

Fig. 8. Example of confusion matrix obtained using just one person of the dataset.

In this case, it is clearly perceived that the aforementioned classes 77–78 are the most misclassified configurations. So even though the training data is favorable there are clusters of configurations that are confusing, which may be due to data collection failures. Signs. On the other hand, regarding the sign recognition, the classification is much better. High accuracy values are obtained for most of the trained HMMs, as presented in Table 5. This may seem odd, since in the proposed hierarchical approach the sign recognition models use the models trained to recognize configurations as basis, and the results of these are much lower. Specifically, the predictions of configuration probabilities are used to train the HMMs. Although these predictions do not yield a high accuracy when selecting the configuration with the highest probability, they are useful for training the HMMs and recognize the sign that is being performed. Thus, the probability distribution of different people for the same sign is more similar than the probability distribution of the same person for different signs. Otherwise, the recognition would be much less efficient. Therefore, it is concluded that a weak classification model can lead to a powerful classification model in the domain of sign language recognition, when employing a hand configuration classifier as basis for a sign classifier in a hierarchical sign recognition model.

52

6

I. Rodr´ıguez-Moreno et al.

Conclusion and Future Work

This paper presents a hierarchical approach for the recognition of some signs of the Spanish Sign Language. The selected signs are decomposed into constituents, in this case the shape of the hand (also called configuration), and the recognition of the signs is based on the classification of these constituents. To this end, different models have been trained to classify the configurations, where different features extracted by MediaPipe and several classifiers have been used. Finally, Hidden Markov Models have been used to recognize in real time a sign performed in a video or live feed. These HMMs have been trained using the predictions of the configuration models as input. The results show that a robust recognition system can be achieved from weaker classification models. As future work we intend to analyze the source and patterns of the weaknesses of our system, therefore the estimation errors of the hand landmarks should be reduced. In addition, the use of more features has to be considered in order to make possible for our system to tell the difference between the most commonly misclassified configurations. Acknowledgment. This work has been partially funded by the Basque Government, Spain, grant number IT900-16, and the Spanish Ministry of Science (MCIU), the State Research Agency (AEI), the European Regional Development Fund (FEDER), grant number RTI2018-093337-B-I00 (MCIU/AEI/FEDER, UE) and the Spanish Ministry of Science, Innovation and Universities (FPU18/04737 predoctoral grant). We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.

References 1. Camgoz, N.C., Koller, O., Hadfield, S., Bowden, R.: Sign language transformers: joint end-to-end sign language recognition and translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10023– 10033 (2020) 2. Cao, Z., Hidalgo, G., Simon, T., Wei, S.-E., Sheikh, Y.: Openpose: realtime multiperson 2D pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 172–186 (2019) 3. Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299–6308 (2017) 4. Elakkiya, R.: Machine learning based sign language recognition: a review and its research frontier. J. Ambient. Intell. Humaniz. Comput. 12(7), 7205–7224 (2021) 5. Gutierrez-Sigut, E., Costello, B., Baus, C., Carreiras, M.: LSE-sign: a lexical database for Spanish sign language. Behav. Res. Methods 48(1), 123–137 (2016) 6. Kratimenos, A., Pavlakos, G., Maragos, P.: Independent sign language recognition with 3d body, hands, and face reconstruction. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4270–4274. IEEE (2021)

Hierarchical SLR

53

7. Liu, Z., Zhang, H., Chen, Z., Wang, Z., Ouyang, W.: Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 143– 152 (2020) 8. Lugaresi, C., et al.: MediaPipe: a framework for building perception pipelines. arXiv preprint arXiv:1906.08172 (2019) 9. Ma, Y., Zhou, G., Wang, S., Zhao, H., Jung, W.: SignFi: sign language recognition using WIFI. In: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 2(1), 1–21 (2018) 10. Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10975–10985 (2019) 11. Rastgoo, R., Kiani, K., Escalera, S.: Hand sign language recognition using multiview hand skeleton. Expert Syst. Appl. 150, 113336 (2020) 12. Gonz´ alez, G.S., S´ anchez, J.C., D´ıaz, M.M.B., Ata P´erez, A.: Recognition and classification of sign language for spanish. Computaci´ on y Sistemas 22(1), 271–277 (2018) 13. Sincan, O.M., Junior, J., Jacques, C.S., Escalera, S., Keles, H.Y.: Chalearn lap large scale signer independent isolated sign language recognition challenge: design, results and future research. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3472–3481 (2021) 14. Vazquez-Enriquez, M., Alba-Castro, J.L., Docio-Fernandez, L., Rodriguez-Banga, E.: Isolated sign language recognition with multi-scale spatial-temporal graph convolutional networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3462–3471 (2021) 15. Wadhawan, A., Kumar, P.: Sign language recognition systems: a decade systematic literature review. Arch. Comput. Meth. Eng. 28(3), 785–813 (2021) 16. Zhang, F., et al.: Mediapipe hands: On-device real-time hand tracking. arXiv preprint arXiv:2006.10214 (2020)

Deontic Sentence Classification Using Tree Kernel Classifiers Davide Liga1,2(B) and Monica Palmirani1 1

CIRSFID-AI, Alma Mater Research Institute for Human-Centered Artificial Intelligence, Alma Mater Studiorum - University of Bologna, Bologna, Italy {davide.liga2,monica.palmirani}@unibo.it 2 University of Luxembourg, Esch-sur-Alzette, Luxembourg https://www.unibo.it/sitoweb/

Abstract. The aim of this work is to employ Tree Kernel algorithms to classify natural language in the legal domain (i.e. deontic sentences and rules). More precisely, an innovative way of extracting labelled legal data is proposed, which combines the information provided by two famous LegalXML formats: Akoma Ntoso and LegalRuleML. We then applied this method on the European General Data Protection Regulation (GDPR) to train a Tree Kernel classifier on deontic and non-deontic sentences which were reconstructed using Akoma Ntoso, and labelled using the LegalRuleML representation of the GDPR. To prove the nontriviality of the task we reported the results of a stratified baseline classifier on two classification scenarios.

Keywords: Deontic Akoma Ntoso

1

· NLP · Legal AI · GDPR · LegalRuleML ·

Introduction

This paper belongs to the field of Artificial Intelligence and Law (AI&Law) [1], and its goal is the automatic detection and classification of deontic sentences and rules, a NLP task which has been tackled rarely by the scientific community. With a certain degree of approximation [20], we can define deontic sentences as those sentences which express obligations and permissions rules (and also prohibitions rules, which are negative obligations). This task of Deontic Sentence Classification (DSC) is a task of crucial interest which has strong implications not only for the legal domain (where it may facilitate long-term goals such as automatic reasoning), but also for any domain where regulations and rules are involved or required to make important decisions [23]. Moreover, deontic NLP is a crucial point of connection between the NLP community and many other communities where deontic knowledge is involved (e.g., Logic, Argumentation, Multi-Agent System, Legal AI, Automatic Reasoning, and so on). Since we think that future achievements in AI can be facilitated by cross-domain tasks, where multiple scientific communities can share ideas c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023  K. Arai (Ed.): IntelliSys 2022, LNNS 542, pp. 54–73, 2023. https://doi.org/10.1007/978-3-031-16072-1_4

Deontic Sentence Classification

55

and common goals, we also argue that tasks like DSC deserve more efforts and attention. Hopefully, following the increasing successes in NLP, the extraction and classification of deontic rules from natural language will be conveniently explored to facilitate the development of its many potential applications and uses. One of the reasons why DSC has been rarely explored is the scarcity of available datasets designed for the natural language processing of deontic rules. Designing this kind of datasets can be, in fact, particularly expensive because of the need of experts of domain, and because the need to select features capable of expressing the multilayered complexity of deontic data (which involves the logical sphere, the normative sphere, as well as the linguistic sphere). In order to tackle at least part of these issues, this work proposes a methodology designed to avoid the need to engineer complex features by leveraging, instead, the structural features of deontic language and by exploiting the knowledge provided by two famous LegalXML formats: Akoma Ntoso [17] and LegalRuleML [2].

2

Methodology

The approach proposed in this work is twofold. Firstly, a combined use of Akoma Ntoso and LegalRuleML is presented to show how to extract robust and reliable symbolic legal knowledge and generate a labelled dataset of deontic sentences and rules, on which some Machine Learning classifiers have been then trained and evaluated. To the best of our knowledge, it is the first time that a combined approach like this is proposed. Secondly, as far as the Machine Learning classifiers are concerned, this work shows that Tree Kernel classifiers [14] can be an appropriate choice as classification algorithm for DSC and, more generally, for the detection of rules within natural language. In fact, Tree Kernel classifiers are similarity measures based on the structural features of natural language sentences, and one of the domains where the structure of language is arguably more crucial (and able to represent target classes located on higher levels of abstraction) is, indeed, the legal and normative domain. In this sense, as observed in other studies [11], Tree Kernels can be a way to avoid highly engineered features shifting the attention towards the representation of language structures. In short, thus, the methodology of this paper proposes offer an Hybrid AI approach [9,19]. On the one side, it describes a novel method to extract deontic knowledge leveraging LegalXML while, on the other side, it suggests an innovative classification algorithm (Tree Kernels) which, to the best of our knowledge, has never been tested on the detection of deontic sentences or rules. 2.1

Akoma Ntoso

Akoma Ntoso is the most famous LegalXML standard for representing legal documents. Among the main features of Akoma Ntoso, there are at least three levels that we want to point out here:

56

D. Liga and M. Palmirani

– It models the natural language of legal documents – It models all the structures (e.g. lists, table of contents, signatures, images, etc.) of legal documents – It models the meta-data of legal documents (e.g. temporal information, references and connections to other legal documents) These three dimensions (natural language, internal structures, meta-data) make Akoma Ntoso able to express nearly any information concerning a specific legal document (including information about how the document is related to other documents) and permits to capture when and where the legal document is applicable so as to identify several important parameters of the deontic operators1 . As far as our work is concerned, we used Akoma Ntoso as the source of the natural language sentences to classify, and since Akoma Ntoso is designed to reproduce the structures of legal documents, it allowed us to recompose (whenever needed) the sentences which were split into different components (like in some lists). For example, let the following fictitious article be an example of deontic structure of a fictitious legal document:

Article 1 Principles and Rules of Subject X 1. Subject X: (a) shall do action α; (b) is allowed to do action β; (c) is not Y;

This will have an Akoma Ntoso representation which can be written as:

Article 1

Principles and Rules of Subject X

1

Another important benefit to use AKN is to use metadata for collocating the deontic operators in the correct temporal sequence. If we have a suspension or a sunset rule or an exception we can use these important information to understand better the relationship between different parts of the discourse so as to recompose the deontic operator correctly.

Deontic Sentence Classification

57

1.

Subject X:



(a)

Shall do action α;



(b)

is allowed to do action β;



(c)

Is not Y;



[. . . ]

The reconstruction of this list is a crucial aspect and we empirically experienced in our experiments how detrimental can be to skip this step and train classifiers just on segmented portions of sentences, instead of using the complete recomposition of sentences. Thanks to the structure and information provided by Akoma Ntoso, it is easy to recompose the previous segments into the following three sentences: – Subject X shall do action α – Subject X is allowed to do action β – Subject X is not Y This is important because in LegalRuleML, as we will see, the deontic information can target specific portions of sentence (e.g. a specific point of a list).

58

2.2

D. Liga and M. Palmirani

LegalRuleML

If Akoma Ntoso represent legal documents’ content, structures and meta-data, LegalRuleML is designed to represent, instead, the logical-deontic sphere of legal documents. While we will thoroughly describe this on our experimental settings in the next sections, what is important to point out now is that LegalRuleML is able to connect each portion of Akoma Ntoso structures to any logical formulae that their natural language may encompass. In this regard, an interesting point is that LegalRuleML can adopt different logical formalizations, allowing for a direct connection between natural language and logic. Although we will not exploit this powerful connection in this work, it is worth mentioning that this connection opens a range of exciting experimental scenarios for the development of methods to unlock automatic reasoning from natural language (a big longterm goal for Artificial Intelligence). For the fictitious example described above, LegalRuleML would assign an obligation rule to the Akoma Ntoso portion related to point a and a permission rule to point b. In this way, we can finally associate the sentences (or better, the legal provisions, which may sometimes consist of multiple sentences) that we find and recompose from Akoma Ntoso to a deontic label from LegalRuleML, as synthesized in Fig. 1.

Fig. 1. Knowledge extraction from Akoma Ntoso and LegalRuleML. Note that each extracted instance refers to an atomic normative provision (generally contained in paragraphs or points), and may sometimes consist of more than one sentence.

2.3

Tree Kernels and Tree Representation

Regarding the classification algorithm, we selected Tree Kernels because they leverage the structural information of natural language sentences. This can be useful in contexts where the structure of text can have a predominant role, like in legal natural language.

Deontic Sentence Classification

59

To use Tree Kernel algorithms, input data (for example, natural language sentences) must be converted into tree structures. This will allow Tree Kernel functions to calculate the similarity between different portions of these treestructured data representations. A common and famous example of conversion of text into a tree structure is the generation of the dependency tree of a sentence, as well as the generation of the constituency tree. In the following part, we will shortly describe the tree representations that have been employed in this study. They can be considered as particular kinds of dependency trees which combine grammatical functions, lexical elements and Part-of-Speech tags in different ways [6], and for this reason they are very powerful data representations. GRCT. The Grammatical Relation Centered Tree (GRCT) representation is a very rich data representation [5]. It involves grammatical, syntactical, lexical elements together with Part-of-Speech and lemmatized words. For example, the Fig. 2a shows a GRCT representation for the simple sentence “Personal data shall be processed lawfully”. In this representation, after the root there are syntactical nodes (grammatical relations), then Part-of-Speech nodes and finally lexical nodes. In other words, a tree of this kind is balanced around the grammatical nodes, which determines the structure of dependencies. LCT. Also Lexical Centered Tree (LCT) representations involve grammatical, lexical and syntactical element, along with Part-of-Speech tags. However, as can be seen from Fig. 2b, the structure of the tree is different. In fact, it is “centered” over Lexical nodes, which are at the second level, immediately after the root. Part-of-Speech nodes and grammatical functions nodes are equally children of the lexical elements. LOCT. The Lexical Only Centered Tree (LOCT) representation can be seen in the Fig. 2c. It basically contains just the lexical elements. Also in this case, the figure shows the sentence “Personal data shall be processed lawfully”. Intuitively, the contribution of LOCT representation can be particularly determinant whenever the tasks to be achieved mostly depend on lexical elements. Apart from the type of tree structure, the second important aspect is the type of kernel function to use. In this work, three types of Tree Kernel functions have been used, and each of them involves different portions of the trees into the calculation of the similarity. Before describing these three different functions, we need to briefly describe what is the mathematical rationale behind the Tree Kernel family. A kernel function can be considered as a similarity measure that perform an implicit mapping ϕ : X → V where X is a input vector space and V is a high-dimensional space. A general kernel function can be represented as: k(x, x ) = ϕ(x), ϕ(x )V

(1)

60

D. Liga and M. Palmirani SYNT root SYNT nsubjpass

SYNT aux

SYNT auxpass

POS VBN

SYNT advmod

LEX process::v

POS RB

SYNT amod

POS NNS

POS MD

POS VB

POS JJ

LEX datum::n

LEX shall::m

LEX be::v

LEX lawfully::r

LEX personal::j (a) GRCT

LEX process::v LEX shall::m

LEX datum::n LEX personal::j POS JJ

POS NNS

SYNT nsubjpass

POS MD

SYNT aux

LEX be::v POS VB

SYNT auxpass

LEX lawfully::r POS RB

POS VBN

SYNT root

SYNT advmod

SYNT amod

(b) LCT

LEX process::v LEX datum::n

LEX shall::m

LEX be::v

LEX lawfully::r

LEX personal::j

(c) LOCT

Fig. 2. The GRCT, LCT and LOCT representations for the sentence “Personal Data shall be Processed Lawfully”.

Deontic Sentence Classification

61

Importantly, the ., .V in the above formula must necessarily be considered an inner product, while x and x belong to X and represent the labelled and unlabelled input, respectively. If we consider, for example, a binary classification task with a training dataset D = {(xi , yi )}ni=1 composed of n examples, where y ∈ {c1 , c2 } (with c1 and c2 being the two possible outputs of a binary classification), the final classifier yˆ ∈ {c1 , c2 } can be calculated in the following way: yˆ =

n  i=1

wi yi k(xi , x ) =

n 

wi yi ϕ(x).ϕ(x )

(2)

i=1

where the weights wi are learned by the trained algorithm. When using Tree Kernels, the function must be adapted to allow the calculations over tree nodes. In this regards, a general Tree Kernel function can be calculated as follows [14]:   Δ(n1 , n2 ) (3) K(T1 , T2 ) = n1 ∈NT1 n2 ∈NT2

In the above equation, T1 and T2 are the two trees involved in the calculation of the similarity, while NT1 and NT2 are their respective sets of nodes and Δ(n1 , n2 ) is the number of common fragments in node n1 and node n2 . Importantly, Δ(n1 , n2 ) can be seen as a function considering common fragments between trees. Depending on how this function is configured (i.e. which fragments are considered involved into the calculation of the similarity), different Tree Kernels can be obtained. Therefore, given that our data is tree-structured, the second important element is the definition of the which fragments must be involved when calculating the similarity between trees. Defining which fragments to involve also means defining the Tree Kernel function, because the names of the Tree Kernel functions usually derives from the fragment definition. In the following part, some famous Tree Kernel functions will be shortly described; each of them defines, in a different way, which fragments should be involved into the calculation of the similarity. STK. In a SubTree Kernel (STK) [21], a fragment is any subtree, i.e. any node of the tree along with all its descendants (see Fig. 3b). As described in previous studies [15], we can consider the set of fragments {f1 , f2 , ...} = F and the indicator function Ii (n) which is equal 1 if the target fi is rooted at node n and 0 if not. From the Eq. 3 we can consider Δ(n1 , n2 ) equal |F | to i=1 Ii (n1 )Ii (n2 ). Furthermore, in the case of STKs, Δ can be calculated in the following ways: – Δ(n1 , n2 ) = 0 if grammar production rules at n1 and n2 are different; – Δ(n1 , n2 ) = λ if the grammar production rules at n1 and n2 are the same, and n1 and n2 have only leaf children;

62

D. Liga and M. Palmirani

S N

VP

Europe

V

NP

created

D

N

a

currency

(a) The Parse Tree of the Sentence

VP

NP

V

NP

created

D

N

a

currency

D

N

a

currency

N

V

D

N

Europe

created

a

currency

(b) The Six Subtrees of the Sentence

VP V

VP

NP

V

D

N

a

currency

NP D

N

NP

NP

D

N

D

a

currency

NP

N

D

NP

N

D

currency

a

N

N

D

Europe

a

...

(c) Some of the Subset Trees of the Sentence

V

VP

VP

VP

VP

VP

VP

NP

NP

NP

NP

NP

NP

D

N

D

N

D

a

currency

a

currency

a

N

D

D

NP D

N

NP D

NP

... N

N

a

(d) Some of the Partial Trees of the Sentence

Fig. 3. A parse tree with its Subtrees (STs), Subset Trees (SSTs), and Partial Trees (PTs) for the sentence “Europe Created a Currency”.

Deontic Sentence Classification

63

nc(n ) – Δ(n1 , n2 ) = λ j=1 1 (Δ(cjn1 , cjn2 )) if the grammar production rules at n1 and n2 are the same, and n1 and n2 are not pre-terminals. where λ is the decay factor, which penalizes the trees depending on their length, while nc(n1 ) is the number of children of the node nn and cjnn is the j th child of nn . Since grammar production rules must be the same, nc(n1 ) = nc(n2 ). SSTK. A SubSetTree Kernel (SSTK) [4] considers as fragments the so-called subset-trees, i.e. it considers any node along with its partial descendancy. For example, as can be seen from Fig. 3c, the 2nd , 4th , 5th and 6th subtrees are clearly partial descendants. Note that even if the descendancy can be incomplete in depth, no partial productions are allowed. This means that production rules cannot be broken: for each node we can consider either all its children or none of them. In this case, Δ(n1 , n2 ) can be computed as: – Δ(n1 , n2 ) = 0 if productions at n1 and n2 are different: – Δ(n1 , n2 ) = λ if the productions at n1 and n2 are the same, and n1 and n2 have only leaf children; nc(n ) – Δ(n1 , n2 ) = λ j=1 1 (1 + Δ(cjn1 , cjn2 )) if the productions at n1 and n2 are the same, and n1 and n2 are not pre-terminals. Since in SSTKs the only constraint of not breaking grammar production rules, and since fragments’ leaves can be also non-terminal symbols, they can be considered a more general representation compared to the previously mentioned STKs. PTK. A Partial Tree Kernel [14] is a convolution kernel that considers partial trees as fragments. Similarly to SSTKs, a partial tree is a fragment of a tree which considers a node and its partial descendancy. However, partial trees allow also partial grammar production rules. For example, the 4th , 5th , 6th , 8th and 9th partial trees in Fig. 3d are clearly breaking the grammatical productions of the Noun Phrase (NP) node, considering the determinant (D) without the noun (N) or vice versa. The fact that production rules can be broken (i.e. partial), makes PTs even more general than SSTs. This is the reason why PTKs generally provide a higher ability to generalize. Their Δ(n1 , n2 ) can be computed as Δ(n1 , n2 ) = 0 if the node labels of n1 and n2 are different. Otherwise: Δ(n1 , n2 ) = μ(λ2 +

 J1 ,J2 ,l(J1 )=l(J2 )

λd(J1 )+d(J2 )

l(J1 )



Δ(cn1 [J1i ], cn2 [J2i ])

(4)

i=1

In Eq. 4, there is a vertical decay factor μ (which penalizes the height of the trees: because matching a large tree means also matching all its subtrees) and a horizontal decay factor λ (which penalizes fragments built on child sub-sequences containing gaps). Figure 4 shows the general overview of all experimental settings of this work, including the chosen Tree Kernel functions and tree representations.

64

D. Liga and M. Palmirani

Fig. 4. General overview

3

Related Works

The first studies which tackled the classification of deontic elements actually focused on the deontic elements as parts of a wider range of targets. Among these first attempts to classify obligations (among other targets) from legal texts there is [10], which focused on the regulations of Italy and US. Their method employed word lists, grammars and heuristics to extract obligations among other targets such as rights and constraints. Another work which tackled the classification of deontic statements is [22], which focused on the German tenancy law and classified 22 classes of statements (among these classes there were also prohibitions and permissions). The method used active learning with Multinomial Naive Bayes, Logistic Regression and Multi-layer Perceptron classifiers, on a corpus of 504 sentences. In [8], the authors used Machine Learning to extract six classes of normative relationships: prohibitions, authorizations, sanctions, commitments and powers. Perhaps the first study which directly addressed the deontic sphere is [16]. This work was focused on the financial legislation to classify legal sentences using a Bi-LSTM architecture, with a training dataset containing 1,297 instances (596 obligations, 94 prohibitions, and 607 permissions). The work also inspired [3], which introduced a hierarchical Bi-LSTM with self-attention to extract sentence embeddings, with the goal to detect contractual obligations and prohibitions. To the best of our knowledge, while these are the only two studies which addressed the detection of deontic classes, there is only one study which employed a Tree Kernel methodology [14] in the legal domain: this study presented CLAUDETTE [12], a platform which can classify sentences as being compliant or not with specific policies. We have no knowledge of other studies using Tree Kernels on legal data, nor other studies employing Tree Kernels to classify deontic sentences. Also, our work presents a Tree Kernel approach of Machine Learning, which combines the symbolic information of LegalXML formats with different Tree Kernel calculations (Sub Tree Kernels, Subset Tree Kernels, Partial Tree Kernels) and different tree representations (GRCT, LCT and LOCT) providing a new benchmark of results that can be compared to other approaches in the future.

Deontic Sentence Classification

65

Moreover, thanks to the use of the biggest LegalRuleML knowledge base, we managed to set two different scenarios of classification: 1. Rule vs Non-rule 2. Deontic vs Non-deontic

4

Data

The data used in this study are 707 atomic normative provisions2 extracted from the European General Data Protection Regulation (GDPR). To extrapolate this dataset we used the DAta Protection REgulation COmpliance (DAPRECO) Knowledge Base [18], which is the LegalRuleML representation of the GDPR and the biggest knowledge base in LegalRuleML [2] as well as the biggest knowledge base formalized in Input/Output Logic [13]. The current version of the DAPRECO3 includes 966 formulae in reified Input/Output logic: 271 obligations, 76 permissions, and 619 constitutive rules. As explained in [18], the number of constitutive rules is much higher than permissions and obligations because constitutive rules are needed to trigger special inferences for the modelled rules. In other words, constitutive rules are an indicator of the existence of a rule, without specifying its properties as being deontic or not. Importantly, DAPRECO also contains the connections between each formula and the corresponding structural element (paragraphs, point, etc.) in the Akoma Ntoso representation of the GDPR4 . In other words, using a LegalRuleML knowledge base like DAPRECO and the corresponding Akoma Ntoso representation, it is possible to connect the logical-deontic sphere of legal documents (in this case the 966 Input/Output formulae provided by DAPRECO) to the natural language statements in the legal text (provided by the Akoma Ntoso representation of the GDPR). Importantly, this combination of Akoma Ntoso and LegalRuleML also facilitates the reconstruction of the exact target in terms of natural language. For example, many obligations of legal texts are split into lists, and Akoma Ntoso is useful to reconstruct those pieces of natural language into a unique span of text. For example, Article 5 of the GDPR5 states:

2

3 4

5

In Akoma Ntoso, atomic normative provisions can be contained in different structures (e.g. in paragraphs or list points), and may sometimes be composed of more than one sentence. We extracted these provisions from the body of the GDPR (the sentences of the preamble and conclusions are thus excluded). The DAPRECO knowledge base can be freely downloaded from its repository: https://github.com/dapreco/daprecokb/blob/master/gdpr/rioKB GDPR.xml. The Akoma Ntoso representation of the GDPR is currently accessible from https:// github.com/guerret/lu.uni.dapreco.parser/blob/master/resources/akn-act-gdprfull.xml, where it can be freely downloaded. https://eur-lex.europa.eu/eli/reg/2016/679/oj#d1e1807-1-1.

66

D. Liga and M. Palmirani

Article 5 Principles relating to processing of personal data 1. Personal data shall be: (a) processed lawfully, fairly and in a transparent manner in relation to the data subject (‘lawfulness, fairness and transparency’); (b) collected for specified, explicit and legitimate purposes and not further processed in a manner that is incompatible with those purposes; further processing for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes shall, in accordance with Article 89(1), not be considered to be incompatible with the initial purposes (‘purpose limitation’); (c) adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed (‘data minimisation’); [...]

As can be seen, paragraph 1 of Article 5 is a list composed of an introductory part (“Personal data shall be:”) and different points. To be concise, only the first three points of paragraph 1 are reported here, namely, point a, point b and point c. From the point of view of the natural language, each deontic sentence is split between the introductory part (which contains the main deontic verb “shall”) and the text of each point. While the introductory part contains the main deontic verb, the actual deontic information is contained within each point. The Akoma Ntoso formalization for this part of the GDPR is:

Article 5

Principles relating to processing of personal data

1.

Personal data shall be:



(a)

Deontic Sentence Classification

67

processed lawfully, fairly and in a transparent manner in relation to the data subject (‘lawfulness, fairness and transparency’);



[...]

In DAPRECO, which uses the LegalRuleML formalization6 , a series of elements can be found, which contain the structural portion where the deontic formulas are located, referenced by using the Akoma Ntoso naming convention7 . For example, the reference of the above mentioned point a can be found in DAPRECO as:

Where the “refersTo” attributes indicates the internal ID of the reference, and the “refID” attribute indicates the external ID of the reference using the Akoma Ntoso naming convention. The prefix “GDPR” stands for the Akoma Ntoso uri of the GDPR, namely, “/akn/eu/act/regulation/2018-05-25/eng@2018-0525/!main#”. In turn, this element is then associated to its target group of logical statements, which collects the group of logical formulas related to this legal reference (so, in this case, related to point a of the first paragraph of Article 5 ). Such association is modelled as follows:



6 7

https://docs.oasis-open.org/legalruleml/legalruleml-core-spec/v1.0/legalrulemlcore-spec-v1.0.html. https://docs.oasis-open.org/legaldocml/akn-nc/v1.0/csprd01/akn-nc-v1.0-csprd01. html.

68

D. Liga and M. Palmirani

Where the attribute “keyref” of the target connects to the collection of statements whose “key” attribute is “statements1”:



[...] [...]



[...] [...]



Importantly, each statement in natural language can have more than one formula in the logical sphere. This is the reason why the element here shows a collection of two logical formulae. To finally associate the portions of natural language sentences extracted from Akoma Ntoso to a class related to the logical sphere, the identification keys of the two formulae can be tracked into the element.

[...]

[...]

Deontic Sentence Classification

69

As can be seen, the first formula is associated with the ontological class “obligationRule”, while the second formula is associate with the ontological class “constitutiveRule”. In other words, the portion of natural language expressed in point a of the first paragraph of Article 5 of the GDPR is represented in the logical sphere as a constitutive rule and an obligation rule.

5

Experiment Settings and Results

The process of extraction resulted in 707 labelled spans of text, which have been reconstructed using the structural information provided by Akoma Ntoso whenever they were split into different parts (as previously mentioned with regard to lists). The labels of these sentences are the same as those provided by DAPRECO with the addition of a ‘none’ category: 1. 2. 3. 4.

obligationRule; permissionRule; constitutiveRule; none;

The class “obligationRule” is referred to those sentences which have at least one obligation in their related formulae. The class “permissionRule” is referred to those sentences which have at least one permission in their related formulae. The class “constitutiveRule” is referred to those sentences which have at least one constitutive rule in their related formulae. The class “none” is referred to those sentences which do not belong to the previous categories. Using these labels we generated two different experimental settings, as shown in Table 1: Table 1. Number of Instances per Class per Scenario. Classes

Instances

Scenario 1

rule non-rule

260 447

Scenario 2

deontic non-deontic

204 503

Scenario 1 is a binary classification task and aims at discriminating between rule and non-rule instances. In this scenario, all labels other than “none” are considered rule, while “non-rule” is just an alias for “none”. Scenario 2 focuses on a binary classification between deontic instances (i.e. any sentence labelled as either “obligationRule” or “permissionRule”) and non-deontic instances (i.e. all instances labelled as “constitutiveRule” or “none”).

70

D. Liga and M. Palmirani

To show the non triviality of this classification we used different baseline methods and selected the best one, namely a basic stratified baseline which reflects the class distribution of the dataset. This baseline was applied to both Scenario 1 and Scenario 2. As can be seen from Table 2, the baseline shows quite low scores for both. Table 2. Results for the two stratified baselines applied to Scenario 1 and 2. Within the brackets, the number of instances is reported. P = Precision; R = Recall, F1 = F1-Score, Acc. = Accuracy, Macro = Macro Average of the F1 Scores.

P rule(39) non-rule(67)

.37 .33 .35 .63 .67 .65 P

deontic(30) non-deontic(76)

Baseline Scenario 1 R F1 Acc. Macro .55

.50

Baseline Scenario 2 R F1 Acc. Macro

.23 .23 .23 .70 .70 .70

.57

.47

Regarding the experimental settings of the Tree Kernel classifiers, the dataset was divided into 75% for the training phase and 25% for the test, using the “class invariant” option to preserve the original distribution of the classes. The Tree Kernel classifiers have been trained using KeLP [7] and, as far as the reproducibility is concerned, the λ value and the μ value have been both set to 0.1 for all classifiers, while for the classifiers using PTKs a value of 3 has been assigned to the terminal factor8 . The final classification on the top of the kernel calculations is a binary Support Vector Machine classification. The scores of the testing phase for Scenario 1 are reported in Table 3, where it can be seen that the better performing classifiers are the SSTKs classifiers. Although GRCT achieved slightly higher scores, it seems that the use of different tree representation does not generate any significant difference. The final results for Scenario 2, reported in Table 4, show that the better performing classifier is the one which combines STKs with the GRCT representation. Also in this scenario, results show that PTKs are less performative than STKs and SSTKs. Overall, the final results are quite encouraging in both Scenario 1 and 2 achieving a macro average of .87 (and an accuracy up to .89 and .90, respectively).

8

These values are used within KeLP to specify the degree of sensitivity of the Tree Kernel algorithms in terms of the vertical and horizontal depth of sentences.

Deontic Sentence Classification

71

Table 3. Results for Scenario 1, showing the nine combinations of three tree kernel functions and three tree representation. P = Precision; R = Recall, F1 = F1-Score, Acc. = Accuracy, Macro = Macro Average of the F1 Scores. SSTK+GRCT P

R

F1

rule

.88

.75

.81

non-rule

.89

.95

.92

P

R

F1

rule

.86

.75

.80

non-rule

.89

.94

.91

P

R

F1

rule

.84

.72

.77

non-rule

.87

.93

.90

Acc.

SSTK+LCT

Macro

.89

.87

P

R

F1

.91

.74

.82

.88

.97

.92

P

R

F1

.87

.68

.77

.86

.96

.91

P

R

F1

.75

.77

.76

.89

.87

.88

STK+GRCT Acc.

.89

SSTK+LOCT

Macro .87

P

R

F1

.91

.74

.82

.88

.97

.92

P

R

F1

.89

.68

.78

.86

.96

.91

P

R

F1

.75

.77

.76

.89

.87

.88

STK+LCT

Macro

.88

.86

PTK+GRCT Acc.

Acc.

Acc. .87

.86

.84

Acc. .84

Macro

.89

.87

STK+LOCT Macro .84

PTK+LCT

Macro

Acc.

Acc.

Macro

.87

.85

PTK+LOCT Macro .82

Acc.

Macro

.84

.82

Table 4. Results for Scenario 2, showing the nine combinations of three tree kernel functions and three tree representation. P = Precision; R = Recall, F1 = F1-Score, Acc. = Accuracy, Macro = Macro Average of the F1 Scores. SSTK+GRCT P

R

F1

deontic

.90

.61

.73

non-deontic

.88

.98

.92

SSTK+LCT

Acc.

Macro

.88

.83

P

R

F1

.90

.59

.71

.87

.98

STK+GRCT P

R

F1

deontic

.94

.67

.79

non-deontic

.90

.99

.94

P

R

F1

deontic

.82

.68

.73

non-deontic

.89

.95

.92

Macro

.88

.82

P

R

F1

.90

.59

.71

.87

.98

.92

STK+LCT

Acc.

Macro

.90

.87

P

R

F1

.90

.57

.69

.86

.98

.92

P

R

F1

.71

.70

.70

.89

.90

.90

PTK+GRCT

6

.92

SSTK+LOCT

Acc.

Macro

.88

.83

Macro

.88

.82

STK+LOCT

Acc.

Macro

.87

.81

P

R

F1

.90

.57

.69

.86

.98

.92

P

R

F1

.71

.70

.70

.90

.90

.90

PTK+LCT

Acc.

Acc.

Acc.

Macro

.87

.81

PTK+LOCT

Acc.

Macro

.85

.80

Acc.

Macro

.85

.80

Conclusions

This work employed a Tree Kernel Machine Learning algorithm for the classification of deontic sentences and rules. First of all, we described a method to extract legal knowledge from the combined information provided by Akoma Ntoso and LegalRuleML; more precisely, we used as case study the famous European General Data Protection Regulation (GDPR), using its Akoma Ntoso representation to extract the natural language structures (recomposing them when needed) and its LegalRuleML representation (i.e. the DAPRECO knowledge base). This study is probably the first work which applies Tree Kernel algorithms to classify deontic data, and one of the few that classify legal sentences. A limitation of this paper is that it performed only binary classifications. In future, we will create a multiclass scenario to detect obligations, permissions and constituency rules separately. Moreover, we would like to improve the results by applying new approaches, such as Transfer Learning methods, which showed to be remarkably performative in many NLP tasks during the last few years.

72

D. Liga and M. Palmirani

It is worth mentioning that in this paper we just started an exploration of how LegalXML formats can be successfully combined to NLP methodologies. In this exploration many other interesting tasks can be performed, which we did not discussed here. For example, since DAPRECO also contains formulas (which are formalized using reified Input/Output logic), these formulae can be used to connect the sphere of natural language to the formal sphere of logic by providing a direct connection between portions of rules within natural language sentences and portions of rules in the formal logical domain. Since we were not aiming at identifying the internal elements of the logical formulae, in this paper we just focused on the binary classification of deontic sentences. However, in the future we want to create classifiers that directly addresses these internal logical components within each formula, trying to find a match between portions of natural language and portions of logic. The ability to connect each internal component (or at least some) of the deontic formulae contained within DAPRECO directly to its relative portion of natural language where the component expresses its deontic meaning can be a crucial step towards the long-term and yearned goal of filling the gap between natural language and the logical sphere. Filling this gap means being able to unlock automatic reasoning, a big step towards general Artificial Intelligence.

References 1. Ashley, K.D.: Artificial Intelligence and Legal Analytics: New Tools for Law Practice in the Digital Age. Cambridge University Press, Cambridge (2017) 2. Athan, T., Boley, H., Governatori, G., Palmirani, M., Paschke, A., Wyner, A.: Oasis legalruleml. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Law, pp. 3–12 (2013) 3. Chalkidis, I., Androutsopoulos, I., Michos, A.: Obligation and prohibition extraction using hierarchical RNNs. arXiv preprint arXiv:1805.03871 (2018) 4. Collins, M., Duffy, N.: Convolution kernels for natural language. In: Advances in Neural Information Processing Systems, pp. 625–632 (2002) 5. Croce, D., Moschitti, A., Basili, R.: Semantic convolution kernels over dependency trees: smoothed partial tree kernel. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 2013–2016 (2011) 6. Croce, D., Moschitti, A., Basili, R.: Structured lexical similarity via convolution kernels on dependency trees. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1034–1046 (2011) 7. Filice, S., Castellucci, G., Croce, D., Basili, R.: Kelp: a kernel-based learning platform for natural language processing. In: Proceedings of ACL-IJCNLP 2015 System Demonstrations, pp. 19–24 (2015) 8. Gao, X., Singh, M.P.: Extracting normative relationships from business contracts. In: Proceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Systems, pp. 101–108 (2014) 9. Gomez-Perez, J.M., Denaux, R., Garcia-Silva, A.: Hybrid Natural Language Processing: An Introduction, pp. 3–6. Springer, Cham (2020). https://doi.org/10. 1007/978-3-030-44830-1 1

Deontic Sentence Classification

73

10. Kiyavitskaya, N., et al.: Automating the extraction of rights and obligations for regulatory compliance. In: Li, Q., Spaccapietra, S., Yu, E., Oliv´e, A. (eds.) ER 2008. LNCS, vol. 5231, pp. 154–168. Springer, Heidelberg (2008). https://doi.org/ 10.1007/978-3-540-87877-3 13 11. Liga, D.: Argumentative evidences classification and argument scheme detection using tree kernels. In: Proceedings of the 6th Workshop ArgMining, pp. 92–97 (2019) 12. Lippi, M., et al.: Claudette: an automated detector of potentially unfair clauses in online terms of service. Artif. Intell. Law, pp. 1–23 (2018) 13. Makinson, D., Van Der Torre, L.: Input/output logics. J. Philos. Logic 29(4), 383– 408 (2000) 14. Moschitti, A.: Efficient convolution kernels for dependency and constituent syntactic trees. In: European Conference on Machine Learning, pp. 318–329 (2006) 15. Moschitti, A.: Making tree kernels practical for natural language learning. In: 11th Conference of the European Chapter of ACL (2006) 16. O’Neill, J., Buitelaar, P., Robin, C., O’Brien, L.: Classifying sentential modality in legal language: a use case in financial regulations, acts and directives. In: Proceedings of the 16th Edition of AI and Law, pp. 159–168 (2017) 17. Palmirani, M., Vitali, F.: Akoma-Ntoso for legal documents. In: Sartor, G., Palmirani, M., Francesconi, E., Biasiotti, M. (eds.) Legislative XML for the Semantic Web. Law, Governance and Technology Series, vol. 4, pp. 75–100. Springer, Dordrecht (2011). https://doi.org/10.1007/978-94-007-1887-6 6 18. Robaldo, L., Bartolini, C., Lenzini, G.: The DAPRECO knowledge base: representing the GDPR in LegalRuleML. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 5688–5697 (2020) 19. Rodr´ıguez-Doncel, V., Palmirani, M., Araszkiewicz, M., Casanovas, P., Pagallo, U., Sartor, G.: Introduction: a hybrid regulatory framework and technical architecture for a human-centered and explainable AI. In: Rodr´ıguez-Doncel, V., Palmirani, M., Araszkiewicz, M., Casanovas, P., Pagallo, U., Sartor, G. (eds.) AICOL/XAILA 2018/2020. LNCS (LNAI), vol. 13048, pp. 1–11. Springer, Cham (2021). https:// doi.org/10.1007/978-3-030-89811-3 1 20. Rubino, R., Rotolo, A., Sartor, G.: An owl ontology of norms and normative judgements. In: Biagioli, C., Francesconi, E., Sartor, G. (szerk.) Proceedings of the V Legislative XML Workshop, pp. 173–187. Citeseer (2007) 21. Vishwanathan, S.V.N., Smola, A.J., et al.: Fast kernels for string and tree matching. Kernel Methods Comput. Biol. 15, 113–130 (2004) 22. Waltl, B., Muhr, J., Glaser, I., Bonczek, G., Scepankova, E., Matthes, F.: Classifying legal norms with active machine learning. In: URIX, pp. 11–20 (2017) 23. Wyner, A., Peters, W.: On rule extraction from regulations. In: Legal Knowledge and Information Systems, pp. 113–122. IOS Press (2011)

Sparse Distributed Memory for Sparse Distributed Data Ruslan Vdovychenko(B) and Vadim Tulchinsky V.M. Glushkov Institute of Cybernetics of National Academy of Sciences of Ukraine, 40 Acad. Glushkov Avenue, Kyiv 03150, Ukraine [email protected]

Abstract. Sparse Distributed Memory (SDM) and Binary Sparse Distributed Representations (BSDR) are phenomenological models of different aspects of biological memory. SDM as a neural network represents the functioning of noise and damage tolerant associative memory. BSDR represents methods of encoding holistic (structural) information in binary vectors. The idea of SDM- BSDR integration appeared long ago. However, SDM is inefficient in the role of BSDR cleaning memory. We can fill the gap between BSDR and SDM using the results of a 3rd theory related to sparse signals: Compressive Sensing (CS). An integrated semantic storage model is presented in this paper. It is called CS-SDM since it uses a new CS-based SDM design for cleaning memory applied to BSDR. CS-SDM implementation is based on GPU. The model’s capacity and denoising capabilities are significantly better than those of classical SDM designs. Keywords: Compressive sensing · Sparse distributed memory · Binary sparse distributed representations · Neural networks · Associative memory · GPU

1 Purpose The phenomenon of memory has been studied by many neurobiologists. Among the used methodological approaches, the phenomenological one is of high practical interest. It means building functional memory designs that simulate the characteristics of the objects abstracting from the implementation details, thus leading to technically affordable solutions and applications. The ability of high-performance implementation on Graphics Processing Units (GPU) is a benefit for such a model. There are numerous types of memory known in nature for both animals and humans, including motor, sensory, short-term and long-term, immune, and so on. None of them work like electronic device memory: old information is not overwritten by the old one, malfunctioning of a small number of brain neurons does not affect the overall system’s functionality, and the “addresses” and “data” are not explicitly separated. The neural network memory models that can efficiently work with semantically rich (hierarchical, structured, holistic) data are particularly interesting and remain a challenge. The main contribution of this work is to propose a new hybrid memory construction for semantic memory. Our design is efficiently implemented on GPU. It was evaluated on millions of synthetic vectors and shown better capacity and denoising ability compared to traditional designs. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 K. Arai (Ed.): IntelliSys 2022, LNNS 542, pp. 74–81, 2023. https://doi.org/10.1007/978-3-031-16072-1_5

Sparse Distributed Memory for Sparse Distributed Data

75

2 Background There are three theoretical sources put in the background of our integrated memory model: Sparse Distributed Memory (SDM), Binary Sparse Distributed Representations (BSDR) and Compressive Sensing or Sampling (CS). SDM proposed by Pentti Kanerva [1, 2] in 1986 has many characteristics of natural memory, including generalization ability, sequence memorization, associativity, and even the ability to make mistakes. Lewis Jaeckel published the SDM hyperplane construction [3, 4] a few years later. Because of the reduced number of comparisons, it consumes less memory and performs reading/writing operations faster. Jaeckel’s SDM was later discovered to be more compatible with the mammalian cerebellum functioning (according to the design by David Marr in 1969 and James Albus in 1975 [5]). Kanerva’s original design is more akin to immune memory in operation (according to the model of Derek Smith, Stephanie Forrest, and Alan Perelson, 1996 [6]). BSDR [7–9] is a subclass of Vector Symbolic Architecture (VSA) that emerged around the turn of the millennium [10, 11]. VSA’s general requirements include local error tolerance, the ability to process sophisticated relationships (symbolic sequences, key-value, hierarchical, etc.), distributed in long codevectors. In the case of BSDR, the codevectors are binary and s-sparse: the fraction of non-zeros s (fixed for all codevectors) is much smaller than the codevector length. The techniques for converting real vectors and holistic data into binary sparse codevectors have advanced significantly [12]. An important element of BSDR is a cleaning memory that recovers codevectors from their incomplete (more sparse) instances obtained within the decoding operations (unbinding). In a narrow sense, our goal is SDM modification for serving as the BSDR cleaning memory. Traditional SDM designs are performing poorly for processing sparse vectors: physical memory is used inefficiently, the majority of address space is not used, and the reading operation is unstable and challenging even with the known a priori probability. CS was proposed in 2004–2006 by Emmanuel Candes, Terence Tao and their colleagues [13, 14]. CS focuses on undetermined systems of linear algebraic equations with normalized coefficients: it either solves them for the known number of non-zero components s or finds the sparsest solution. The system coefficients create a matrix called a dictionary. The dictionary rows (that are shorter than columns) are called samples. The CS condition of the problem solvability is called Restricted Isometry Property (RIP) [13, 15]. An informal definition of the s-RIP condition is that any set of s dictionary samples containing roughly orthogonal vectors. In our model, CS algorithms are used to reconstruct the s-sparse BSDR vectors from significantly shorter noisy dense vectors from SDM. Among the CS algorithms, we examined reconstruction by linear programming [16] and CoSaMP [17]. CoSaMP is a greedy algorithm derived from the Orthogonal Matching Pursuit (OMP) [18], specially modified to match CS problems. Although linear programming often provides superior results, it takes a lot more time to compute, hence it has not been applied for extensive experiments. (However, in smaller tests, we utilized both.)

76

R. Vdovychenko and V. Tulchinsky

3 Model In general architecture of the associative cleaning memory we propose for s-sparse binary codevectors as presented in Fig. 1. Its main components are CS coder and decoder for transformation of sparse data to and from dense form with SDM of slightly modified Jaekel’s design for storing the dense vectors. Because of the components, we call the memory construction CS-SDM. Storing data in encoded form for different purposes is not a new idea. ([19] is an example). But the combination of CS encoding with SDM makes CS-SDM especially fitted for storing binary codevectors of fixed sparsity with highly improved density and recognition ability for the addresses corrupted by noise. Because of the ability to recover insufficient features (missed 1s), CS-SDM can be proposed for BSDR cleaning memory implementation.

CS-SDM

Binary Sparse Distributed RepresentaƟon

Compressive Sensing

SemanƟc and structured data

Binary Sparse Distributed RepresentaƟon

Random matrix

Neural network

Structure unravelling

GPU implementaƟon

Sparse Distributed Memory

Memory

Fig. 1. General architecture of CS-SDM.

Let consider CS-SDM construction step by step. The semantic or structured data encoded into an M-dimensional s-sparse binary codevector x is first transformed with multiplication by a random dictionary matrix  ∈ {±1}m×M . Within the construction, the dictionary matrix elements are computed as pseudorandom uniformly distributed binary values (0 is replaced by −1). The multiplication resembles combining positive and negative signals from neuron input axons and sending the combined information to the neuron output axons. In other words, the CS encoder is just a first layer of a multilayer neural network. The encoder decreases the

Sparse Distributed Memory for Sparse Distributed Data

77

vector length from M to m and L > q, then from Eq. (45) one gets below: E[J−q]⇒[J−(q+1)] =

m2 (xJ−q − xJ−(q+1) )2 , 2m (tJ−q − tJ−(q+1) )2

(46)

E[J−k]⇒[J−(k+1)] =

m2 (xJ−k − xJ−(k+1) )2 , 2m (tJ−k − tJ−(k+1) )2

(47)

90

H. Nieto-Chaupis

and with this the generalized difference can be written as:   (xJ−k − xJ−(k+1) )2 m2 (xJ−q − xJ−(q+1) )2 ΔE(q, k) = − . 2m (tJ−q − tJ−(q+1) )2 (tJ−k − tJ−(k+1) )2

(48)

Now consider a detector can be continuously measuring the difference above. Because it is a pure kinetic energy then the accuracy of the measurements is given at the resolution of the measured coordinates. Thus not any error is applicable to the time slots since it is assumed that the time measurements is carried out by a perfect clock free of systematic errors. Consider Eq. (17) again, then one has for an arbitrary value of :   im Ψ (tJ− )|Ψ (tJ− ) = dxJ− dxJ−(+1) Ψ (x, t) 2π(tJ− − tJ−(+1) )   2 im (xJ− − xJ−(+1) ) (tJ− − tJ−(+1) ) Ψ (xJ−(+1) , tJ−(+1) ).(49) Exp − 2 (tJ− − tJ−(+1) )2 2.7

Errors at the Energy Measurement ˜ ⇒ H + δH would have effect on the thus a possible deviation of Hamiltonian H updated wave function: ˜ |Ψ (t + δt) = (E + δE) |Ψ (t + δt) H

(50)

with δE the error at the measured energy at a subsequent time t + δt. In this manner it is plausible to define the efficient Hamiltonian H at the sense that exhibits the efficient value of energy and it can be written as: L ˆ h F (51) H = =0 L ˆ  satisfies the eigenvalues equation: by the which there the Hamiltonian h ˆ  |E  = E |E  h with the efficient energy defined as:   E+1 F = . E+1 + E

(52)

(53)

Because Eq. (50) and Eq. (51), the efficient Hamiltonian Eq. (51) can be written as:  L ˆ  E+1 +δE+1 h  =0 E+1 δE+1 +E +δE . (54) H= L In this manner, the construction of Hamiltonian at Eq. (54) can led to define a general evolution operator written as:   L E+1 +δE+1 ˆ (t − t ) h +a  =0 E+1 δE+1 +E +δE . (55) U(t − t0 ) = −i L

Quantum Displacements Dictated by Machine Learning Principles

3

91

The Quantum Mechanics Machinery

Fig. 2. Sketch of space-time trajectory of a charged particle in presence of a system composed by point-like charged particles. Thus the particle experiences either attraction or repulsion depending at the sign of charge.

4

Applications

Consider Fig. 2 where a space-time trajectory of single particle is sketched. Here attraction and repulsion are expected as seen at the oscillatory trajectory. Clearly one finds that Coulomb interaction governs along the trajectory. Now it is assumed that the interactions only can be done if only if the energy fits well to the expected ones. Thus clearly the system targets to employ best trajectory that minimizes its energy. In this manner the Green’s function can be written below as:  ⎤ ⎡  2 K P k Uk (X) (t − t0 ) 2M + q ⎦ |x0  , (56) G(x, t; x0 , t0 ) = x| Exp ⎣−i  with the potential energy explicitly defined as: V (X) = q

K 

Uk (X) =

K 

k

k

qQk . |X(t) − Xk (t)|

(57)

Clearly it is desired to find the optimal trajectory at the sense that along the space-time history one must keep that for all available values of k it is required that |X(t) − Xk (t)| ≈ d. Therefore an external control can ask about to keep constant this value that in principle might to establish a system whose energy is optimized. In this way Eq. (57) acquires a simple form given by: V (dk ) =

K  qQk k

dk

.

(58)

92

H. Nieto-Chaupis

With this Eq. (56) can be written as: ⎡  2 K P k 2M + G = x| Exp ⎣−i

qQk dk



(t − t0 )



⎤ ⎦ |x0  .

A most convenient form for Eq. (59) is written as:   K 

  qQk (t − t0 ) E(t − t0 ) G = x| Exp −i |x0  . Exp −i  dk

(59)

(60)

k

Despite the fact that E can be controlled by an external agent through the tuning of velocity, the potential energy exhibits a different role with respect to this. The reader has to remember that dk is the desired distance. In order to illustrate the presented theory given above the potential energy for a chain of K charges can be written as: K  qQk k

dk

=

KqQ =W ˆ x

(61)

ˆ the distance operator. Now Eq. (60) is written with the insertion of with x identity operator and it reads:  

 

 E(t − t0 ) W (t − t0 ) G = x| Exp −i dx1 |x1  x1 | Exp −i |x0     



  E(t − t0 ) W (t − t0 ) = dx1 x| Exp −i |x1  x1 | Exp −i |x0     



  E(t − t0 ) W (x1 )(t − t0 ) = dx1 δ(x − x1 )Exp −i δ(x1 − x0 )Exp −i .   ˆ |x1 =x1 |x1  for this It should be noted the usage of eigenvalues equation: x reason W depends now on x1 . With x = x1 and with the integration over the Dirac-delta function one gets below: 

 

 E(t − t0 ) KqQ(t − t0 ) G = Exp −i Exp −i . (62)  x1  Because the presence of negative and positive charges, one might to expect that the charge of test q acquires an oscillatory dynamics along its space-time trajectory. On the other hand one still has the control over the integer number K denoting the effective number of charges. Thus the following approximations are done:

Quantum Displacements Dictated by Machine Learning Principles

E(t − t0 ) = βA SinθA  KqQ(t − t0 ) = βB SinθB . x1 

93

(63) (64)

With this Eq. (62) can be rewritten as:

=



G = Exp [−iβA SinθA ] Exp [−iβB SinθB ]  J (βA )Exp(−i θA ) Jr (βB )Exp(−irθB )



G=



(65)

r

J (βA )Jr (βA )Exp[−i( θA + rθB )].

(66)

,r

With J and Jr integer-order Bessel functions. The imaginary exponential can be solved through the assumption: θA = −rθB ). Therefore the propagator G can be written as: G=



J (

,r

4.1

E(t − t0 ) KqQ(t − t0 ) )Jr ( ). SinθA x1 SinθB

(67)

Numerical Applications

In order to illustrate Eq. (67) it is assumed for instance from experience it is noteworthy that kinetic energy is greater than the electric interactions. In the case of a single or few particles crossing bunches of ions, the Coulomb interactions are not enough to stop the particle. Thus one can establish that the electric potential is a fraction λ of kinetic energy: KqQ = λE x1

(68)

and with /r = −1 then θA = θB one arrives to: G=



J (

,r

E(t − t0 ) λE(t − t0 ) )J− ( ). SinθA SinθA

(69)

Finally working at small angles the SinθA ≈ θA and SinθB ≈ θB one gets: G=

 ,r

J (

E(t − t0 ) λE(t − t0 ) )J− ( ). θA θA

(70)

94

H. Nieto-Chaupis

Fig. 3. The Green’s function up to for two values of λ = 0.7 (Up) and 0.9 (down) showing the oscillatory behavior at the particle when it crosses a chain of charged particles. To note the decreasing of function in time indicating the lost of stability of particle.

In Fig. 3 the behavior of Green’s function along the time of interaction is displayed. Although the Up and Down panel have values of λ = 0.7 and λ = 0.9 it is exhibiting a short difference, one can see in both cases that the Green functions displays a decreasing behavior since the multiple interaction with the chain of particles charged. One can anticipate that for a long time the Green’s function would acquire a negligible value since the particle has depleted its energy to travel along the bunch of charged particles or ions for instance. The arrows are indicating the optimal values of the Green’s function in the sense that the particle would have to stop at those values of time. The corresponding Mitchell’s criteria can be written below as: Above are listed the main lines of a pseudo code that has central purpose the calculation of experience with the estimation of Green’s function. In line-5 reads the comparison of Green’s function and 0.5 that is interpreted as the 50% to pass from the performance to the experience. When it is failed the step is discarded to a different trajectory. It is actually the meaning of Fig. 3 where the arrows were imposed at the time when the Green’s function cannot overcome the minimal requirement to continue with the trajectory. In this manner this requirement needed by Machine Learning can be written by:

Quantum Displacements Dictated by Machine Learning Principles

95

Algorithm: The Mitchell’s Criteria

0 1 2 3 4 5 6 7 8 9 10 11

Task: Choice of λ and calculate energies for k = 1 to K ions Define the time of interaction for k = 1 to K Trajectories Performance Usage Green’s function if G < 0.5 then Discard trajectory Assess Experience of Calculations Done Go Line-3 endif end for end for end   H(t − t0 ) |G(x, x0 ; t, t0 )| = | x| Exp −i |x0  | ≥ 0.5. 

(71)

On the other hand one can take the time derivative in both sides yielding:   d d H(t − t0 ) |G(x, x0 ; t, t0 )| = i |G(x, x0 ; t, t0 )| = x| HExp −i |x0  (72) dt dt   and inserting the closure identity dx |x x| = I :    H(t − t0 ) d (73) i |G(x, x0 ; t, t0 )| = x| H dx |x x| Exp −i |x0  dt   d (74) i |G(x, x0 ; t, t0 )| = dx x| H |x G(x, x0 ; t, t0 ). dt Thus Eq. (74) is expressing the fact that the time evolution of Green’s function is the convolution of mean value of Hamiltonian with the Green’s function. Clearly this integration can increase iteratively.

5

Conclusion

In this paper it was implemented the principles of Machine Learning based at the Mitchell’s criteria to develop a theory consisting at the searching of the optimal trajectory. Thus at the done example in the which a charged particle crosses a space-time region, one might to have capabilities to discard those quantum mechanics trajectories [18,19] in the sense that the entering particle avoids to deplete its total energy. Although these ideas are theoretical in a next paper the case of high energy colliders shall be studied.

96

H. Nieto-Chaupis

References 1. Feynman, R.P.: [1942/1948]. In: Brown, L.M (ed.). Feynman’s Thesis: A New Approach to Quantum Theory (2005) 2. Wheeler, J.A., Feynman, R.P.: Classical electrodynamics in terms of direct interparticle action. Rev. Modern Phys. 21(3), 425–433 (1949) 3. Feynman, R.P.: The theory of positrons. Phys. Rev. 76(6), 749–759 (1949) 4. Feynman, R.P.: Theory of Fundamental Processes. Addison Wesley. ISBN 0-80532507-7 (1961) 5. Higgs, P.: Broken symmetries and the masses of gauge bosons. Phys. Rev. Lett. 13(16), 508–509 (1964) 6. Graudenz, D., Spira, M., Zerwas, P.M.: QCD corrections to Higgs-boson production at proton-proton colliders. Phys. Rev. Lett. 70, 1372 (1993) 7. ATLAS Collaboration: Observation of a new particle in the search for the standard model Higgs boson with the ATLAS detector at the LHC. Phy. Lett. B 716, (1), 1–29 (2012) 8. Volkov, D.M.: Zeitschrift fur Physik 94, 250 (1935) 9. Djouadi, A., Spira, M., Zerwas, P.M.: Phys. Lett. B 264, 440 (1991) 10. Spira, M., Djouadi, A., Graudenz, D., Zerwas, P.M.: Nucl. Phys. B 453, 17 (1995) 11. Ravindran, V., Smith, J., van Neerven, W.L.: Nucl. Phys. B 665, 325 (2003) 12. Mitchell, T.M.: Version Spaces: An Approach to Concept Learning, Ph.D. Dissertation. Electrical Engineering Department, Stanford University (1978) 13. Mitchell, T.M.: Machine Learning. McGraw Hill, New York (1997). ISBN 0-07042807-7. OCLC 36417892 14. Mitchell, T.M., Schwenzer, G.M.: Applications of artificial intelligence for chemical inference XXV. a computer program for automated empirical 13C-NMR rule formation. Org. Magnet. Resonan. 11(8), 378–384 (1978) 15. Nieto-Chaupis, H.: Theory of machine learning based on nonrelativistic quantum mechanics. Int. J. Quant. Inf. 19(4), 2141004 (2021) 16. Nieto-Chaupis, H.: The quantum mechanics propagator as the machine learning performance in space-time displacements. In: 2021 IEEE Fourth International Conference on Artificial Intelligence and Knowledge Engineering (AIKE) 17. Sakurai, J.J, Napolitano, J.: Modern Quantum Mechanics. 2nd edn,. Cambridge University Press, Cambridge (2017) 18. Chao-Hua, Yu., Gao, F., Chenghuan Liu, D., Huynh, M.R., Wang, J.: Quantum algorithm for visual tracking. Phys. Rev. A 99 (2019) 19. Griffiths, R.B.: Consistent interpretation of quantum mechanics using quantum trajectories, Phys. Rev. Lett. 70, 2201 (1993) - Published 12 April 1993

A Semi-supervised Vulnerability Management System Soumyadeep Ghosh(B) , Sourojit Bhaduri, Sanjay Kumar, Janu Verma, Yatin Katyal, and Ankur Saraswat AI Garage, Mastercard, Gurgaon, India {soumyadeep.ghosh,sourojit.bhaduri,sanjay.kumar3,janu.verma, yatin.katyal,ankur.saraswat}@mastercard.com

Abstract. With the advent of modern network security advancements, computational resources of an organization are always at a threat from external entities. Such entities may be represented by hackers or miscreants who might cause significant damage to data and other software or hardware resources of an organization. A Vulnerability is a general way of representing a weakness in the software or hardware resources of the computational infrastructure of an organization. Such vulnerabilities may be either minor software issues, or in some cases may expose vital computational resources of the organization to external threats. The first step is to scan the entire computational infrastructure for such vulnerabilities. Once they are ascertained, a patching process is carried out to mitigate the threats. In order to perform effective mitigation, the most serious vulnerabilities should be given a higher priority. In order to create this priority list, a scoring mechanism is required for all scanned vulnerabilities. We present an end to end deployed vulnerability management system which can score these vulnerabilities using a natural language description of the same. Keywords: Vulnerability · Classification Self-training · Cybersecurity

1

· Semi-supervised ·

Introduction

A vulnerability is defined as a weakness in computational logic utilized in hardware or software, which when exploited may result in the loss of confidentiality, integrity, or availability of any computational infrastructure. Some vulnerabilities may be minor issues in network software or the operating system which can be mitigated by a software or hardware update in most cases. On the other hand, some of them can be exploited by external entities, such as an attacker who can cause serious damage to the computational resources of an organization. The damage or harm caused may be limited to blocking or malfunctioning of a website or may be as serious as significant loss of data and computational resources. Large corporations have invested significant amount of their resources and manpower on detecting and mitigating such vulnerabilities. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023  K. Arai (Ed.): IntelliSys 2022, LNNS 542, pp. 97–113, 2023. https://doi.org/10.1007/978-3-031-16072-1_7

98

S. Ghosh et al.

Fig. 1. Proposed semi-supervised training algorithm for the vulnerability management system. The events refer to those vulnerabilities which are serious and exploitable in nature and the non-events are others. The events are given by an internal labelling of the data by the organization and the non-events are mined from a global database to increase the pool of labelled data.

The first step of managing such vulnerabilities is to identify them. There are several commercially available tools which may be utilized to identify them. These tools internally scan the entire computer and network infrastructure for possible symptoms of vulnerabilities. These symptoms are then matched with global vulnerability databases (e.g. NIST National Vulnerability Database) and then the detection process is completed. The global vulnerability databases have descriptions of these vulnerabilities in natural language, along with other details of those vulnerabilities. The description of these vulnerabilities in these databases has enough information to train a classifier to score the seriousness of a vulnerability in terms of the damage it may cause to the physical and software infrastructure of an organization. The National Vulnerability Database (NVD) contains several score metrics known as the Common Vulnerability Scoring System (CVSS), in addition to the description of the vulnerability. Such metrics refer to the characteristics and general severity of the vulnerabilities. While these metrics convey useful information about the severity of the vulnerabilities, they may fail to do the same in the context of a specific organization. In such cases one of the options is to train a model on the natural language description of the vulnerabilities from the NVD. Since we are concerned with the context of a

Semi-supervised Vulnerability Management

99

specific organization, this data needs to be labelled manually by subject matter experts of that organization. It is worthwhile to mention that manually labelling data is expensive and time consuming. Thus abundant manually labelled data is difficult to obtain. In such cases, in addition to a small set of labelled data, the rest of the vulnerabilities from the database may be utilized to train the model. This additional data may be incorporated into the training process in a semi-supervised learning paradigm, which is expected to give us an improved model compared to what can be achieved by training on the limited manually labelled data. We propose an AI based solution to a situation where limited labelled vulnerability data is available, so that a robust classifier may be trained for scoring a vulnerability into either a serious or a non-serious external/internal security threat, in the context of a specific organization. Thus, this paper presents an end to end machine learning solution for creating such a solution. We also present a detailed overview of the deployment of this solution and outline critical steps for maintaining and improving the solution over the course of time.

2

Related Work

There have been some work performed in the analysis of vulnerabilities using AI and network security. The author in [2] proposed a Neural network based classification method for classifying vulnerabilities. However, only TF-IDF based features have been considered in this study and showed that the neural network based classifier performed better than Naive-Bayes and SVM models on this task. The author in [4] proposed a text mining based technique to classify network vulnerabilities. However, this study was limited to the ones reported during the last three years by CERT. The author in [1] proposed a classification method for vulnerabilities by exploiting an underlying graph structure by considering a relationship among the them. In [6], the author in presented a method for attack capability transfer, which can aggregate vulnerabilities with the same exploitation attributes and satisfying some constrains to simplify the further analysis. The author in [3] presented a review of several methods for vulnerability classification.

3

Contributions of This Work

Most of the methods published on this topic have been either considering naive methods and features for classification of vulnerabilities, or have focused on specific types of vulnerabilities. The proposed method is much more general, and we have introduced the utilization of modern deep learning advancements, for training models for vulnerability classification. The salient contributions of this work are as follows: 1. We present a deep learning based vulnerability classification system. We also present comparisons with a logistic regression naive model and show that the proposed system produces much better results.

100

S. Ghosh et al.

2. We present a semi-supervised learning framework, which removes the dependence on limited manually labelled data. This allows our system to work with any organization-dependent labelling of vulnerabilities. Since manually labelling data is time consuming and costly, this method allows the system to produce much more accurate results. Thus we exploit several modern AI advancements into the development of this system, which not only makes it robust and accurate, but also does away with dependence on limited labelled data. 3. The system not only takes care of the present organization-dependent context, but also has a built-in online learning mechanism. This allows the system to update itself over time, by incorporating user feedback.

4

Problem Formulation

We consider a collection of n examples X : = {x1 , x2 , . . . , xl , xl+1 , . . . , xn } with xi ∈ χ. The first l examples XL : = {x1 , x2 , . . . , xl } are labeled as YL : = {y1 , y2 , . . . , yl } where yi ∈ C a discrete set over c classes i.e. C : = {1, 2, . . . , c}. The remaining examples xi for i ∈ U : = {l + 1, l + 2, . . . , n} are unlabeled. Denote by XU the unlabeled set, then X is the disjoint union of the two sets X = XL ∪ XU . In supervised learning, we use the labeled examples with their corresponding labels (XL , YL ) to train a classifier that learns to predict classlabels for previously unseen examples. The guiding principle of semi-supervised learning is to leverage the unlabeled examples as well to train the classifier. Supervised Learning: We assume a deep convolutional neural network (DCNN) based classifier trained on the labeled set of examples (XL , YL ) which takes an example xi ∈ χ and outputs a vector of class-label probabilities i.e. fθ : χ → Rc where θ are the parameters of the model. The model is trained by minimizing the supervised loss Ls (XL , YL , θ) =

l 

ls (fθ (xi ), yi )

(1)

i=1

A typical choice for the loss function ls in classification is the cross-entropy y , y) = −ylog(ˆ y) ls (ˆ The DCNN can be thought of as the composition of two networks - feature extraction network which transforms an input example to a vector of features φθ : χ → Rd and classification network which maps the feature vector to the class vector. Let vi : = φθ (xi ) be the feature vector of xi . The classification network is usually a fully-connected layer on top of φθ . The output of the network for xi is fθ (xi ) and the final prediction is the class with highest probability score i.e. yˆi := argmaxj (fθ (xi ))j

(2)

A trained classifier (at least the feature generator network) is the starting point of the most of the semi-supervised learning techniques, including the studies performed in this work.

Semi-supervised Vulnerability Management

101

Semi-Supervised Learning (SSL): There are two main schools of SSL approaches for classification – Consistency Regularization: An additional loss term called unsupervised-loss is added for either all samples or for only unlabeled ones which encourages consistency under various transformations of the data. Lu (X; θ) =

n 

lu (fθ (xi ), fθ (xi ))

(3)

i=1

where xi is a transformation of xi . A choice for consistency loss is the squared Euclidean distance. – Pseudo-labeling: The unlabeled examples are assigned pseudo-labels thereby expanding the label set to all of X. A model is then trained on this labeled set (XL ∪ XU ), (YL ∪ YˆU ) using the supervised loss for the true-labeled examples plus a similar loss for the pseudo-labeled examples. Lp (XU , YˆU , θ) =

n 

ls (fθ (xi ), yˆi )

(4)

i=l+1

The current work fits in the realm of the later school where we study the effect of iteratively adding pseudo-labeled examples for self-training. Self Training Using Student-Teacher Models: This class of methods [7] for SSL iteratively use a trained (teacher) model to pseudo-label a set of unlabeled examples, and then re-train the model (now student) on the labelled plus the pseudo-labelled examples. Usually the same model assumes the dual role of the student (as the learner) and the teacher (it generates labels, which are then used by itself as a student for learning). A model fθ is trained on the labelled data XL (using supervised loss Eq. 1), and is then employed for inference on the unlabeled set XU . The prediction vectors fθ (xi )∀xi ∈ XU are converted to one-hot-vectors, where XU ⊂ XU . These examples XU along with their corresponding (pseudo)labels YˆU are added to the original labelled set. This extended labelled set XL ∪ XU is used to train another (student) model fθ . This procedure is repeated and the current student model is used as a teacher in the next phase to get pseudo-labels ∪XU for training another (student) model fθ on the set XL ∪XU ∪ XU . Now, for conventional self-training methods we use the entire unlabeled set XU in every iteration. However, as mentioned above, the most general form of self training can have different sets of unlabeled data (XU , XU and so on) in every iteration. The method of selection of XU from XU can come from any utility function, the objective of which would be to use the most appropriate unlabeled data samples in each iteration. Some methods even use weights for each (labelled/unlabeled) data sample, which are updated in every iteration, similar to a process followed in Transductive Semi-Supervised Learning [5] methods, which is borrowed from the traditional concept of boosting used in statistics.

102

5

S. Ghosh et al.

Algorithm

In this section we illustrate the algorithm that was used for training the vulnerability classification model as outlined in Fig. 1. The initial model, which is referred to as the baseline model (shown as model 1 in Fig. 1) in the later sections, is built on limited manually labelled training data. The labelling was performed in the context of a specific organization, which we cannot reveal due to legal constraints. This baseline model is then improved by incorporating additional data from the NVD and the next phase of training was performed using the self-training student-teacher paradigm for semi-supervised learning. The baseline model is used to pseudolabel data from the NVD, and these pseudolabels are treated as true labels of this data. This helps us increase the amount of labelled training data for training a supervised model. The enriched training data, containing both the original labelled samples as well as the pseudo-labelled examples, is now used to train next iteration classifier. Since the training data contains both manually labelled data as well as pseudolabelled data, our loss function has two components - supervised loss Ls (XL , YL , θ) using cross-entropy as defined in the Eq. 1 for the hard-labelled examples and the unsupervised loss Lu (X; θ) using squared Euclidean distance as defined in the Eq. 3 for the soft-labelled examples. The full model is trained using a loss which is the linear combination of these two losses i.e. LSSL = Ls + λLu

(5)

where λ is a hyperparameter to be chosen. If there is no pseudo-labelled data, the loss term reduces to the supervised loss only. This process is repeated fro severalk iterations until the model ceases to improve its accuracies on a held out validation set.

6 6.1

Experiments and Results Database

The NVD is the U.S. government repository of standards based vulnerability management data represented using the Security Content Automation Protocol (SCAP). This data enables automation of vulnerability management, security measurement, and compliance. The NVD includes databases of security checklist references, security-related software flaws, misconfigurations, product names, and impact metrics. This data is collectively called as Common Vulnerability and Exposure (CVE). Each CVE in the NVD database has an unique id associated with it. This id is called a CVE-ID. Each CVE-ID is accompanied by a textual description of the CVE. A CVE also has a vulnerability score associated with it. This vulnerability score determines the seriousness of the vulnerability. This score is provided by the common vulnerability scoring system (CVSS). CVSS is an open framework

Semi-supervised Vulnerability Management

103

Fig. 2. Baseline accuracies for LSTM (a) ROC curves for training and testing sets and (b) precision-recall curves for train and test sets respectively.

for communicating the characteristics and severity of software vulnerabilities. CVSS consists of three metric groups: Base, Temporal, and Environmental. The Base metrics produce a score ranging from 0 to 10, which can then be modified by scoring the Temporal and Environmental metrics. A CVSS score is also represented as a vector string, a compressed textual representation of the values used to derive the score. Thus, CVSS works as a general measurement system for industries, organizations, and governments that need some vulnerability severity scores. Two common uses of CVSS are calculating the severity of vulnerabilities discovered on one’s systems and as a factor in prioritization of vulnerability remediation activities. The National Vulnerability Database (NVD) provides CVSS scores for almost all known vulnerabilities. The NVD supports both CVSS v2.0 and v3.X standards. NVD provides CVSS ‘base scores’ which represent the innate characteristics of each vulnerability. NVD does not currently provide ‘temporal scores’ (metrics that change over time due to events external to the vulnerability) or ‘environmental scores’ (scores customized to reflect the impact of the vulnerability on your organization). However, the NVD does supply a CVSS calculator for both CVSS v2 and v3 to allow you to add temporal and environmental score data.

104

S. Ghosh et al.

Fig. 3. Baseline accuracies for logistic regression (a) ROC curves for training and testing sets and (b) precision-recall curves for train and test sets respectively.

6.2

Experiments

In this subsection we outline the experiments and protocol that we have performed. The following experiments are performed. 1. Baseline: The baseline experiment is performed on limited manually labelled training data. This is referred to as model in Fig. 1. We have used logistic regression and bidirectional LSTM classifiers on this data to train models separately. 2. Semi-supervised Model: In this experiment we incorporate pseudolabelled data and add it to the limited set of manually labelled data. We have utilized the Word2Vec algorithm for features and trained a bidirectional LSTM model for classification. We iterate over this process multiple times as explained in the previous section.

6.3

Preprocessing

We use a few text preprocessing algorithms before feeding the CVE description as an input to our model. The following algorithms were used for preprocessing.

Semi-supervised Vulnerability Management

105

Fig. 4. Proposed semi-supervised model results. (a) ROC curves for training and testing sets and (b) precision-recall curves for train and test sets respectively. Table 1. Logistic regression Kolmogorov-Smirnov test results on test set. Events Count Binned

Lower limit Upper limit Non events Events dist Non events dist

Events cumm

Non events cumm

KS

131

201

(0.56, 0.878]

0.56

0.88

70

0.32

0.04

0.32

0.04

27.88

111

200

(0.379, 0.56]

0.38

0.56

89

0.27

0.06

0.60

0.10

49.65

81

200

(0.25, 0.379]

0.25

0.38

119

0.20

0.07

0.80

0.17

62.15

35

200

(0.16, 0.25]

0.16

0.25

165

0.09

0.10

0.88

0.28

60.44

19

200

(0.104, 0.16]

0.10

0.16

181

0.05

0.11

0.93

0.39

53.78

16

201

(0.0716, 0.104]

0.07

0.10

185

0.04

0.12

0.97

0.51

46.14

4

200

(0.0409, 0.0716]

0.04

0.07

196

0.01

0.12

0.98

0.63

34.85

0

87

(0.0286, 0.0409]

0.03

0.04

87

0.00

0.05

0.98

0.68

29.41

9

514

(0.01339, 0.0286] 0.01

0.03

505

0.02

0.32

1.00

1.00

0.00

1. Lemmatization: Refers to preprocessing with the use of a vocabulary and morphological analysis of words, normally aiming to remove inflectional endings only and to return the base or dictionary form of a word, which is known as the lemma. If confronted with the token saw, lemmatization would attempt to return either see or saw depending on whether the use of the token was as a verb or a noun. 2. Stopword removal: Sometimes, some extremely common words which would appear to be of little value in natural language processing tasks are excluded from the vocabulary entirely. These words are called stop words. Some examples of these words are the, a, an, in. These words are dropped from the corpus.

106

S. Ghosh et al.

Table 2. Baseline bidirectional LSTM Kolmogorov-Smirnov test results on test set. Events Count Binned

Lower limit Upper limit Non events Events dist Non events dist

Events cumm

Non events cumm

KS

101

109

(0.94, 0.999]

0.94

1.00

8

0.28

0.01

0.28

0.01

26.96

89

108

(0.76, 0.94]

0.76

0.94

19

0.25

0.03

0.53

0.04

49.07

73

108

(0.43, 0.76]

0.43

0.76

35

0.20

0.05

0.73

0.09

64.55

49

111

(0.233, 0.43]

0.23

0.43

62

0.14

0.09

0.87

0.17

69.66

25

108

(0.0825, 0.233]

0.08

0.23

83

0.07

0.11

0.94

0.28

65.22

6

32

(0.0367, 0.0825] 0.04

0.08

26

0.02

0.04

0.95

0.32

63.32

11

186

(0.0111, 0.0367] 0.01

0.04

175

0.03

0.24

0.98

0.56

42.37

3

109

(0.001, 0.0111]

0.00

0.01

106

0.01

0.15

0.99

0.71

28.66

2

109

(0.0002, 0.001]

0.00

0.00

107

0.01

0.15

1.00

0.85

14.54

Table 3. Semi-supervised bidirectional LSTM Kolmogorov-Smirnov test results on train set. Events Count Binned

Lower limit Upper limit Non events Events dist Non events dist

Events cumm

Non events cumm

KS

391

392

(0.99999, 1.0]

1.0

1.00

1

0.33

0.00

0.33

0.00

33.24

391

391

(0.99991, 0.99999]

1.0

1.00

0

0.33

0.00

0.67

0.00

66.52

379

392

(0.84471, 0.99991]

0.8

1.00

13

0.32

0.00

0.99

0.01

98.30

14

390

(6.6966e−05, 0.84471]

0.0

0.84

376

0.01

0.14

1.00

0.14

85.77

0

392

(6.9737e−06, 6.6966e−05] 0.0

0.00

392

0.00

0.14

1.00

0.29

71.47

0

391

(1.1623e−06, 6.9737e−06] 0.0

0.00

391

0.00

0.14

1.00

0.43

57.21

0

389

(2.0862e−07, 1.1623e−06] 0.0

0.00

389

0.00

0.14

1.00

0.57

43.01

0

296

(8.9407e−08, 2.0862e−07] 0.0

0.00

296

0.00

0.11

1.00

0.68

32.21

0

883

(−1e−05, 8.9407e−08]

0.00

883

0.00

0.32

1.00

1.00

0.00

6.4

0.0

Results

We have performed experiments on the three different models. For the first model, where tf-idf features were created and a logistic regression model was fit on the data, the train AUC, accuracy, precision and recall values are given as follows. Train AUC is 0.96 and the train accuracy is 93%. The precision and recall values for non-events (no threat) are 0.98 and 0.93, respectively. The precision and recall values for events (threats) are 0.72 and 0.92, respectively. The test AUC, accuracy, precision and recall values are displayed in Table 5. Test AUC is 0.87 and accuracy is 83 %. The precision and recall values for non-events are 0.95 and 0.86 respectively. The precision and recall values for events are 0.39 and 0.65, respectively. The confusion matrix on the test set is presented in Table 6. Out of the total 242 events the algorithm was able to predict correctly 158 events. The Kolmogorov-Smirnov (KS) test results on the test set is 62.15 and is presented in Table 1. The ROC and precision-recall curves for both train and test set are presented in Fig. 3. The second model, where word2vec embeddings are fed as input to a Bidirectional LSTM model, the train AUC, accuracy, precision and recall values are displayed in Table 7. Train AUC is 0.96 and the train accuracy is 89%. The precision and recall values for non-events are 0.94 and 0.90, respectively. The precision and recall values for events are 0.78 and 0.86, respectively. The test AUC, accuracy, precision and recall values are displayed in Table 8. Test AUC is 0.91 and test accuracy is 85%. The precision and recall values for non-events are

Semi-supervised Vulnerability Management

107

0.93 and 0.86, respectively. The precision and recall values for events are 0.7 and 0.82, respectively. The confusion matrix on the test set is presented in Table 9. Out of the 306 events the algorithm was able to correctly predict on 252 events. The KS test results on the test set is 69.66 and is presented in Table 2. The ROC and precision-recall curves for both train and test set are presented in Fig. 2. Table 4. Semi-supervised bidirectional LSTM Kolmogorov-Smirnov test results on test set. Events Count Binned

Lower limit Upper limit Non events Events dist Non events dist

Events cumm

Non events cumm

KS

97

97

(0.99999, 1.0]

1.00

1.00

0

0.33

0.00

0.33

0.00

32.99

93

99

(0.99958, 0.99999]

1.00

1.00

6

0.32

0.01

0.65

0.01

63.75

80

98

(0.30863, 0.99958]

0.31

1.00

18

0.27

0.03

0.92

0.03

88.34

19

98

(0.000276, 0.30863]

0.00

0.31

79

0.06

0.12

0.98

0.15

83.28

3

98

(.13262, 0.00027654]

0.00

0.00

95

0.01

0.14

0.99

0.29

70.46

2

98

(1.1802e−06, 1.3262e−05] 0.00

0.00

96

0.01

0.14

1.00

0.43

57.14

0

96

(2.3842e−07, 1.1802e−06] 0.00

0.00

96

0.00

0.14

1.00

0.57

43.15

0

94

(5.9605e−08, 2.3842e−07] 0.00

0.00

94

0.00

0.14

1.00

0.71

29.45

0

202

(−1e−05, 5.9605e−08]

0.00

202

0.00

0.29

1.00

1.00

0.00

0.00

Table 5. Logistic regression testing results. This table shows the AUC, accuracy, precision and recall values on the test set. Test AUC - 0.87

Test accuracy - 83%

Threshold - 0.5

Precision Recall

Testing Non event 0.95 Event 0.39

0.86 0.65

The third model, where word2vec embeddings are fed as input to a Bidirectional LSTM model and a semi-supervised training approach is incorporated, the train AUC, accuracy, precision and recall values are displayed in Table 10. Train AUC is 0.99 and the train accuracy is 99%. The precision and recall values for non-events are 0.99 and 1, respectively. The precision and recall values for events are 0.99 and 0.99, respectively. The KS test results on the train set is 98.30 and is presented in Table 3. The test AUC, accuracy, precision and recall values are displayed in Table 11. Test AUC is 0.98 and test accuracy is 95%. The precision and recall values for non-events are 0.97 and 0.96, respectively. The precision and recall values for events are 0.90 and 0.94, respectively. The confusion matrix on the test set is presented in Table 12. Out of the 284 events the algorithm was able to correctly predict on 266 events. The KS test results on the test set is 88.34 and is presented in Table 4. The ROC and precision-recall curves for both train and test set are presented in Fig. 4.

7

Analysis of Results

We analyse the results obtained in this section.

108

7.1

S. Ghosh et al.

Improvement in Results

It may be observed that the semi-supervised model outperforms the baseline models (both logistic regression and the LSTM models). If we compare Tables 10, 11 and 12 with Tables 5, 6, 7, 8 and 9, we may observe that we achieve better accuracies on the semi-supervised model compared to the baseline model. This is achieved due to added training data that is obtained by the self-training based pseudolabelling process. Table 6. Logistic regression confusion matrix on the test set. Test confusion matrix True non event True event Predicted non event Predicted event

1513

84

248

158

Table 7. Baseline bidirectional LSTM training results. This table shows the AUC, accuracy, precision and recall values on the train set. Train AUC - 0.96

Train accuracy - 89%

Threshold - 0.5

Precision Recall

Training Non event 0.94 Event 0.78

7.2

0.90 0.86

Word2Vec Embedding Space

The word2vec model was trained using the vulnerability dataset. We tried to visualise the words in our dataset in the embedding space. Every word in the CVE description is assigned a vector, which is an embedding of the word. In this embedding space, words that occur together in a context (or have similar meaning) should be close to each other. We analyze the closest words in the embedding space to openssl. Openssl is a toolkit for the Transport Layer Security (TLS) and Secure Sockets Layer (SSL) protocols. It is also a general-purpose cryptography library. The top words similar to openssl in the embedding space include moduli, nntplib, sapscore, cospace, poplib and DCB. Moduli is a file that is a component of the OpenSSH library. The moduli file contains prime numbers and generators for use by sshd in the Diffie-Hellman Group Exchange key exchange method. Nntplib module defines the class NNTP which implements the client side of the Network News Transfer Protocol. It can be used to implement a news reader or poster, or automated news processors. Sapscore is a SAP product that is affected by vulnerabilities that exploit authorization checks. Cospace is a component where the cisco meeting server takes place. WEBCUIF is a SAP Web UI Framework. Poplib module defines a class, POP3, which encapsulates a connection to a POP3 server and implements the protocol. Data center bridging

Semi-supervised Vulnerability Management

109

(DCB) is a collection of standards developed by a task force within the Institute of Electrical and Electronics Engineers (IEEE) 802.1 Working Group to create a converged data center network infrastructure using Ethernet as the unified fabric. This cluster of embeddings majorly comprise of network protocols and authentication checks. As a result of this the embeddings are close together. We analyze the closest words in the embedding space to ImageMagick. ImageMagick is a free and open-source cross-platform software suite for displaying, creating, converting, modifying, and editing raster images. The top words similar to ImageMagick in the embedding space include graphicsmagick, libtiff, ffmpeg, jasper, libdwarf, pillow, tcpdump, libarchive and potrace. GraphicsMagick is a fork of ImageMagick, emphasizing stability of both programming API and command-line options. It is now a separate utility from ImageMagick. LibTiff is a library used to handle a user’s needs for reading and writing TIFF images on 32-bit and 64-bit machines. FFMpeg is a complete, cross-platform solution to record, convert and stream audio and video. JasPer is a collection of software (i.e. a library and application programs) for the coding and manipulation of images. Libdwarf is a library and a set of command-line tools for reading and writing DWARF2 and later debugging information. Libdwarf handles the details of the actual format so coders can focus on the content. Python Imaging Library (expansion of PIL) is an image processing package for Python language. Pillow is a fork of PIL. Tcpdump is a data-network packet analyzer library that runs under a command line interface. Libarchive is a multi-format archive and compression library. It is also useful in compressing images. Potrace is a tool for tracing a bitmap, which means, transforming a bitmap into a smooth, scalable image. The input is a bitmap (PBM, PGM, PPM, or BMP format), and the output is one of several vector file formats. A typical use is to create SVG or PDF files from scanned data, such as company or university logos, handwritten notes, etc. This cluster of embeddings majorly comprise of image processing tools and it is evident that these are close together. Table 8. Baseline bidirectional LSTM testing results. This table shows the AUC, accuracy, precision and recall values on the test set. Test AUC - 0.91

Test accuracy - 85%

Threshold - 0.5

Precision Recall

Testing Non event 0.93 Event 0.7

0.86 0.82

Table 9. Baseline bidirectional LSTM confusion matrix on the test set. Test confusion matrix True non event True event Predicted Non Event

675

54

Predicted event

108

252

110

S. Ghosh et al.

Table 10. Semi-supervised bidirectional LSTM Training Results. This table shows the AUC, accuracy, precision and recall values on the train set. Train AUC - 0.99

Train accuracy - 99%

Threshold - 0.5

Precision Recall

Training Non event 0.99 Event 0.99

1 0.99

Table 11. Semi-supervised bidirectional LSTM testing results. This table shows the AUC, accuracy, precision and recall values on the test set. Test AUC - 0.98

Test accuracy - 95%

Threshold - 0.5

Precision Recall

Testing Non event 0.97 Event 0.90

0.96 0.94

We analyze the closest words in the embedding space to cipher. Ciphers, also called encryption algorithms, are systems for encrypting and decrypting data. The top words similar to cipher in the embedding space are bleichenbacher, pasting, encrypting, encryptupdate, xattr, readablestream. Bleichenbacher attack is applicable when key-exchange take place using RSA algorithm and the padding used is PKCS#1 v1.5. Pasting, also called cut-and-paste, one part of ciphertext is replaced by another ciphertext with known (or at least, known legible) plaintext, so the resulting message has a different meaning to the receiver of the encrypted message. It should be avoided by using authenticated encryption. Encrypting or encryption is the process of encoding information. This process converts the original representation of the information, known as plaintext, into an alternative form known as ciphertext. Encryptupdate is a module in Safenet ProtectToolkit that is used to encrypt data. Xattr is an extended attribute. These are name: value pairs associated permanently with files and directories, similar to the environment string associated with a process. If a linux kernel does not perform bounds checking on xattr in some situation, then a local attacker could possibly use this to expose sensitive information (kernel memory). ReadableStream is an interface of the Streams API that represents a readable stream of byte data. This cluster of embeddings majorly comprise of cypher and encryption methods and therefore are close together in the embedding space.

8

Implementation Details

The manually labelled train dataset contains 2726 samples and the test dataset contains 1363 samples. This test dataset is kept constant across all the iterations of the semi-supervised learning process. The models are implemented in Tensorflow 2.4 framework. The Word2vec + LSTM model is optimized using

Semi-supervised Vulnerability Management

111

Table 12. Semi-supervised bidirectional LSTM confusion matrix on the train set. Test confusion matrix True non event True event Predicted non event Predicted event

668

18

28

266

Adam Optimizer with initial learning rate 1e−3 and then reducing learning rate by the factor of 10 when the test loss has stopped reducing for 3 consecutive epochs. The model is trained for 50 epochs with batch size of 32. During the process of pseudolabelling, we sort the pseudolabelled data in order of the highest confidence of the model for non-events (less serious vulnerabilities). We select top 1000 non events in every iteration and repeat the process for 4 iterations. After that, we could not observe any further improvement in accuracies, so we terminate the process at that point.

9

Model Deployment

The model is deployed as a REST API service. The client side can hit an POST request with the CVE IDs as the parameter. The server will fetch the description of the CVE from a server stored local copy of NVD database. The model will then score the CVE after pre-processing of the input features (description of the CVE) and returns a CSV file containing the CVE IDs and their respective scores. Since the code for pre-processing and model training code is developed in python 3.7 the API is developed in Django 3.2.7 framework for code re-usability. 9.1

Architecture

Figure 5 illustrates the deployed system which has 3 major parts: 1) Local NVD Database, 2) Dashboard and 3) Online Learning. Local NVD Database. The system maintains a local copy of NVD Database in a relational database containing the CVE IDs and their description. The NVD Database is updated every two hours with new vulnerabilities so the system updates the local database everyday. Dashboard. A web app is designed for the input data pipeline. The user can provide a comma separated list of CVE IDs or upload a CSV file containing the list of CVE IDs. The system returns a CSV file containing the CVE Ids and their respective scores in a descending order of scores. The systems also show a dashboard of the CVE IDs in decreasing order of score and allows vulnerability management team to collaborate and distribute patching work among the team members. The dashboard also allows vulnerability management team to label a predicted vulnerability score as false and this feedback label is used for online learning of the model.

112

S. Ghosh et al.

Online Learning. From the dashboard the vulnerability management team can supply list of CVE IDs which are incorrectly classified. The system stores these CVE IDs in a database and use them for online learning. Once we have collected sufficient number of user feedback, we do two things, namely, 1) Re-label the data if the user feedback has any CVE-ID that overlaps with the training data, and 2) Update the model by retraining it on the extra data that is obtained. This way, we are able to automatically improve the model over time without any manual intervention. We also generate an internal report to account for the user feedback which gives us an idea on the performance of our vulnerability management system.

Fig. 5. System architecture

10

Conclusion

In this paper we presented a vulnerability management system, which utilizes the natural language description of a vulnerability and outputs a score which the seriousness and exploitability of the issue by an external actor. After the

Semi-supervised Vulnerability Management

113

vulnerabilities are ascertained by internal scanning of the computational infrastructure of the organization, our system takes in the CVE-IDs of the vulnerabilities that are returned by the scan. The system then runs it though the deep learning model and returns the score to the user on the web app dashboard that we have deployed. Our deployed system also contains the NVD database and updates it from time to time to reflect the new vulnerabilities that are added to the NVD regularly. In addition to this the system also contains a module to incorporate user feedback on the scores, so that the model can be updated automatically without any manual intervention. In future we will explore opportunities to deploy the system in the real world and access its performance and also distribute the system to other organizations which will help them prioritize vulnerability patching to save their organization from external entities that may harm their computing infrastructure.

References 1. Bagga, K.S., Beineke, L.W., Pippert, R.E., Lipman, M.J.: A classification scheme for vulnerability and reliability parameters of graphs. Math. Comput. Model. 17(11), 13–16 (1993) 2. Huang, G., Li, Y., Wang, Q., Ren, J., Cheng, Y., Zhao, X.: Automatic classification method for software vulnerability based on deep neural network. IEEE Access 7, 28291–28298 (2019) 3. Jin, S., Wang, Y., Cui, X., Yun, X.: A review of classification methods for network vulnerability. In: IEEE International Conference on Systems, Man and Cybernetics, pp. 1171–1175. IEEE (2009) 4. Liu, C., Li, J., Chen, X., Network vulnerability analysis using text mining. In: Asian Conference on Intelligent Information and Database Systems, pp. 274–283 (2012) 5. Shi, W., Gong, Y., Ding, C., Tao, Z.M., Zheng, N.: Transductive semi-supervised deep learning using min-max features. In: European Conference on Computer Vision (ECCV), pp. 299–315 (2018) 6. Wang, Y., Yun, X., Zhang, Y., Jin, S., Qiao, Y.: Research of network vulnerability analysis based on attack capability transfer. In: 2012 IEEE 12th International Conference on Computer and Information Technology, pp. 38–44 (2012) 7. Xie, Q., Luong, M.-T., Hovy, E., Le, Q.V.: Self-training with noisy student improves imagenet classification. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10687–10698 (2020)

Evaluation of Deep Learning Techniques in Human Activity Recognition Tiago Mendes2 and Nuno Pombo1,2(B) 1 2

Instituto de Telecomunica¸co ˜es,Covilh˜ a, Portugal Universidade da Beira Interior, Covilh˜ a, Portugal [email protected]

Abstract. The increasingly acceptance of Internet of Things (IoT) devices adversely contribute to the accumulation of massive amounts of data which challenges for the adoption of techniques capable to handle it. This paper first presents overall points of deep learning, and IoT principles. After that, this paper uses the human recognition activity scenario to evaluate two DL models: the Convolutional Neural Network (CNN), and the Recurrent Neural Network. At last, a benchmark with the state-of-the-art is presented. The main findings evidenced the suitability of the proposed model; the CNN performed a mean accuracy rate of 93%, and therefore it is likely to be embedded in an IoT device. There is room for improvements, namely, the ability to recognize additional human activities, and to include more testing scenarios. Keywords: Deep learning · Internet of things recognition · Prediction · Classification

1 1.1

· Human activity

Introduction Motivation

IoT devices have seen rapid growth in recent years into a plethora of applications and domains, usually under the umbrella of smart or intelligent systems (e.g. smart home, smart city, smart agriculture, smart industry, smart energy, ...). The increasingly sophistication of IoT solutions challenges in multiple dimensions such as security, reliability, efficiency, and data analysis [9,24], just to mention a few. These devices adversely contribute to the accumulation of massive amounts of data which makes debatable its analysis and the subsequent knowledge inference [17]. Hence, inclusion of deep learning (DL) technologies becomes mandatory to address the impact of huge and heterogeneous datasets generated by IoT devices [16,24]. 1.2

Objectives

The main objective of this project is to achieve results to help practitioners monitoring their patients, or people the lives of their relatives in need, with the implementation of DL and IoT regarding the human activity recognition. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023  K. Arai (Ed.): IntelliSys 2022, LNNS 542, pp. 114–123, 2023. https://doi.org/10.1007/978-3-031-16072-1_8

Evaluation of Deep Learning Techniques in Human Activity Recognition

115

The contributions of this paper has been summarized below: – We identified, and highlighted the key concepts of DL, and IoT. – We have designed a use case scenario, activity recognition, where DL and IoT could be of potential solution. – We have evaluated and analysed the performance of the presented system. – We have benchmark the main findings with the state-of-the-art. This paper is structured as follows: Sect. 2 introduces DL, and IoT. Section 3 provides the proposed model and details the key parts of the system, while Sect. 4 presents and discusses the experimental results and the performance evaluation. Finally, Sect. 5 concludes this present study.

2 2.1

Background Deep Learning

In recent years, DL has attracted many researchers and organizations, compared to traditional machine learning approaches [19]. We may enumerate some reasons for this recognition of the DL, such as: its ability of learning more abstract features and to handle large amount of data, along with its support for the transfer learning, and its accuracy [20]. Moreover, we may consider that the DL is bio-inspired since it consists of many layers of artificial neural networks containing activation functions that can be utilized to produce non-linear outputs as happens with the neuron structure of the human brain. 2.2

Deep Learning Architectures

DL is a subset of machine learning which may accommodate four learning techniques: (1) supervised, (2) unsupervised, (3) semi-supervised, and (4) reinforcement learning. The supervised learning requires that the data used to train the architecture is fully labelled. On the contrary, the absence of labelled data may lead to the adoption of unsupervised learning models aiming at to come up with a structure by extracting useful information [17]. In addition, when the data contains labelled and unlabelled data then semi-supervised techniques gains traction for its adoption. At last, reinforcement learning offers the ability to learn from its environment (by means of actions and rewards), enabling the model to act and behave smartly. On the one hand, supervised learning methods are generally supported by discriminative models, whereas the generative models are adequate for unsupervised learning approaches [13]. In other words, generative models are able to separate one class from another in a dataset using probability estimates and likelihood. Thus, the presence of outliers in the dataset is prone to reduce the models’ accuracy. On the other hand, discriminative models use conditional probability to make prediction and to learn boundaries among classes in a dataset. Congruently, hybrid approaches combine discriminative and generative models.

116

T. Mendes and N. Pombo

– Convolutional Neural Network (CNN): is a type of Deep Neural Network (DNN) which incorporates the back propagation algorithm for learning the receptive fields of simple units [13]. The CNN is detailed explained in the Methods section. – Deep Belief Network (DBN): is also a type of DNN which incorporates multiple layers of hidden units. In this model there are connections between the layers but not with the units of each layer [22]. – Long Short-Term Memory (LSTM): incorporates memory blocks in the recurrent hidden layer. These memory cells are capable to store the temporal states of the network which along with the gates offers the ability to control the flow of information. The LSTM is further detailed in the Methods section. – Recurrent Neural Network (RNN): is an extension of the Feed Forward Neural Network (FFNN). The recurrence lies in the fact that the RNN performs the same task for each element of a sequence, where the output is dependent on previous computations [18]. Complementary information on the RNN is provided in the Methods section. 2.3

Internet of Things

The IoT means smart environments as a result of linkage and connections among billions of different objects over the Internet. The past few decades have witnessed the prosperity and development of the IoT in which numerous related studies and concepts have emerged one after another. Therefore, the IoT encompasses heterogeneous platforms and devices and is presented in a multitude of scenarios such as education, industry, agriculture, healthcare, transportation, smart homes, and smart cities. As depicted in Fig. 1, authors in [7] presented multiple features required for a system or thing to be considered and IoT device, including: interconnection, programability, self-configurability, connection to Internet, embedded intelligence, unique identifier, interoperability, sensing actuation, and ubiquity. Furthermore, the correlation between IoT and DL was broadly addressed by either researchers or practitioners. Authors in [13] surveyed the application of DL for IoT different domains describing analytic models to infer knowledge from the processed data. In addition, authors [25] studied the use of DL in mobile and wireless environments. In [26] authors reviewed the literature covering DL techniques for big data feature learning, highlighting several techniques such as: DBN, CNN, and RNN. In [5], authors provided a survey focused the use of DL on smart city data, presenting multiple scenarios, and use cases.

3

Methods

Our experiments are focused on the human activity recognition by means of DL principles. In line with this, a data handler was programmed in Python language. This program implements a three-steps algorithm on the original dataset (as described in the subsection Dataset) which includes: loading, labelling, and

Evaluation of Deep Learning Techniques in Human Activity Recognition

117

Fig. 1. Suggested features of an IoT device. Adapted from [7].

segmentation into train and test set. The rationale for this approach lies in the fact that this study is focused merely in a subset of the existing information of the original dataset, namely, the one that is related with either accelerometer or gyroscope. Since these sensors are present in every modern smartphone then a scenario which includes they data is expected to be adequate to simulate and to evaluate a ubiquitous computerised model for human activity recognition. 3.1

Deep Learning Models

Our experiments are focused on the application of DL models, namely the CNN, and the RNN. Since the CNN is capable of automatically learning features from sequence data, and support multiple-variate data then it may represent a feasible method on time series problems. In fact, the CNN was increasingly adopted for image classification [21], and clustering [14], or object recognition within images [10], [23]. In addition, its classification ability paved the way to be implemented on labeling data [11], and knowledge discovering [28]. The CNN is a common neural network architecture which learns the features by applying a convolution operation in the embedding input representation. Thus, given a sentence S = {w1 , w2 , ..., wn }, the embedded input representations is S = {z1 , z2 , ..., zn }. The CNN contains a set of convolution filters applied to m continuous words to generate a new feature based on m − grams. Thus, for a given window of words zi:i+m−1 a new feature ci is obtained by a convolution

118

T. Mendes and N. Pombo

filter defined as follows [27]: ci = f (W.Zi:i+m−1 + b)

(1)

where b is a bias parameter, and f a nonlinear function. In line with this, a feature map c = {c1 , c2 , ..., cn−m+1 } is generated through the sequential convolution filter {z1:m , c2:m+1 , ..., zn−m+1:n } In this study the ReLu [4] activation function was adopted, and the CNN was implemented using three fully-connected layers which contains two convolutional layers, and one pooling layer. The outputs produced by the first convolutional layer represent the input of the pooling layer. Congruently, the output generated by this layer will represent the input of the subsequent convolutional layer. As a result, a feature vector is generated representing the cornerstone for the classification. The RNN is a temporal-based neural networks that capture long sequences of inputs using the internal memory [27]. In this study, the RNN was implemented following the LSTM model. Its architecture may be described as a cell state; ck that serves as memory, and information carrier. Moreover, the information to add or remove from the cell state is carefully regulated by structures called gates, namely: forget fk , input ik , and output ok gate, at time k. Consider hk the hidden state also at time k, then: fk = σ(W f .[hk−1 , ik ] + bf )

(2)

ik = σ(W i .[hk−1 , ik ] + bi )

(3)

ck = fk  ck−1 + ik  tanh(W c .[hk−1 , ik ] + bc )

(4)

o

o

ok = σ(W .[hk−1 , ik ] + b )

(5)

hk = ok  tanh(ck )

(6)

Assuming that W and b are learners parameters and . denotes superscripts including gates f , i, and o [8] which are usually composed of a sigmoid neural network layer. In addition, the  denotes the element wise multiplication. At last, we implemented the hyperbolic tangent as activation functions in the nonlinear neural network layers. The notion behind the LSTM is that uses a set of gates (input, forget, output) to supervise access to the memory cells, protecting them from perturbation by unrelated inputs [3]. In addition, the RNN uses loops in their connection layer and extracts features from inputs stored earlier. Then, outputs are obtained due to recurrent connections existing between hidden layers. When compared with the CNN, the main difference is the ability to process temporal information or data that comes in sequences. 3.2

Dataset

Both DL models were coded in Python along with TensorFlow and Keras libraries. The UCI Human Activity Recognition Using Smartphones Data Set

Evaluation of Deep Learning Techniques in Human Activity Recognition

119

[2] was used in our experiments1 . This dataset presents six different classes of daily human activities, such as: sitting, laying, standing, walking, and walking downstairs or upstairs. Two sensor signals were extracted: the accelerometer, and the gyroscope. The dataset was resized to 10.000 samples in which 70%, i.e. 7.000 samples composed the training set, and 3.000 samples are selected as test set.

4

Results and Discussion

During our experiments, epochs, initial batch size, and number of neurons were set as 30, 16, and 32 respectively. The criteria to tune the number of neurons may rely in a multitude of features such as: size and nature of data sets, data types, validation methods, and accuracy. Thus, several different tests were made in which the most successful are the ones that the number of epochs increased significantly. In addition, in each iteration we monitored the processing time, and the accuracy. Table 1 shows the confusion matrix for the CNN prediction segmented by classes: sitting (S), laying (L), standing (ST), walking (W), and walking downstairs (WD) or upstairs (WU). Congruently, Table 2 presents the confusion matrix for the RNN+LSTM prediction. On the one hand, as presented in Table 1 the sitting is recognized correctly 82%, laying 98%, standing 94%, walking 94%, walking downstairs 98%, and waling upstairs 90%. On the other hand, as presented in Table 2 the sitting is recognized correctly 86%, laying 95%, standing 77%, walking 92%, walking downstairs 41%, and walking upstairs 94%. Therefore, the proposed system performs well with accuracy rate ranging between 82% and 98%, except on the recognition of walking downstairs (41%), and standing (77%) using the RNN+LSTM prediction. Table 1. Confusion matrix for the CNN prediction. Activity S S

L

528

ST

W

18

L

27 2788

3

0

0

21

ST

15

15 2131

0

0

65

W

2

0 382

17

4

0

WD

3

1

0

WU

49

13

20

9

WD WU

15

0

0 304 4

2

73

3 752

In addition, Fig. 2 provides the performance benchmark between the CNN and RNN+LSTM prediction. 1

https://archive.ics.uci.edu/ml/datasets/human+activity+recognition+using+smart phones.

120

T. Mendes and N. Pombo Table 2. Confusion matrix for the RNN+LSTM prediction. Activity S S L

L

422

ST W

WD WU

7

62

0

0

0

0 510

12

0

0

15

0 407

ST

121

4

0

0

W

0

0

2 455

19

20

WD

0

0

0

0 173

247

WU

0

0

0

12

15

444

Fig. 2. System accuracy: comparison between CNN and RNN+LSTM.

We compared our system with four proposals identified in the literature. Authors in [12], proposed to recognize human activities using acceleration data generated by a mobile phone. This system was evaluated on real-world conditions reaching 91% as the overall accuracy. In addition, authors in [6] evaluated their system on multiple datasets obtaining an accuracy of 98%. Moreover, in [15] authors obtained 92% as overall accuracy. At last in [12], authors proposed a model based on three accelerometers and two microphones placed on different body locations obtaining an accuracy of 84%. In line with this, the CNN prediction, with a mean accuracy rate of 93%, outperforms the observed studies with the exception of [6]. Contrarily, the RNN+LSTM prediction, with a mean accuracy rate of 81%, underperforms the observed literature. Note that the purpose of these experiments is not to obtain a system that outperforms the human activity recognition state of-the-art, but to evidence that the proposed computerised model may improve the recognition performance with respect to the baseline. In addition, the proposed model was designed keeping in mind it suitability to be coupled with an IoT device. Despite its contributions, this study has certain limitations. One the one hand this study limits its scope only to DL and does not discuss on traditional machine

Evaluation of Deep Learning Techniques in Human Activity Recognition

121

learning algorithms with respect to IoT. On the other hand, the benchmark of the proposed system jointly the state-of-the-art is not representative. In addition, this comparison must be interpreted with some caution due to the fact that the included studies are different on either methods or experiments settings. The main goal of this study was achieved since our experiments revealed the suitability of the proposed method for the human activity recognition.

5

Conclusions

The goal of this work is to propose and to evaluate a computerised system based on DL and IoT capable to support practitioners monitoring patients, and/or elders by means human activity recognition. In line with this two DL models, the CNN, and the RNN combined with the LSTM were designed and evaluated. It is concludable that the performance of the proposed system matches the approaches existing in the literature. As future work we are going to enhance the proposed system with (1) the ability to recognize additional human activities, (2) embedded the computerised system into a IoT device, and (3) to include more testing scenarios to support personalised recognition according to the users own daily living environments. Acknowledgment. This work is supported by Centro-01-0247-FEDER-072632“NomaVoy - Nomad Voyager”, co-financed by the Portugal 2020 Program (PT 2020), in the framework of the Regional Operational Program of the Center (CENTRO 2020) and the European Union through the Fundo Europeu de Desenvolvimento Regional (FEDER).

References 1. Towards a definition of the internet of things (IoT) (2015) 2. Anguita, D., Ghio, A., Oneto, L., Parra, X., Reyes-Ortiz, J.L.: Human activity recognition on smartphones using a multiclass hardware-friendly support vector machine. In: Bravo, J., Herv´ as, R., Rodr´ıguez, M. (eds.) IWAAL 2012. LNCS, vol. 7657, pp. 216–223. Springer, Heidelberg (2012). https://doi.org/10.1007/9783-642-35395-6 30 3. Atitallah, S.B., Driss, M., Boulila, W., Gh´ezala, H.B.: Leveraging deep learning and IoT big data analytics to support the smart cities development: review and future directions. Comput. Sci. Rev. 38, 100303 (2020) 4. Boob, D., Dey, S.S., Lan, G.: Complexity of training ReLU neural network. Discrete Optim. 100620 (2020) 5. Chen, Q., et al.: A survey on an emerging area: deep learning for smart city data. IEEE Trans. Emerg. Top. Comput. Intell. 3(5), 392–410 (2019) 6. Dohnalek, P., Gajdoˇs, P., Peterek, T.: Human activity recognition: classifier performance evaluation on multiple datasets. J. Vibroeng. 16(3), 1523–1534 (2014) 7. Eceiza, M., Flores, J.L., Iturbe, M.: Fuzzing the internet of things: a review on the techniques and challenges for efficient vulnerability discovery in embedded systems. IEEE Internet Things J. 8(13), 10390–10411 (2021)

122

T. Mendes and N. Pombo

8. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997) 9. In´ acio, P.R.M., Duarte, A., Fazendeiro, P., Pombo, N. (eds.): 5th EAI International Conference on IoT Technologies for HealthCare. Springer, Cham (2020). https:// doi.org/10.1007/978-3-030-30335-8 10. Li, C., Qu, Z., Wang, S., Liu, L.: A method of cross-layer fusion multi-object detection and recognition based on improved faster R-CNN model in complex traffic environment. Pattern Recogn. Lett. 145, 127–134 (2021) 11. Kolisnik, B., Hogan, I., Zulkernine, F.: Condition-CNN: a hierarchical multi-label fashion image classification model. Expert Syst. Appl. 182, 115195 (2021) 12. Lukowicz, P., et al.: Recognizing workshop activity using body worn microphones and accelerometers. In: Ferscha, A., Mattern, F. (eds.) Pervasive 2004. LNCS, vol. 3001, pp. 18–32. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-54024646-6 2 13. Mohammadi, M., Al-Fuqaha, A., Sorour, S., Guizani, M.: Deep learning for IoT big data and streaming analytics: a survey. IEEE Commun. Surv. Tutor. 20(4), 2923–2960 (2018) 14. Nguyen, C.T., Khuong, V.T.M., Nguyen, H.T., Nakagawa, M.: CNN based spatial classification features for clustering offline handwritten mathematical expressions. Pattern Recogn. Lett. 131, 113–120 (2020) 15. Olgu´ın, D., Pentland, S.: Human activity recognition: accuracy across common locations for wearable sensors, January 2006 16. Pombo, N., Bousson, K., Ara´ ujo, P., Viana, J.: Medical decision-making inspired from aerospace multisensor data fusion concepts. Inform. Health Soc. Care 40(3), 185–197 (2014) 17. Pombo, N., Garcia, N., Bousson, K.: Machine learning approaches to automated medical decision support systems. In: Pandian, V. (ed.) Handbook of Research on Artificial Intelligence Techniques and Algorithms, pp. 183–203. IGI Global, Hershey (2015) 18. Shao, H.: Delay-dependent stability for recurrent neural networks with timevarying delays. IEEE Trans. Neural Netw. 19(9), 1647–1651 (2008) 19. Wang, C., Dong, S., Zhao, X., Papanastasiou, G., Zhang, H., Yang, G.: SaliencyGAN: deep learning semisupervised salient object detection in the fog of IoT. IEEE Trans. Industr. Inf. 16(4), 2667–2676 (2020) 20. Wang, J., Chen, Y., Hao, S., Peng, X., Hu, L.: Deep learning for sensor-based activity recognition: a survey. Pattern Recogn. Lett. 119, 3–11 (2019). Deep Learning for Pattern Recognition 21. Wang, W., Yang, Y.: Development of convolutional neural network and its application in image classification: a survey. Opt. Eng. 58(04), 1 (2019) 22. Wang, Z., Zeng, Y., Liu, Y., Li, D.: Deep belief network integrating improved kernel-based extreme learning machine for network intrusion detection. IEEE Access 9, 16062–16091 (2021) 23. Hao, W., Bie, R., Guo, J., Meng, X., Zhang, C.: CNN refinement based object recognition through optimized segmentation. Optik 150, 76–82 (2017) 24. Xu, L., Pombo, N.: Human behavior prediction though noninvasive and privacypreserving internet of things (IoT) assisted monitoring. In: 2019 IEEE 5th World Forum on Internet of Things (WF-IoT), pp. 773–777, April 2019 25. Zhang, C., Patras, P., Haddadi, H.: Deep learning in mobile and wireless networking: a survey. IEEE Commun. Surv. Tutor. 21(3), 2224–2287 (2019) 26. Zhang, Q., Yang, L.T., Chen, Z., Li, P.: A survey on deep learning for big data. Inf. Fusion 42, 146–157 (2018)

Evaluation of Deep Learning Techniques in Human Activity Recognition

123

27. Zhang, Y., Lin, H., Yang, Z., Wang, J., Sun, Y., Bo, X., Zhao, Z.: Neural networkbased approaches for biomedical relation classification: a review. J. Biomed. Inform. 99, 103294 (2019) 28. Zhang, Y., Qiao, S., Zeng, Y., Gao, D., Han, N., Zhou, J.: CAE-CNN: predicting transcription factor binding site with convolutional autoencoder and convolutional neural network. Expert Syst. Appl. 183, 115404 (2021)

Siamese Neural Network for Labeling Severity of Ulcerative Colitis Video Colonoscopy: A Thick Data Approach Jinan Fiaidhi1(B) , Sabah Mohammed1 , and Petros Zezos2 1 Department of Computer Science, Lakehead University, Thunder Bay, ON, Canada

{jfiaidhi,mohammed}@lakeheadu.ca

2 Northern Ontario School of Medicine, Thunder Bay, ON, Canada

[email protected]

Abstract. Research on learning automatically medical image descriptors requires very large sample training data along with complex deep learning neural networks models. This is a challenging requirement for many medical specialties. However, new research trends indicate that Siamese neural network can be trained with small samples and still provide acceptable accuracy, but this yet to be demonstrated for medical practices like identifying ulcerative colitis severity in video colonoscopy. In this research paper, we are introducing a Siamese neural model that uses triplet loss function that enables the gastroenterologist inject anchor images that can correctly identify the ulcerative colitis severity classes and we are using for this purpose the Mayo Clinic Ulcerative Colitis Endoscopic Scoring scale. The Python prototype demonstrates performance accuracy of 70% in average by only training the model with one video of 75 frames along with 24 anchor images. This research is part of our ongoing effort to employ more thick data techniques for enhancing the accuracy and interpretations of deep learning analytics by incorporating more heuristics from the experts. We are following this attempt by other validation methods including the YOLO visual annotation and additive image augmentations. Keywords: Siamese neural network · Triplet loss function · Video colonoscopy · Few shots learning · Thick data analytics

1 Introduction Colonoscopy is a widely performed procedure that provides gastroenterologists with more accurate, effective, and reliable diagnoses as well as to facilitate therapeutic endoscopic interventions when needed. However, endoscopic video navigation of gastrointestinal tract disorders like ulcerative colitis and other gastrointestinal tract diseases and lesions is largely a manual process requiring highly experienced gastroenterologist. Because of the experience gap, considerable suspicious lesions can be missed and thus there is a great necessity to explore new methods of identifying these lesions automatically and if possible in real time [1]. Some new methods suggest using additional © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 K. Arai (Ed.): IntelliSys 2022, LNNS 542, pp. 124–135, 2023. https://doi.org/10.1007/978-3-031-16072-1_9

Siamese Neural Network for Labeling Severity

125

modalities including contrast-enhancement techniques, external trackers and magnifying observation technologies [2]. Nevertheless, these methods remain to prove their usefulness to provide low-cost solution to the next generation of context-aware navigated endoscopy. Although the normal (e.g. white-light) colonoscopy is affected by a high miss rate of certain lesions among individual endoscopists, techniques that uses machine learning (ML) and artificial intelligence (AI) proved to reduce this performance variability [3, 4]. The employment of smart detection algorithms of the added AI/ML components usually uses a neural network trained based on annotated remarks by the expert which are able to detect and locate suspicious lesions, such those provided by Table 1. Table 1. Suspicious lesions as detected By AI/ML algorithms Lesion type

Sample image

Pedunculated Polyp

Depressed Polyp

Sessile serrated lesion

Ulcerative colitis

Although the application of ML/AI algorithms on the collected colonoscopy videos provided useful computer aided lesions detection tools, the neural networks used for this purpose require huge annotated samples of videos and lesion images to reach a

126

J. Fiaidhi et al.

robust detection accuracy performance. Examples of such neural networks used in video frames classifications Faster R-CNN with VGG16, DarkNet 53, ImageNet and ResNet 152 [5–8]. The VGG-16, for example, uses 1.2 million training images, 50,000 validation images, and 150,000 testing images to achieve 92.7% accuracy on the top-5 frames of the ImageNet repository [5]. However, the availability of extremely large annotated images is a common challenge in the medical paradigm [9]. Several approaches have been proposed in the last decade to boost the accuracy given as small training samples. Figure 1 provides an overview on the existing few-shot methods. The optimization methods attempt to reduce the number of parameters to optimize the learning network, with only a few steps. However, they do not necessarily reduce the amount of training data required for image classification. This is the basic idea of Optimization-based algorithms are the Model-Agnostic Meta-Learning and the LSTM [10]. With image embedding methods, the learning network is modeled as Message Passing Neural Network where the nodes of the graph are the images to be classified. The algorithms attempt to enhance the labeled nodes by associating the image embeddings using different techniques like the UNet, SqueezeNet and the Inception Network [11]. In image augmentation, the methods attempt to create an auto labeling encoder and augment samples of newly labeled images to training space [12]. However, the thick data methods attempt to strengthen the image classification process for a given small training sample by considering the small sample as anchor images and let a Siamese Neural Network to set the adjust the weights. The weights can be used by a similar neural network to predict the classes of the new images. The accuracy of classification can be enhanced by injecting more heuristics about the images classes as well as using transfer learning to adjust the weights [13].

Fig. 1. Notable few shots learning methods applied to image classification.

Siamese Neural Network for Labeling Severity

127

In this article, we are proposing a fully automated system based on thick data methods to predict the Ulcerative Colitis endoscopic severity according to the Mayo Endoscopic Score from raw colonoscopy videos based on multi-label Siamese neural network. Our proposed method mimics the assessment done in practice by a gastroenterologist that is, traversing the whole colonoscopy video, identifying visually informative regions and computing an overall Mayo Clinic Endoscopic Scores. Figure 2 illustrates the basic building of our proposed system. Moreover, we used the YOLOv3 algorithm [14] trained on the annotated UC frames that were labeled according to the Mayo endoscopic score. The YOLO (You Only Look Once) algorithm is a visual object detection algorithm that is different from the region-based algorithms, where a single Siamese neural network is employed to predict the bounding boxes and the class probabilities for these boxes.

Fig. 2. Thick data approach for classifying colonoscopy videos based on few annotated video shots.

2 The UC Video Scoring Method Our method will use the Mayo endoscopic score (MES or MC for short) to annotate the colonoscopy video frames as it remains the most common endoscopic index used in clinical trials and in clinical practice [15]. The MC scoring provides a simple heuristics for rating the endoscopic severity of ulcerative colitis on scale of 4 as illustrated by Fig. 3. However, incorporating the MC heuristic for differentiating between the ulcerative colitis severity types and learn it from small sample of colon frames requires the use of special sort of neural network such as the Siamese network (SNN). However, SNN uses three different loss functions:

128

J. Fiaidhi et al.

• Contrastive Loss - Requires two input frames (Anchor frame and Test frame) • Triplet Loss - Requires 3 inputs (Anchor frame, Positive frame and Negative frame). • Quadruplet Loss - Requires 4 inputs (Anchor frame, Positive frame and two Negative frames). Using the contrastive loss is sensitive to calibration and it is quite impractical for distinguishing between the different images types with slight differences as those for the ulcerative colitis [16]. However, the calibration requirement can be minimized by providing reference or anchor images as vetted by the expert as well as the small labeled sample that is used for training. In this case triplets for training seem more appropriate as the quadruplet provide more bias measures to the negative samples. Figure 4 illustrates the triplet loss SNN chosen for learning the UC severity classes.

Fig. 3. Mayo clinic ulcerative colitis scoring (MC0: normal, MC1: mild, MC2: moderate and MC3: severe).

Siamese Neural Network for Labeling Severity

129

Fig. 4. SNN for learning UC severity classes.

The triplet loss function used for ranking the UC severity classes can be defined as follows: L = max(d(a, p) − d(a, n) + margin, 0) where “a” is the anchor image and “p” & “n” are the positive and negative images. The “d” represents the similarity score and the margin defines how far away the dissimilarities should be. By using loss ranking function we can calculate the gradients and update the weights and biases of the SNN. However, before defining our SNN and the prototype in Python, it is important that we decide on the training dataset that will be used in training the SNN; related to UC video dataset, the notable KVASIR1 Dataset stand unique in its resolution quality and the focused annotation provided. KVASIR video data is collected using endoscopic equipment at Vestre Viken Health Trust (VV) in Norway and the frames are carefully annotated by one or more medical experts from VV and the Cancer Registry of Norway (CRN) [17]. In prototyping our method, we decided to use the TensorFlow 1.14 and its Keras deep learning API as they can provide the primitives and utilities for defining our SNN. However, our approach starts learning with few shots and that means we can start with one annotated video and few anchor images provided by the expert. In this direction we will need first to segment the training video provided into frames. This can be done using the following code segment:

1 https://datasets.simula.no/hyper-kvasir/

130

J. Fiaidhi et al.

Moreover, we will need to store and read the training annotation of each frame of the video and the anchor images as follows: data = pd.read_csv(F : //VideoAnalysis/KV_Mapping.csv). We will need also to construct few utility functions such as the followings:

The SNN model is a sequential one that have the following structure:

Siamese Neural Network for Labeling Severity

131

The SNN layers are followed by one more layer for computing the induced distance metric between each Siamese twin frames, which is given to a single sigmoid output unit after that. The MC_DistanceLayer code snippet illustrates how this layer can be set:

132

J. Fiaidhi et al.

All what we need to finish our prototyping is to save the trained model and test its prediction performance. The following code snippet illustrates this part of the process:

The prediction performance results are shown in Fig. 5 where the validation accuracy reached 70%. This is result is encouraging when trained with one video of 75 frames and provided with a set of six frames on each MC category.

Siamese Neural Network for Labeling Severity

133

Fig. 5. The accuracy of the proposed triplet loss SNN on predicting UC severity classes.

3 Discussion and Conclusions The SNN model and the triplet loss function for UC endoscopic severity scoring has been introduced in this paper. The model describes a new way to inject the heuristics of the gastroenterologist through the anchor images provided. The model shows promising accuracy results when trained on one video with 75 frames and 24 anchor images and tested on small sample of unseen image frames with average accuracy of 70%. However, we are conducting more intensive testing experiments to include all the frames provided by the KVASIR 8000 UC images and variety of colon annotated videos. We are intending to publish it in a follow-up journal article. Moreover, we started to incorporate the

Fig. 6. An attempt to use YOLO algorithm for classifying UC severity.

134

J. Fiaidhi et al.

YOLO object detection algorithm trained on the KVASIR frames based on the approach suggested by [18]. However, our current programming development managed to annotate the UC severity class of a video frame only and we are working to extend it to identify other sub regions of interest. Figure 6 illustrates the first version of our YOLO for UC severity classification. It is also important to mention that this research is part of the ongoing research to incorporate more thick data and embed more expert heuristics in data analytics. We use this approach previously with classifying CT-Scans of COVID-19 and managed to have much robust accuracy that the other techniques [19]. We are considering incorporating transfer learning to enhance the weights of our SNN model and the enhanced YOLO model when additional training data become available. In this direction we are in the process of modifying the model proposed by Heidari et al. [20]. In addition we are investigating variety of meaningful image augmentation techniques that can contribute to the segmentation of UC like thresholding and contouring (see Fig. 7 as an example).

Fig. 7. Applying two image augmentation for UC detection (thresholding and contouring)

Acknowledgments. This research is funded by the first author NSERC DDG- DDG-2020–00037 and first and second author MITACS Accelerates Grant IT22305 of 2021.

References 1. Yamada, M., et al.: Development of a real-time endoscopic image diagnosis support system using deep learning technology in colonoscopy. Sci. Rep. 9(1), 14465 (2019) 2. Fu, Z., et al.: The future of endoscopic navigation: a review of advanced endoscopic vision technology. IEEE Access 9, 41144–41167 (2021) 3. Xiaobei, L., et al.: Artificial intelligence− enhanced white-light colonoscopy with attention guidance predicts colorectal cancer invasion depth. Gastrointest. Endosc. 94(3), 627–638.e1 (2021)

Siamese Neural Network for Labeling Severity

135

4. Hassan, C., et al.: New artificial intelligence system: first validation study versus experienced endoscopists for colorectal polyp detection. Gut 69(5), 799–800 (2020) 5. Guo, X., Zhang, N., Guo, J., Zhang, H., Hao, Y., Hang, J.: Automated polyp segmentation for colonoscopy images: a method based on convolutional neural networks and ensemble learning. Med. Phys. 46(12), 5666–5676 (2019) 6. Li, K., et al.: Colonoscopy polyp detection and classification: dataset creation and comparative evaluations. PLOS ONE 16(8), e0255809 (2021) 7. Urban, G., et al.: Deep learning localizes and identifies polyps in real time with 96% accuracy in screening colonoscopy. Gastroenterology 155(4), 1069–1078 (2018) 8. Yang, Y.J., et al.: Automated classification of colorectal neoplasms in white-light colonoscopy images via deep learning. J. Clin. Med. 9(5), 1593 (2020) 9. Medela, A., et al.: Few shot learning in histopathological images: reducing the need of labeled data on biological datasets. In: 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), pp. 1860–1864. IEEE (2019) 10. Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: International Conference on Machine Learning, PMLR, pp. 1126–1135. (2017) 11. Srinivasan, A., Bharadwaj, A., Sathyan, M., Natarajan, S.: Optimization of image embeddings for few shot learning. arXiv preprint arXiv:2004.02034 (2020) 12. Jadon, S.: An overview of deep learning architectures in few-shot learning domain. arXiv preprint arXiv:2008.06365 (2020) 13. Fiaidhi, J., Zezos, P., Mohammed, S.: Thick data analytics for rating ulcerative colitis severity using small endoscopy image sample. In: IEEE Big Data 2021 Conference 15–19 Dec 2021 14. Du, J.: Understanding of object detection based on CNN family and YOLO. J. Phys.: Conf. Ser. 1004(1), 012029 (2018) 15. Sharara, A.I., Malaeb, M., Lenfant, M., Ferrante, M.: Assessment of endoscopic disease activity in ulcerative colitis: is simplicity the ultimate sophistication? Inflamm. Intest. Dis. 7(1), 7–12 (2022) 16. Hoffer, E., Ailon, N.: Deep metric learning using triplet network. In: Feragen, A., Pelillo, M., Loog, M. (eds.) SIMBAD 2015. LNCS, vol. 9370, pp. 84–92. Springer, Cham (2015). https:// doi.org/10.1007/978-3-319-24261-3_7 17. Pogorelov, K., et al.: Kvasir: A multi-class image dataset for computer aided gastrointestinal disease detection. In: Proceedings of the 8th ACM on Multimedia Systems Conference, pp. 164–169 (2017) 18. Anton, M.: TrainYourOwnYOLO: Building a Custom Object Detector from Scratch, Github Repo. https://github.com/AntonMu/TrainYourOwnYOLO (2019) 19. Sawyer, D., Fiaidhi, J., Mohammed, S.: Few shot learning of covid-19 classification based on sequential and pretrained models: a thick data approach. In: 2021 IEEE 45th Annual Computers, Software, and Applications Conference (COMPSAC), pp. 1832–1836. IEEE (2021) 20. Heidari, M., Fouladi-Ghaleh, K.: Using Siamese Networks with Transfer Learning for Face Recognition on Small-Samples Datasets. In: 2020 International Conference on Machine Vision and Image Processing (MVIP), pp. 1–4. IEEE (2020)

Self-supervised Contrastive Learning for Predicting Game Strategies Young Jae Lee, Insung Baek, Uk Jo, Jaehoon Kim, Jinsoo Bae, Keewon Jeong, and Seoung Bum Kim(B) Industrial and Management Engineering, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul 02841, Republic of Korea {jae601,insung_baek01,ukjo,jhoon0418,wlstn215,dia517, sbkim1}@korea.ac.kr

Abstract. Many games enjoyed by players primarily consist of a matching system that allows the player to cooperate or compete with other players with similar scores. However, the method of matching only the play score can easily lose interest because it does not consider the opponent’s playstyle or strategy. In this study, we propose a self-supervised contrastive learning framework that can enhance the understanding of game replay data to create a more sophisticated matching system. We use actor-critic-based reinforcement learning agents to collect many replay data. We define a positive pair and negative examples to perform contrastive learning. Positive pair is defined by sampling from the frames of the same replay data, otherwise negatives. To evaluate the performance of the proposed framework, we use Facebook ELF, a real-time strategy game, to collect replay data and extract data features from pre-trained neural networks. Furthermore, we apply k-means clustering with the extracted features to visually demonstrate that different play patterns and proficiencies can be clustered appropriately. We present our clustering results on replay data and show that the proposed framework understands the nature of the data with consecutive frames. Keywords: Game matching system · Reinforcement learning · Self-supervised contrastive learning

1 Introduction Game is a system that consists of various contents with specific rules for winning or losing to capture the attention of players [1]. Many games are structured so that players can cooperate or compete with other players, and they usually use a matching system to enjoy various contents in a game. For example, the real-time strategy (RTS) game StarCraft II [2] and the soccer game [3] use a matching system to team up or compete with others. The common goal of the existing matching system is based on the results of the game scores that players have earned by playing so far. However, a system that only matches game scores cannot properly understand the opponent’s playstyle or strategy. A system that considers only a limited set of game elements makes it difficult to achieve © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 K. Arai (Ed.): IntelliSys 2022, LNNS 542, pp. 136–147, 2023. https://doi.org/10.1007/978-3-031-16072-1_10

Self-supervised Contrastive Learning

137

the desired play because there can be large differences in level between players even with similar scores. Therefore, to play the game fairly, elements such as play style and strategy must be included in the matching system. Recently, RTS games continue to arouse interest as a popular research area among artificial intelligence (AI) researchers. In particular, RTS games, which are complex, and partially observable with dynamic environments, have been actively studied to analyze games and model players with various AI approaches [4]. Facebook ELF is a representative RTS game on a lightweight and flexible research platform that applies deep reinforcement learning (DRL) algorithms to model players. This game allows us to readily change the play pattern and difficulty of the opponent (e.g., computer). Under these conditions, one can model multiple RL players by changing the opponent’s build commands, strategies, and difficulty. In addition, we can collect replay data from trained RL players of different levels. Therefore, if we can fully utilize replay data, we can create a new matching system that considers not only game scores, but also the play pattern, strategy and proficiency. In this study, we propose a framework based on self-supervised contrastive learning that can enhance the understanding of game replay data to create a sophisticated matching system. We apply an actor-critic-based RL algorithm to construct multiple scenarios that consider different play patterns, strategies, and proficiencies in the Facebook ELF environment. We collect replay data from trained RL agents of different levels. The collected replay data are then trained by a modified momentum contrast (MoCo) approach using a convolutional neural network (CNN) encoder. We define a positive pair and negative examples to perform contrastive learning. Positive pair is defined by sampling from the frames of the same replay data, otherwise negatives. To evaluate the performance of the proposed framework, we extract data features from a pre-trained CNN encoder and perform k-means clustering. We provide a visualization of the different play patterns and proficiencies to check the appropriateness of clustering results. We hypothesize that if we learn a good representation of the collected replay data, the k-means clustering performance would improve as well. The clustering results on replay data show that the proposed framework properly reflect the nature of the data with consecutive frames. The remainder of this paper is organized as follows. Section 2 provides a review of recent advances research. Section 3 illustrates the details of the proposed framework. Sections 4 and 5 describe the results of clustering experiments on different replay data, and present conclusions and future research directions.

2 Related Works 2.1 Convolutional Neural Networks In recent years, deep learning has become a standard learning architecture in the field of computer vision and natural language processing. In particular, CNN models achieve excellent performance in image-based tasks such as image classification [5], object detection [6], instance segmentation [7], and video prediction [8]. The core of CNNs is to learn hierarchical representations of features while preserving spatial and local information from raw input data. Convolutional layers can reduce data size by applying filtering techniques to transform the local part of the data into a single representative

138

Y. J. Lee et al.

scalar value. That is, a stack of convolutional layers builds high-level features from low-level features such as image pixels. CNNs, which can retain spatial information, are actively used in image-based game AI research. For example, the classic game Atari 2600 outperformed humans by training CNN-based agents based on DRL [9]. In StarCraft II, with DRL, CNN-based agents trained a policy to play multiple mini-games or full games [2, 10]. Even Facebook ELF, which is lighter than StarCraft II, used several DRL algorithms to model CNN-based agents. These studies are the basis for extending CNN applications from static raw images to multi-frame images. 2.2 Self-supervised Contrastive Learning In the field of computer vision, self-supervised learning has been introduced as an alternative to learning proper representations for high-dimensional images without human annotations [11–13]. Self-supervised learning methods have proven their effectiveness using pre-trained CNN encoders on vast amounts of unlabelled data for downstream tasks such as image classification, object detection, and video prediction. Recently, contrastive learning approaches were introduced to learn representations by discriminating similarities between instances obtained by applying data augmentation [14–17]. The key idea of contrastive learning is to define a positive pair and negative examples from instances obtained through different data augmentations. Positive pair is defined as instances obtained by applying data augmentation on the same image, otherwise negative examples. Contrastive learning is also used in image-based game AI research. [18] and [19] are based on data augmentation and contrastive learning to pre-train the representation space using a CNN encoder in a reward-free environment. After the pre-training phase, the CNN encoder is applied to downstream tasks to maximize the agent’s task-specific reward. These methods collect a dataset consisting of expert demonstrations from each environment to pre-train a CNN encoder. Further, they explore unknown areas in the taskagnostic environment to gather new data. In the context of these studies, the matching system can be further refined by collecting replay data for various situations or by pre-training the representation space using a CNN encoder.

3 Proposed Framework In this paper, we propose a self-supervised contrastive learning framework to learn good representations of game replay data for various situations. Figure 1 shows an overview of the proposed contrastive learning framework. First, we collect sufficient amount of game replay data before performing contrastive learning. Then we define a positive pair and negative examples to perform contrastive learning with the collected data. The core of the proposed framework is to collect replay data from trained RL agents of different levels, and to define a positive pair by sampling from some frames of the same replay data. The goal of the proposed framework is to learn the patterns of replay data for various situations in the training phase based on unsupervised learning, and to cluster the frames according play patterns and proficiencies in the testing phase.

Self-supervised Contrastive Learning

139

Fig. 1. Overview of the proposed contrastive learning framework

3.1 Dataset We use the replay data from plays of trained RL agents in the Facebook ELF environment. Collecting replay data for various situations requires trained RL agents at different levels. To obtain trained RL agents of different levels, we construct six scenarios that consider various play patterns and difficulties in the environment. Table 1 provides details about the scenarios. We train five RL agents of different proficiency by applying the actor-criticbased RL algorithm to each scenario. RL agents trained by competing with computers have limitations in growing into highly skilled agents. Therefore, to obtain highly skilled agents, we apply the self-play method that clones and competes in its own. Figure 2 shows the overall process for training RL agents of different proficiencies. Table 1. Details about the six scenarios Scenario

Computer play pattern

Computer difficulty

1

AI SIMPLE

Upper

2

AI SIMPLE

Middle

3

AI SIMPLE

Lower

4

HIT AND RUN

Upper

5

HIT AND RUN

Middle

6

HIT AND RUN

Lower

We collect a total of 6,000 replay data by selecting one agent out of 30 trained RL agents and competed 1,000 times for each scenario. Figure 3 shows the components of the replay data we collected. Replay data consist of consecutive frames where each frame is represented by a 2020 feature map with information about 22 game situations. A frame of size 202022 has multiple channels filled with zero values. Therefore, we

140

Y. J. Lee et al.

Fig. 2. A process for training RL agents of various proficiency

remove 11 channels composed of zero values and finally reconstructed them into 11 channels of nine categorical and two continuous values. The final dataset for training consists of unlabeled x, where x represents the input feature maps for the game situation.

Fig. 3. Components of replay data

3.2 Modified Momentum Contrast To learn good representations for replay data, we use MoCo among the contrastive learning methods. Because the replay data consists of consecutive frames, we modify some of the original MoCo by constructing a 3D-CNN encoder that takes into account the time order [20]. Figure 4 shows the overall architecture of the modified MoCo for training the 3D-CNN encoder. The 3D-CNN encoder receives x, a 20201120 feature

Self-supervised Contrastive Learning

141

map as input and performs convolutional operations to extract meaningful embedding features. We customize our 3D-CNN encoder in ResNet style [21] with spatial pooling. All convolution layers apply a rectified linear unit (ReLU) activation function [22]. We flatten the feature map output from the 3D-CNN encoder and input it into a multilayer perceptron (MLP) head. Contrastive learning learns representations using feature vectors from the MLP head. The goal of our modified MoCo is to get q to be relatively closer to k+ among the keys in K, given a query q and look-up dictionary keys K = {k0 ,k1 , . . . ,kN } in the feature space. q and look-up dictionary keys K are mapped into the feature space by embedding with 3D-CNN encoders fq and fk . . We define a positive pair (q and k+ ) and negative examples to perform contrastive learning. In particular, we define a positive pair by sampling two sets of frames in different time zones of time length 20 from the same replay data, otherwise negative examples. The following InfoNCE [14] loss is applied to reflect the similarity relations between q and look-up dictionary keys K:   +) exp (qi ·k τ      (1) LInfoNCE (qi ,K) = −log (qi ·kj ) N +) + exp (qi ·k 1 exp [j = +] j=1 τ τ where 1[j=+] ∈ {0,1} is an indicator function equivalent to 1 if j = + and τ is a temperature parameter for scaling. 3D-CNN encoders fq and fk share parameters.

Fig. 4. Architecture of the modified MoCo based on 3D-CNN encoder

4 Experiments 4.1 Training We used the actor-critic-based RL algorithm to each scenario in the Facebook ELF environment to obtain five trained RL agents with different win rates. We applied the self-play method to upgrade the proficiency of the 30 agents we obtained. We collected a

142

Y. J. Lee et al.

total of 6,000 replay data by selected one agent out of 30 trained RL agents and competed 1,000 times for each scenario. We preprocessed the collected replay data and performed contrastive learning to learn good representations. We trained the model for 5,000 epochs with a batch size of 128 and determined the model at the end of training. A feature map of 20 × 20 × 11 × 20 was used as the input data, and the time length was fixed at 20. When performing contrastive learning, a positive pair was defined by sampling two sets of frames in different time zones from the same replay data, and negative examples used 8,192 stored in the look-up dictionary keys. Policy gradient [23] was used with Adam [24] fixing the learning rate at 0.01. Experiments were conducted on a single machine with Intel Zeon 5122 CPU and two NVIDIA RTX TITAN. 4.2 Evaluation The replay data were collected by selecting one agent out of 30 trained RL agents. We need specific criteria to select an agent. Therefore, we present a quantitative assessment method for the evaluation of trained RL agents. Figure 5 shows the quantitative assessment of trained RL agents. Each agent is evaluated as the average score of 1,000 competition outcomes per scenario. We evaluated all 30 trained RL agents by applying the presented quantitative assessment method. Table 2 shows the quantitative assessment results for 30 RL agents. We first selected the agents with the highest average scores from the results of the five agents for each scenario. Our final chosen agent is the 62% Scenario 2 agent with the average score of 66%.

Fig. 5. An example of quantitative assessment for trained RL agents

We conducted contrastive learning using 6,000 replay data collected with 62% Scenario 2 RL agent. We performed a qualitative assessment by applying a k-means clustering method to confirm that the proposed framework has a good understanding of the characteristics of the data. Figure 6 shows the architecture of k-means clustering using a 3D-CNN encoder trained with modified MoCo. First, we generated summarized feature vectors for each data in a trained 3D-CNN encoder. We then clustered the summarized feature vectors through k-means clustering. The number of clusters was determined to be four based on the silhouette coefficients [25]. Figure 7 shows a visualization of the

Self-supervised Contrastive Learning

143

Table 2. Quantitative assessment results for 30 trained RL agents Score (%) Scenario 1 RL agent

Scenario 2 RL agent

Scenario 3 RL agent

Scenario 4 RL agent

Scenario 5 RL agent

Scenario 6 RL agent

AI simple

Hit & Run

Avg. score

Upper

Middle

Lower

Upper

Middle

Lower

62

44

56

56

37

42

48

47

64

45

54

58

45

51

52

51

66

57

66

67

56

61

63

62

68

47

58

64

46

57

60

55

70

46

53

58

42

50

56

51

56

44

53

57

44

50

55

50

58

49

63

69

52

63

67

61

60

55

66

70

47

59

63

60

62

60

69

73

57

66

68

66

64

59

65

70

59

67

70

65

56

47

53

59

45

52

56

52

58

38

45

49

38

49

52

45

60

41

48

51

43

50

53

48

62

44

52

57

52

63

64

55

64

28

36

41

53

63

64

48

56

35

40

43

45

52

57

45

58

31

35

39

52

58

64

47

60

27

31

35

48

58

61

43

62

36

45

49

52

60

66

51

64

42

46

52

44

52

54

48

56

27

30

34

49

52

55

41

58

37

43

49

41

51

57

46

60

37

47

55

47

58

63

51

62

39

46

54

43

51

60

49

64

43

49

55

42

50

55

49

56

28

33

33

34

37

43

35 (continued)

144

Y. J. Lee et al. Table 2. (continued) Score (%)

AI simple

Hit & Run

Avg. score

Upper

Middle

Lower

Upper

Middle

Lower

58

32

36

42

36

40

43

38

60

20

29

30

47

55

59

40

62

36

39

47

44

55

62

47

64

39

44

52

40

47

52

46

clustering results. We checked the game videos by sampling 30 feature vectors from each cluster for qualitative assessment. Cluster 1 (blue) had a play pattern in which the RL agent immediately attacks when the enemy spawns tank 2. Cluster 2 (red) showed a pattern in which the RL agent immediately attacks the enemy base after creating two tank 1. Cluster 3 (yellow) showed a pattern in which the RL agent immediately attacks the enemy base after spawning 1 to 3 tank 1. Cluster 4 (green) had a play pattern in which the RL agent attacks the enemy barracks when the enemy spawns tank 2. Proficiency for each cluster was 60%, 47%, 30%, and 47% based on average win rates. We have shown that the proposed framework learns good representations of data from the qualitative assessment conducted by the naked eye.

Fig. 6. Architecture of k-means clustering using trained 3D-CNN encoder

Self-supervised Contrastive Learning

145

Fig. 7. Clustering results for trained replay data

5 Conclusions We propose a framework based on self-supervised contrastive learning that can enhance the understanding of game replay data to create a sophisticated matching system. We construct multiple scenarios that consider different play patterns, strategies, and proficiencies in the Facebook ELF environment and apply an actor-critic-based RL algorithm. We collect replay data from the selected RL agent through a quantitative assessment. The replay data are trained with the proposed framework and visually evaluated by a kmeans clustering method. k-means clustering results show that each cluster has distinct play patterns and proficiency, implying that a good representation is obtained through self-supervised contrastive learning. In the present study, the evaluation of the proposed framework was quantitative. This approach may reduce confidence in the results. We plan to conduct surveys and evaluate them through specialized survey agency. In addition, we plan to come up with a way to directly connect the proposed framework to the matching system.

146

Y. J. Lee et al.

Acknowledgments. This research was supported by the Agency for Defense Development (UI2100062D).

References 1. Yusfrizal, Adhar, D., Indriani, U., Panggabean, E., Sabir, A., Kurniawan, H.: Application of the fisher-yates shuffle algorithm in the game matching the world monument picture. In: 2020 2nd International Conference on Cybernetics and Intelligent System (ICORIS), pp. 1–6. IEEE, Manado, Indonesia (2020) 2. Vinyals, O., et al.: Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 350–354 (2019) 3. Kurach, K., et al.: Google research football: a novel reinforcement learning environment. arXiv preprint arXiv:1907.11180 (2019) 4. Ravari, Y.N., Snader, B., Pieter, S.: Starcraft winner prediction. In: Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, vol. 12, no. 1 (2016) 5. Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015) 6. Ren, S., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inform. Process. Syst. 28, 91–99 (2015) 7. Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481– 2495 (2017) 8. Oprea, S., et al.: A review on deep learning techniques for video prediction. IEEE Trans. Pattern Analysis Mach. Intell. 44(6), 2806–2826 (2020) 9. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529– 533 (2015) 10. Vinyals, O., et al.: Starcraft ii: A new challenge for reinforcement learning. arXiv preprint arXiv:1708.04782 (2017) 11. Doersch, C., Gupta, A., Efros, A.A.: Unsupervised visual representation learning by context prediction. In: Proceedings of the IEEE international conference on computer vision (2015) 12. Noroozi, M., Favaro, P.: Unsupervised learning of visual representations by solving jigsaw puzzles. In: European Conference on Computer Vision. Springer, Cham (2016) 13. Gidaris, S., Singh, P., Komodakis, N.: Unsupervised representation learning by predicting image rotations. arXiv preprint arXiv:1803.07728 (2018) 14. van den Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018) 15. He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020) 16. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning. PMLR (2020) 17. Misra, I., van der Maaten, L.: Self-supervised learning of pretext-invariant representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020) 18. Liu, H., Abbeel, P.: Unsupervised active pre-training for reinforcement learning (2020) 19. Stooke, A., Lee, K., Abbeel, P., Laskin, M.: Decoupling representation learning from reinforcement learning. In: International Conference on Machine Learning. PMLR (2021)

Self-supervised Contrastive Learning

147

20. Qian, R., Meng, T., Gong, B., Yang, M.-H., Wang, H., Belongie, S., Cui, Y.: Spatiotemporal contrastive video representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021) 21. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016) 22. Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on International Conference on Machine Learning (ICML’10), pp. 807–814. Omnipress, Madison, WI, USA (2010) 23. Silver, D., et al.: Deterministic policy gradient algorithms. In: International Conference on Machine Learning. PMLR (2014) 24. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv: 1412.6980 (2014) 25. Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)

Stochastic Feed-forward Attention Mechanism for Reliable Defect Classification and Interpretation Jiyoon Lee, Chunghyup Mok, Sanghoon Kim, Seokho Moon, Seo-Yeon Kim, and Seoung Bum Kim(B) Industrial & Management Engineering, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul 02841, Republic of Korea {jiyoonlee,mokch,dawonksh,danny232,joanne2kim,sbkim1}@korea.ac.kr

Abstract. Defect analysis in manufacturing systems has been crucial for reducing product defect rates and improving process management efficiency. Recently, deep learning algorithms have been widely used to extract significant features from intertwined and complicated manufacturing systems. However, typical deep learning algorithms are black-box models in which the prediction process is difficult to understand. In this study, we propose a stochastic feed-forward attention network that consists of input feature level attention. The stochastic feed-forward attention network allows us to interpret the model by identifying the input features, dominant for prediction. In addition, the proposed model uses variational inferences to yield uncertainty information, which is a measure of the reliability of the interpretations. We conducted experiments in the field of display electrostatic chuck fabrication process to demonstrate the effectiveness and usefulness of our method. The results confirmed that our proposed method performs better and can reflect important input features. Keywords: Feed-forward attention mechanism · Bayesian neural network · Explainable artificial intelligence · Defect prediction · Electrostatic chuck fabrication process

1

Introduction

As the performance of electronic devices improves, the demand for this display industry, an essential part of electronic devices, is increasing. The display manufacturing process consists of several fabrication processes including a large ultra-fine process, a cell process that cuts out the produced display board to a size suitable for the product application, and a module process that attaches a display to electronic equipment. Furthermore, each process consists of several production facilities. In the display manufacturing process, an electrostatic chuck method of attaching and detaching a display panel with electrostatic force transfers the display panel to a production facility without physical damage. Because accurate and rapid transfer using electrostatic chuck is directly related to the c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023  K. Arai (Ed.): IntelliSys 2022, LNNS 542, pp. 148–158, 2023. https://doi.org/10.1007/978-3-031-16072-1_11

Stochastic Feed-forward Attention Mechanism

149

productivity of the display manufacturing process, it is important to identify defects of electrostatic chuck devices in advance and take preemptive measures [1,2]. However, engineers qualitatively identified defects from electrostatic chuck equipment in most cases. This approach is inefficient in that it requires a lot of time and cost. Therefore, it is important to develop artificial intelligence (AI) method that can classify defects in advance and explain the causes of defects using data collected from manufacturing facilities. A proper AI method can save time and money in determining product condition and provide appropriate actions for defect [3]. Several attempts have been conducted to utilize AI in the display industry. The author in [4] classified multiple defect classes by applying a support vector machine algorithm to facility data such as width, size, and eccentricity of defects collected from the liquid crystal display (LCD) process. The author in [5] proposed an algorithm to classify normal and defect circuit images using convolutional neural networks (CNNs) that can effectively extract features of large-area defect images. Although these methods successfully classify the condition and defects of the display panel, they cannot provide an interpretation for the model. To address this issue, we propose a stochastic feed-forward attention network that can interpret the input features important for prediction and provide uncertainty information for each interpretation. In this study, we conducted experiments on the electrostatic chuck equipment failure in the fabrication process, the primary display production process, to classify the defects and derive their causes. Furthermore, the proposed stochastic feed-forward attention network grafts Bayesian neural networks (BNNs) to provide quantified uncertainty for interpretation. The remainder of this paper is structured as follows. Section 2 introduces the related work on attention mechanisms and Bayesian neural networks. Section 3 describes the proposed method in detail. Section 4 presents experimental results on display manufacturing facility datasets to prove the applicability of the proposed method. Finally, in Sect. 5, the conclusions and future research directions are presented.

2

Related Works

Deep learning algorithms are actively used in various domains in that they derive more accurate performance by utilizing large amounts of data with the advancement of technology. Moreover, as the performance of deep learning algorithms has become more sophisticated and stable to some extent, interest in the reliability of the decision-making derived from the deep learning algorithm is increasing to apply it to practical application problems such as smart factories. However, existing deep learning algorithms have the black box characteristic of not knowing the internal decision-making process. In response, various studies are being conducted on explainable artificial intelligence to make deep neural networks more transparent to understand them better.

150

J. Lee et al.

Explainable Artificial Intelligence (XAI) is a way to understand better and explain the output of the deep learning model used in the analysis. We can identify important features through XAI that have influenced the prediction process of the model and quantitatively calculate each contribution degree to provide an intuitive explanation of the prediction results. According to the method applied to the prediction model, XAI can be divided into a model-specific approach that can be applied only to a specific model and a model-agnostic approach that does not depend on the structure of the model. In this study, we utilized the attention mechanism, a representative model-specific XAI. The attention mechanism has been proposed to alleviate the long-term dependency problem, a chronic limitation of recurrent neural networks (RNNs) used in sequence-to-sequence learning [6]. The attention mechanism at each decoding time step allows the model to pay more attention to the relevant time steps of the input with respect to the output rather than paying attention to only a single context vector which is the last hidden state of the input. The attention mechanism outperforms the traditional RNNs encoder-decoder model by solving the long-term dependency problem. The initial attention mechanism was used in the machine translation, which aims to guide the prediction model by focusing on the critical words while minimizing other words. Thereafter, the attention mechanism has been applied to different tasks, including document classification [7] and semantic segmentation [8]. In the process of applying to various industrial tasks, the attention mechanism has numerous variations. While the attention mechanisms were proposed to improve the RNNs structure, Some variation methods utilize attention without recurrence to calculate the context vector in the model [9]. Because these methods are performed in parallel, they require less computational cost than RNN-based models. In this study, we propose a stochastic feed-forward attention structure that can adequately analyze manufacturing facility data. The proposed method is an uncertainty-aware explainable neural network that provides an interpretation of a specific instance through an attention mechanism. There are attempts to utilize the results derived from deep learning algorithms reliably. It is representative to obtain uncertainty information, which is a reliability indicator for the result as well as accurate prediction performance. Bayesian neural networks (BNNs), which can quantify uncertainty, are a representative method for estimating the distribution of model parameters and for describing the stochastic properties of deep neural networks. It has recently been demonstrated that applying L2 regularization and Monte Carlo dropout to all layers of deep neural networks architecture is equivalent to a variational approximation for BNN [10,11]. In addition, the uncertainty quantification method has been proposed in which the value of the last-layer output of the deep neural network is variably decomposition and subdivided according to each purpose [12]. Uncertainty in BNNs consists of aleatoric and epistemic uncertainty depending on the purpose [13]. First, aleatoric uncertainty, which means uncertainty about the data, is caused by inherent noise and randomness of the data. This

Stochastic Feed-forward Attention Mechanism

151

inherent noise is measured due to malfunctions in the measurement system itself or an error in the sensor equipment. Therefore, aleatoric uncertainty is perceived as irreducible uncertainty. Next, epistemic uncertainty, known as model uncertainty, occurs when the model parameters lack the knowledge to recognize the data. The more difficult it is to reliably understand the data given to the model, the more uncertain the model parameters. Therefore, the learning comprehension of the model can be grasped through the epistemic uncertainty, and this uncertainty can be sufficiently reduced when learning with a large amount of data and understanding various characteristics of the data. The purpose of this study is to identify the important input features for the defect classification and to quantify the uncertainty of each interpretation. To achieve this goal, the existing deterministic feed-forward attention mechanism should be replaced with the stochastic feed-forward attention mechanism.

3

Proposed Method

We propose a stochastic feed-forward attention network (SFAN) to perform defect analysis and interpretation in manufacturing systems effectively. Figure 1 shows a overall structure of the proposed method. The model consists of feature extraction, feed-forward attention, and defect classification. In feature extraction, the model extracts the input features through a large number of one-dimensional convolution filters. These features are then fed into the feedforward attention module. The model then focuses on important features through the feed-forward attention module that computes the feature context vectors. Finally, the model conducts a downstream task of defect classification based on the feature context vector. 3.1

Feature Extraction

The feature extraction module aims to properly summarize each feature and identify defects of electrostatic chuck devices in advance. When the input variable x of the given tabular data has P dimensions and N samples are given, it can

Fig. 1. A overall structure of the stochastic feed-forward attention network for tabular data.

152

J. Lee et al.

Fig. 2. A feature extraction module of the SFAN.

be expressed as X ∈ RN ×P . The target value y ∈ RC is one-hot encoded for C classes. The first layer of the model encodes pth input variable xp as feature extracted vector hp ∈ Rh through a one-dimensional convolution. At this time, the number of convolution filters is h. Next, the normalization and activation functions are applied as follows.    (1) hp = LReLU LN W1p xp + bp1 , where W1p ∈ Rh is the filter weight of a one-dimensional convolution layer and bp1 ∈ Rh is a bias. we use the leaky ReLU (LReLU) activation and layer normalization (LN). As shown in Fig. 2, hp is a feature extracted vector. 3.2

Feed-Forward Attention

In the feed-forward attention module, Our model utilizes a variant of the feedforward attention [14] to classify the defect by weighting the relevant input feature. As shown in Fig. 3, given p features extracted(hp ), the feed-forward attention is used to produce a feature context vector z p . The attention mechanism computes the following probabilities, which are the attention scores: op = f p (hp ) ,   exp op p α = P  , k k=1 exp o

(2) (3)

Stochastic Feed-forward Attention Mechanism

153

Fig. 3. A feature attention uncertainty module of the SFAN.

where f p is a function to derive the logit vector op from hp , and αp is the attention probability derived by applying softmax to the logit vector op . We use a linear function f p (h) = W2p h + bp2 to produce logit vectors. The attention probabilities are then used to compute the feature context vector z p . After the weighted sum of the extracted feature vectors is calculated, the activation function is applied as follows:    (4) z p = LReLU LN αp hp . The feature context vector is fed into the final layer of the model to produce class logits, and softmax activation is then applied to obtain the class probabilities as follows: q k = f d (W3 z + b3 ) ,   exp q c c p = C  , k k=1 exp q

(5) (6)

where q k ∈ RC is the logit vector, and pc represents the predicted probability of class c. Given a training dataset {Xi , yi }ni=1 , we use the following categorical crossentropy loss with L2 regularization: L(X, y) = −

n L C  1  c c Wi yi log(pci ) + λ ||Wj ||22 + ||bj ||22 , n i=1 c=1 j=1

(7)

where Wic is the weight considering the number of data for each class to alleviate the class imbalance. pci is the probability to be predicted as class c for the input xi . The neural network parameters, W1 , b1 , W2 , b2 , W3 , b3 can be estimated

154

J. Lee et al.

by a backpropagation approach with a mini-batch stochastic gradient descent method. The proposed method has an advantage in that it only requires a dataset consisting of the input variable and their class information. The method learns to identify important input features in the model without any explicit information on which variables need to focus. 3.3

Uncertainty Quantification

We aim to quantify attention information for important features and uncertainty information associated with each attention. We use the attention score α ˆ p derived from the feed-forward attention module to define each decomposed uncertainty to achieve this goal. The uncertainties for the calculated interpretation are as follows:

inputf eature − uncertainty =

N N 1  p 1  p α ˆ n − (ˆ αnp )2 + (ˆ α −α ¯ p )2 . N n=1 N n=1 n 



aleatoric

(8)

epistemic

where N is the number of times to apply the stochastic forward pass and α ˆ np is the nth attention probability of the proposed method with the softmax function ˆ np derived for applied to the pth feature. In addition, α ¯ p means the average of α each stochastic forward pass.

4 4.1

Experiments Display Manufacturing Facility Datasets

The dataset used in this study is the electrostatic chuck facility data collected from the fabrication process of the display company in South Korea. A total of 4,767 samples were collected from March 2021 to June 2021, of which 175 cases of defect data were included. There are 57 input features that describe facility information such as process time, production path, type of voltage applied power, pin position, internal pressure, alignment degree, application area, and strength of static electricity. The dataset was split into the training, validation, and test sets at rates of 63%, 7%, and 30%, respectively. Each partitioned dataset consists of normal and defective data. Because the manufacturing facility datasets are imbalanced (i.e. few defect data compared to the normal data), we used a cost-sensitive learning method that assigns a high cost to the misclassification of the minority class to mitigate imbalance problems [15].

Stochastic Feed-forward Attention Mechanism

4.2

155

Results

The proposed method was constructed with the number of one-dimensional convolution filters h = 64. Then we set the mini-batch size to 64 and used Adam optimizer with learning rate η = 10−4 . In addition, we applied Monte Carlo dropout with a ratio of 0.01 to stochastically construct the model parameters, including the attention mechanism. The stochastic forward pass N to quantify the attention score and uncertainty was 100. We defined the maximum epoch as 500 with early stopping based on the validation loss calculated using the validation set during the experimental setup. To compare the performance between the proposed method and other methods in terms of defect classification, we used the accuracy and F1 score as follows: TP + TN , (9) TP + TN + FP + FN TP P recision = , (10) TP + FP TP Recall = , (11) TP + FN 2 ∗ P recision ∗ Recall 2 ∗ TP F 1 − score = = , (12) P recision + Recall 2 ∗ TP + FP + FN where true positive (TP) is the number of defects that are predicted to be defective, and false negative (FN) is the number of defects that are falsely predicted to be normal. Conversely, true negative (TN) is the number of normal that is predicted to be normal, and false positive (FP) is the number of normal that are falsely predicted to be defects. Regarding the identification of features important to prediction, we had no information on which features were important for defect classification. Therefore, we validated the interpretation qualitatively. First, an ablation study was conducted to investigate the effect of each module constituting the proposed model. The results can be seen in Table 1. The model variants used in the experiments are a vanilla deep neural network (DNN) and a model that even applies a feed-forward attention network (FAN) and our proposed SFAN. FAN with the attention mechanism applied has a better performance than DNN structure without the attention mechanism because it focuses more on the important features to make predictions. In addition, FAN provides attention scores that make it possible to interpret. Moreover, SFAN derived better attention scores than Accuracy =

Table 1. Ablation analysis of the proposed SFAN to examine the effect of a feedforward attention network and stochastic attention on manufacturing facility dataset. Model

DNN

HAM

SFAN

Classification Accuracy 0.9874 0.9902 0.9944 F1 Score

0.8944 0.9288 0.9586

156

J. Lee et al.

Table 2. Importance scores and uncertainties for the top 10 features responsible for classifying normal and defective products. Normal Features

Defect Importance score

Aleatoric uncertainty

Epistemic uncertainty

Features

Feature 58 0.40401

0.04984

0.00508

Feature 59 0.34341

0.05061

0.00414

Feature 51 0.24223

0.05211

Feature 73 0.00403

Importance score

Aleatoric uncertainty

Epistemic uncertainty

Feature 73 0.08750

0.00801

0.13685

Feature 70 0.08684

0.00993

0.14686

0.00305

Feature 74 0.08650

0.01960

0.10619

0.04830

0.00446

Feature 69 0.08626

0.00560

0.00027

Feature 68 0.00314

0.05085

0.00303

Feature 72 0.08438

0.00167

0.00018

Feature 72 0.00074

0.04996

0.00378

Feature 68 0.08410

0.00092

0.00002

Feature 65 0.00035

0.04950

0.00429

Feature 66 0.08337

0.00122

0.00001

Feature 71 0.00031

0.04942

0.00293

Feature 71 0.08138

0.00102

0.00002

Feature 69 0.00029

0.04843

0.00358

Feature 67 0.07883

0.00087

0.00001

Feature 70 0.00025

0.00508

0.00267

Feature 83 0.01472

0.00127

0.00001

FAN because it quantified attention scores that reflect uncertainty information by applying a stochastic attention mechanism. For this reason, SFAN has much more advantages in terms of better classification performance and providing more reliable important feature information including uncertainty. As a result, The classification accuracy and F1 score of the test dataset of the proposed method were 0.9944 and 0.9586, implying that the proposed method successfully classified the defects. The proposed method yielded the importance score for each feature per sample. Table 2 shows the average importance score and uncertainties for the top 10 input features in order of importance. Important features identified from the model include lag time, electrostatic chuck driving environment, and application area, and these results were confirmed by domain experts. Table 3 shows the results of comparative experiments in terms of classification accuracy and F1 score with XGboost [16] and TabNet [17], widely used for tabular data and capable of identifying the important features. The result shows that our proposed SFAN outperformed XGboost and TabNET by achieving the highest classification accuracy and F1 score. Table 3. Comparison between the proposed SFAN and the TabNet and XGboost methods in terms of accuracy and F1 score on manufacturing facility dataset. Model

TabNET XGboost SFAN(proposed)

Accuracy 0.9832

0.9930

0.9944

F1 Score 0.8726

0.9510

0.9586

Stochastic Feed-forward Attention Mechanism

5

157

Conclusions

In this study, we introduce a stochastic feed-forward attention network for equipment defect classification work in the display electrostatic chuck manufacturing process. Our proposed stochastic feed-forward attention network consists of feature extraction and feed-forward attention modules. The Feature extraction module effectively summarizes features of tabular data from various perspectives through one-dimensional convolution filters, and the feed-forward attention module quantifies which features are primarily applied to the prediction results. Furthermore, the model can provide uncertainty information to facilitate the interpretation of the results more reliably. The explainability of our model is particularly useful for interpreting complex relationships between data features collected from manufacturing facilities. We verified the superiority of our method with display manufacturing facility datasets. The experimental results showed that our model could correctly recognize important features while deriving excellent classification performance. Acknowledgments. This research was supported by the Brain Korea 21 FOUR and the IITP grant funded by the MSIT (No.2021–0-00034).

References 1. Sahno, J., Shevtshenko, E.: Quality improvement methodologies for continuous improvement of production processes and product quality and their evolution. In: 9th International DAAAM Baltic Conference “Industrial Engineering”, pp. 181-186 (2014) 2. Stojanovic, L., Dinic, M., Stojanovic, N., Stojadinovic, A.: Big-data-driven anomaly detection in industry (4.0): an approach and a case study. In: 2016 IEEE International Conference Big Data, IEEE, pp. 1647–1652 (2016) 3. Cho, Y.S., Kim, S.B.: Quality-discriminative localization of multisensor signals for root cause analysis. IEEE Trans. Syst. Man, Cybern. Syst. 52(7), 4374–4387 (2021) 4. Kang, S.B., Lee, J.H., Song, K.Y., Pahk, H.J.: Automatic defect classification of TFT-LCD panels using machine learning. In: 2009 IEEE International Symposium on Industrial Electronics, pp. 2175-2177. IEEE (2009) 5. Ryu, J.-H., Heo, M.-O., Zhang, B.-T.: CNN-based classification methods for filtering of defect pixel circuit images. J. Comput. Sci. Eng. 907-909 (2017) 6. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014) 7. Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings 2016 Conference North American Chapter Association Computational Linguistics: Human Language Technologies, pp. 1480-1489 (2016) 8. Zhang, H., Zhang, H., Wang, C., Xie, J.: Co-occurrent features in semantic segmentation. In: Proceedings IEEE Conference Computer Vision Pattern Recognition, pp. 548-557 (2019) 9. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998-6008 ( 2017) 10. Gal, Y.: Uncertainty in deep learning, University of Cambridge, 1 (2016)

158

J. Lee et al.

11. Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: International Conference Machine Learning, pp. 1050-1059 (2016) 12. Kendall, A., Gal, Y.: What uncertainties do we need in Bayesian deep learning for computer vision?. In: Advanced Neural Information Processing Systems, pp. 5574-5584 (2017) 13. Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? Does it matter? Struct. Saf. 31, 105–112 (2009) 14. Raffel, C., Ellis, D.P.W.: Feed-forward networks with attention can solve some long-term memory problems. arXiv preprint arXiv:1512.08756 (2015) 15. Elkan, C.: The foundations of cost-sensitive learning. In: International Joint Conference Artificial Intelligence, pp. 973-978. Lawrence Erlbaum Associates Ltd (2001) 16. Chen, T., Guestrin, C.: Xgboost: a scalable tree boosting system. In: Proceedings 22nd ACM SIGKDD International Conference Knowledge Discovery Data Mining, pp. 785-794 (2016) 17. Arık, S.O., Pfister, T.: TabNet: attentive interpretable tabular learning. arXiv preprint arXiv:1908.07442 (2020)

Bregman Divergencies, Triangle Inequality, and Maximum Likelihood Estimates for Normal Mixtures Bernd-J¨ urgen Falkowski(B) ¨ Fachhochschule F¨ ur Okonomie und Management FOM, Arnulfstrasse 30, 80335 M¨ unchen, Germany [email protected]

Abstract. The concepts of distance and angle and their algebraic realizations in the form of a scalar product are known to lead to kernel versions of sophisticated clustering algorithms. Here the more recently utilized Bregman Divergencies are treated. They possess all the properties of a metric apart from satisfying the triangle inequality. However, they can be suitably modified. En passant an apparent gap in a former paper is eliminated by exploiting an old proof concerning (conditionally) positive definite kernels. In addition an explicit isometric embedding of the modified Bregman Divergence in a Reproducing Kernel Hilbert Space is described. On a practical level recalling some basic facts on normal mixtures the well known connection between the parameter estimation problem for these mixtures and clustering algorithms is shown to hold in this abstract setting as well thus providing a more flexible approach to maximum likelihood estimates. Keywords: Bregman Divergencies mixtures

1

· Triangle inequality · Normal

Introduction

Clustering algorithms have recently received much attention. On the one hand they are attractive for commercial purposes as exemplified in recommender systems. On the other hand they are interesting from a theoretical point of view creating a connection between maximum likelihood methods in statistics and certain artificial intelligence aspects. Of course, clustering algorithms have been known for quite some time. However, during the last decade many improvements have been effected through the use of kernel methods. Here a particular kind of kernel is considered that initially does not satisfy the triangle inequality. In the first part of this paper certain definitions from [1] are recalled. In particular the so-called Jensen Bregman (JB) is defined. It is observed that the JB possesses all properties of a metric apart from the triangle inequality. Following Schoenberg, [9], the triangle property can be verified by showing that the JB can be embedded in a Hilbert Space iff the JB is conditionally positive definite. The c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023  K. Arai (Ed.): IntelliSys 2022, LNNS 542, pp. 159–166, 2023. https://doi.org/10.1007/978-3-031-16072-1_12

160

B.-J. Falkowski

proof seems enlightening since the result is obtained from an abstract distance function satisfying the symmetry and positivity requirements by a simple computation. Slightly modifying a result due to Parthasarathy and Schmidt, [8], p. 4, a connection to infinitely divisible positive definite functions comes to light. This then shows the important role of certain cohomology groups of abelian groups. In particular the special form of certain conditionally positive kernels is obtained via the so-called Levy-Khinchine formula, [8], p. 93. Slightly modifying a result due to Parthasarathy and Schmidt, cf. [8], p. 3, provides the possibility of applying the Reproducing Kernel Hilbert Space (RKHS) in order to construct an explicit isometric imbedding of the modified Bregman divergence. In the second part of the paper for the benefit of the reader Duda‘s et al. results concerning maximum likelihood estimates for normal mixtures are recalled, [2]. It is well known that there is a close connection between likelihood estimation problems for normal mixtures and clustering algorithms, see e.g. [2], p. 526, and [5]. Thus in the last part of the paper it is shown how a kernel version of Duda’s iterative k-means algorithm [4], can be used to obtain approximate solutions for the maximum likelihood parameters of normal mixtures applying the JB. This is appealing from a practical point of view since the original algorithm can be made rather more flexible and thus provides for a wide range of applications.

2

Bregman Divergencies

For the convenience of the reader some definitions from [1] have to be recalled, where some technical details will be omitted. For these the reader is referred to the cited reference. Definition 1 (Bregman Divergence). Let φ be a convex function of Legendre type defined on a suitable domain. For any x, y the Bregman divergence corresponding to φ is defined as dφ (x, y) = φ(x) − φ(y)− < x − y, ∇φ(y) >

(1)

In [1] there is also a symmetrized Bregman divergence defined in order to end up with an isometric embedding in Hilbert space. Let φ be a convex function of Legendre type and dφ (., .) be the corresponding Bregman divergence. Then the Jensen Bregman (JB) is defined as Definition 2 (Jensen Bregman). φ (x, y) =

1 1 1 x+y 1 x+y x+y dφ (x, ) + dφ (y, ) = φ(x) + φ(y) − φ( ) (2) 2 2 2 2 2 2 2

Note that the definition of the JB ensures that apart from the triangle inequality all properties of a metric are satisfied. This shortcoming, however, is remedied by the following observation due to v. Menger and Schoenberg, [9], [10].

Bregman Divergencies, Triangle Inequality, and Maximum Likelihood

161

Theorem 1. Given a sepaprable space X and a distance function d satisfying 1. d(x, y) = d(y, x) ≥ 0 2. d(x, x) = 0 for arbitrary x, y ∈ S. Then a necessary and sufficient condition that X be imbeddable in a real Hilbert space H is that for any n+1 points (n ≥ 1) n  n 

(d(x0 , xi )2 + d(x0 , xk )2 − d(xi , xk )2 )ai ak ≥ 0

(3)

i=1 k=1

Summing over the three terms separately and setting a0 = − simplifies to n  n 

(d(xi , xk )2 ai ak ≤ 0 subject to

i=0 k=0

n 

n

i=1

ai this

ai = 0

(4)

i=0

This then leads to the following definition that is needed in the sequel. Definition 3. Given a topological space X and a continuous function K : X × X → R where R denotes the real numbers. Then K is called a positive definite (p.d.) kernel if it satisfies a)  K(x, y) = K(y, x) for all x, y ∈ X n n b) i=1 j=1 ai aj K(xi , xj ) ≥ 0 for all (ai , xi ) ∈ R × X If a) nabove holds and b) above holds under the additional condition c) i=1 ai = 0 then K is called conditionally positive definite (c.p.d) Clearly every p.d. kernel is c.p.d. whilst in general not every c.p.d. kernel is p.d.. Moreover it is seen that the negative of the above distance function must be a c.p.d. kernel in order to possess the triangle property that is required for it to be a metric. In fact one has the following corollary. Corollary 1. φ (., .)1/2 is a metric iff −φ (., .) is a c.p.d. kernel. Corollary 2. If φ (., .)1/2 is a metric, then the kernel defined by K(x, y) = φ( x+y 2 ) is c.p.d. n Proof. −φ (x, y) is c.p.d. Hence let {a1 , a2 , ..., an } be given with i=1 ai = 0 n n and let x1 , x2 , ..., xn ∈ X be arbitrary. Then i=1 j=1 ai aj (−φ (., .)) ≥ 0. But n n since the other summands vanish. To this reduces to i=1 j=1 ai aj φ( x+y 2 ) ≥ 0 n n n n see this note that i=1 j=1 ai aj φ(xi ) = j=1 aj i=1 ai φ(xi ) = 0. A similar argument goes through for the other summand. In order to obtain further consequences of Theorem 1 another lemma is needed whose proof has been obtained by slightly modifying a result from [8], p.3.

162

B.-J. Falkowski

Lemma 1. Let X be a set and K be a symmetric kernel on X × X. Then the following conditions are equivalent: 1. K is c.p.d 2. For all fixed x0 ∈ X the kernel Kx0 := K(x, y) − K(x, x0 ) − K(x0 , y) + K(x0 , x0 ) is p.d. for all x, y ∈ X 3. For every t > 0, exptK is p.d. Proof. (1) implies (2) n Let x1 , x2 , ..., xn ∈ X and a1 , a2 , ..., an ∈ R and set a0 = − j=1 aj then definin n tions 3 b,c imply that j=1 k=1 aj ak K(xj , xk ) ≥ 0 Rewriting this inequality one obtains n  n 

(K(x, y) − K(x, x0 ) − K(x0 , y) + K(x0 , x0 ))aj ak ≥ 0

(5)

j=1 k=1

This shows that (1) implies (2): (2) implies (3): Suppose now that (2) holds. Then it follows by a lemma of Schur, cf. [11], and since linear combinations of p.d. kernels are p.d. that exptKx0 is a p.d. kernel for all t > 0. Since any kernel of the form K(x, y) = k(x)k(y) is p.d. it follows once again from Schur‘s lemma that exptK(x, y) = exp − tK(x0 , x0 ) ∗ expt(K(x, x0 ) + K(x0 , y)) ∗ exptKx0 (x, y) is p.d.. Hence (2) implies (3) (3) implies 1): If (3) holds then the kernel t−1 (exptK − 1) is c.p.d. Letting t → 0 it is seen that K is c.p.d. Remark: In fact an interesting derivation of the Levy Khinchin formula, that is closely connected to c.p.d. kernels, can be found in [8], p. 93. This is obtained from an abstract result on the 1-Cohomology of abelian groups. Here the LevyKhinchine formula can be obtained for hermitian c.p.d. kernels. However together with a result in [3] the real valued version can easily be constructed. For further information and explicit results on the Levy-Khinchine formula see also [6], p.69, and [7].

3

Normal Mixtures

In order to prepare the way for showing how the clustering algorithm in [4] can be applied to obtain an approximate solution for the maximum likelihood parameters for normal mixtures using JB the reader is invited to consult some general results from [2], p. 521, concerning three equations for maximum likelihood estimates.

Bregman Divergencies, Triangle Inequality, and Maximum Likelihood

163

These general results obtained by Duda et al. will be applied to normal mixtures of the form p(x|ωi , θi ) ∼ N(μi , Σi ), where all Parameters are assumed to be unknown, see [2], p. 524, for technical details and notation. Considering ln p(xk |ωi , θi ) =

1/2 ln |Σ−1 1 i | − < Σ−1 i (xk − μi ), (xk − μi ) > 2 (2π)d/2

(6)

it is possible to obtain three equations from the three equations cited above for the local maximum likelihood estimates. The resulting equations are given as follows: n 1ˆ ˆ P (ωi |xk , θ) Pˆ (ωi ) = (7) n n

k=1

ˆ k Pˆ (ωi |xk , θ)x ˆ Pˆ (ωi |xk , θ)

(8)

ˆ k−μ Pˆ (ωi |xk , θ)(x ˆi )(xk − μ ˆi )t n ˆ ˆ k=1 P (ωi |xk , θ)

(9)

k=1 μˆi =  n

ˆi = Σ where

n

k=1

k=1

ˆ ˆ ˆ =  p(xk |ωi , θi )P(ωi ) Pˆ (ωi |xk , θ) c ˆ ˆ j=1 p(xk |ωj , θj )P(ωj )

(10)

and this reduces to c

ˆ i) ˆi |−1/2 exp< ΣˆI −1 (xk − μˆi ), (xk − μˆi ) > P(ω |Σ ˆj |−1/2 exp− 1 < ΣˆJ −1 (xk − μˆj ), (xk − μˆj ) > P(ω ˆ j) |Σ

j=1

4 4.1

(11)

2

Reproducing Kernel Hilbert Space, Clustering, and Normal Mixtures RKHS

In order to use the kernel version of Duda’s iterative algorithm for clustering, cf. [4], the c.p.d. kernel K(x, y) = −φ (x, y) cannot be used. Instead one has to look for a p.d. kernel that provides an isometric imbedding for φ (x, y). However, the following result follows easily from lemma 1. ˜ x (., .) defined by Lemma 2. The kernel K 0 ˜ x (x, y) = 1 (φ( x + y ) − φ( x + x0 ) − φ y + x0 ) + φ(x0 ))− K 0 2 2 2 2

(12)

for arbitrary x0 ∈ X is p.d. and leads to an isometric imbedding for φ (x, y).

164

B.-J. Falkowski

x+y Proof. Note first that (from corollary 2) 12 (φ( x+y 2 ) must be c.p.d. since φ( 2 ) ˜ is. Moreover from lemma 1 applied to 12 (φ( x+y 2 ) it follows that Kx0 (., .) is p.d. ˜ x (., .) is isometric to K(x, y) = φ (x, y) since Finally K 0

˜ x (x, x) + K ˜ x (y, y) − 2K ˜ x (x, y) = 1 φ(x) + 1 φ(y) − φ( x + y ) K (13) 0 0 0 2 2 2 Using this kernel one may now apply the Reproducing Kernel Hilbert Space to obtain an explicit desciption of the corresponding Hilbert space. It is defined as follows: Definition 4. Given a p.d. kernel on a set S := {x1 , x2 , ..., xn }, an imbedding η of S in a vector space H = RS (the space of functions from S to R) may  always be constructed by setting η(x) := K(., x), considering functions n f = i=1 αi K(., xi ), and defining addition of such functions and multiplication of such a function by a scalar pointwise. If the inner product is defined by < η(x), η(y) >RP KH := K(x, y) and extended by linearity, then a Hilbert space H (the Reproducing Kernel Hilbert Space,RKHS, sometimes also called feature space) is obtained by completion as usual, see e.g. [13]. This then gives an explicit isometric imbedding if as the RKHS kernel the ˜ x is taken. kernel K 0 4.2

Clustering

Fortunately enough the explicit version of the feature map is not needed for the kernel version of Duda’s iterative k-means algorithm. The problem considered there may briefly be described as follows. Given a set of n samples S := {x1 , x2 , ..., xn }, then these samples are to be partitioned into exactly k sets S1 , S2 , ..., Sk . Each cluster is to contain samples more similar to each other than they are to samples in other clusters. To this end one defines a target function that measures the clustering quality of any partition of the data. The problem then is to find a partition of the samples that optimizes the target function. Now the generalized concept of mean is given as m(S) by m(S) := 1/n

n 

η(xi ).

i=1

Hence it becomes clear that the explicit use of the feature map can be avoided if the target function is defined to be the sum of squared errors in feature space. This follows by observing that

η(x) − m 2 = K(x, x) + 1/n2

n  n  i=1 j=1

K(xi , xj ) − 2/n

n 

K(x, xj )

j=1

holds. Similarly it can be computed that tentative updates of the Si as required for iterative improvement of the target function can also be effected without explicit use of the feature map. For technical details see [4,12].

Bregman Divergencies, Triangle Inequality, and Maximum Likelihood

4.3

165

Clustering and Maximum Likelihood Estimates for Normal Mixtures

ˆ is large if the From equation (11) it is clear that the probability Pˆ (ωi |xk , θ) −1 ˆ (xk − μˆj ), (xk − μˆj ) > is small. In contrast squared Mahalanobis distance < Σ to Duda et al. here the squared Mahalanobis distance is approximated by the JB. ˆ = δim Hence one finds the mean μˆm nearest to xk and approximates Pˆ (ωi |xk , θ) where δ is just the Kronecker symbol. The iterative application of equation (8) then leads to the algorithm sketched out above and described in detail in [4]. This kernel version of Duda’s iterative algorithm appears particularly elegant. Of course, it must be admitted that quite a few technical details had to be suppressed here for lack of space: This particularly concerned mean updates in terms of the kernel. It proved somewhat difficult to avoid using the feature map explicitly. However, the use of indicator functions as suggested in [12] proved helpful.

5

Conclusion

Following Schoenberg’s arguments the JB was successfully modified to obtain the triangle inequality. During that process the role of conditionally positive definite and positive definite kernels became clear. Moreover a connection to the Levy-Khinchine formula was exhibited. On an abstract level the importance of the 1-cohomology of Abelian groups in this context came to light. The concept of Reproducing Kernel Hilbert space was helpful. It provided an explicit isometric imbedding of the modified Bregman divergence. Moreover it was shown by several examples how an explicit use of the feature map could be avoided. However, this of course was not just l’art pour l’art. There are some practical consequences of these abstract considerations. After recalling some basic facts about normal mixtures it was shown how employing a kernel version of Duda’s elegant iterative clustering algorithm approximate solutions for likelihood parameters for normal mixtures could be obtained. Here the use of the JB provided extra flexibility. There is rather a wide range of possible applications. One of the more esoteric ones could be in biological/genetic data for genom wide association studies [14]. As far as possible CPU time problems are concerned preliminary results using graphic cards were promising. Of course, more systematic experiments are needed to validate the approach. In addition it must be admitted that the shortcomings of a hill-climbing algorithm (getting stuck in local extrema) are still present. Moreover it is not clear how optimal initializations should be chosen.

References 1. Acharyya, S., Banerjee, A., Boley, D.: Bregman divergences and triangle inequality. In: Proceedings of the 2013 SIAM International Conference on Data Mining, Austin, US (2013)

166

B.-J. Falkowski

2. Duda, R.O., Hart, P.F., Stork, D.G.: Pattern Classification. Wiley, Reprinted (2017) 3. Ehm, W., Genton, M., Gneitig, T.: Stationary covariances associated with exponentially convex functions. Bernoulli 9(4), 607–615 (2003) 4. Falkowski, B.-J.: A kernel iterative K-Means algorithm. In: Borzemski, L., Wilimowska, Z. (eds.) ISAT 2019. AISC, vol. 1051, pp. 221–232. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-30604-5 20 5. Falkowski, B.-J.: Maximum likelihood estimates and a kernel k-Means iterative algorithm for normal mixtures. In: USB Proceedings IECON 2020 The 46th Annual Conference of the IEEE Industrial Electronics Society, 18-21 Oct 2020, Singapore (2020) 6. Jacob, N., Schilling, R.: An analytic proof of the Levy-Khinchine formula on Rn. Publ. Math. Debrecen, 53 (1998) 7. Karpushev, S.I.: Conditionally positive definite functions on locally compact groups and the Levy-Khinchine formula. J. Math. Sci. 28, 489–498 (1985). https://doi. org/10.1007/BF02104978 8. Parthasarathy, K.R., Schmidt, K.: Positive Definite Kernels, Continuous Tensor Products, and Central Limit Theorems of Probability Theory. LNM, vol. 272. Springer, Heidelberg (1972). https://doi.org/10.1007/BFb0058340 9. Schoenberg, I.J.: Metric spaces and positive definite functions. Trans. AMS 41, 522–536 (1938) 10. Schoenberg, I.J.: On certain metric spaces arising from Euclidean spaces by a change of metric and their imbedding in Hilbert space. Ann. Math. 38, 787 (1937) 11. Schur, I.: Bemerkungen zur Theorie der beschr¨ ankten Bilinearformen mit unendlich vielen Ver¨ anderlichen. J. f¨ ur die reine angewandte Mathematik 140, 1–28 (1911) 12. Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004) 13. Wahba, G.: Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV. Adv. Kernel Methods Support Vector Learn. 6, 69–87 (1999) 14. Stepwise iterative maximum likelihood clustering approach. https://www.ncbi. nlm.nih.gov/pmc/articles/PMC4995791/. Accessed 21 Apr 2021

Self-supervised Learning for Predicting Invisible Enemy Information in StarCraft II Insung Baek, Jinsoo Bae, Keewon Jeong, Young Jae Lee, Uk Jo, Jaehoon Kim, and Seoung Bum Kim(B) Industrial and Management Engineering, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul 02841, Republic of Korea {insung_baek01,wlstn215,dia517,jae601,ukjo,jhoon0418, sbkim1}@korea.ac.kr

Abstract. In real-time strategy games such as StarCraft II, players gather resources, make buildings, produce various units, and create strategies to win the game. Especially, accurately predicting enemy information is essential to victory in StarCraft II because the enemy situation is obscured by the fog of war. However, it is challenging to predict the enemy information because the situation changes over time, and various strategies are used. Also, previous studies for predicting invisible enemy information in StarCraft do not use self-supervised learning, which is extracting effective feature spaces. In this study, we propose a deep learning model combined with a contrastive self-supervised learning to predict invisible enemy information to improve the model performance. The effectiveness of the proposed method is demonstrated by quantitatively and qualitatively. Keywords: StarCraft II · Deep learning model · Contrastive learning · Self-supervised learning

1 Introduction The gaming industry is gaining in popularity because it is recognized as a new competitive category called eSports [1]. Especially, real-time strategy games such as StarCraft and StarCraft II have been leading the growth of the gaming industry for over 20 years. StarCraft game players collect resources such as minerals and gas, construct the building, produce various units, compose an army, and combat opponent players [2]. To win the game, players devise a winning strategy by considering their own situations and the enemy’s situations at the same time. However, in StarCraft, it was challenging to know the enemy’s situations because they are hidden in the fog of war. “Fog of war” means a partially observed environment where players can see only their own situations [3, 4]. Another challenging point of StarCraft is that the situation changes over time, and opponent players use various strategies. Therefore, a machine learning model that can predict the invisible enemy information is required.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 K. Arai (Ed.): IntelliSys 2022, LNNS 542, pp. 167–172, 2023. https://doi.org/10.1007/978-3-031-16072-1_13

168

I. Baek et al.

However, previous StarCraft studies mainly predicted game results [2, 5, 6]. Although some studies predict the hidden enemy information [3, 7, 8], no study applies the selfsupervised learning algorithms to extract adequate feature spaces in the game. Selfsupervised learning could improve the performance of deep learning models because it can extract a good feature space by summarizing various information in the game. In this study, we propose using a contrastive self-supervise learning combined with a convolutional encoder-decoder to improve the predictive accuracy of the opponent’s hidden information. We use partially visible and noisy data to construct the encoderdecoder model that predict the locations of the enemy’s units and buildings. We use StarCraft II replay data containing various information for both players’ and opponents’ information for training the prediction model. To demonstrate the effectiveness of our approach, we provide visualization that compares the predicted enemy information with the actual enemy information. The reminder of the paper is organized as follows. Section 2 consists of a review of the studies on StarCraft II win-loss prediction models and invisible enemy information prediction models. Section 3 illustrates the details of the proposed invisible enemy prediction model using self-supervised learning, while Sect. 4 presents the qualitative and quantitative experimental results. Finally, Sect. 5 contains our concluding remarks and directions for future research.

2 Related Works StarCraft II is a well-suited testbed for human-level artificial intelligence (AI) because it has a dynamics environment with many features. Various previous studies have been conducted based on a supervised learning approach: game win-loss prediction [2, 5, 6], hidden enemy information prediction [3, 7, 8]. Wu et al. proposed predicting game outcomes by using recurrent neural networks (RNN) with gated recurrent unit (GRU) cells in the last two layers [5]. Lin et al. suggested dividing the feature information in StarCraft II based on whether it is associated with the game player or his/her enemy [6]. They compared the performance of the neural processes and support vector machine models. Lee et al. proposed a combat winner prediction model considering unit combination and battlefield information [2]. In the StarCraft series, the defogger problem is one of the challenging research topics because the “fog of war” is a fundamental problem of the StarCraft series. Synnaeve et al. combines encoder-decoder neural networks and recurrent networks and predicted the enemy units and visualized them after five and 30 s [7]. Kahng and Kim used a convolutional neural network model to predict the opponent’s information [3]. Jeong et al. proposed a defogger model using the generative adversarial networks (GAN) based on pyramid reconstruction loss [8]. All studies attempted to predict hidden opponents’ units based on supervised learning approaches. The main contribution of the present study is to use contrastive self-supervised learning to improve the performance of the supervised model for defogging problems.

Self-supervised Learning for Predicting Invisible Enemy Information in StarCraft II

169

3 Proposed Method We use a contrastive self-supervised learning method to extract effective feature space to improve model’s performance. In this study, we use a momentum contrast (MoCo) algorithm, one of the representative contrastive self-supervised learning methods [9]. Figure 1 shows an overview of self-supervised learning based on MoCo. MOCO uses both positive and negative examples. The loss function of MOCO minimizes the distance between positive samples, while maximizing the distance between negative and positive samples. Here we define positive and negative samples suitable for our StarCraft II replay dataset. After splitting adjacent five-frame data into three and two, the positive key and query with each different encoder are generated. Other data except for adjacent five-frame data are considered as the negative key.

Fig. 1. An overview of self-supervised learning based on MoCo for StarCraft II replay data.

We use the DeepLabv3+ model that includes dilated separable convolution composed of a depthwise convolution and pointwise convolution [10, 11]. The model backbone is Xception, and the number of layers is reduced to half of the original DeepLabv3+ model to suit the StarCraft II dataset. In summary, the pre-trained encoder generated by MoCo is used for training the convolutional encoder-decoder network. Figure 2 shows the overall process of the proposed convolutional encoder-decoder network that combines MOCO.

170

I. Baek et al.

Fig. 2. Architecture of the proposed convolutional encoder-decoder network that combines MOCO

4 Experiments 4.1 Data and Experiments Setting We collected 448 actual replay data from StarCraft II players from Battle.net. To simplify our experimental scenarios, we limited the matchups between Terran and Protoss. Using the StarCraft II learning environment (SC2LE) package developed by DeepMind, we have extracted variables of image form (https://github.com/deepmind/pysc2). To utilize the past state information, we used the input data with five-time points, including the present information. Consequently, we acquired the 4D tensor with 5 × 32 × 32 × 49 as the input and 1 × 32 × 32 × 28 as output. The last dimension in the 4D tensor represents units and buildings counts (28 for Terran and 28 for Protoss). In addition, based on the extracted data, we created additional data that represent the positions of enemies and allies. Finally, we obtained a total of 96,492 game frames. Among a total of 96,492 game frames, 76,850 (80%) frames were used as training data, 9,682 (10%) frames were used as validation data, and the remaining 9,960 (10%) were used as testing data. It is worth noting that the frames from the same replay fall into only one of the training, validation, and test sets to ensure temporal independence. We trained 100 epochs with a batch size of 64 and saved them with an early stopping rule. An early stopping rule follows when the validation loss does not drop continuously for five epochs. The learning rate was 0.0001, and weight decay was 0.00005 with an AdamW (adaptive moment estimation considering weight decay) optimizer. 4.2 Results We presented results for quantitative and qualitative evaluations. Table 1 shows quantitative results with contrastive self-supervised learning. All evaluation in Table 1 was conducted on a pixel-by-pixel. It can be seen that the recall and F1-score values were improved when using a pre-trained MoCo encoder, while maintaining accuracy.

Self-supervised Learning for Predicting Invisible Enemy Information in StarCraft II

171

For further evaluation, we visualized the model predictions on 32 × 32 grids. We assumed Terran as allies and Protoss enemies in all our experiments. Figure 3 shows the prediction results for Protoss units and buildings on the heatmap. The first column of Fig. 3 represents the input data (i.e., incomplete information). The second column represents ground truth. The third and fourth columns represent prediction results with and without the MOCO pre-trained encoder. When using the MoCo pre-trained encoder, the results were closer to the ground truth results. Specifically, we confirmed that the templar archive is accurately predicted by interacting with other input data although the building is invisible at the input data. We also confirmed that pylon and probe, which are frequently appeared units and structures were accurately predicted when the MOCO was used. Table 1. Accuracy, Recall, and F1-score of 3D-Deeplab V2 without and with MOCO Model

Accuracy

Recall

F1-Score

3D-Deeplab V3 Without MOCO

0.9972

0.4803

0.5262

3D-Deeplab V3 with MOCO

0.9971

0.5008

0.5337

Fig. 3. Prediction results for protoss units and buildings without and with the MOCO encoder

172

I. Baek et al.

5 Conclusions and Future Works We propose a convolutional encoder-decoder architecture combined the self-supervised learning that can predict an invisible enemy’s units and buildings in StarCraft II. The effectiveness of the proposed method is demonstrated by quantitatively and qualitatively. The problem of “fog-of-war” is a fundamental yet relatively unexplored research topic in StarCraft II. We plan to apply more advanced architecture in the field of computer vision to improve model performance. In addition, we plan to explore reinforcement learning to design StarCraft II agents to further improve model performance. Acknowledgments. This research was supported by the Agency for Defense Development (UI2100062D).

References 1. Reitman, J.G., Anderson-Coto, M.J., Wu, M., Lee, J.S., Steinkuehler, C.: Esports research: a literature review. Games and Culture 15(1), 32–50 (2020) 2. Lee, D., Kim, M.J., Ahn, C.W.: Predicting combat outcomes and optimizing armies in StarCraft II by deep learning. Expert Syst. Appl. 185, 115592 (2021) 3. Ontanón, S., Synnaeve, G., Uriarte, A., Richoux, F., Churchill, D., Preuss, M.: A survey of real-time strategy game AI research and competition in StarCraft. IEEE Transact. Computat. Intelli. AI in games 5(4), 293–311 (2013) 4. Kahng, H., Kim, S.B.: Opponent modeling under partial observability in starcraft with deep convolutional encoder-decoders. In: Proceedings of SAI Intelligent Systems Conference, pp. 751–759. Springer, Cham (2019) 5. Wu, H., Zhang, J., Huang, K.: Msc: A dataset for macro-management in starcraft ii. arXiv preprint arXiv:1710.03131 (2017) 6. Lin, M., et al.: An uncertainty-incorporated approach to predict the winner in StarCraft II using neural processes. IEEE Access 7, 101609–101619 (2019) 7. Synnaeve, G., et al.: Forward modeling for partial observation strategy games-a starcraft defogger. arXiv preprint arXiv:1812.00054 (2018) 8. Jeong, Y., Choi, H., Kim, B., Gwon, Y.: Defoggan: predicting hidden information in the starcraft fog of war with generative adversarial nets. In Proceedings of the AAAI Conference on Artificial Intelligence 34(04), 4296–4303 (2020) 9. He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) 10. Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 833–851. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_49 11. Minaee, S., Boykov, Y.Y., Porikli, F., Plaza, A.J., Kehtarnavaz, N., Terzopoulos, D.: Image segmentation using deep learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021)

Measuring Robot System Agility: Ontology, System Theoretical Formulation and Benchmarking Attila Vid´ acs1(B) , G´eza Szab´o2 , and Marcell Balogh1 1 High-Speed Networks Laboratory, Department of Telecommunications and Media Informatics, Faculty of Electrical Engineering and Informatics, Budapest University of Technology and Economics, Budapest, Hungary [email protected] 2 Ericsson Research, Ericsson, Hungary [email protected]

Abstract. This paper introduces a concise mathematical framework for measuring robot agility. The framework is built on definitions taken from the robotics and automation domain for naming the robotic systems and components. Standardized definitions related to robot autonomy are taken into account. Robot agility is defined as an emergent system property in complex robotic systems. Based on the introduced system theoretic model and the related mathematical framework, agility evaluation methods are presented. Besides theoretical formulae, practical benchmarking methods are also considered—at least as a desirable target. The cost of agility is also discussed, as being an important factor when different systems are compared and evaluated. A simple but practical example use case of robot pick&place is examined, where all proposed agility metrics and the benchmarking procedure are explained and evaluated. Keywords: Robot agility Emergence

1

· Benchmarking · Systems theory ·

Introduction

Today’s robots are adept at doing the same thing over and over very effectively and efficiently. However, when it comes to being agile enough to respond to varied and unpredictable needs, unexpected or unforeseen circumstances, robots often struggle. To prepare for unforeseen situations, agile capabilities can be added to the system. For example, the ability to “see” with the aid of sensing and perception capabilities. As a result, the system will be more robust and more adapting to changing needs and emerging challenges. However, currently there is no way of telling how agile a given robotic system is. But the need of having such a metric is clearly identified already. Quantifiable agility metrics would allow customers c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023  K. Arai (Ed.): IntelliSys 2022, LNNS 542, pp. 173–191, 2023. https://doi.org/10.1007/978-3-031-16072-1_14

174

A. Vid´ acs et al.

to purchase robots best suited to their use cases. Manufacturers ranging in size need robotic systems to deal with high mix, low volume production runs based on customer demands. System integrators need to be able to pick and choose which robot(s) provide the best option for their customers, which means being able to easily swap between robot manufacturers while providing the same level of functionality. Academic institutions doing research with robots need a quantifiable metric for measuring the agility of robots as they go through the design choices for developing new robotic technologies [8]. In this paper, the concept of robot system agility is elaborated. Section 2 introduces the concepts and ontologies used in robotics and automation domain, and examines robot autonomy—a concept closely related to robot agility. It also lists related research results and directions in the area, including the newly formed IEEE standardization activity aiming at measuring robot agility. Section 3 presents our proposed mathematical framework for measuring robot agility. A System Theoretical approach was chosen to tackle robot agility defined as an emergent system property. The system model and formal definitions of agility and related concepts are presented. Section 4 deals with the practically important agility evaluation problem, examining the exact and estimated agility, and the cost and utility of agility as well. Section 5 proposes the steps for a general benchmarking procedure to measure robot system agility. In Sect. 6 an example use case of a robotic arm performing pick operation is investigated in details. The goal is to demonstrate how the theoretical framework can be used in practice by giving tangible solutions for measuring and benchmarking agility. Finally, Sect. 7 concludes the paper.

2 2.1

Related Work Ontology of Robotic Systems and Components

The IEEE Standard Ontologies for Robotics and Automation [1] aims to provide a common vocabulary along with clear and concise definitions from the robotics and automation (R&A) domain. The following definition are all cited from [1]. A robotic environment consists of a physical environment equipped with a robotic system formed by one or more robots. A robot is an agentive device in a broad sense, purposed to act in the physical world in order to accomplish one or more tasks (see Fig. 1). Any device that is attached to the robot and serves in the functioning of the robot is called a robot part. The robot interface is composed by the devices that play the roles of sensing parts, actuating parts, and communicating parts. Through the interface, the robot can sense and act on the environment as well as communicate with other agents. Therefore, the robot interface can be viewed as way to refer to all the devices that allow the robot to interact with the world. In some cases, the actions of a robot might be subordinated to actions of other agents, such as software agents (bots) or humans. Robots might form social groups, where they interact to achieve a common goal.

Measuring Robot Agility

175

Fig. 1. Environment and components of a robotic system.

2.2

Robot Autonomy

An automated robot performs a given task in which the robot acts as an automaton, not adapting to changes in the environment and/or following scripted plans. The robot is remote-controlled while performing a given task in which the human operator controls the robot on a continuous basis, from a location off the robot, via only her/his direct observation. In this mode, the robot takes no initiative and relies on continuous or nearly continuous input from the human operator. The robot is teleoperated robot when performing a given task in which a human operator, using sensory feedback, either directly controls the actuators or assigns incremental goals on a continuous basis, from a location off the robot. A teleoperated robot will complete its last command after the operator stops sending commands, even if that command is complex or time-consuming. A semi-autonomous robot performs a given task in which the robot and a human operator plan and conduct the task, requiring various levels of human interaction, while a fully autonomous robot performs the task in which the robot solves the task without human intervention while adapting to operational and environmental conditions [1]. It is important to note that a given robot can play multiple roles in different processes at the same time. For example, a robotic rover exploring a planet can assume the semi-autonomous role in the process of planet exploration, but it can be, at the same time, fully autonomous in the process of navigation. Autonomous robotic systems will play an essential role in many applications across diverse domains. In [5] the authors survey and discuss Artificial Intelligence (AI) techniques including navigation and mapping, perception, knowledge

176

A. Vid´ acs et al.

representation and reasoning as enablers for long-term robot autonomy (see the extensive list of references in [5]). In [7] the authors review projects that use ontologies to support robot autonomy. The study has shown that a wide range of cognitive capabilities is already covered by existing frameworks, but often only in a prototypical form. They conclude that ontologies could become essential in topics such as explainable robots, ethics-aware robots, and robust and safe human-robot interaction as they are a tool to relax the programming effort required to deliver robots that are adaptive. Measuring the performance of autonomous robots operating in complex and changing environment is a real challenge. There is no known metric to measure quantitatively and precisely how a robot system satisfies the expectations. In [6] the authors present an approach aiming at qualifying robot autonomy. Their method is based on a measure of robot performance in achieving a given task related to environment complexity. To illustrate their approach, simulation results of a navigation mission are presented, and robot performance is compared when algorithm parameters change. Levels of robot autonomy, ranging from teleoperation to fully autonomous systems, influence the way in which humans and robots may interact with one another [3]. The framework proposed by the authors defines a process for determining a robot’s autonomy level, by categorizing autonomy along a 10-point taxonomy. 2.3

Standard for Measuring Robot Agility

Agility is a compound notion of reconfigurability and autonomy as opposed to the typical use of automated robots with rigid pre-programmed tasks. Based upon the perceived state of the world, known robot capabilities, and desired goal, the robot should be able to generate plans on its own. This allows for a more autonomous system in the face of errors that may not have been predicted [8]. Agility should involve a dynamic component as well to be able to realize this potential in a timely and cost-efficient manner. We refer to robot agility as follows: Given the challenges the robotic system has to respond, robot system agility is the ability of the robotic system to adapt its state to satisfy the new challenges promptly and with little effort. The upcoming IEEE P2940 Standard for Measuring Robot Agility [8] provides a listing of desirable traits of robotic systems under the umbrella of agility. In particular, it describes a set of quantitative test methods and metrics for assessing the following ten aspects: hardware reconfigurability, software reconfigurability, communications, task representation, sensing, reasoning, perception, planning, tasking, and execution [8]. The so-called agility score will be maximized as the robotic system tends toward ideal situations, that is to say, when: – the physical components of the system are fully interchangeable, e.g. every mechanical connection is the same;

Measuring Robot Agility

177

– the software components of the robot controller can be dynamically changeable and reconfigurable to adapt the controller to new tasks or operational conditions derived from faults and disturbances; – ultra-reliable low latency communication is used, the required bandwidth is minimal (e.g. with onboard processing of the data on the sensor), the range is maximal and the jitter and packet loss are non-existent; – the robot understands all the tasks it can be assigned, all the actions it can carry out and all the reachable states of the environment; – the robot system possesses suitable sensors to collect useful information about every component of the environment state with minimal lag and maximum refresh rate; – the robot maps every possible tasks to desired states of the environment; – given the available data from sensors, the robot preserves its world representation up-to-date in every reachable state of the environment with minimal lag and maximum refresh rate; – automated plan generation can be carried out based on the task representation in minimal time and with minimal computational resources; – teaching or programming the robot to perform new tasks takes minimal time and effort from the end-users, and knowledge about the tasks can be gathered autonomously or with minimal human-robot interaction; and – actions altering the environment can be carried out in a whole range of situations, in minimal time, with a maximal probability of success, and with appropriate failure identifications and recovery strategies. This standard is intended to help stakeholders enable agility of robot systems and to develop the measurement science to assess and assure the agility performance of their robot systems.

3

Mathematical Framework for Measuring Agility

Emergent properties represent one of the most significant challenges for the engineering of complex systems [4]. An example is system reliability: it depends on component reliability but system failures often occur because of unforeseen inter-relationships between components. Other such examples include system safety, performance and security. We argue that agility is an emergent property in robotic systems. 3.1

Systems Theoretical Approach

Systems Theory focuses on systems taken as a whole, not on parts taken separately. Emergence refer to the appearance of higher-level properties and behaviours of a (complex) system that comes from the collective dynamics of that system’s components. Emergent properties are properties of the “whole” that are not possessed by any of the individual parts. Thus, from systems theoretical point of view, agility can be seen as an emergent system property of a

178

A. Vid´ acs et al.

robotic system. Since an emergent property is a consequence of the relationships between system components, it can therefore only be assessed and measured once the components have been integrated into a system. The following arguments are in line with [2] where network flexibility is discussed. Here we use agility as a system property instead of flexibility, and communication networks are replaced by robotic systems. 3.2

System Model

We consider a robotic system that can be described by a system state s ∈ S, where S contains all possible states that the system can realize. We define a challenge set Ω that captures challenges posed to the robotic system. Each challenge di ∈ Ω is associated with a set of valid states V(di ) ⊆ S in which the challenge is addressed. Over time, the challenges will emerge and the system will adapt its state in order to cope with the new challenges. Each challenge dj ∈ Ω demands that the system is adapted to sj where sj ∈ V(dj ). (Note, that a new challenge does not necessarily induce a state change if the actual state also satisfies the challenge.) We define system implementation X ∈ X where X is the set of possible implementations, which is bound to specific hardware and software configurations, algorithms, etc. Due to its nature, X can realize any system state out of a set SX ⊆ S. Consequently, for all di ∈ Ω there is a set of valid states, VX (di ) = SX ∩ V(di ), that can be achieved by implementation X. Each system implementation X is associated with a set of addressed challenges AX in which the challenge di ∈ AX can be addressed by that implementation, i.e., VX (di ) = ∅. 3.3

Formal Robot System Agility Definition

Agility is related to the amount of challenges that a system can support. Definition 1. Given a system implementation X with the set of addressable challenges AX , the robot system agility of X is defined as μ(AX ) where μ is an appropriate measure on Ω. According to Definition 1, agility can be quantified by a measure μ on AX , which reflects the size of the addressed challenge set. For example, μ can be the counting measure, i.e., μ(A) is simply the number of challenges the system is able to cope with. 3.4

Time and Cost Constraints

Agility is related not only to the amount of challenges that a system can cope with, but to the time scale at which it can adapt to it, as well as to the effort associated with it.

Measuring Robot Agility

179

In general, the time needed for the system to adapt is described by the action time T : Ω → IR+ , which maps each challenge to its appropriate time value (e.g., time elapsed until the system is reconfigured). Furthermore, each challenge is associated with an action cost, which is described by the mapping C : Ω → IR+ and reflects the effort of adapting to the posed challenge. Definition 2. The set of addressed challenges by the considered system implementation X under given action time constraint T and action cost constraint C is defined as: AX (T, C) = {di ∈ Ω; VX (di ) = ∅; T (di ) ≤ T ; C(di ) ≤ C)}.

(1)

Definition 3. (Robot system agility): Given a system implementation X with the set of addressed challenges AX (T, C) with respect to time and cost constraints T and C, the robot system agility of X is defined as μ(AX (T, C)) where μ is an appropriate measure on Ω. 3.5

Adaptability, Reactivity and Cost-Efficiency

Consider two system implementations X, Y with associated sets AX (T, C) and AY (T, C) , respectively. Then: Definition 4. We say that implementation X is at least as reactive as Y if ∀T : μ(AY (T, ∞)) ≤ μ(AX (T, ∞)),

(2)

that X is at least as cost-efficient as Y if ∀C : μ(AY (∞, C)) ≤ μ(AX (∞, C)),

(3)

and that X is at least as adaptive as Y if μ(AY (∞, ∞)) ≤ μ(AX (∞, ∞)),

(4)

The reactivity property states that, disregarding cost, implementation X can react to challenges at least as fast as implementation Y . Analogy holds for cost-efficiency with respect to cost. Adaptability states that, independent of time and cost, implementation X can react to at least as many challenges as implementation Y .

4

Agility Evaluation

Note that, with Definition 3 agility is by no means a unique metric, since the use of different measure types will result in different agility values. Next we discuss normalizations that can be achieved empirically.

180

4.1

A. Vid´ acs et al.

Exact Agility

Given a robotic system under test whose agility shall be calculated, we select an infinite length challenge sequence D = {di1 , di2 , di3 , . . .}, which may contain arbitrary challenges. We argue that challenging the system with this sequence and observing its reaction for each posed challenge, the agility measure can be evaluated. We measure: μD (A) = E{1 | di ∈ A} = Pr{di ∈ A | D}.

(5)

E{·} therein is the expectation value and Pr{·} the probability of an event occurring. That is, agility corresponds to the probability that the system can adapt to the challenges in sequence D. An important design choice for evaluating agility is the selection of challenge sequence D. 4.2

Estimated Agility

In practice, we cannot generate an infinite length sequence. Thus, we will have only a finite length challenge sequence, where the agility evaluation boils down to estimating the event probability from Eq. 5. Consider a challenge sequence D = {di1 , di2 , . . . , diN } of finite length N , with elements randomly chosen out of Ω with uniform distribution. By using the empiric mean as estimator for the expectation, an estimate for μD (A) can be: N 1  1{dik ∈ A}. μ ˆD (A) = N

(6)

k=1

1 therein is the indicator function, which is one if the logical statement is true, and zero otherwise. This estimate becomes arbitrarily precise for N → ∞. The intuitive meaning of this estimated agility can be reduced to μ ˆD (A) =

# of supported challenges . # of posed challenges

(7)

A measure defined this way will always be out of the interval [0, 1]. However, in this case it is not guaranteed that 100% agility is reachable, because D might contain challenges that the system is not able to cope with. 4.3

Utility of Agility

In the agility evaluation discussed so far, every challenge was considered equally important (i.e. the elements of D were chosen out of Ω with uniform distribution). However, this might not be true in real scenarios: coping with some challenges may be crucial, whereas fulfilling others might have no effect. We

Measuring Robot Agility

181

can intuitively measure this “importance” with an utility function. Given a utility function u that reflects how valuable the ability to cope with a challenge is assumed, we define the utility of agility that can be evaluated as: μuD (A) = E{u(di ) | di ∈ A}.

(8)

When agility is estimated based on a finite challenge sequence, the utility can be taken into account while designing that sequence. If the challenge distribution was not uniform, we would have a weighted agility value with more emphasis on the challenges that occur more often. In that case the probability that a particular challenge is selected into the challenge sequence should relate to its utility value. 4.4

Cost of Agility

Intuitively, a more agile system may lead to a better performance, hence reducing cost. On the other hand, increasing agility requires additional resource usage, which can lead to higher costs. As a consequence, agility and its cost (i.e. the price we pay for it) is a trade-off that needs to be taken into account. Next, we define the different agility-related cost components, which can be used to calculate the cost of an agile robotic system. Action Cost. Earlier we formally defined the action cost C as a function mapping each challenge to a cost value: C(di ) : Ω → R+ . The cost associated to a challenge reflects all the additional resource usage which the system incurs during adaptation. Preparation Cost. Apart from the action cost, which reflects the resource usage when a challenge handled, we have to consider another cost component: the preparation cost K. This is the cost of deploying and operating an agile system, even if no challenges occur. Intuitively, we expect that an agile system, that is, a system that can dynamically adapt to multiple challenges, may lead to higher deployment and operation costs.

5

Benchmarking

As discussed before, agility can only be estimated based on a selected, finitelength challenge sequence. The measured agility highly depends on the generated sequence D. For some systems, there are well established benchmarking procedures, specified by standardisation bodies, which propose sets of specific tests to evaluate the devices under test. The question is how a meaningful challenge sequence can be generated so that it challenges the system to cover a significant part of the state space.

182

5.1

A. Vid´ acs et al.

Different Types of Challenges

The robotic system has to meet various requirements. These requirements can be broken down to multiple types of different challenges, i.e. new demands, physical environmental changes, robot part failures, etc. These challenges may be based upon system parameters that can be modified by end users (such as tasks), environmental conditions that are out of control from the user’s perspective (such as changes in the physical environment), or they can be unpredictable events (such as hardware failures). In general, requirements leading to challenges can be regarded as the instantaneous state change of the robotic environment, in which the robot system operates, and can be modeled as a set of (possibly many) different parameters. This leads to a potentially high variability in the challenge space Ω when performing agility analysis. 5.2

Adapting to Challenges

Besides the challenge set, we have to measure somehow whether the system is able to adapt to a given challenge or not. Considering some high-level KPI measure that captures system performance, we can say that the system could adapt to the given challenge if there was no significant degradation of performance. On the other hand, the system is not able to cope with the posed challenge when the performance degradation is unacceptable. 5.3

Benchmarking Procedures

In general, the following steps are needed in order to define and perform a benchmarking procedure to measure robot system agility: 1. Identify possible challenges for the robotic system under test, and form the challenge set Ω. 2. Define the utility function u that reflects how valuable the ability to cope with a challenge. 3. Define the finite length challenge sequence D consisting of challenges from the challenge set while taking into consideration their utility. 4. Define one or more high-level performance metric(s) (KPIs) that are used to evaluate whether the system was able to adapt, together with threshold values to decide what performance degradation is acceptable. 5. Define acceptable time (T ) and cost (C) constraints. 6. Select an appropriate measure μ that can reflect the size of the addressable challenge set A. 7. Select the system implementation(s) to be tested. 8. Perform the benchmarking tests and evaluate system agility.

Measuring Robot Agility

6

183

Example Use Case: Robotic Arm Pick Operation

Our chosen example use case is a simple pick operation performed by a robotic arm. Our intention is to show how the previously proposed system modeling approach and agility formulae can be used in practice. We give the benchmarking procedure and its evaluation for two system implementations, taking into account three different challenges that the system has to cope with. 6.1

Robotic System

Our simple robotic system consists of a single robotic arm. Its robot interface is composed by a robotic gripper and—depending on the actual implementation— an optional camera mounted on the arm. The task is to pick up a product part from a given location. System State. The state of the robotic system includes—among many other system and environmental state variables—the actual joint states s = (s1 , s2 , ..., sn ) of the robotic arm (e.g., for a 6-DoF arm the set of possible joint states is S = {si } where si = (si1 , si2 , ..., si6 ) must be valid. (A state is considered valid if it can be performed by the robotic arm without colliding to itself or to its environment.) Thus, a state change is necessary whenever the robotic arm is positioned above the part to be gripped. The new joint states must be calculated from the part location, taking into account the kinematic model of the robot as well as all environmental constraints, which is the classical inverse kinematic problem: s(x) = IK(x). System Implementations. In our example we consider the following two system implementations (X = {X, Y }). Implementation X (w/o Camera). The robotic arm is equipped with a gripper. The robot is always picking up the part from a fixed pre-programmed position X0 . This implementation basically realizes an automated robot that performs its task, not adapting to changes in the environment. Implementation Y (with Camera). Here we assume that the robotic arm is equipped with a camera (see Fig. 2). With the help of sensing and perception the robot is now capable to detect and estimate the location of the product part to be picked. That is, the robot performs its picking task fully autonomously by adapting to changing operational and environmental conditions. System Performance. The performance of the robot is measured by the success rate, i.e. the probability that any given pick operation is performed successfully. The pick operation is only successful if the gripper is positioned above the product with a position error () less than d (see Fig. 3). We assume that the

184

A. Vid´ acs et al.

Fig. 2. Robotic arm equipped with a camera (implementation Y ).

robot gripper can be positioned accurately above the target location (x) without any position error. Thus, the operation can only fail if the robot is commanded to a wrong target position instead of the actual part location (x0 ), i.e. |x − x0 | > d. This can happen if the part is misplaced, and/or the part location is incorrectly measured or estimated.

Fig. 3. Successful (Left Two) and Unsuccessful (Right) Pick Operations.

6.2

Challenges and Performance

The following three different challenges are taken into account in our example: (1) the part to be picked is placed at location x0 , (2) the part is misaligned (i.e. its actual location can differ from the expected target), and (3) the lights in the physical environment are turned off. The set of challenges is given by: Ω = {{at x0 }, {misaligned}, {lights off}}. Challenge: Part at Fixed Location. This case is not a real challenge but rather the normal operation when the task is to pick up the part from a fixed location known in advance (d = {at x0 }).

Measuring Robot Agility

185

Implementation X (w/o Camera). If the parts are always placed at fixed known location for picking without any location error, the robot will never miss, and the success rate (KP I) is 1. Thus, the challenge is addressed by this configuration, i.e., {at x0 } ∈ AX . Implementation Y (with Camera). By using a camera, the estimated part location can differ from the actual position by the error of the estimation. We model the estimated part location as a random variable Y with conditional distribution function fY |X (y|X0 ), where the condition is that the part is placed exactly at location X0 . The probability of a successful grip can now be calculated by Eq. 9 (see Fig. 4).  p(X0 ) =

X0 +d

X0 −d

fY |X (y|X0 ) dy ≡ p0

(9)

Fig. 4. Conditional probability distribution function of a successful grip event.

Assuming that the camera performance (but not the actual estimate!) does not depend on the part’s location, we have Eq. 10 with fc (y) being the probability density function of the location estimation error. fY |X (y|X0 ) = fc (y − X0 ),

∀y, X0

(10)

The probability for successfully picking up the part from fixed position X0 is given by Eq. 11.  KP I = p(X0 ) =

d

−d

fc (y) dy ≡ pc

(11)

Note here, that pc ≤ 1 is constant, and only depends on the camera’s sensing and perception capabilities, and the gripper’s tolerance. The success rate can only be one if the estimation error is always smaller than the grip tolerance limit d.

186

A. Vid´ acs et al.

If the precision of the implemented machine vision solution is high enough, and thus the success rate is close to one, we can say that implementation Y is able to address the challenge, i.e. {at x0 } ∈ AY . Challenge: Part Misaligned. Next, we assume that the actual part location can differ from the expected target position (d = {misaligned}). Denote X the actual part location modeled as a random variable with probability density function f (x). Implementation X (w/o Camera). The robot will try to pick the part precisely from location X0 . The probability of a successful grip can be calculated given that the actual location of the part equals x (see Eq. 12 and Fig. 5):  1, if x ∈ [X0 − d, X0 + d] (12) p(x) = Pr{success|X = x} = 0, otherwise

Fig. 5. Probability of successful grip (w/o Camera).

Since the actual part location is modeled as a random variable, the measured KPI can be the expected value of the grip success rate as given by Eq. 13:  KP I = E{p(x)} =

A

 p(x)f (x) dx =

X0 +d

X0 −d

f (x) dx

(13)

Whether the expected success rate is close to one or is near to zero depends solely on the distribution of X. On one hand, if the distribution is centered around X0 with a small deviation, the success rate can be close to one. On the other hand, if the actual position is a fixed but biased location that is farther away from the expected X0 by more than d, the pick operation will always fail. We assume the latter, so that the challenge can not be addressed by this implementation in the general case, i.e. {misaligned} ∈ / AX .

Measuring Robot Agility

187

Implementation Y (with Camera). Since we assumed earlier that the camera performance does not depend on the part location, Eq. 11 can be generalized to any location x as given by Eq. 14.  KP I = p(x) =

d

−d

fc (y) dy ≡ pc ,

∀x ∈ [dmin , dmax ]

(14)

Hence, the grip success rate will be pc whenever the part is located within a reasonable range, i.e. the part holder is able to accommodate parts in between dmin and dmax and the robot is physically able to reach for it (see Fig. 6). We conclude that this challenge is addressed by implementation Y for parts with moderate misalignment (meaning that the part is still within reach of the robotic arm), i.e. {misaligned} ∈ AY .

Fig. 6. Probability of successful grip (with camera).

Challenge: Working in the Dark. This challenge represents a situation when an external environmental parameter changes, in particular, all lights are turned off (d = {lights out}) and the robot has to work in pitch dark. Implementation X (w/o Camera). Since this implementation does not include a camera, the robot performance does not depend on the challenge. We can say, that the system is able to cope with the challenge as there is no performance degradation, i.e. {lights out} ∈ AX . Implementation Y (with Camera). Assuming that the robot camera needs some ambient light source to operate, it will not be able to perform in the dark at all. / AY . Hence, {lights out} ∈ Addressable Challenges. As a summary, the addressable challenge sets for the two implementations of our system when taking into consideration the challenge set Ω = {{at x0 }, {misaligned}, {lights off}} are given by Eq. 15. AX = {{at x0 }, {lights out}} AY = {{at x0 }, {misaligned}}

(15)

188

6.3

A. Vid´ acs et al.

Agility Evaluation

Robot System Agility. According to Definition 1 with μ(·) being the simple counting measure, the robot system agility for the two implementations are μ(AX ) = μ(AY ) = 2. Exact Agility. Consider a challenge sequence D of infinite length with elements randomly chosen out of Ω with uniform distribution, the probability that the system is able to adapt to the given challenge is given by Eqs. 18 and 19 for implementations X and Y , respectively. ⎧ ⎨ 1, if di = {at x0 } (16) Pr{di ∈ AX } = p0 , if di = {misaligned} ⎩ 1, if di = {lights out} ⎧ ⎨ pc , if di = {at x0 } Pr{di ∈ AY } = pc , if di = {misaligned} (17) ⎩ 0, if di = {lights out} The exact agility can be calculated based on Eq. 5, and is given by Eqs. 18 and 19 for the two implementations. Remember, that p0 is the probability that the misaligned part is located (nearly) exactly at X0 by chance (which is practically zero), and pc is the probability that the part is successfully detected and located by the camera (which should be close to one). μD (AX ) = (2 + p0 )/3

(18)

μD (AY ) = (2pc )/3

(19)

Estimated Agility. The robot system agility can be estimated by putting together a finite challenge sequence by randomly choosing challenges from the challenge set, and performing the trials for all implementations (see Eq. 6) and counting the successful picks. In our example use case (assuming p0 ≈ 0 and ˆD (AX ) ≈ μ ˆD (AY ) ≈ 2/3. pc ≈ 1) it would result in μ Note, that the agility metric for implementation X (w/o camera) is as high as the agility of implementation Y (with camera), i.e., μD (AX ) ≥ μD (AY ). This is surprising at first, but there are problems with our assumptions from practical point of view. Utility of Agility. Firstly, the challenge {at x0 } is not a challenge, rather the baseline operation of an automated robot. Secondly, the real challenge is when the part is misaligned. And thirdly, {lights out} is not a challenge that should be taken into account in most real use cases. (For example, it is not likely that the light goes out in an industrial shop floor during production, and when it does, it is most likely that other electronic devices go down as well, and production is halted.)

Measuring Robot Agility

189

To take into account all of these concerns, we drop {at x0 } from the challenge set, and set the challenge utilities as u({lights out}) = 0 and u({misaligned}) = 1. According to Eq. 8 the utility of agility for the two implementations are μuD (AX ) = p0 and μuD (AY ) = pc , where pc >> p0 . This result is more inline with our expectations, namely, that implementation Y (with camera) is more agile than implementation X (w/o camera). Or more emphatically, the robot arm with camera is highly agile when considering displaced parts, while the robot without the camera is basically not agile at all. Time and Cost of Agility. As stated earlier, besides the amount of challenges that the system can cope with, agility is related to the time scale at which it can adapt to it, as well as to the cost associated with it. The time needed to adapt (i.e. action time) in our use case is nothing more than the time needed for image recognition (sensing and perception) and for driving the robot arm to the new position (planning and execution). The cost associated to a challenge (i.e. action cost) reflects all the additional resource usage which the system incurs during adaptation. In our example, the sensing and perception phases must be performed on some (local or remote) resources on one hand, as well as some additional resources (software and communication) are needed for trajectory planning of the robotic arm. The preparation cost involves the cost of the additional hardware (camera) and the necessary extra resources listed earlier. For example, when image recognition and trajectory planning are to be performed remotely (e.g. in the cloud), cloud storage and compute resources have to be provided together with communication capabilities. The cost of deployment and operation of the underlying technologies belong to this category as well. Not to mention the hardware cost of the robotic arm if a more flexible product is needed to be able to pick from arbitrary positions (e.g. a 6-DoF robotic arm instead of the 3-DoF that is able to pick only from a fixed position). Benchmarking Procedure. Based on the above considerations, the (simplified) benchmarking procedure for our example use case can have the following steps: 1. The challenge set: Ω = {{misaligned}, {lights off}}. 2. The utilities: u({misaligned}) = 1 and u({lights out}) = 0. 3. The finite challenge sequence: Series of N independent pick locations drawn from a random distribution. 4. The performance metric: pick success rate. (No threshold is given since this is a binary value: either the pick succeeds or fails.) 5. “reasonable” cost and time constrained are assumed. 6. The measure μ is the simple counting measure. 7. Implementations: robot arm without (X) and with camera (Y ). 8. See Eq. 6.

190

7

A. Vid´ acs et al.

Discussion on Further Directions

When a robotic system is designed and operated, the best performance can be achieved when everything is known in advance regarding the robotic system as well as the environment, and all system parameters can be kept under control during production. However, when an unforeseen situation (agility challenge) emerges, the performance of the system can drastically drop. To prepare for unforeseen situations, agile capabilities can be added to the system. The question is, how to measure and decide how agile the improved system is when considering different types of agility challenges. In this paper, a mathematical framework for measuring robot system agility was proposed, together with benchmarking solutions that can be applied for different robot systems in practice. It should be noted, that agility is measured by evaluating “classical” performance measures. The key is the generated challenge sequence, which finally boils down to the benchmarking procedure. However, there is a price of agility as well. The added capabilities increase the complexity of the system, giving space to possible vulnerabilities and new emerging challenges. For example, the agility challenge of sensor blackout only affects the systems where sensors are employed. Furthermore, agile solutions can even decrease the system performance in particular situations when compared to the best-case specialized solution. One of the interesting key questions when designing a benchmarking procedure is how we select the proper agility challenges. As the name implies, the agility challenge is for challenging the system under test, and see how the system reacts to the unexpected and unknown new situation. If the challenge were known in advance, the system would certainly be prepared for that. The real challenge is handling the “unknown unknowns”, that is, to have an adequate system response that was not planned well in advance. For example, in the example use case presented earlier, what exactly the {part misaligned} challenge means? If it can be known when and how the parts could be misaligned, it would not be completely unknown anymore (e.g. stochastic modeling for known distributions can be taken into account, and maximum likelihood techniques can be used, etc.).

References 1. IEEE Standard Ontologies for Robotics and Automation: IEEE Std 1872–2015, 1–60 (2015) 2. Babarczi, P., et al.: A mathematical framework for measuring network flexibility. Comput. Commun. 164, 13–24 (2020) 3. Beer, J.M., Fisk, A.D., Rogers, W.A.: Toward a framework for levels of robot autonomy in human-robot interaction. J. Hum.-Robot Interact. 3(2), 74-99 (2014) 4. Johnson, C.: What are emergent properties and how do they affect the engineering of complex systems? Reliabil. Eng. Syst. Saf. - Reliab. Eng. Syst. Saf. 91, 12 (2006) 5. Kunze, L., Hawes, N., Duckett, T., Hanheide, M., Krajn´ık, T.: Artificial intelligence for long-term robot autonomy: a survey. IEEE Robot. Autom. Lett. 3(4), 4023–4030 (2018)

Measuring Robot Agility

191

6. Lampe, A., Chatila, R.: Performance measure for the evaluation of mobile robot autonomy. In: Proceedings 2006 IEEE International Conference on Robotics and Automation 2006. ICRA 2006, pp. 4057–4062 (2006) 7. Olivares-Alarcos, A., et al.: A review and comparison of ontology-based approaches to robot autonomy. Knowl. Eng. Rev. 34, e29 (2019) 8. IEEE Robotics and Automation Society. Standard for Measuring Robot Agility (2020). https://standards.ieee.org/project/2940.html. Accessed 6 May 2021

A New Adoption of Cloud Computing Model for Saudi Arabian SMEs (ACCM-SME) Mohammed Alqahtani1,2(B) , Natalia Beloff2 , and Martin White2 1 Department of Information Systems, University of Bisha, Bisha, Saudi Arabia 2 Department of Informatics, University of Sussex, Brighton, UK

{ma2251,N.Beloff,M.white}@sussex.ac.uk

Abstract. The significant developments and great contributions that Cloud computing technology has achieved recently has made them widespread and the focus of attention by many of the Small and Medium Enterprises (SMEs) around the world. This has helped them to take advantage of this technology to facilitate and improve business. In the last decade, many SMEs have utilised Cloud computing technology in an effective manner and benefit from its services. This has contributed to accelerating the success and sustainability of the business sector. However, the adoption rate of Cloud computing services by Saudi SMEs is still low due to some influencing factors. This paper examines the four main influencing factors that impact on Cloud computing adoption by SMEs in Saudi Arabia which are the technological context (data confidentiality, data integrity, data security, relative advantages, service cost, and service quality), the organisational context (organisation regulations, organisation size, technology readiness, and top management support), the environmental context (technology infrastructure, Cloud provider, legal framework, and enterprise competitiveness), and the social context (culture, awareness, and trust). This paper will use both quantitative (survey) and qualitative (semi-structured interviews) methods to collect the data and evaluate the proposed model. This paper proposes a new comprehensive model called the adoption of Cloud computing model by Saudi Arabian SMEs (ACCM-SME). This was the framework created to explore the factors influencing the adoption of Cloud computing services by Saudi SMEs which is the contribution of this paper. Therefore, this is position paper which will be followed by collecting data, then analysing them and evaluating the proposed model in order to increase the Saudi SMEs’ intention to adopt Cloud computing services. Keywords: Cloud computing · Technology adoption models · SMEs within Saudi Arabia

1 Introduction With the growth and expansion of the Internet in recent decades, Cloud computing has become a critical industry 4.0 technology, allowing organisations to instantly access a shared pool of computer resources such as storage, applications, networks, servers, and © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 K. Arai (Ed.): IntelliSys 2022, LNNS 542, pp. 192–210, 2023. https://doi.org/10.1007/978-3-031-16072-1_15

A New Adoption of Cloud Computing Model for Saudi Arabian SMEs

193

services. Cloud computing provides a contemporary economic and technological pathway to shift from the conventional ways of providing IT services to a more contemporary paradigm. Cloud computing has become more widespread and provides various technical services, and resources. The shift from storing data in internal sources to external sources is central, along with other factors such as the low cost, self-service nature, and flexible network access, all of which have prompted the public and private sectors to switch to Cloud computing services [1]. The advantages of adopting Cloud computing services are that they facilitate business services and improve the efficiency and quality of operations along with having a high flexibility that enables the user to choose the appropriate service as needed without wasting associated resources. Cloud computing services can be provided by service providers to the end user with the capacity required and without any human intervention [2]. Thus, these services, such as storage, network access, processing, etc. Are provided automatically. In contrast, the disadvantages of Cloud computing services that have been raised by organisations are that it depends entirely on the Internet and requires third-party access to control the data and to engage in data distribution [3]. Thus, a more detailed description of Cloud computing is that it is a combination of distributed computing, grid computing and parallel computing that has been developed to operate as a single system under a single term that provides a set of services that solve the traditional problems existing in old systems [4]. However, many Saudi SMEs have not yet trusted Cloud services due to their concerns about security issues, the Cloud providers’ control, and the low speed of the internet, all of which will result in high technical costs for operating their systems, and low productivity due to their preoccupation with building, monitoring, and protecting their conventional systems. As some researchers have argued regarding this matter, implementing Cloud Enterprise Resource Planning (ERP) has cost 15% less than a conventional system with a reduction of 50% in the time needed for implementation. In addition, SMEs are at risk of losing a lot of their customers’ data over the years because of a weakness in their conventional systems when compared to Cloud computing systems. Thus, using Cloud services that offer a back-up service, like Amazon’s S3, would prevent such losses [5]. Using Cloud computing resources has become like using utility services such as electricity, water and gas, all of which are available upon request with payment for use [6]. The Cloud offers a means to store and process data through a group of devices deployed to provide services on time without interruption and continuously ensuring that the stored data would not be lost [7]. This is because one of the advantages of Cloud is that it continuously updates the items stored in it when any modification occurs by multiple users, such as happens in Dropbox, keeping the Cloud active [8]. In this position paper’s context, this study will focus on understanding and analysing the factors that influence the adoption of Cloud computing services by SMEs in Saudi Arabia by creating a new comprehensive framework and model which is the contribution of this paper to investigate and examine said factors. The objectives of this study are: to review the current literature in order to explore the barriers and challenges that influence the adoption of Cloud computing services by Saudi SMEs, then investigate the impact of the adoption of Cloud computing services and identify the most influential factors that impact the adoption of Cloud computing services from the perspective of the customers,

194

M. Alqahtani et al.

IT managers, top managers and employees, and validate the proposed framework for understanding the adoption of Cloud computing services by Saudi SMEs. This paper will discuss the knowledge gap in more details at the end of Sect. 2 after deep investigating in the literature review. This paper’s structure is designed as follows. Section 2 reviews the literature in the context of developed and developing countries, particularly Saudi Arabia. Section 3 discusses the adoption theories and models. Section 4 proposes the conceptual framework. Section 5 outlines the conclusion and future work.

2 Literature Review A number of previous studies, including [9–13], have been discussed by previous researchers reporting on the current state of the adoption process and associated effects of Cloud computing adoption by SMEs. Furthermore, Cloud computing adoption in many developing nations, including Saudi Arabia, continues to face various hurdles and impediments, thus lowering the adoption rate [14, 15]. For example, a study in Australia explored the adoption of Cloud computing services by Australian SMEs using the quantitative method. The results found that the size of the organisation played a critical role in the intention to adopt Cloud computing services [9]. Similarly, another study in the UK investigated the adoption of Cloud computing services by SMEs using 300 UK SMEs as the study sample. It became obvious that using Cloud computing services helped to reduce the cost ratio by 45.5% [10]. Moreover, Klug and Bai focused on the factors that support the adoption of Cloud computing in organisations in the United States of America and Canada. Their study revealed that large organisations often had a slow and complex process for adopting Cloud computing services and products because they possessed sufficient human and financial resources to use their own products. From the perspective of developing countries including Saudi Arabia, other researchers using the qualitative method have conducted interviews with a number of CEOs in the United Arab Emirates (UAE) to investigate the extent to which SMEs in the UAE are adopting Cloud computing services through the implementation of the Cloud ERP model. Their framework provided the answer in terms factors such as ‘ease of use’, relatively low ‘cost’, the ‘convenience of control and management’, the ‘flexibility of access’ from any web server, and because there were no ‘maintenance costs’ [12]. Another study investigated the variables (factors) influencing the intention to adopt Cloud computing services by SMEs in Lebanon. The study’s findings then helped to identify six major barriers, namely ‘lack of top management support’, ‘poor technology experience’, a ‘lack of IT infrastructure’, ‘complexity’, a ‘lack of government initiatives’, and ‘security’ and ‘privacy’ concerns [13]. In Saudi Arabia, [16] conducted semi-structured interviews with 16 Information Technology (IT) managers from SMEs in Saudi Arabia to investigate the challenges that have prevented enterprises from adopting Cloud computing services. The results revealed that security was considered to be the most significant challenge. Albelaihi and Khan examined the obstacles and advantages of adopting Cloud computing services by SMEs in Saudi Arabia and found that the benefits of adopting Cloud computing lay mostly in reducing the costs that were spent on traditional systems, and its effective contribution to promoting innovations, accelerating decision-making,

A New Adoption of Cloud Computing Model for Saudi Arabian SMEs

195

and supporting communication with the customers [15]. However, in the last decade, the Saudi government has prioritised enhancing the country’s information and communications technology (ICT) infrastructure and it has worked to develop it. This has helped to speed up the adoption process. Where the number of Internet users at the end of 2000 reached 200,000 users, the number has increased steadily until it reached 97.80% users out of the total population of Saudi Arabia in 2020 [17] (see Fig. 1).

Fig. 1. Growth of internet users in Saudi Arabia (drawn by the author using the data by [17])

The literature reviewed above covers the adoption of Cloud computing services in the context of developed and developing countries along with relevant studies from Saudi Arabia. In each context, we found a specific knowledge, culture, economy, ICT Infrastructure, and internet penetration rate. The factors therefore change according to the diversity of the surrounding environment and although each context needs to be studied separately to identify the specific factors influencing Cloud computing adoption, it is beneficial to gain an overview from a range of studies and to discern any patterns that emerge from such a review. Therefore, Cloud technology in Saudi Arabia is now in the early stages [18], and further research is required. Most of the previous researchers discussed the barriers faced by SMEs, but they focused on specific factors and did not include all influencing factors; a number of them focused on security and privacy concerns, such as [14, 15]. Moreover, the majority of the previous studies proposed frameworks and models which highlighted the cost of adopting Cloud computing services and its impact on the expenses of the enterprises along with the benefits of shifting to the Cloud without assessing all the factors influencing the adoption of Cloud computing services, such as [19, 20]. However, the cost factors, security factors, and other factors (e.g., size of the organization, culture, physical location, etc.) all have an impact on the process of transition from traditional systems and can influence the adoption of Cloud computing services, so to investigate one factor and neglect the others does not provide the full picture, as it is necessary to research all the factors. To the best knowledge of the researcher, no prior research has explored the full range of factors

196

M. Alqahtani et al.

that could affect the adoption of Cloud computing services by Saudi SMEs. Therefore, after the deep investigation in the literature review and finding of the knowledge gap, the author will attempt to fill that gap by proposing a new comprehensive framework (see section four(for the adoption of Cloud computing services to investigate most influential factors hindering the adoption of Cloud services by Saudi Arabian SMEs. The following Section will go over all of the theories and frameworks that have been used to investigate the factors preventing the adoption of Cloud computing services in order to determine which ones are appropriate for use in this research.

3 Adoption Theories and Frameworks A variety of adoption theories and frameworks have been established and refined to test and examine the factors influencing the adoption of current technology in a certain sector. This research conducted a comprehensive literature review of the papers that related to the research context to find the theories that have been used to test Cloud computing adoption by both governmental and private organisations. The results of this research found that most utilised theories which have been used individually or integrate with other theories are as follows: • • • • •

Technology Acceptance Model (TAM) [21, 22], Technology Organisation and Environment (TOE) framework [23, 24], Diffusion of Innovation (DOI) [25, 26], TAM Integrated with TOE framework [27–29], DOI integrated with TOE [18, 30–32].

After a comprehensive review of the previous studies, it was found that many researchers successfully blended DOI theory with the TOE framework regarding to Cloud computing adoption. Often they used original form of the model or added additional factors to investigate a particular factor hindering the adoption of Cloud computing services [33]. Therefore, this research aims to employ the DOI theory that was developed by Rogers in 1995 [34] along with the TOE framework that was developed by Tornatzky et al. in 1990 [35]. Both the DOI theory and TOE framework are combined in this research by adding a new main construct factor and some new sub-factors to increase the SMEs’ ‘intention to adopt Cloud computing services’. Combining the two strengthens the factor investigation further. The DOI focuses on evaluating innovations based on the adopter characteristics or that innovations with the best advantage that are more likely to be used [36]. The TOE is consistent with the DOI as it investigates the organisation’s intention to adopt Cloud computing services from various aspects that are covered by the technological, organisational, and environmental contexts [37] by adding the new social construct context.

4 Conceptual Framework This study has resulted in the development of a model known as the adoption of Cloud computing model by Saudi Arabian SMEs (ACCM-SME) to investigate and analyse

A New Adoption of Cloud Computing Model for Saudi Arabian SMEs

197

the factors that influence the adoption of Cloud computing services by SMEs in Saudi Arabia. ACCM-SME was developed after careful analysis of the previous studies related to the adoption of Cloud computing in SMEs. This is in addition to theories and models that have been widely used in measuring the influence of factors in adopting modern technologies. ACCM-SME also integrates some of the critical factors that have been identified by the previous research on DOI theory with the TOE framework to offer a comprehensive model that guides this research to find the factors that affect the adoption of Cloud computing. The framework then assesses the impacts and makes a contribution to improving the image of Cloud computing by increasing the intention of SMEs in Saudi Arabia to adopt Cloud computing services. This framework will focus on four main constructs to represent the essence of the Cloud computing challenges that will be investigated in this research (see Fig. 2): 1. 2. 3. 4.

Technological context Organisational context Environmental context Social context

Fig. 2. The framework for the adoption of cloud computing model by Saudi Arabian SMEs (ACCM-SME)

The four essential constructs in this research reflect how the independent variables in the technological, organisational, environmental, and social contexts can be sensitive issues for contemporary SME practices. They have an influence on the Saudi SMEs’ adoption of Cloud computing services. The following sub-sections will go over all of the factors that hinder Saudi SMEs from adopting Cloud computing services by discussing each factor separately with a thorough investigation.

198

M. Alqahtani et al.

4.1 The Technological Context Data Confidentiality The significance of investigating the confidentiality of Cloud computing practices stems from the potential risks that can occur in the involved processes [38]. Relevant concerns have been investigated regarding the possibility of an external parties’ involvement in providing Cloud services and having control over the organisations’ sensitive information. This poses a significant risk to both privacy and information confidentiality [39–41]. In fact, most users of traditional systems, whether companies or individuals, have very serious concerns about switching to Cloud computing services regarding the confidentiality of their information when it is shared with the third party that manages the services. However, there are still concerns about the confidentiality of data in Cloud computing. This has caused delays in its adoption by many organisations. Perhaps the creation of new and quality data encryption models will be a supportive solution to maintain data confidentiality which will positively reflect on organisations in the rapid adoption of Cloud computing services [42]. Therefore, depending on the previous discussion, hypothesis (H1) is designed as: H1: Increasing the data confidentiality of Cloud computing increases the Saudi SMEs’ intention to adopt Cloud computing services. Data Integrity Data is currently considered to be a precious treasure whose integrity must be maintained, especially with recently developed and developing countries relying on digital systems almost entirely. The National Institute of Standards and Technology (NIZT) defined data integrity as “… the property that data has not been altered in an unauthorised manner, and data integrity covers data in storage, during processing, and while in transit” [43]. Data integrity has been studied by various scholars in terms of the possible risks that might occur in Cloud servicing such as preserving the data, finding effective solutions to the risks and establishing a sustainable secure work environment [44–47]. The integrity of both the data and information is a crucial matter, and it should meet the global standards that ensure that the users’ data is not tampered with. A loss of integrity in Cloud computing services means that the services will no longer be considered viable. Therefore, depending on the previous discussion, hypothesis (H2) is designed as: H2: Increasing the data integrity of Cloud computing increases the Saudi SMEs’ intention to adopt Cloud computing services. Data Security Several researchers have discussed the matter of security as the most influential factor in an organisations’ intention to adopt Cloud computing services. Security is important in any system or organisation wishing to protect their information from theft or destruction. Data has a life cycle that is organised according to certain stages from creation to destruction. All of these stages must be highly secured in order to provide a secure environment for the data lifecycle [48]. Data security refers to the means by which

A New Adoption of Cloud Computing Model for Saudi Arabian SMEs

199

all information stored in the system is protected [49]. Wahsh and Dhillon defined data security related to Cloud computing as the extent to which the provider can secure the customers’ information and maintain it [50]. A lack of security creates an insecure environment for organisations and raises concerns about the transition to Cloud computing services. Therefore, techniques such as authentication, authorisation, and cryptography should be considered for use in a Cloud computing system to get a high level of data security. Therefore, depending on the previous discussions, hypothesis (H3) is designed as: H3: Increasing the data security of Cloud computing increases the Saudi SMEs’ intention to adopt Cloud computing services. Relative Advantages Relative advantages contribute to changing the negative image of modern technologies in many enterprises. This results in the recognition of the benefits of new technologies. Relative advantages have been defined as “…the degree to which an innovation is perceived as being better than the idea it supersedes” [51]. The relative advantage supports Cloud computing and catalyses it to be used. This is what allows for the utilisation of pooled resources, improves productivity, lowers costs, boosts sales, and saves time [52]. When the SMEs perceive the relative advantage of Cloud computing, it will encourage them to adopt Cloud computing services in all of their processes. According to Khayer et al., SMEs are more inclined to incorporate Cloud computing in their operations and procedures when they consider Cloud computing as providing a relative advantage [53]. Therefore, it is important to scientifically investigate relative advantage and its impact on whether or not Cloud computing services are adopted by SMEs. Therefore, depending on the previous discussion, hypothesis (H4) is designed as: H4: The perceived relative advantages of Cloud computing have a positive effect that increases the Saudi SMEs’ intention to adopt Cloud computing services. Service Cost The concept of cost is fundamental and must be discussed because the cost is proportional to the services provided by the Cloud Service Provider (CSP). The service cost factor is a general term that gives a hint as to the meaning of cost but in fact, the cost of services is impacted by sub-factors such as server cost, software cost, network cost, support and maintenance cost, power cost, cooling cost, facilities cost, and real-estate cost [54]. All of these costs (carried by the CSP) will result in saving a lot of money and time for SMEs, thus we can say that in order to achieve a cost reduction, SMEs should use and adopt Cloud computing services [55]. SMEs are always interested in the actual cost of the transformation process to Cloud computing services. Thus, the actual cost depends on several variables such as the resources that will be used, the period of use, the basic specifications of the resources used, and the geographical location of the resources used. Therefore, depending on the previous discussion, hypothesis (H5) is designed as: H5: Decreasing the cost of Cloud computing increases the Saudi SMEs’ intention to adopt Cloud computing services.

200

M. Alqahtani et al.

Service Quality Logically, service quality is a very influential factor when adopting any new services in any organisation, whether technical or otherwise. This is because it plays a vital role in the success of the adopted new service. There is a set of studies in the scope of technology conducted to measure the quality of service through several different criteria. Pham et al. discussed the quality of service in e-learning in Vietnam and its impact on student satisfaction. They found that the quality of the service provided is a factor affecting student satisfaction, and that there is a positive relationship between them [56]. Other researchers have said that the more there is the desire to accept Cloud computing services, the better the quality of the service, keeping in mind that quality of service has been identified as an influential element in encouraging the management of enterprises to adopt Cloud services [18]; thus, paying attentiveness to the quality of service in Cloud computing pushes organisations forward to adopt its services without delay. Therefore, depending on the previous discussion, hypothesis (H6) is designed as: H6: Increasing the services quality of Cloud computing increases the Saudi SMEs’ intention to adopt Cloud computing services.

4.2 The Organisational Context Organisation Regulation This refers to the rules and procedures implemented by businesses in relation to the adoption of IT innovations [26]. An organisation that wants to adopt Cloud computing services needs unambiguous internal regulations that define and identify the responsibilities of the organisation with respect of the Cloud data, keeping in mind that these regulations are strict regarding privacy laws [57]. In addition, external regulations cannot be neglected. There must be education and awareness of the regulations of the countries in which the data centres of the organisation’s Cloud computing service providers are located [58]. This factor is at the forefront of the challenges facing Saudi SMEs when seeking to move to Cloud computing services due to the current insufficiency of support from the General Authority of SMEs regarding their Cloud data. This means that there must be clear and publicised standards that support the transformation process. Therefore, depending on the previous discussion, hypothesis (H7) is designed as: H7: Increasing and updating the organisational regulations increases the Saudi SMEs’ intention to adopt Cloud computing services. Organisation Size The size of the organisation refers to the size of the investments in that organisation and its market value, number of employees, the size of its annual returns, and the number of branches. Organisation size is a crucial and influential factor that plays a powerful part in the Cloud computing services adoption [52]. A study conducted in 2017 found that large organisations have a strong desire to adopt a lot of modern innovations due to the robustness of their foundation and their ability and flexibility to bear the risks due

A New Adoption of Cloud Computing Model for Saudi Arabian SMEs

201

to the adoption of Cloud computing services [59]. In contrast, another study in 2017 have found that size of the firm does not significantly affect the intention to adopt Cloud computing services [60]. In fact, SMEs are in need of Cloud computing services due to their limited infrastructure. This is unlike large companies that have more ability and flexibility deal with any problems. Therefore, depending on the previous discussion, hypothesis (H8) is designed as: H8: A smaller organisation size is more likely to increase the Saudi SMEs’ intention to adopt Cloud computing services. Technology Readiness Technology readiness is defined as the level at which the technology infrastructure and human resources inside the organisation are required to be to facilitate Cloud computing adoption [61]. Technical readiness is one of the variables that can influence the Cloud computing services adoption as the organisations that have high technology readiness would have a better opportunity to adopt Cloud computing services [62]. Technology readiness measurement is based on the organisation’s infrastructure and the extent to which it will accept the transition to Cloud computing services. Organisations that have a strong ready technical infrastructure will be positively affected to adopt Cloud computing services [63]. However, SMEs in Saudi Arabia do not pay enough attention to technical infrastructure as they still treat it as a secondary form in business success. This is while using free technical resources with their employees such as Gmail, Google forms, Google drive, etc. [64]. Therefore, depending on the previous discussion, hypothesis (H9) is designed as: H9: Increasing technology readiness increases the Saudi SMEs’ intention to adopt Cloud computing services. Top Management Support The decision makers who impact the adoption of an innovation are referred to as the top management [65]. Top management support is a critical factor that has a significant influence in the process of adopting of Cloud computing services. The conviction of the top management about the advantages of Cloud computing contributes to the potential of Cloud computing services adoption [66]. The greater the top management support received, the greater the adoption of Cloud computing services [19]. If the adoption of Cloud computing services is supported by the top management, this will have a big impact on the transformation process. Conversely, if the top management does not approve of Cloud computing, this will be a serious barrier to adoption. Therefore, depending on the previous discussion, hypothesis (H10) is designed as: H10: Increasing the top management support of Cloud computing increases the Saudi SMEs’ intention to adopt Cloud computing services.

202

M. Alqahtani et al.

4.3 The Environmental Context Technology Infrastructure Cloud computing provides its services completely through the Internet [67]. The lack of a good infrastructure thus reflects negatively on the Cloud computing services. Regarding this factor, one study revealed that the providers of Cloud computing services face problems when it comes to providing their services to companies that are located in places where the technical infrastructure is poor and does not carry the necessary provisions for Cloud computing services [68]. Any technical system that depends on the Internet for its work needs a very strong technical infrastructure to ensure the continuity of its service provision. Based on our literature review regarding information and communication technology (ICT) infrastructure for adopting Cloud computing services, it is found that in the contemporary era of the Internet and mobile communications, the current economic growth movement index is related to the ICT infrastructure [69]. Now that the coronavirus pandemic has hit the world, countries that have a good technical infrastructure for the most part have been able to continue their work without interruption, whether in education or otherwise. Having a good technical infrastructure nationwide means the presence of renewable and continuous systems that keep pace with the information technology revolution. Therefore, depending on the previous discussion, hypothesis (H11) is designed as: H11: Obtaining a high level of the technology infrastructure increases the Saudi SMEs’ intention to adopt Cloud computing services. Cloud Provider The Cloud computing service provider plays a vital role in the success of the process of adopting Cloud computing services as it contributes to overcoming the challenges that face the consumers [70]. The success of the customer adoption process is linked to the possibility of developing and improving the services of the Cloud provider itself. As a result, by providing an appropriate data storage audit trail, providers may be able to aid their clients in regulatory and legal matters [71]. In addition, Services Level Agreements (SLAs) may help the provider to achieve success and avoid violations [72]. However, the management of Cloud service providers is usually confronted with issues when it comes to renewable management techniques such as agile project management [73]. Moreover, the physical location of the Cloud provider is an influential matter regarding where the data is stored by the service providers. As some providers store their client’s data in many countries, this can make the clients more concerned about their data. Therefore, depending on the previous discussion, hypothesis (H12) is designed as: H12: Increasing the number of Cloud providers within Saudi Arabia increases the Saudi SMEs’ intention to adopt Cloud computing services. Legal Framework The legal framework is considered to be a very important factor due to the problems that can occur in this field. This is because sometimes the service provider is in one

A New Adoption of Cloud Computing Model for Saudi Arabian SMEs

203

country, the beneficiary is in another, and the data centre is in a third. This means that the rules governing a particular service may differ significantly. Flexible and effective rules that can meet the global standards are still required. It is very important to look for a legal framework that is suitable for both providers and clients, and that can meet the local requirements as well [74]. However, both public and private laws are affected by the changing legal environment and the protection of consumers and service providers’ rights is a challenge for Cloud computing. For all sides, the laws and policy must be effective and equitable [75]. From this study’s point of view, the adoption of Cloud computing services must follow the regulations of the country that hosts the services to be safe, and there must be trust in the supervision of the legal organisation. Therefore, depending on the previous discussion, hypothesis (H13) is designed as: H13: Obtaining an organised legal framework of Cloud computing increases Saudi SMEs’ intention to adopt Cloud computing services. Enterprise Competitiveness Competitiveness is defined as the ability to perform distinctive commercial practices through competing products [76]. From this study’s viewpoint, the concept of competitiveness means that the enterprise is ready to compete in a specific market with its high power and intention to be the first one. However, in order to achieve the goals of the enterprise and to compete strongly among its peers, it is necessary to maintain technological competence [77]. Often, some competitiveness is positive in that it pushes competitors to adopt technological innovations that meet the aspirations of the field in which they work [78]. Therefore, competing enterprises might see the advantages of adopting Cloud computing services early on to be able to gain the maximum benefit and to obtain better operational efficiency. Therefore, depending on the previous discussion, hypothesis (H14) is designed as: H14: Increasing the enterprise competitiveness increases the Saudi SMEs’ intention to adopt Cloud computing services.

4.4 The Social Context Culture Culture is an influencing factor as it reflects the attitude of the community toward the adoption of Cloud computing. More specifically, technological culture has been defined as “… the increase in the efficiency and complexity of tools and techniques in human populations over generations” [79]. Culture will influence the enterprises’ intention to adopt new technologies whether positively or negatively, especially if the enterprise provides a service related to the community members [80]. The current literature regarding the influence of culture on adopting Cloud computing services suggests that Saudi Arabia’s culture has a major influence on the private sector’s adoption of Cloud computing services, and that traditional cultural values have a negative and indirect impact on a firms’ decision to embrace Cloud computing services [52]. Thus, any understanding of

204

M. Alqahtani et al.

the barriers to the adoption of Cloud computing services by Saudi SMEs necessitates an investigation of the impact of culture. Therefore, depending on the previous discussion, hypothesis (H15) is designed as: H15: Increasing the technology culture for customers, IT managers, top managers, and employees increases the Saudi SMEs’ intention to adopt Cloud computing services. Awareness People’s understanding of technology as well as the accessibility of internet technology is referred to as awareness [81]. In the context of Cloud computing, awareness is the step that contributes to helping organisations benefit from adopting Cloud services [82]. Community awareness plays an important role in accepting new technologies for several different reasons, the most important of which is that there cannot be people adopting new services without being educated about them and fully aware of their benefits. One study investigated the reasons impeding Cloud computing adoption by different sizes of organisation in North America, Europe, Asia, and Africa using the quantitative method. It was found that 40 per cent of participants do not believe in adopting Cloud computing services, and of these, 15 per cent did not understand what Cloud computing was. Although awareness of the Cloud has increased since then, awareness is always considered to be one of the main barriers to adopting new technologies directly. Thus, SMEs must first educate the beneficiaries of their service before adopting Cloud computing services for them to have sufficient knowledge and know-how about the services. Therefore, depending on the previous discussion, hypothesis (H16) is designed as: H16: Increasing the awareness of the customers, IT managers, top managers and employees increases the Saudi SMEs’ intention to adopt Cloud computing services. Trust Trust is generally defined as “… the expectations about individual relationships and behaviours” [83]. However, most of the research reviewed indicates that in the IT field, ‘trust’ had a more specific definition. Thus, trust from the perspective of Cloud computing is an estimate of a Cloud resources capacity to complete a job based on variables such as resource processing power, dependability, and availability [84]. In Cloud computing, trust is considered to be a crucial issue and a lack of trust is an influential factor that may contribute to reducing the adoption of Cloud computing services. SMEs are concerned about how to avoid the lack of transparency and control over the data in Cloud computing services to keep their reputation at a high level and to maintain people’s trust in their organisation [85]. Therefore, a lack of trust does not simply refer to the service provider but it also refers to a lack of faith in technology and its capacity to provide a decent service without data loss. Therefore, depending on the previous discussion, hypothesis (H17) is designed as: H17: Increasing the trust of the customers, IT managers, top managers and employees increases the Saudi SMEs’ intention to adopt Cloud computing services.

A New Adoption of Cloud Computing Model for Saudi Arabian SMEs

205

5 Conclusion and Future Work To collect the data for this study, both quantitative (survey) and qualitative (semistructured interviews) methods will be used. This research combines both methods to get a comprehensive result which is a widely adopted practice in the informatics field. The target sample size in this study is intended to be at least 400 participants. This is over the minimum recommended number of responses [86]. The Structural Equation Model (SEM) along with descriptive statistics will be used to analyse the data collected and to validate the model. The questionnaire will be distributed electronically among the SMEs in Saudi Arabia after getting ethical approval and running a pilot study to validate both quantitative (questionnaire) and qualitative (interview) methods. The researcher intends to formally apply to the General Authority of SMEs in Saudi Arabia to conduct the study formally through them. This will motivate the enterprises to participate and cooperate with this study. The questionnaire (survey) will be the main method designed in accordance with the research objectives and hypotheses to answer the research questions and to examine the hypotheses that have been formulated. Researchers who use questionnaires normally take into consideration a number of stages that should be carried out to obtain reliable and accurate results and outcomes. The second method is the qualitative approach which is an accredited investigation method according to many different academic disciplines. Qualitative researchers aim to deepen their understanding of practical behaviours, and the factors or causes that govern and drive these behaviours; for example, why and how decisions are made, not just what, where and when they are made. There is a need to focus on small rather than large samples. Therefore, due to the Covid-19 restrictions, the interviews could be face-to-face if possible, or through Zoom. They will be conducted with the top managers of at least ten SMEs in Saudi Arabia, as well as with the information technology managers in the same enterprises. Cloud computing in Saudi Arabia is still in its early stages and it is facing challenges in its adoption. To find out the challenges specifically facing Saudi SMEs, this research investigated all the elements that may impact the conversion to Cloud computing services. Moreover, this position paper developed a new comprehensive model that can identify the most influential factors hindering the adoption of Cloud services. This study focused on explaining the model and discussing all of the factors that may potentially affect the decisions of Saudi SMEs about adopting Cloud computing services. The next stage of the study will be to collect the data and analyse them to be able to demonstrate the most crucial factors currently affecting the Saudi SMEs’ intention to adopt Cloud computing services.

References 1. Pibul, A.N.: How is user trust in cloud computing affected by legal problems relating to data protection in cloud computing, and how can user trust in cloud computing be built? (2018) 2. Diaby, T., Rad, B.B.: Cloud computing: a review of the concepts and deployment models. Int. J. Inf. Technol. Comput. Sci. 9, 50–58 (2017) 3. Abdalla, P.A., Varol, A.: Advantages to disadvantages of cloud computing for small-sized business. In: 2019 7th International Symposium on Digital Forensics and Security (ISDFS), pp. 1–6. IEEE (2019)

206

M. Alqahtani et al.

4. Huang, H., Wang, L.: P&P: a combined push-pull model for resource monitoring in cloud computing environment. In: 2010 IEEE 3rd International Conference on Cloud Computing, pp. 260–267. IEEE (2010) 5. You, P., Peng, Y., Liu, W., Xue, S.: Security issues and solutions in cloud computing. In: 2012 32nd International Conference on Distributed Computing Systems Workshops, pp. 573–577. IEEE (2012) 6. Buyya, R., Yeo, C.S., Venugopal, S., Broberg, J., Brandic, I.: Cloud computing and emerging IT platforms: vision, hype, and reality for delivering computing as the 5th utility. Futur. Gener. Comput. Syst. 25, 599–616 (2009) 7. Alam, T., Benaida, M.: CICS: cloud–internet communication security framework for the internet of smart devices. Tanweer Alam. Mohamed Benaida.“ CICS Cloud–Internet Commun. Secur. Framew. Internet Smart Devices.“. Int. J. Interact. Mob. Technol. 12 (2018) 8. Yuan, J., Yu, S.: Efficient public integrity checking for cloud data sharing with multi-user modification. In: IEEE INFOCOM 2014-IEEE Conference on Computer Communications, pp. 2121–2129. IEEE (2014) 9. Senarathna, I., Wilkin, C., Warren, M., Yeoh, W., Salzman, S.: Factors that influence adoption of cloud computing: An empirical study of Australian SMEs. Australas. J. Inf. Syst. 22 (2018) 10. Sahandi, R., Alkhalil, A., Opara-Martins, J.: Cloud computing from SMEs perspective: a survey based investigation. J. Inf. Technol. Manag. 24, 1–12 (2013) 11. Klug, W., Bai, X.: Factors affecting cloud computing adoption among universities and colleges in the United States and Canada. Issues Inf. Syst. 16 (2015) 12. Alsharari, N.M., Al-Shboul, M., Alteneiji, S.: Implementation of cloud ERP in the SME: evidence from UAE. J. Small Bus. Enterp. Dev. (2020) 13. Skafi, M., Yunis, M.M., Zekri, A.: Factors influencing SMEs’ adoption of cloud computing services in lebanon: an empirical analysis using TOE and contextual theory. IEEE Access. 8, 79169–79181 (2020) 14. Alsafi, T., Fan, I.-S.: Investigation of cloud computing barriers: a case study in Saudi Arabian SMEs. J. Inf. Syst. Eng. Manag. 5, em0129 (2020) 15. Albelaihi, A., Khan, N.: Top benefits and hindrances to cloud computing adoption in saudi arabia: a brief study. J. Inf. Technol. Manag. 12, 107–122 (2020) 16. Alassafi, M.O., Alharthi, A., Walters, R.J., Wills, G.B.: Security risk factors that influence cloud computing adoption in Saudi Arabia government agencies. In: 2016 International Conference on Information Society (i-Society). pp. 28–31. IEEE (2016) 17. CITC: Background. https://www.citc.gov.sa/en/AboutUs/Pages/History.aspx last accessed 06 August 2021 18. Al Mudawi, N., Beloff, N., White, M.: Cloud computing in government organizationstowards a new comprehensive model. In: 2019 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), pp. 1473–1479. IEEE (2019) 19. Alshamaileh, Y.Y.: An empirical investigation of factors affecting cloud computing adoption among SMEs in the north east of England declaration. (2013) 20. Carroll, M., Van Der Merwe, A., Kotze, P.: Secure cloud computing: Benefits, risks and controls. In: 2011 Information Security for South Africa, pp. 1–9. IEEE (2011) 21. Arpaci, I.: Antecedents and consequences of cloud computing adoption in education to achieve knowledge management. Comput. Human Behav. 70, 382–390 (2017) 22. Ali, Z., Gongbing, B., Mehreen, A.: Understanding and predicting academic performance through cloud computing adoption: a perspective of technology acceptance model. J. Comput. Educ. 5, 297–327 (2018)

A New Adoption of Cloud Computing Model for Saudi Arabian SMEs

207

23. Tashkandi, A., Al-Jabri, I.: Cloud computing adoption by higher education institutions in Saudi Arabia: Analysis based on TOE. In: 2015 International Conference on Cloud Computing (ICCC), pp. 1–8. IEEE (2015) 24. Low, C., Chen, Y., Wu, M.: Understanding the determinants of cloud computing adoption. Ind. Manag. data Syst. (2011) 25. Al Ajmi, Q., Arshah, R.A., Kamaludin, A., Sadiq, A.S., Al-Sharafi, M.A.: A conceptual model of e-learning based on cloud computing adoption in higher education institutions. In: 2017 International Conference on Electrical and Computing Technologies and Applications (ICECTA), pp. 1–6. IEEE (2017) 26. Mohammed, F., Ibrahim, O., Nilashi, M., Alzurqa, E.: Cloud computing adoption model for e-government implementation. Inf. Dev. 33, 303–323 (2017) 27. Gangwar, H., Date, H., Ramaswamy, R.: Understanding determinants of cloud computing adoption using an integrated TAM-TOE model. J. Enterp. Inf. Manag (2015) 28. Chiniah, A., Mungur, A.E.U., Permal, K.N.: Evaluation of cloud computing adoption using a hybrid TAM/TOE model. In: Information Systems Design and Intelligent Applications, pp. 257–269. Springer (2019) 29. Stewart, H.: The hindrance of cloud computing acceptance within the financial sectors in Germany. Inf. Comput. Secur. (2021) 30. Alkhalil, A., Sahandi, R., John, D.: An exploration of the determinants for decision to migrate existing resources to cloud computing using an integrated TOE-DOI model. Journal of Cloud Computing 6(1), 1–20 (2017). https://doi.org/10.1186/s13677-016-0072-x 31. Hiran, K.K., Henten, A.: An integrated TOE–DoI framework for cloud computing adoption in the higher education sector: case study of Sub-Saharan Africa. Ethiopia. Int. J. Syst. Assur. Eng. Manag. 11, 441–449 (2020) 32. Alkhater, N., Wills, G., Walters, R.: Factors influencing an organisation’s intention to adopt cloud computing in Saudi Arabia. In: 2014 IEEE 6th international conference on cloud computing technology and science, pp. 1040–1044. IEEE (2014) 33. Karunagaran, S., Mathew, S.K., Lehner, F.: Differential cloud adoption: a comparative case study of large enterprises and SMEs in Germany. Inf. Syst. Front. 21(4), 861–875 (2017). https://doi.org/10.1007/s10796-017-9781-z 34. Rogers, E.M.: Diffusion of innovations. Simon and Schuster (2010) 35. Tornatzky, L.G., Fleischer, M., Chakrabarti, A.K.: Processes of technological innovation. Lexington books (1990) 36. Al-Rahmi, W.M., et al.: Integrating technology acceptance model with innovation diffusion theory: an empirical investigation on students’ intention to use E-learning systems. IEEE Access. 7, 26797–26809 (2019) 37. Oliveira, T., Thomas, M., Espadanal, M.: Assessing the determinants of cloud computing adoption: an analysis of the manufacturing and services sectors. Inf. Manag. 51, 497–510 (2014). https://doi.org/10.1016/J.IM.2014.03.006 38. AlSudiari, M.A.T., Vasista, T.G.K.: Cloud computing and privacy regulations: an exploratory study on issues and implications. Adv. Comput. 3, 159 (2012) 39. Lorünser, T., et al.: Towards a new paradigm for privacy and security in cloud services. In: Cyber Security and Privacy Forum, pp. 14–25. Springer (2015) 40. Sahmim, S., Gharsellaoui, H.: Privacy and security in internet-based computing: cloud computing, internet of things, cloud of things: a review. Procedia Comput. Sci. 112, 1516–1522 (2017) 41. Rao, B.T.: A study on data storage security issues in cloud computing. Procedia Comput. Sci. 92, 128–135 (2016) 42. El Makkaoui, K., Beni-Hssane, A., Ezzati, A.: Speedy Cloud-RSA homomorphic scheme for preserving data confidentiality in cloud computing. J. Ambient. Intell. Humaniz. Comput. 10(12), 4629–4640 (2018). https://doi.org/10.1007/s12652-018-0844-x

208

M. Alqahtani et al.

43. NIST: data integrity - Glossary | CSRC. https://csrc.nist.gov/glossary/term/data_integrity last accessed 23 December 2021 44. Gaetani, E., Aniello, L., Baldoni, R., Lombardi, F., Margheri, A., Sassone, V.: Blockchainbased database to ensure data integrity in cloud computing environments (2017) 45. Aldossary, S., Allen, W.: Data security, privacy, availability and integrity in cloud computing: issues and current solutions. Int. J. Adv. Comput. Sci. Appl. 7, 485–498 (2016) 46. Yu, Y., et al.: Identity-based remote data integrity checking with perfect data privacy preserving for cloud storage. IEEE Trans. Inf. Forensics Secur. 12, 767–778 (2016) 47. Kumari, P., Paul, R.K.: A Study for Authentication and Integrity of Data Files in Cloud Computing. Smart Moves J. Ijoscience 2 (2016) 48. Kumar, P.R., Raj, P.H., Jelciana, P.: Exploring data security issues and solutions in cloud computing. Procedia Comput. Sci. 125, 691–697 (2018) 49. Kosseff, J.: New York’s Financial Cybersecurity Regulation: Tough, Fair, and a National Model (2016) 50. Wahsh, M.A., Dhillon, J.S.: An investigation of factors affecting the adoption of cloud computing for E-government implementation. In: 2015 IEEE Student Conference on Research and Development (SCOReD), pp. 323–328. IEEE (2015) 51. Rogers, E.: Attributes of Innovations and Their Rate of Adoption. Libr. Congr. Cat. Data. 219 (1995) 52. Alkhater, N., Walters, R., Wills, G.: An empirical study of factors influencing cloud adoption among private sector organisations. Telemat. Informatics. 35, 38–54 (2018). https://doi.org/ 10.1016/J.TELE.2017.09.017 53. Khayer, A., Talukder, M.S., Bao, Y., Hossain, M.N.: Cloud computing adoption and its impact on SMEs’ performance for cloud supported operations: A dual-stage analytical approach. Technol. Soc. 60, 101225 (2020) 54. Li, X., Li, Y., Liu, T., Qiu, J., Wang, F.: The method and tool of cost analysis for cloud computing. In: 2009 IEEE International Conference on Cloud Computing, pp. 93–100. IEEE (2009) 55. Hentschel, R., Leyh, C., Petznick, A.: Current cloud challenges in Germany: the perspective of cloud service providers. Journal of Cloud Computing 7(1), 1–12 (2018). https://doi.org/ 10.1186/s13677-018-0107-6 56. Pham, L., Limbu, Y.B., Bui, T.K., Nguyen, H.T., Pham, H.T.: Does e-learning service quality influence e-learning student satisfaction and loyalty? Evidence from Vietnam. Int. J. Educ. Technol. High. Educ. 16(1), 1–26 (2019). https://doi.org/10.1186/s41239-019-0136-3 57. Marston, S., Li, Z., Bandyopadhyay, S., Zhang, J., Ghalsasi, A.: Cloud computing—The business perspective. Decis. Support Syst. 51, 176–189 (2011) 58. Morgan, L., Conboy, K.: Factors affecting the adoption of cloud computing: an exploratory study (2013) 59. Al-Sharafi, M.A., Arshah, R.A., Abu-Shanab, E.A.: Factors influencing the continuous use of cloud computing services in organization level. In: Proceedings of the International Conference on Advances in Image Processing, pp. 189–194 (2017) 60. Loukis, E., Arvanitis, S., Kyriakou, N.: An empirical investigation of the effects of firm characteristics on the propensity to adopt cloud computing. IseB 15(4), 963–988 (2017). https://doi.org/10.1007/s10257-017-0338-y 61. Senyo, P.K., Effah, J., Addae, E.: Preliminary insight into cloud computing adoption in a developing country. J. Enterp. Inf. Manag. (2016) 62. Gutierrez, A., Boukrami, E., Lumsden, R.: Technological, organisational and environmental factors influencing managers’ decision to adopt cloud computing in the UK. J. Enterp. Inf. Manag. (2015) 63. Alhammadi, A., Stanier, C., Eardley, A.: The determinants of cloud computing adoption in Saudi Arabia (2015)

A New Adoption of Cloud Computing Model for Saudi Arabian SMEs

209

64. Lakshminarayanan, R., Kumar, B., Raju, M.: Cloud computing benefits for educational institutions. arXiv Prepr. arXiv1305.2616 (2013) 65. Lai, H.-M., Lin, I.-C., Tseng, L.-T.: High-level managers’ considerations for RFID adoption in hospitals: an empirical study in Taiwan. J. Med. Syst. 38, 1–17 (2014) 66. Yigitbasioglu, O.M.: The role of institutional pressures and top management support in the intention to adopt cloud computing solutions. J. Enterp. Inf. Manag. (2015) 67. Amairah, A., Al-tamimi, B.N., Anbar, M., Aloufi, K.: Cloud computing and internet of things integration systems: a review. In: International Conference of Reliable Information and Communication Technology, pp. 406–414. Springer (2018) 68. Sabi, H.M., Uzoka, F.-M.E., Langmia, K., Njeh, F.N.: Conceptualizing a model for adoption of cloud computing in education. Int. J. Inf. Manage. 36, 183–191 (2016) 69. Pradhan, R.P., Mallik, G., Bagchi, T.P.: Information communication technology (ICT) infrastructure and economic growth: a causality evinced by cross-country panel data. IIMB Manag. Rev. 30, 91–103 (2018) 70. Alghamdi, B., Potter, L.E., Drew, S.: Validation of architectural requirements for tackling cloud computing barriers: cloud provider perspective. Procedia Comput. Sci. 181, 477–486 (2021) 71. Morgan, L., Conboy, K.: Key factors impacting cloud computing adoption. Computer (Long. Beach. Calif) 46, 97–99 (2013) 72. Singh, S., Chana, I., Buyya, R.: STAR: SLA-aware autonomic management of cloud resources. IEEE Trans. Cloud Comput. (2017) 73. Oesterle, S., Jöhnk, J., Keller, R., Urbach, N., Yu, X.: A contingency lens on cloud provider management processes. Bus. Res. 13(3), 1451–1489 (2020). https://doi.org/10.1007/s40685020-00128-8 74. El-Gazzar, R., Hustad, E., Olsen, D.H.: Understanding cloud computing adoption issues: A Delphi study approach. J. Syst. Softw. 118, 64–84 (2016) 75. Hourani, H., Abdallah, M.: Cloud computing: legal and security issues. In: 2018 8th International Conference on Computer Science and Information Technology (CSIT), pp. 13–16. IEEE (2018) 76. Krugman, P.: Competitiveness: a dangerous obsession. Foreign Aff. 73 (1994) 77. Saini, L., Kaur, H.: Role of cloud computing in education system. Int. J. Adv. Res. Comput. Sci. 8, 345–347 (2017) 78. Hermundsdottir, F., Aspelund, A.: Sustainability innovations and firm competitiveness: a review. J. Clean. Prod. 280, 124715 (2021) 79. Osiurak, F., Reynaud, E.: The elephant in the room: what matters cognitively in cumulative technological culture. Behav. Brain Sci. 43 (2020) 80. Alqahtani, F.N.: Identifying the critical factors that impact on the Development of Electronic Government using TOE Framework in Saudi E-Government Context: A Thematic Analysis (2016) 81. Mofleh, S.I., Wanous, M.: Understanding factors influencing citizens’ adoption of egovernment services in the developing world: Jordan as a case study. Infocomp J. Comput. Sci. 7, 1–11 (2008) 82. Chou, D.C.: Cloud computing: A value creation model. Comput. Stand. Interfaces. 38, 72–77 (2015) 83. Shockley-Zalabak, P., Ellis, K., Winograd, G.: Organizational trust: What it means, why it matters. Organ. Dev. J. 18, 35 (2000) 84. Hassan, H., El-Desouky, A.I., Ibrahim, A., El-Kenawy, E.-S.M., Arnous, R.: Enhanced QoSbased model for trust assessment in cloud computing environment. IEEE Access. 8, 43752– 43763 (2020)

210

M. Alqahtani et al.

85. Khan, K.M., Malluhi, Q.: Establishing trust in cloud computing. IT Prof. 12, 20–27 (2010) 86. Wolf, E.J., Harrington, K.M., Clark, S.L., Miller, M.W.: Sample size requirements for structural equation models: An evaluation of power, bias, and solution propriety. Educ. Psychol. Meas. 73, 913–934 (2013)

A New Approach for Optimal Selection of Features for Classification Based on Rough Sets, Evolution and Neural Networks Eddy Torres-Constante1(B) , Julio Ibarra-Fiallo1 , and Monserrate Intriago-Pazmi˜ no2 1

Colegio de Ciencias e Ingenier´ıas, Universidad San Francisco de Quito, Cumbay´ a, Ecuador [email protected], [email protected] 2 Departamento de Inform´ atica y Ciencias de la Computaci´ on, Escuela Polit´ecnica Nacional, Quito, Ecuador [email protected]

Abstract. In number recognition, one of the challenges is to deal with the high dimensionality of data that affects the performance of algorithms. On the other hand, pattern recognition allows establishing fundamental properties among sets of objects. In this context, Rough Set Theory applies the concept of super-reducts in order to find subsets of attributes that preserve the capability of the entire set to distinguish objects that belong to different classes. Nevertheless, finding these reducts for large data sets has exponential complexity due to the number of objects per class and attributes per object. This paper proposes a new approach for dealing with this complex problem in real data sets to obtain a close enough to a minimal discriminator. It takes advantage of the theoretical background of Rough Set Theory, especially considering those super-reducts of minimal length. In literature, there is an algorithm for finding these minimal length reducts. It performs well for a small sampling of objects per class of the entire data set. An evolutionary algorithm is performed to extend it over a huge data set, taking a subset of the entire list of super-reducts as the initial population. The proposed discriminator is evaluated and compared against state-of-theart algorithms and data set declared performance for different models. Keywords: Pattern recognition · Exponential complexity · Handwritten number classification · Super-reducts · Neural networks Accuracy · Evolutionary strategy · Minimal length

1

·

Introduction

Handwritten number recognition implies great challenges due to the huge quantity of information that is required for training a classification model [1,2]. Moreover, the multiple difference of inputs, the distinct ways of interpreting as humans c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023  K. Arai (Ed.): IntelliSys 2022, LNNS 542, pp. 211–225, 2023. https://doi.org/10.1007/978-3-031-16072-1_16

212

E. Torres-Constante et al.

a handwritten text allow to think that our brain only consider a reduced amount of all the information received by our senses. In this research work, a new approach for optimal selection of features for handwritten number recognition is proposed to find a close enough to a minimal subset of attributes that preserve the capability of the entire set of attributes to distinguish objects that belong to different classes. For this purpose, Rough Set Theory, artificial neural networks and evolution algorithms play the key roles. For machine learning models, having a high dimensionality of data causes multiple problems in complexity time and performance for accurate object recognition or classification. In fact, train the model with a large number of features becomes difficult as the number of operations raises exponentially [3,4]. In reducing the dimensionality, is crucial to ensure that essential information is preserved considering every aspect of the classes that are part of the entire set. As the performance of algorithms can be degraded by data sets that contains large number of attributes by reducing the training of models reduces in complexity being specially useful for image recognition, text mining or big data [5–8]. Rough Set Theory (RST) handle features reduction ensuring that the ability of discern between sets of objects is preserved. Another key point in this theory is that every aspect of this objects is considered [9,10]. Following this technique it is ensured that every potential useful information is retained. The concept of reduct in RST states it as a subset of attributes that preserve the discernibly capacity of the entire set of attributes [11]. Hence, by its definition it matches as a candidate for using them in feature selection for machine learning models. Several algorithms have been developed during last years such to compute a single reduct as [12,13], or [14]. There are also those to compute the entire set as [15,16]. In our case we focus our attention on an algorithm that searches for those reducts of minimal length such as [17]. However, finding this reducts over an entire data set is an NP-hard problem, for this reason we use the power of artificial neural networks and evolution strategy to extend the discern capability of a reduct found over a sample of the entire data set. In literature, there exist algorithms for dimensionality reduction such as PCA [18], LDC [19] or GDA [20]. It also exist those algorithms that keep the most important features like [21] or Random Forest [22]. Nevertheless, the theoretical background do not ensure to be strong for classification over machine learning models; we do not either ensure to preserve a minimal quantity of features such that prediction can be preformed with accuracy. Mainly, the reasons presented before are the question that we what to give an answer. There exist a subset of features that preserve the essential information? If yes, is there a way to ensure that is minimal? In this paper, we propose a new approach to build a close enough to a minimal subset of features to ensure precision over accuracy for handwritten digits. Its importance relays on the future use of the subset of features that will reduce the amount of information that needs to be processed, improving the efficiency in time and complexity of any model that uses only the selected information.

Classification Based on Rough Sets

213

The rest of this paper is organized as follows. In Sect. 2 we formally describe a reduct for a boolean matrix, artificial neural networks, and evolution strategy used to achieve an equal or closed enough to a minimal subset of features. The MNIST data set is used to perform all the calculations, assessment metrics and present results. Then we detail our method and experimental configuration. Section 3 gives the results and analysis of the study. Finally, some conclusions and future work are presented in Sect. 4.

2

Materials and Methods

In this section, the theoretical background is introduced in order to understand every step of the proposed algorithm. The algorithm as the first three subsections is divided in three key components: Rough Set Theory, Neural Networks and Evolutionary Strategy. The following sections describes the data set used and finally all the components are combined to explain in detail the proposed method. 2.1

Rough Set Theory

Let U be a finite non-empty collection of objects and A a finite non-empty set of attributes. For every attribute a in A there exist a set Va called the value set of a and a mapping α : U − → Vα . Also the attributes of A are divided into decision attributes D and condition attributes C such that A = C ∪ D and C ∩ D = Ø. Let B be a condition subset of attributes of A. The Indiscernibility Relation is defined as: IN D(B|D) = {(x, y) ∈ U 2 |[α(x) = α(y)∀α ∈ B] ∨ [δ(x) = δ(y)]} where α(x) is the value attribute defined previously and δ(x) is the value of the decision attribute. Hence, the set of all pairs of objects that cannot be distinguished between different classes by the attributes of B and the elements of the same class belong to the indecirnibility relation for B. The concept of a decision reduct is important as it is defined in terms of the previously defined indecirnibility relation. In a decision systems DS this decision reduct allows us to distinguish between objects that belong to different classes. Definition 1. Let D be the set of decision attributes and C be the set of condition attributes of a decision system DS, the set B ⊆ C is a decision reduct of DS if: 1. IN D(B|D) = IN D(C|D) 2. ∀b ∈ B, IN D(B − {b}|D) = IN D(C|D) For simplicity, decision reducts will be simply called reducts. A binary table where rows represent comparisons of pairs of objects of different decision classes and columns are condition attributes is called a Binary Discernibility Matrix DM . The discernibility element dmij ∈ {0, 1}. dmij = 0 and dmij = 1 means that the objects of pair denoted by i are similar or different respectively in the attribute j.

214

E. Torres-Constante et al.

Definition 2. Let DM be a discernibility matrix and rk be a row of DM . rk is a superfluos row of DM if there exists a row r in DM such that ∃i|(r[i] < rk [i]) ∧ ∀i|(r[i] ≤ rk [i]) where r[i] is the i-th element of the row r. There is a related concept in Testor Theory where they call the matrix obtained by removing every superfluous row matrix as Basic Matrix [23]. For simplicity we will call to the Binary Discernibility Matrix as basic matrix. Recall from [24] that the reducts of a decision system can be calculated from this basic matrix, which is an important fact to consider for the development of the algorithm. A super-reduct is a subset of features that discerns between objects that belong to different classes. Definition 3. Let BM be a basic matrix and L be an ordered list of condition attributes. L is associated to a super-reduct if and oly if in the sub-matrix of BM considering only the attributes in L, there is no zero row (a row with only zeros). Proposition 1. Let BM be a basic matrix and L be an ordered list of attributes. If ∃cx ∈ L such that emL ∧cmcx = (0, .., 0). Then, L is not associated to a reduct. Where emL and cmcx are the exclusion mask and the cummulative mask respectively defined in [17]. Proposition 1 ensures that no superfluous attributes are present in the reduct. This proposition is used to evaluate if a super-reduct is a reduct. The minReduct [17] algorithm supports most part of the theoretical background needed. In our work we make use of their algorithm and only declare those definitions and propositions that were explicitly used. 2.2

Neural Networks

Feed-forward back propagation neural networks have gained their reputation due to their high use rate among the time [25]. They are present in several fields as prediction image recognition as in [26,27], medicine problems [28], chemistry problems [29], oil and gas industry [30], water level prediction [31]. Their repercussion an usage made them the best candidate for using their properties for feature selection. The theoretical background consider the concept of neurons. Each of this is the composition of a linear regression, a bias and an activation function. This neurons are ordered by layers and their connections are known as weights. The first layer is the input layer and the last one is known as the output layer. All the layers in between are called hidden layers [32]. The back-propagation algorithm disperses the output error from the output layer through the hidden layers to the input layers so that the connection between the neurons can be recurrently calculated on training looking forward to minimize the loss function in each training iteration, so that with the enough quantity of data and training we are able to classify and predict [33]. The definition of accuracy in fact is the number of correct predictions divided by the total number of predictions and its value lays in the interval [0, 1].

Classification Based on Rough Sets

2.3

215

Evolutionary Strategy

In evolutionary strategy the main idea follows this behavior: from a population of individuals within in an environment with limited resources, a competition for those resources is performed so that the survival of the fittest as natural selection does plays its role. From generation to generation the fitness of the population is increased. Given a metric on how to evaluate the quality of an individual it is treated as a function to be maximised. For an initial population we can initialize randomly in the domain of the function. After that we apply the metric as an abstract way to measure how fitness an individual is, where a higher value implies better. We must ensure that only some of the better candidates are chosen to seed the next generation. This is performed by applying mutation and/or recombination to them. Mutation is applied to an individual by altering some of their attributes resulting in a new individual. Recombination performed to two or more selected individuals, called parents, producing one or more new individuals, called children. By executing these operations on the parents we will end with a the next generation, called the offspring. These new generation retains at least all the best from the previous one, being variation operations the way to increase fitness in further generations. This process has an stop criteria so that new generation creation is iterated until an individual that satisfies the metric, in a defined level, is found or computational iteration limit is reached [34,35]. 2.4

MNIST Data Set

For a real testing purpose, a widely used data set in machine learning is required. The MNIST is a data set of handwritten digits which has been used for several classification and prediction models and all their values are reported officially. Handwriting recognition is a difficult task as mentioned due to the high number of attributes (pixels in the case of images) and what makes specially difficult to this data set is the huge amount of images for training (60,000) and testing (10,000). These are binary images centered at 28 pixels per 28 pixels [36]. 2.5

Proposed Method

In this section we introduce the method proposed for reaching a close enough to a minimal subset of attributes able to distinguish between elements of different classes. For this purpose we dive the algorithm into two stages. The first stage is to find a subset of reducts. As dealing with the entire 60,000 training images is impossible for memory and time complexity of the NP-hard problem we decide to randomly select a sample of 10 objects of each class. As in the MNIST data set each pixel is in the range of [0, 255] we set the threshold to 100, so that every pixel with a value high to 100 was set to 1 and everything else to 0. This number of elements and threshold was chosen by experimentation as

216

E. Torres-Constante et al.

the minReduct algorithm performed better for the basic matrix generated from this objects. The second step is to sort in lexicographical order as detailed in minReduct but with the difference that we move each of the columns to place each of the ones on the rows as close to the left as possible, so that we form an uppertriangular representation. Consider that this matrix is not necessarily square so we only want to ensure this triangular representation as close as we are able. All this column changes must be stored in order to translate to the original indexes after the algorithm finishes. This is shown in Fig. 1 has the top indexes are the original ones.

Fig. 1. Reduct list search performed over the rearranged basic matrix

The third step is to choose the number of reducts desired to find, this number is required as represent the number of individuals in a population when evolution is performed. A variation to minReduct is performed to do it, we first search for a maximum length limit and when found every time that a super-reduct is found we evaluate Proposition 1 to evaluate if it is a reduct or not. Once all the desired number of reducts are found we are able to proceed to the next stage of the algorithm where neural networks and evolutionary strategy are combined. Figure 2 shows the relation between reducts found and neural networks. For evolutionary strategy, as the initial population is the list of reducts found we ensure to mutate each candidate once for allowing enough variation to begin. Then we save the best candidates and use Univariate Marginal Distribution Algorithm (UMDA) [37] for next generation creation. The fitness function is defined as the accuracy on the feed-forward back propagation neural network model. To ensure that we reach a close to a minimal subset of attributes we only perform mutation at a 0.5% rate. Finally, we must define a prediction accuracy threshold as stop criteria or a maximum number of generations so that the algorithm is able to finish. 2.6

Experimental Setup

For the sampled sub set of features we decide to choose randomly 10 objects of each class. A class is the group of all digits from 1 to 9 labeled with the same

Classification Based on Rough Sets

217

Fig. 2. Individual relation to neural networks

decision attribute. With the threshold established to 100 we proceed to binarize the matrix of comparisons of each pair of selected elements removing all the superfluous rows. After this process the lexicographical order is performed and the matrix is rearranged moving all the ones to the left searching for an upper triangular representation. All the indexes are stored to be translated at the end. Once all this has been performed the minReduct algorithm is executed but with some changes. The first step is to search for a maximum length bound. For accomplishing it multiple maximum lengths are evaluated under a period of time, once it is reached the minimum found is considered as the general maximum length. With this value, minReduct runs again but every time that it founds a super-reduct it evaluates if it is a reduct so that it can be appended to the solution list. A maximum number of desired reducts is previously declared so that when the solution list reaches that quantity the algorithm finishes. If not, it will search until the and return all of the reducts found. This solution list is translated to the original indexes and mutated in a rate of 0.5%, or three random features in order to create the initial population for evolutionary strategy. For next generation creation it uses marginal probability in order to keep all those key features and mutation is performed in the same rate in order to grow slowly for just adding the minimum features from generation to generation. This allows to ensure that when we reach a solution the biggest possible variation from another one is at most in the same rate of the mutation 0.5%. For fitness function, the neural networks play their role. The topology used is an input layer with the same number of neurons as the length of selected features for the current individual to be tested. As activation function relu is used. For hidden layers there are two, the first one with 52 neurons and the second one with 26 neurons, both use the same relu activation function. The output layer uses sof tmax as activation function and has 10 response neurons as there are 10 classes in our data set. Also we use as loss function Sparse Categorical Cross-

218

E. Torres-Constante et al.

entropy, used 10 epochs and set batch size equal to 1/5 of the training samples. All definitions are described in [25,38]. The stop criteria is based on the whole data set perform ace considering all the attributes. So we set our threshold for accuracy to the maximum reported accuracy minus 0.04. Once a subset that satisfy this accuracy is found is reported as a solution and the algorithm finishes. In case it is not found at the beginning of the evolution is declared a maximum number of generations. For us this value was set to 100. With the same topology previously declared the model is trained with some variations when a solution is found. Stratified K-Folds cross-validation are performed for 5 folds over the whole train set each fold with 20 epochs whit the same batch size. For evaluating the performance of the solution one-vs-all multi-class classification metrics are calculated. For achieving it, over the same declared topology, Stratified K-Folds cross-validation is used for model training with only two folds [39]. In general terms, this neural network model is evaluated by accuracy-vs-epochs, loss-vs-epochs, multi-class precision. We also present onevs-all ROC curves and AUC scores [40], and one vs all precision vs recall metrics [41]. Moreover, the subset of features is evaluated by using it on some of the declared models in the documentation of the MNIST data set. P ython in language version 3.7.10 was used to implement all the source code [42]. Scikit-learn (SKlearn) library [43] and Keras [44] were also used.

3

Results and Discussion

In this chapter the assessment metrics calculated over the proposed model with the selected subset of features are discussed. The initial population had a length of 13 attributes per individual and a total of 20 individuals per population was used. The subset obtained has a length of 152 attributes, which represent a total of 19.38% of the total 784 pixels per image. 3.1

Performance Evaluation

To evaluate accuracy and loss metrics the first step was to reduce the train and test sets considering only those 152 attributes (columns). With this new data set, the training process was performed and the testing over unknown data (10.000 images) returned the following results. Figure 3 evidences how on each fold the accuracy in the model increases to higher values really close to 1. In fact, the reported accuracy for the model was 99.36% on training and 97.45% on validation. In terms of loss, it was reported a 0.0297 on training and a 0.0860 on validation. We can also evidence how the model is not over fitted in Fig. 4. Its also clear that training and validation converge after some epochs, in spite of the curve being slightly different at the beginning. This can be interpreted as a good reason to state that the model fits the data and any variation is not going to be statistical significant.

Classification Based on Rough Sets

219

Fig. 3. Accuracy vs Epochs

For analyzing precision on prediction we present the ROC curves for all one-vs-all with their corresponding area under the curve (AUC). In Fig. 6 we evidence an AUC approximately to 1 for every one-versus-all cases. This implies that the model has an strong performance in distinguish between all classes. Hence, we are allowed to interpret that those points chosen by the subset of attributes are able to discern, clearly classify, and predict between all classes. Which is also confirmed by Fig. 5 as its interpretation justifies the performance of the classification. A higher precision and recall score is also related to a better performance of the model. As presented in Fig. 7, in average the precision value is close to 0.99 so the ability of the model to predict each of the classes is confirmed. We would like to emphasize how following this approach we are evidencing how a reduction of more than the 80% of the features is able to keep at least 97% of the accuracy with the ability of fully discern between object and classes. In comparison with other models, we do not create uncorrelated variables like in PCA or categorizing like in SVM, we use the same data and mainly select the most important features on it. For this reason, the discriminator found can also be used in other models as we detail in Table 1. Furthermore, in any other model we can ensure to get a close enough to a minimal subset of features, being this another advantage of using our method.

220

E. Torres-Constante et al.

Fig. 4. Loss vs Epochs

3.2

State of the Art-Based Comparison

As mentioned we can evaluate the performance of the model against the error rate reported in the MNIST documentation for other models. Table 1. Solution subset of attributes evaluated in different reported models Classifier model

Reported Test Error Rate (%) considering all the features

Replicated Test Error Rate (%) considering all the features

Replicated Test Error Rate (%) considering selected features only

Linear classifier (1-layer NN) [45]

12.00%

12.70%

14.29%

K-nearest-neighbors, Euclidean (L2) [45]

5.00%

3.35%

4.35%

40 PCA + quadratic classifier [45]

3.30%

3.74%

5.36%

SVM, Gaussian Kernel [36]

1.40%

3.34%

3.06%

2-layer NN, 800 HU, Cross-Entropy 1.60% Loss [46]

1.86%

3.73%

3-layer NN, 500+300 HU, softmax, cross entropy, weight decay [36]

1.53%

1.79%

2.59%

This proposal



2.22%

2.55%

We are able to evidence from Table 1 that all models that used the attributes selected preserved the error rates in a small range no bigger than a 2%. As the classifier models differ significantly in their approaches of training and prediction and the error rates is preserved we can ensure that the selected features are

Classification Based on Rough Sets

221

Fig. 5. Confusion matrix for the model trained with the resultant selected attributes

certainly the most relevant for distinguishes purposes over the entire data set. In terms of computational and time complexity every classification model can be considered as less complex and also faster as less operations are preformed and less data is being used. Even in the case of using a reduction method it performs well, PCA is reducing even more the set of features and the quadratic classifier stills preserves its error rate in range. The same applies for SVM, with the consideration that is the model for which the solution subset of features perform even better than using the entire set of attributes. For models that follow the same approach of keeping the most important features like backward elimination, using a recursive feature elimination makes it computationally impossible to test overall the data set used. Furthermore, as is a regression model it will not ensure to achieve a minimal subset if we change some of the parameters to get a result. In the case of using random forest for getting feature importance we do not have certain of where to choose or which features to pick form the result, we tested it out and the results for this models returned a subset if 226 features which does not compare to our 152 subset of features. This will result in future time efficiency independent of the model we choose to build from the selected features. Our sub set of attributes reported an error rate of only 2.55% for neural networks systems while using 19,39% of the total amount of attributes. We consider to have enough evidence to consider this subset as a minimal enough preserver of the discernibility capacity of the whole set of attributes.

222

E. Torres-Constante et al.

Fig. 6. ROC Curve One-vs-All

Fig. 7. Precision-vs-Recall One-vs-All

Classification Based on Rough Sets

4

223

Conclusions and Future Work

This paper proposes a new strategy to find a subset of attributes able to preserve the discernibility capacity of the whole set of attributes in a group of classes. By the theoretical background of Rough Sets, we were able to build an initial population of possible solutions for a sample of the entire data set which settles the beginning for an intelligent search. Evolution strategy made possible to extend this subset of attributes to be useful for the entire data set. Mutation also played the role of controller to ensure obtaining a close enough to a the minimum set of attributes. The fitness function was all in the filed of neural networks, which made possible to ensure good accuracy levels. With all this together, we found a subset of attributes reduced on more than the 80% of the total amount of attributes. Recalling what was mentioned on the assessment metrics calculation and their interpretation with the subset of features found, we can build a model that predicts with an accuracy of over 97%. Moreover, it can discern between all the classes and their objects. Hence, we conclude that the found subset of attributes is able to preserve the discernibility capacity of the whole set of attributes in a group of classes with a minimum length. Furthermore, experimentation over other models proves that the computational cost of calculating the reducts is worth it as the subset of features can be extended for other techniques. Additionally, the huge attribute reduction shows that not all the information is required for classifying and predicting. Therefore, as future work, we propose: (1) to use reducts in average per groups of classes to use more information of the data set, (2) increase the number of classes such as in hand written alphabet characters and follow the approach of dived and conquer principle to build a set of discriminator that classify between characters previously its classification, (3) compare with other reduction techniques and with other data sets, as well as (3) establishing standard parameters for correctly use of the algorithm just as the density of 1’s required in a fundamental matrix for a good performance.

References 1. Wang, M., Wu, C., Wang, L., Xiang, D., Huang, X.: A feature selection approach for hyperspectral image based on modified ant lion optimizer. Knowl.-Based Syst. 168, 39–48 (2019) 2. Zhou, H.F., Zhang, Y., Zhang, Y.J., Liu, H.J.: Feature selection based on conditional mutual information: minimum conditional relevance and minimum conditional redundancy. Appl. Intell. 49(3), 883–896 (2018). https://doi.org/10.1007/ s10489-018-1305-0 3. Ayesha, S., Hanif, M.K., Talib, R.: Overview and comparative study of dimensionality reduction techniques for high dimensional data. Inf. Fusion 59, 45–58 (2020) 4. Gao, L., Song, J., Liu, X., Shao, J., Liu, J., Shao, J.: Learning in high-dimensional multimedia data: the state of the art. Multimed. Syst. 23(3), 303–313 (2015). https://doi.org/10.1007/s00530-015-0494-1

224

E. Torres-Constante et al.

5. Zebari, R., Abdulazeez, A., Zeebaree, D., Zebari, D., Saeed, J.: A comprehensive review of dimensionality reduction techniques for feature selection and feature extraction. J. Appl. Sci. Technol. Trends 1(2), 56–70 (2020) 6. Reddy, G.T., et al.: Analysis of dimensionality reduction techniques on big data. In: IEEE Access, pp. 54776–54788. IEEE (2020) 7. Mafarja, M.M., Mirjalili, S.: Hybrid binary ant lion optimizer with rough set and approximate entropy reducts for feature selection. Soft. Comput. 23(15), 6249– 6265 (2018). https://doi.org/10.1007/s00500-018-3282-y 8. Saxena, A., Saxena, K., Goyal, J.: Hybrid technique based on DBSCAN for selection of improved features for intrusion detection system. In: Rathore, V.S., Worring, M., Mishra, D.K., Joshi, A., Maheshwari, S. (eds.) Emerging Trends in Expert Applications and Security. AISC, vol. 841, pp. 365–377. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-2285-3 43 9. Pawlak, Z.: Rough sets. Int. J. Comput. Inf. Sci. 11(5), 341–356 (1982) 10. Pawlak, Z.: Classification of objects by means of attributes. In: Polish Academy of Sciences [PAS], Institute of Computer Science (1981) 11. Pawlak, Z. Rough sets: Theoretical aspects of reasoning about data In Springer Science & Business Media., vol. 9 (1991) 12. Jiang, Yu., Yu, Y.: Minimal attribute reduction with rough set based on compactness discernibility information tree. Soft. Comput. 20(6), 2233–2243 (2015). https://doi.org/10.1007/s00500-015-1638-0 13. Jensen, R., Tuson, A., Shen, Q.: Finding rough and fuzzy-rough set reducts with SAT. Inf. Sci. 255, 100–120 (2014) 14. Prasad, P.S., Rao, C.R.: IQuickReduct: an improvement to Quick Reduct algorithm. In: International Workshop on Rough Sets, Fuzzy Sets, Data Mining, and Granular-Soft Computing, pp. 152–159, December 2009 15. Chen, Y., Zhu, Q., Xu, H.: Finding rough set reducts with fish swarm algorithm. Knowl.-Based Syst. 81, 22–29 (2015) 16. Chen, Y., Miao, D., Wang, R.: A rough set approach to feature selection based on ant colony optimization. Pattern Recogn. Lett. 31(3), 152–159 (2010) 17. Rodr´ıguez-Diez, V., Mart´ınez-Trinidad, J.F., Carrasco-Ochoa, J.A., Lazo-Cort´es, M.S., Olvera-L´ opez, J.A.: MinReduct: a new algorithm for computing the shortest reducts. Pattern Recogn. Lett. 138, 177–184 (2020) 18. Roweis, S.: EM algorithms for PCA and SPCA In EM algorithms for PCA and SPCA, pp. 626–632 (1998) 19. Park, C.H., Park, H.: A comparison of generalized linear discriminant analysis algorithms. Pattern Recogn. 41(3), 1983–1997 (2008) 20. Baudat, G., Anouar, F.: Generalized discriminant analysis using a kernel approach. Neural Comput. 12(10), 2385–2404 (2000) 21. Maulidina, F., Rustam, Z., Hartini, S., Wibowo, V, Wirasati, I., Sadewo, W.: Feature optimization using Backward Elimination and Support Vector Machines (SVM) algorithm for diabetes classification In Journal of Physics: Conference Series (2021) 22. Zhang, H., Zhou, J., Jahed, D., Tahir, M., and Pham, B., Huynh, V.: A combination of feature selection and random forest techniques to solve a problem related to blast-induced ground vibration. Applied Sciences (2020) 23. Lazo-Cortes, M., Ruiz-Shulcloper, J., Alba-Cabrera, E.: An overview of the concept of testor. Pattern Recogn. J. 34(4), 753–762 (2000) 24. Yao, Y., Zhao, Y.: Discernibility matrix simplification for constructing attribute reducts. Pattern Recogn. J. 179(7), 867–882 (2009)

Classification Based on Rough Sets

225

25. Haykin, S.: Neural Networks - A Comprehensive Foundation (2008) 26. Dosovitskiy, A., et al.: An image is worth 16 × 16 words: Transformers for image recognition at scale. In arXiv preprint arXiv:2010.11929 (2020) 27. Weytjens, H., Lohmann, E., Kleinsteuber, M.: Cash flow prediction: Mlp and lstm compared to arima and prophet. Electron. Commerce Res. 21(2), 371–391 (2021) 28. Kumar, S.A., Kumar, A., Dutt, V., Agrawal, R.: Multi model implementation on general medicine prediction with quantum neural networks. In: 2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks, pp. 1391–1395 (2021) 29. Abdi-Khanghah, M., Bemani, A., Naserzadeh, Z., Zhang, Z.: Prediction of solubility of n-alkanes in supercritical co2 using rbf-ann and mlp-ann. J. CO2 Utilization, 25, 108–119 (2018) 30. Orru, P.F., Zoccheddu, A., Sassu, L., Mattia, C., Cozza, R., Arena, S.: Machine learning approach using mlp and svm algorithms for the fault prediction of a centrifugal pump in the oil and gas industry. Sustainability 12(11), 4776 (2020) 31. Ghorbani, M.A., Deo, R.C., Karimi, V., Yaseen, Z.M., Terzi, O.: Implementation of a hybrid mlp-ffa model for water level prediction of lake egirdir, turkey. Stochastic Environ. Res. Risk Assessment 32(6), 1683–1697 (2018) 32. Luo, X.J., et al.: Genetic algorithm-determined deep feedforward neural network architecture for predicting electricity consumption in real buildings. In: Energy and AI (2020) 33. Svozil, D., Kvasnicka, V., Pospichal, J.: Introduction to multi-layer feed-forward neural networks. In Chemometrics and Intelligent Laboratory Systems 39(1), 43– 62 (1997) 34. Eiben, A.E., Smith, J.E.: What is an evolutionary algorithm? In: Introduction to Evolutionary Computing, pp. 25–48 (2015) 35. Oliver, K.: Genetic algorithms In: Genetic algorithm essentials, pp. 11–19 (2017) 36. LeCun, Y., Cortes, C.: MNIST handwritten digit database (2010) 37. Alba-Cabrera, E., Santana, R., Ochoa-Rodriguez, A., Lazo-Cortes, M.: Finding typical testors by using an evolutionary strategy. In: Proceedings of the 5th Ibeory American Symposium on Pattern Recognition, p. 267 (2000) 38. Liu, W., Wen, Y., Yu, Z., Yang, M.: Large-margin softmax loss for convolutional neural networks. 2(3), 7 (2016) 39. Ramezan, C., Warner, T., Maxwell, E.: Evaluation of sampling and cross-validation tuning strategies for regional-scale machine learning classification. Remote Sens. 11(2), 185 (2019) 40. Wandishin, M.S., Mullen, S.J.: Multiclass roc analysis. Weather Forecast. 24(2), 530–547 (2009) 41. Powers, D.M.: Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation. arXiv preprint arXiv:2010.16061 (2020) 42. Van, R.G., Drake, F.: Python 3 Reference Manual. CreateSpace, Scotts Valley, CA (2009) 43. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011) 44. Chollet, F., et al.: Keras (2015). https://github.com/fchollet/keras 45. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998) 46. Simard, P.Y., Steinkraus, D., Platt, J.C.: Best practices for convolutional neural networks applied to visual document analysis. In: Icdar, volume 3. Citeseer (2003)

An Unexpected Connection from Our Personalized Medicine Approach to Bipolar Depression Forecasting ˇ c1,2(B) , Pavel Llamocca3 , and Victoria Lopez4 Milena B. Cuki´ 1 Empa Swiss Federal Institute for Material Science and Textiles, Dübendorf, Switzerland

[email protected]

2 General Physiology With Biophysics, University of Belgrade, Belgrade, Republic of Serbia 3 Complutense University of Madrid, Madrid, Spain 4 Quantitative Methods, Cunef University, Madrid, Spain

Abstract. As one of the most complicated and recurrent depressive disorders, bipolar depression holds the highest morbidity and high mortality risk, but effective early detection and appropriately targeted treatments are still missing. This requires a new innovative approach, one capable of forecasting of mood states, in particular manic one. In our recent work, we combined several data sources to extract the most relevant variables, describe its intrinsic dynamics by networkflow analysis, and apply several supervised machine learning models to predict mania in BDD. By applying several methods of extracting and selecting the features from those aggregated data, and consequently performed supervised machine learning we arrived at real personalized medicine approach to BDD forecasting. Here we are interpreting previously unpublished data on sleep-related variables and its possible relation with irritability that was the most promising variable from daily self-report data. By putting this connection in the perspective of other recent neuroimaging and biochemical findings we are elucidating on another most important factor, namely the reason why some antidepressants shown to disrupt sleep dynamics can exacerbate the tipping point to mania, via the already mentioned link between sleep-related variables and irritability that our research demonstrated to be of most valuable predictable power. Keywords: Bipolar depression · Mood disorders · Sleep-related variables · Forecasting · Personalized medicine

1 Introduction Bipolar depressive disorder (BDD) is a reccurent and complex mental disease which is considered to be one of the major contributor of worldwide work disability [1]. This disorder can produce everything from depression to mania (depression in major, dystimic or mixed forms), holds the most prominent suicide risk among mental disorders, usually starts in young adulthood and is characterized with high mortality risk [2–4]. The most important aspect of the treatment is medication, but BDD is usually misdiagnosed and treated as unipolar depression, in average for 8 years [5, 6]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 K. Arai (Ed.): IntelliSys 2022, LNNS 542, pp. 226–235, 2023. https://doi.org/10.1007/978-3-031-16072-1_17

An Unexpected Connection from Our Personalized Medicine Approach

227

This is the reason why this mental disorder with very complex intrinsic dynamics needs new approaches to early and accurate detection, and this much sought soluiton is probably coming from the computational psychiatry. In our recent work we aimed to mathematically describe the dynamics of and relationship between different phases, or states of the disease, by utilizing of network-flow analysis [7]. In this study, the variable “diagnosis” takes one of these 5 states (Depressed, Hypomanic, Manic, Mixed or Euthymic). By combining different sources we gathered data from conventional clinical interview, daily self-report via mobile application, recordings of smartwatches and actigraphs, and after careful selection of many variables (several methods of feature extraction and feature selection were used) we applied four supervised machine learning models: Random Forest, Decision Trees, Logistic Regression and Support Vector Machines algorithms. The final goal of our work was to discern what variables are most promising in predicting the mania, which has the most important practical implications [7, 8]. That work was natural extension of our previous research on several manners of data collection and various possibilities of portable monitoring and measeurements, as well as various mobile applications used for self-report [9, 10]. So, while we strived for reliable and relatable variables, we applied combination of previously succesfull pipeline that we used in other project of depression detection. Our aim to isolate the most relevant variables for mania prediction in BDD resulted in pinpointing irritability from daily self reports and duration of sleep as most promising [7]. Due to the nature of our previous publication and limitation of the space, we did not entirely discussed one group of variables that show to be in line with other research results coming from different disciplines dealing with the mechanisms and dynamics of bipolar depression: sleep related variables. Here we are offering previously unpublished data, that belong to our recently performed research on BDD, with additional interpretations and comparisons with other findings that connect those two variables that showed to be the most prominent in our work on BDD so far. In this manuscript we aim at elucidating on the connection between above mentioned most relatable variables, and putting them into a framework of overall effectiveness of BDD treatment. This Manuscript proceeds with Sect. 2 describing our Method, Sect. 3 reporting our Results, and finally Sect. 4 containing Discussion and the last section, Conclusions.

2 Methods In our project Bip4Cast we collected the data from different sources and combined them to discern previously unknown connections and relations between them. Our sample was comprised of 17 BDD patients, who were in the program of treatment in a hospital (Nuestra Senora de la Paz, Madrid) with whom our University has a collaboration. After acquiring approval from the Local Ethics Committee, all the patients were interviewed and signed the informed consent to participate in this study. We used the data from the conventional clinical interviews, daily self-report via the mobile application, recordings of smartwatches and actigraphs, and after careful selection of many variables (several methods of feature extraction and feature selection were used) we applied, at the first stage, four supervised machine learning models: Random Forest, Decision Trees, Logistic Regression and Support Vector Machines algorithms. The results of this research are

228

ˇ c et al. M. B. Cuki´

published in Llamocca et al., 2019 [10]. Further on a personalized model was developed by the same group of researchers in Llamoca et al., 2021 [7] and Portela et al. 2021 [27]. In the last mentioned paper, the authors show how the emotional state of an individual p can be modelled as a function mp (t) on-time t for a specific patient p. The behaviour of this function fits well with alterations in the behaviour of the patient, while a regular behaviour of the patient is associated with a bounded behaviour of the function. Thus, mp (t) >> mp (t + ε) on a specific time t for a small increase of time t + ε, indicates a rapid decline to a depressed state, maybe due to a sudden adverse event happening. On the contrary, mp (t) T achiz_prel = 39731.2 µs. It is presented, in real-time, the parallelism of the acquisition and processing by the fact that the shock is acquired simultaneously on the three channels. Case 2. Processing at the Microprocessor Level RT In the scenario of case 2, the vibration signals were acquired simultaneously on the three reconfigurable channels in the FPGA circuit. They were consequently transmitted by communication (DMA channel) to the RT processor in the Compact RIO system. At this processor level, the power spectrum calculation was performed. The graphical display of the acquired vibration signals and the realized power spectra are displayed in the Debugger mode of the application at the RT processor level. The frames of 4096 samples taken from the vibration signals acquired on the three channels (Ox, Oy, and Oz) are displayed. In Fig. 20, an experiment is presented, in which the three vibration transducers are hit by a flexible rod connected to the shaft on which the bearings are mounted. At the acquisition speed of 102.4kS/sec, 16000 samples were taken simultaneously for each of the three vibration signals, which also contain these shocks. It can be observed that the

254

B. Popa et al.

distance in time between 2 shock-type pulses is the same on each of the three acquisition channels and the number of samples between 2 shocks is 6400.

Fig. 20. Example of engine revolution analysis.

It was obtained a frequency of shock occurrence, which corresponds to a motor speed of T2 = 6.400 samples ∗ Te = 6.400 ∗ 9, 7 = 0, 06208 s. Thus we obtain a frequency 1 = 16, 11 rot/sec, which corresponds to of occurrence of shocks: f = T12 = 0.06208s an engine speed nmotor = 966 rot/min. . According to the ratio between the diameters (1/2) of the axle belts, the rotational speed generated by the motor, in this case, is about 1932 rpm. In Fig. 21, the FPGA circuit-level application is presented for the acquisition of a vibration signal produced by a metal rail. In this situation, it is possible to observe very well the synchronization of the “shock” signal on the three purchased channels.

Fig. 21. The interface of real-time parallel acquisition and processing software at the FPGA circuit for real-time analysis of “shock” type vibration signals.

Real-Time Parallel Processing of Vibration Signals

255

Case 3. FFT Processing at the PC Level The PC-level application was also developed in the LabVIEW graphical programming environment. Any development environment with TCP/IP communication capabilities can be used at this level. It is possible to observe the simultaneous acquisition of the three vibration signals by the fact that a “shock” pulse is synchronously displayed on each vibration signal. The total processing time increases to 86301 µs and the notion of off-line processing with sample loss is confirmed.

5 Future Work The ongoing development of software, with FPGA applications, for image processing [18] and signal processing leads to a series of further development proposals. The directions that can be deepened in the future are the following: • Development of the system proposed in this research for the determination of vibration errors in motors or other mechanical means and with predictive maintenance for motors based on vibration analysis such in [19]. • Scaling the example of signal processing on a new system with higher computing power Compact Rio 9039, without problems generated by memory resources; this system can acquire and process signals from 24 distinct points.

6 Conclusions In this paper, a vibration acquisition and analysis system was proposed. The idea started from the problem of acquiring signals with high-speed dynamics, such as the vibration signal at the bearings of an engine. Another area of applicability can be considered the turbine bearings field, presented in [20]. The acquisition system requires a very close placement to the mechanical system because an analog signal line can be disturbed and thus information is lost. The system developed by the National Instruments company has been used, which offers the possibility of processing at the source with the help of the FPGA circuit incorporated in the Compact Rio 9024 model. This system was also used in [21] for the process of monitoring a baking oven. One of the main problems is that the number of digital data (samples) to be transmitted is very large, and the communication to a computer system is made much slower. The main objective of the proposed system is to process the Fourier transform for the acquired signal as quickly as possible and to transmit the information in a high-speed regime. It was proposed a system of the parallel acquisition on three channels of vibration signals and their processing in real-time. Three distinct cases of processing at the level of the Compact Rio 9024 were presented: • Processing at the FPGA level. • Processing at RT level. • Processing at the PC level.

256

B. Popa et al.

For these three cases, software solutions have been implemented in the LabVIEW development environment, with a focus on parallelism and processing speed. The obtained results confirm that the processing of the fast Fourier transform at the FPGA level is the most efficient in terms of working time. This solution offers real-time, nearsource processing benefits and can be considered a local system capable of transmitting processed information to a higher level. Another innovative factor is the acquisition of signals on three independent channels, and the parallel processing of these signals. Acknowledgment. This work was supported by the grant POCU/380/6/13/123990, co-financed by the European Social Fund within the Sectorial Operational Program Human Capital 2014–2020.

References 1. Verma, A., et al.: Edge-cloud computing performance benchmarking for IoT based machinery vibration monitoring. Manufacturing Letters 27, 39–41 (2021) 2. Yang, H., Kumara, S., Bukkapatnam, S.T.S., Tsung, F.: The internet of things for smart manufacturing: a review. IISE Trans 51(11), 1190–1216 (2019) 3. Swartzlander, E.E., Hani, H.M.: Saleh, “FFT implementation with fused floating- point operations.” IEEE Trans. Comput. 61(2), 284–288 (2012) 4. Popescu, I.M., Popa, B., Prejbeanu. R.: Technology based on FPGA circuits and simultaneous processing of signals with great dynamic over time. Annals of the University of Craiova, Series: Automation, Computers, Electronics and Mechatronic 14(1), 25–30 (2017) 5. FPGA Architecture Presentation: https://www.slideshare.net/omutukuda/presentation-199 3175. Omesh Mutukuda (Masc. Candidate) 6. Wen, Z., Zhongpan, Q., Zhijun, S.: FPGA Implementation of efficient FFT algorithm based on complex sequence. In: 2010 IEEE International Conference on Intelligent Computing and Intelligent Systems, pp. 614–617. https://doi.org/10.1109/ICICISYS.2010.5658418 7. Yu-Heng, G.L., Chien-In, H.C.: Dynamic kernel function FFT with variable truncation scheme for wideband coarse frequency detection. In: IEEE Transactions on Instrumentation and Measurement 58(5), pp. 1555–1562 (May 2009) 8. http://www.ni.com/compactrio/. National Instruments Web Page 9. Bre´nkacz, Ł., et al.: Research and applications of active bearings: a state-of-the-art review. Mechanical Systems and Signal Processing 151, 107423 (2021) 10. Vanmathi, K., Sekar, K., Ramachandran, R.: FPGA implementation of fast fourier transform. International Conference on Green Computing Communication and Electrical Engineering (ICGCCEE) 2014, 1–5 (2014). https://doi.org/10.1109/ICGCCEE.2014.6922467 11. Wang, P., McAllister, J., Wu, Y.: Soft-core stream processing on FPGA: An FFT case study. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2756– 2760 (2013). https://doi.org/10.1109/ICASSP.2013.6638158 12. Wang, B., Zhang, Q., Ao, T., Huang, M.: Design of pipelined FFT processor based on FPGA. Second International Conference on Computer Modeling and Simulation 2010, 432–435 (2010). https://doi.org/10.1109/ICCMS.2010.112 13. Hassan, S.L.M., Sulaiman, N., Halim, I.S.A.: Low power pipelined FFT processor architecture on FPGA. In: 2018 9th IEEE Control and System Graduate Research Colloquium (ICSGRC), pp. 31–34 (2018). https://doi.org/10.1109/ICSGRC.2018.8657583 14. Lau, D., Schneider, A., Ercegovac, M.D., Villasenor, J.: FPGA-based structures for online FFT and DCT. In: Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines, pp. 310–311 (1999). https://doi.org/10.1109/FPGA.1999.803710

Real-Time Parallel Processing of Vibration Signals

257

15. Popa, B., Popescu, I.M., Popescu, D., Bobasu, E.: Real-time monitoring system of a closed oven. In: 2018 19th International Carpathian Control Conference (ICCC), pp. 27–32 (2018). https://doi.org/10.1109/CarpathianCC.2018.8399597 16. Information technology for the acquisition: parallel, synchronized and real-time processing of vibration signals, using FPGA (TIAVIB) technology, Innovation checks PN-III-P2-2.1-CI2017-0167, Nr. financing contract: 116Cl/2017, Partnership University of Craiova, SC Vonrep SRL 17. Popescu, I.M., Popa, B., Prejbeanu, R., Cosmin, I.: Evaluation of parallel and real-time processing performance for some vibration signals using FPGA technology. In: 2018 19th International Carpathian Control Conference (ICCC), pp. 365–370 (2018). https://doi.org/10.1109/ CarpathianCC.2018.8399657 18. Zhilu, W., Guanghui, R., Yaqin, Z.: A study on implementing wavelet transform and FFT with FPGA. In: ASICON 2001. 2001 4th International Conference on ASIC Proceedings (Cat. No.01TH8549), pp. 486–489 (2001). https://doi.org/10.1109/ICASIC.2001.982606 19. Novoa, C.G., Berríos, G.A.G., Söderberg, R.A.: Predictive maintenance for motors based on vibration analysis with compact rio. IEEE Central America and Panama Student Conference (CONESCAPAN) 2017, 1–6 (2017). https://doi.org/10.1109/CONESCAPAN.2017.8277603 20. Natili, F., et al.: Multi-scale wind turbine bearings supervision techniques using industrial SCADA and vibration data. Applied Sciences 11(15), 6785 (2021) 21. University of Craiova: Automation department. ADCOSBIO project, Advanced Control Systems for Bioprocesses in the food industry, Project no.: 211/ 2014, Project code:: PN-II-PT-PCCA-2013-4-0544 (2017) http://ace.ucv.ro/adcosbio/index_en.php

Robust Control Design Solution for a Permanent Magnet Synchronous Generator of a Wind Turbine Model Silvio Simani1(B) and Edy Ayala2 1

2

Department of Engineering, University of Ferrara, Ferrara, Italy [email protected] Universidad Politecnica Salesiana, Calle Vieja 12-30 y Elia Liut, Cuenca, Ecuador [email protected] http://www.silviosimani.it

Abstract. The paper addresses the development of a perturb and observe algorithm implemented for maximum power point tracking control of a permanent magnet synchronous generator. It is shown that this algorithm tracks the optimum operation point and provides fast response even in the presence of faults. The strategy implements the tracking algorithm by using real—time measurements, while providing maximum power to the grid without using online data training. The solution is simulated in the Matlab and Simulink to verify the effectiveness of the proposed approach when fault–free and faulty conditions are considered. The simulation results highlight efficient, intrinsic and passive fault tolerant performances of the algorithm for electric generators and converters with low inertia. Keywords: Robust control design · Passive fault tolerant control Maximum power point tracking · Perturb and observe algorithm · Permanent magnet synchronous generator · Wind turbine model

1

·

Introduction

In the era of energy shortage, the development of non-fossil energy has become increasingly close. Among them, renewable energy represented by solar energy and wind energy has eased the trend of global energy tension [6,14]. On the earth, wind energy is everywhere. It is clean, green, pollution-free, and sustainable energy. At present, the conversion of wind power into electric energy is mainly achieved through wind turbines. A wind farm is mainly composed of the site, wind, and wind turbines [4]. The so-called wake is a common wind flow phenomenon in nature, and is characterized by the decrease of the wind speed. After the upstream turbine consumes some wind energy, the downstream turbine absorbs less wind energy [3,9]. The purpose of wind farm layout optimization is to determine the geographical location of the turbine to be installed in the c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023  K. Arai (Ed.): IntelliSys 2022, LNNS 542, pp. 258–270, 2023. https://doi.org/10.1007/978-3-031-16072-1_19

Wind Turbine Permanent Magnet Synchronous Generator Robust Control

259

wind farm, and meet the condition that the wind farm can make maximum use of wind energy for power generation. To facilitate the modeling and implementation of WFLO problems, decision variables are often discretized to meet the needs of actual production and life. The classical optimization methods based on the gradient are not easy to be adopted for the discrete WFLO problem because it needs gradient and other information [2]. In recent years, the metaheuristic algorithm has had natural advantages regarding solving the WFLO problem because of its gradient free characteristics [2]. Therefore, many researchers have applied the metaheuristic algorithm to this problem and achieved many research results. In recent years, an increasing number of metaheuristic algorithms have been employed to the WFLO problem. The metaheuristic algorithms possess the characteristics of no gradient information and have natural advantages regarding solving discrete WFLO problems. In recent years, metaheuristic algorithms to solve the WFLO problem have included the biogeographical-based optimization algorithm, adaptive differential evolution algorithm, genetic algorithm, particle swarm optimization, yin-yang pair optimization, evolutive algorithm [5], dynastic optimization algorithm, and other single- objective optimization methods, which have demonstrated strong optimization performance and provided many high- quality solutions for solving WFLO problems. 1.1

Related Literature Review

To handle the multi-objective WFLO problem, [5] presented a multi-objective WFLO model based on the wake effect. Using this model, power generation was improved and the load caused by partial wake overlap was reduced. In [5] designed a multi-objective WFLO problem to maximize the energy output and minimize distance constraint conflicts between wind turbines. Moreover, [5] used a multi-objective algorithm to solve the WFLO problem with two objectives: minimizing land resource use and maximizing energy output. The work made the WFLO problem closer to the actual scenario by modifying the wind constraint conditions, and solved the problem using a multi-objective optimizer. Moreover, it used a multi-objective approach to address the WFLO problem, by minimizing the unit power generation cost and maximizing power generation. The multi-objective lightning search algorithm was used to optimize three objectives, as described. On the other hand, it considered the impact of land occupation, energy output, power infrastructure, and environment, and presented a multi-objective method based on the gradient to solve the WFLO problem, as shown in Fig. 1. The author in [7] regarded energy output maximization and noise impact minimization as optimization objectives, and adopted a multi-objective stochastic optimization method to solve the WFLO problem. It considered solving the WFLO problem under two objectives: maximizing the total cable length and the power generation. To solve the WFLO problem, the above research involved various optimization objectives and multi-objective optimizers, and the performance of various multi-objective optimizers could not be analyzed. To fill this research

260

S. Simani and E. Ayala

Fig. 1. Diagram of the maximum power control scheme.

gap, while under the same objectives, we use nine multi-objective metaheuristic algorithms to compare and analyze the performance of WFLO in various scenarios, to provide the best optimizer to solve this problem, as highlighted in as shown in Fig. 2.

Fig. 2. Diagram of the optimum generated control scheme.

The contributions of our work can be summarized as the following points: First, the basic WFLO problem is divided into three objective optimization problems, so that the benchmark wind farm testing problem is more in line with the complex environmental requirements of wind farms. Secondly, the optimization performance of nine multi-objective optimizers in the WFLO problem is compared and analyzed, and a multi-objective optimizer with superior performance and strong robustness is provided for solving the WFLO problem. Third, an in-depth analysis of the optimization results, success rates, and CPU time of different optimizers are considered in two wind farm scenarios with different wind velocities and wind directions [7]. The considered scheme is sketched in Fig. 3.

Wind Turbine Permanent Magnet Synchronous Generator Robust Control

261

Fig. 3. PID regulation scheme for Cp control.

When wind flows through the turbine, the wind velocity decreases and the degree of turbulence increases, which results in the formation of a wake behind the turbine [8]. Because of the wake effect, the wind energy in the downstream area is reduced, and the electricity generated is also reduced. Nowadays the most popular and commonly adopted wake model is the Jensen wake model [8], as shown in Fig. 1. For the calculation of the generating power of each turbine, in this paper, the generating power model of various wind speeds in various directions is considered. The wind direction is divided into several segments, with zero north wind as the starting direction, and increases at a certain angle from east to west. According to the law of energy conservation [13], the power generated by the j-th turbine in unit time when the incident wind speed [13]. For building a wind farm project, the cost problem cannot be ignored. The modeling of the cost problem is as in [12]. When calculating the total annual cost of building the wind farm, only the quantity of wind turbines in the wind farm is considered as a variable. In this paper, the author supposed that the dimensionless cost/year of a generator was 1. If sufficient turbines are installed into the wind farm, the cost of each additional turbine can be reduced by up to 1/3. The mathematical description of the cost model is described in [12].

2

Wind Turbine Model

There is a great number of literature on the WFLO problem with constant wind speed and constant wind direction. This scenario does not exist in the real world; hence, it is an ideal test environment. In this paper, two wind probability distribution datasets are adopted. The wind farms in Data 1 are from [15]. Their wind probability distributions are shown in Fig. 3. Here the wind direction is divided clockwise into 16 directions, with the north wind of zero and the increasing angle of 22.5. The wind speed is divided into four speeds: 14.4 m/s, 10.8 m/s, 8.2 m/s, and ”. The partial cedents on the left of the symbol represent what the patient currently has, and the partial cedent on the right of the symbol represents the proposed change. • The difference in confidence (DConf ) - represents the difference in confidence between the original treatment and the proposed treatment. It is interpreted as a predicted improvement in patient expressed in percentage points. eTRT KB Treatment Rule Syntax. The treatment rules are then used by eTRT’s rule engine to help clinicians modify a patient’s treatment. Figure 6 shows a treatment rule that was encoded using the extracted action rule seen in Fig. 5.

Fig. 6. Sample treatment rule encoded into eTRT using jess syntax.

Syntactically, they are similar to the diagnosis rules with the only difference being the treatment. Instead of creating a Diagnosis object, the treatment rule creates a Treatment object, which is comprised of three parts: the proposed change in treatment, the difference in confidence, and an explanation for why the treatment was recommended. 2.3 Knowledge Translating Procedure The knowledge translating procedure begins with reading the output files from the data mining software. The second step is parsing and interpreting them, adding the explanation component, and encoding them into the eTRT knowledge base using JESS syntax. The pseudocode for knowledge translating is depicted in Fig. 7.

Knowledge Translator for Clinical Decision Support in Treating

277

Knowledge Translator Procedure 1. Read a rule. 2. Extract confidence, category, and other components from the rule. 3. Split the rule’s hypothesis into partial cedents 4. Parse each partial cedent and create an object representing the cedent 5. Develop an explanation for each partial cedent 6. Create a rule object containing the cedent objects and explanations 7. Encode that rule object to a file in eTRT KB

Fig. 7. Steps in the knowledge translator procedure.

Reading a Rule. Each rule one by one is read from the data mining software output file. Each rule is identified by an ordering number followed by the hypothesis (see Figs. 3 and 5). A hypothesis starts with a partial cedent, that starts with an attribute, which starts with a letter. Parse Rule’s Components. The next step parses a read rule and extracts the confidence, support, or DConf numbers, hypothesis, and conclusion of the rule. Parse Hypothesis into Partial Cedents. In the next step hypothesis is split into partial cedents, which are separated by an “&” symbol. In action rules, additionally flexible and stable parts have to be extracted - the hypothesis uses a “:” symbol to separate those. The flexible part is further split into a before (representing the current treatment) and after section (representing the proposed changes) as delimited by the “->” symbol. Parse Each Partial Cedents. In this step, each partial cedent is parsed. For diagnosis rules, a list of objects representing partial cedents is created. For treatment rules, two lists are created: one list contains the cedents from the stable part of the hypothesis and the other one contains the cedents parsed from the proposed treatment section. Develop Explanations for Each Partial Cedents. After parsing the partial cedents the explainability feature of the eTRT system is developed. The natural language-encoded explanation is based on the attribute name and the partial cedent’s condition. Develop Explanations for Each Partial Cedents. After parsing the partial cedents the explainability feature of the eTRT system is developed. The natural language-encoded explanation is based on the attribute name and the partial cedent’s condition. Create a Rule Object. A rule object represents a rule and stores the partial cedent objects, the relevant numbers (category, confidence, and support for diagnosis rules, DConf for treatment), and it also stores the explanations generated from the cedents. Encode a Rule. The rule encoding first checks if the rule’s confidence and support meet the threshold specified by the user. Then encoder checks if the rule already exists in the knowledge base to avoid repetitive rules. A separate knowledge base is maintained for the diagnostic and separate for treatment rules.

278

K. Tarnowska and J. Conragan

3 Experiments and Results Experiments were conducted to test the simplicity, efficiency, and scalability of rule encoding by the algorithm described in Sect. 2. Within this section, the testing procedure is described, test results are presented and discussed. 3.1 Graphical User Interface The Graphical User Interface (GUI) was developed to access various functions of eTRT, such as Patient data, Visit data, and Analytics [4]. The Knowledge Updater is available through the user interface, and the corresponding frame is depicted in Fig. 8.

Fig. 8. The main frame of eTRT’s graphical user interface and a frame for updating the eTRT’s knowledge base.

A user can update separately the diagnosis rules, the treatment rules, or both. Users have the option to change the minimum confidence, support, and change in confidence values, which by default accept any values. Users can also change the path to the folder where the association and action rules produced by LISp-Miner are stored. 3.2 Results The Knowledge Translator was tested within eTRT for efficiency and scalability. The runtimes of various steps of Knowledge Translator are depicted in Table 1. Encoding

Knowledge Translator for Clinical Decision Support in Treating

279

step encompasses all operations from reading the files with rules to writing the rules into KB. Parsing, in this case, only refers to parsing the rule and creating an object in memory, but it does not include any I/O operations. The tests were run using 2,192 diagnosis rules and 1,348 treatment rules. An average from running one test 5 times is presented in the Table 1. Table 1. Runtimes for encoding and parsing diagnosis (Total of 2,192 Rules) and action rules (Total of 1,348). Rule type

Total encoding time

Total parsing time

Time to parse a rule

Diagnosis rule

0.29 s

0.22 s

0.098 ms

Treatment rule

0.24 s

0.13 s

0.094 ms

3.3 Discussion As one can see from the results in Table 1, the developed Knowledge Translator encodes and parses a massive amount of extracted rules in a relatively very short time. This provides an important step in the future scalability of the eTRT system. When comparing a time to parse a single rule by the Knowledge Translator (less than 0.1 ms) versus the same task performed manually (manual encoding by a human, which takes approx. 2 min at least to read, interpret and encode a rule in a correct syntax), the time gain is enormous. Additionally, the developed Knowledge Translator encodes the human-understandable explanations, which are critical for clinical use and support in the accurate diagnosis of the category of a hearing problem and treatment actions recommendations (see Fig. 9).

Fig. 9. The explainability of diagnosis inference and treatment recommendation in eTRT visualized in a graphical user interface.

280

K. Tarnowska and J. Conragan

4 Conclusions Within this work, we have developed a complete and accurate knowledge base (KB) of the eTRT system which is critical for the other metrics eTRT is being evaluated for, such as accuracy and coverage of the predictions for new patient cases. This knowledge base of TRT method and eTRT system support in clinical decision-making are expected to improve healthcare outcomes for tinnitus patients and wider adoption of this niche but effective therapy. The encoded knowledge can be easily interpretable and explained which is especially critical for a potential clinical use. eTRT offers the realization of precision and personalized medicine for tinnitus healthcare, which is currently missing. The system proposes a novel framework for an expert system that utilizes the concept of actionable knowledge mining, and seems promising in a variety of clinical settings, as medical practitioners need actionable knowledge. With the developed Knowledge Translator and Encoder component, the knowledge base of the eTRT system can be updated at any time efficiently. We are not aware of any other software existing that translates knowledge from LISp-Miner to the Java Expert System Shell, and our algorithm can be easily adapted to other applications when knowledge translation and updates are needed. The limitations of this study include the challenges with the medical validation of the extracted and encoded rules, which is not scalable. More extensive testing needs to be conducted with more patient cases to determine the completeness and accuracy of the knowledge base of eTRT. The future work in this research includes expansion of the available tinnitus datasets and more extensive and complex data mining experiments, and consequently, improvement and continuous testing of the knowledge translator on the new data mining results. The graphical user interface needs to be revised and tested by real users to determine the feasibility of using the system in clinical practice.

References 1. Understanding the Facts: American Tinnitus Association. www.ata.org/understanding-facts. Last Accessed 31 Jan 2022 2. Jastreboff, P.J., Jastreboff, M.M.: Tinnitus retraining therapy (TRT) as a method for treatment of tinnitus and hyperacusis patients. J. Am. Acad. Audiol. 11(3), 156–161 (2000) 3. Tarnowska, K.A., Ras, Z.W., Jastreboff, P.J.: Decision Support System for Diagnosis and Treatment of Hearing Disorders The Case For Tinnitus, 2nd. Springer (2017) 4. Tarnowska, K.A., Dispoto, B.C., Conragan, J.: Explainable AI-based clinical decision support system for hearing disorders. In: AMIA Joint Summits on Translational Science proceedings, pp. 595–604. AMIA Joint Summits on Translational Science (2021) 5. Friedman-Hill, E.: Jess The Rule Engine for the Java Platform, pp. 159–161. Sandia National Laboratories (2008) 6. Simunek, M.: LISp-miner control language description of scripting langauge implementation. J. Syst. Integr. 24–44 (2014) 7. Ras, Z.W., Wieczorkowska, A.: Action-rules: how to increase profit of a company. In: Zighed, D.A., Komorowski, J., Zytkow, J. (eds.) Principles of Data Mining and Knowledge Discovery, Proceedings of PKDD’00, Lyon, France, LNAI, No. 1910, pp. 587–592. Springer (2000)

Framework for Modelling of Learning, Acting and Interacting in Systems Artur Poplawski1,2(B) 1

2

NOKIA Krak´ ow Technology Center, Krak´ ow, Poland [email protected] AGH University of Science and Technology, Krak´ ow, Poland

Abstract. This work outlines unified formal approach to modelling and analysis of different ML problems in single-user and multi-user setting. This is done by extending formal framework of game theory and enriching standard definition of the game by introducing state (or memory) and algorithm for the players. This together creates formal tool allowing expression of the subtle properties of the models and algorithms. Additionally proposed definitions try to capture stochastic aspects of both algorithms and environment algorithms operate hence make stochasticity internal to the model. Keywords: Learning

1

· Formalism · Game theory

Introduction

Motivations for these considerations come from attempts to understand, model and develop algorithms for self-adaptive and learning wireless networks. Large class of models used there comes from the game theory (see e.g. [8] for gentle introduction to both: game theory and its applications for modelling wireless networks, however in general, literature on the subject is vast). Game definition is well suited to represent the problem in its static aspect - how single interaction looks like. Game theory goes beyond this static view by considering iterated games. From engineering perspective (but definitely also from the perspective of applications of game theory in scientific disciplines as research tool not as an engineering tool) one must however emphasize algorithmic aspect related to learning and utilizing information gathered during playing the game to “solve” the game. This way of thinking resonates well with modern revolution around application of machine learning (ML) and, more generally, artificial intelligence (AI). Indeed this learning aspect is also considered in game theory where it is known as “learning in games” (see e.g. monograph [4] for general overview and [6] or [16] for special context of games in wireless communication). There is a need to better understand class of problems appearing in learning in games and their relation to other domains under umbrella of machine learning (ML). This would c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023  K. Arai (Ed.): IntelliSys 2022, LNNS 542, pp. 281–294, 2023. https://doi.org/10.1007/978-3-031-16072-1_21

282

A. Poplawski

allow to accurately express conditions for application of algorithms, comparing models, manipulate them etc. To achieve this it is required to develop unifying formal tools in the toolbox of researchers and engineers. This work tries to address this need by slight extension of the classical game theoretical model to naturally express algorithmic and learning aspect, but also formally deal with issues like uncertainty. Outline of the paper is following. In next chapter we give some standard definitions and notations. Then, after some discussion on motivating heuristics, we will introduce the notion of the learning system which combines game-theoretic model (i.e. interactive aspect of the system) with machine-learning aspect. We will give some examples. We will express it in the stochastic setting based on the so called Giry monad (see [5]) generalizing the Markov chain framework. Next we show how some classical problems of machine learning fit into the framework. We will also argue that game-theoretic perspective brings new insight into the properties of the ML algorithms. In the last section we give brief overview of future perspectives of this line of research.

2

Definitions and Notations

For function f : S → R and A ⊂ S we define by arg max f = {x ∈ A|∀s ∈ S : s∈A

f (s) ≤ f (x)}. When A = S we will write just arg max f . For X, Y being the set, X Y = {f |f : Y → X}. For set X and A ⊂ X denote by χA characteristic function χA : X → {0, 1} s.t. χA (x) = 1 ⇐⇒ x ∈ A. We do not label this function additionally by X to reduce number of indices: every time symbol is used it should be clear from context what the set X is.  Xp , for p ∈ P by For Cartesian products of sets indexed by set P , i.e. p∈P  πp : Xp → Xp we denote projection on the set Xp p∈P

Assuming that (X, σ) is a set with σ-algebra we will denote by P(X) set of all probability measures on (X, σ). To simplify notation, we will not mention σ when considering P(X). Classical definition of the game given in [17] is following: Definition 1. Game Γ is a triple (P, {Sp }p∈P , {up }p∈P ) where P is a set of players and eachof the Sp is a set a set of strategies for given player. For each Sp → R is a payoff function. p ∈ P up : S = p∈P

3 3.1

Acting and Learning Motivation

In [10], motivated by studies of game-theoretic model of wireless networks we have introduced simple representation of learning agent (or rather player in the specific context of the game modelling). Because the representation was used not

Modelling of Learning and Acting

283

only as a tool for formal analysis but also directly as a basis for the simulation software this representation tries to capture “interface” learning and acting agent needs to expose and provide template of the internal structure. Thus, abstractly, we model internal state of the agent p as some object Mp . One can think about this object - instantiation of which depends on the algorithm - as representing agent’s memory. Gathering experience or “learning” is modelled by the function 

where S =

learnMp : Mp × S → Mp , Sp .

p∈P

Although definition of S above comes from the framework of game theory we are staying within, it is clear that in this context S represents state of the “external world” or environment, as this is constituted by actions of all agents/players. We emphasize word “external” here as the state of the whole world includes also internal states of agents. The decision of a player is a function of the memory and is modelled by selectp : Mp → Sp . This clearly leads to three related types of dynamical system understood here as simply discrete map. First is just an evolution of the whole world, including the internal states of agents:     Mp × Sp → Mp × Sp ev : p∈P

p∈P

i∈P

p∈P

provided as ev(m, s) = ((learnMp )p∈P (m, s),



selectp (m))

p∈P

One can also restrict the view to evolution disregarding the outcomes of the games provided as ev  (m) = (learnMp )p∈P (m, (selectp )p∈P (m)) This is justifiable as long as we consider trajectory, i.e.:   trajm,s : N → Mp × Sp p∈P

p∈P

defined as trajm,s (0) = (m, s) trajm,s (n + 1) = ev(trajm,s (n)) and analogously:  trajm :N→

 p∈P

Mp

284

A. Poplawski

as

 (0) = m trajm  trajm (n + 1) = ev  (trajm (n))

One can easily observe that  (n + 1)) πs (trajm,s (n + 1)) = (selectp )p∈P (trajm

so essentially both contain the same information. 3.2

Stochastic Approach

Some problem with the little framework presented so far comes from the fact that it models game as deterministic entity. However, nature of game modelled often involves randomness that cannot be eliminated without significant violation of accuracy of the model. Also algorithms used to optimize and control behaviour of players are often probabilistic i.e. incorporate some form of randomness as a resource necessary to solve the problem. This is a serious formal issue that shows up on the level of analysis, but also on the level of implementation in strictly functional programming languages (like Haskell). One possible approach, sufficient for programming and sometimes for mathematical analysis is to introduce randomness directly via introduction of random variables into the model (e.g. like in the classical paper [14]). Schematically, in this approach instead of function from some state one considers function from the product of state and random variable. So instead of the function: f :D→C one considers function f :D×V →C Where V is some random variable (possibly multidimensional). One can easily see that f can be now described by deterministic recipe - whole randomness is encapsulated in the random variable. This approach is - often without deeper reflection - applied in programming languages, where random number generator (RNG) serves as V . From more detailed analysis point of view however this is not satisfactory as systems are still modelled by the not deterministic function of D. Very thorough analysis reaching the foundations of the notion of probability as conventionally used today after Kolmogorov, definition of random variable and precise notion of independence given on the ground of this theory allows to convert this “random variable trick” into the sound function language. This often must be done when one comes from description or implementation of the procedure to proving its correctness in some sense (e.g. proving that some results are achievable with high probability etc.). However, following the work [5], there is an elegant way of incorporating randomness into deterministic framework in a way that eliminates this passage

Modelling of Learning and Acting

285

through definitions of the random variables. As additional benefit such an approach may be a starting point to more abstract approach toward the probability theory resulting in new theoretical frameworks. To avoid technical difficulties we will not discuss assumptions about existence of certain measures and σ-algebras. Essence of the Giry’s approach is to consider f : D → P(C), so introduce randomness by assigning to each point of the domain D probability measure on C. Such a functions can be composed and result of the composition of the function f : D → P(C) and the function gC → P(B) is the function h = f  g : D → P(B) defined by  (f  g)(x)(A) = g(y)(A)df (x)y C

where A is a measurable subset of B. We used “” as a symbol of this composition, and order of arguments for operator “” is different than usual composition operator “◦”. We extend the definition of the game here to the following: Definition 2. Game is a triple Γ = (P, {Sp }p , {up }p ) where P is a set of play Sp → P(R). ers, Sp is a set of strategies for each player p and up : p∈P

Please note, that classical definition of von Neumann and  Morgenstern amounts to additional assumption about the range of up : up ( Sp ) ⊂ {δr |r ∈ p∈P

R}, where δp is a Dirac’s measure. Following the same path, we extend definition of the learning system from the previous subsection to the following: Definition 3. Learning system is the triple: (P, {(p, Mp , learnMp , selectp )}p∈P , {Sp }p∈P , {up }p∈P ) where (P, {Sp }p∈ , {up }p∈P ) is a game, learnMp : Mp ×



Sp → P(Mp )

p∈P

is stochastic learning function and selectp : Mp → P(Sp ) is a stochastic choice of the strategy. 3.3

Reductions to Games

We are searching for possibly universal language that allows us to easily construct, express, manipulate and compare models of system where entities interact between themselves and interact with environment. Interaction and learning was

286

A. Poplawski

covered by the extension to the model offered by game theory. We argue, that this extended model is also sufficient for modelling interaction with environment. Whole idea is about modelling the environment as a player in game. This approach has been used before. In context of pure game theory it was known as game against nature (see e.g. [9]), where Nature was treated as a special type of player, which has a trivial constant payoff function but complicated strategy. In context of extended games framework presented in this work we also model Environment (which seems to be a better name) as a player in the games with memory. This player also does not have any intention expressed as non-trivial payoff and its memory models objective reality. We present a couple of examples how to express models used in the ML community as an extended model, for now abstracting from exact description of players other than Environment to show that one can easily use the formalism to unify approach. Multi-armed Bandits. Well known multi-armed bandit (MAB) model is one of the simplest and perhaps early considered (long before named was coined) problem in the automated learning. It is extensively discussed in the [13]. In plain language situation modelled by MAB is equivalent to situation of the person playing the gambling game with the machine where machine has multiple arms to be pulled. With each arm there is associated distribution of reward. Pulling the specific arm effects in sampling from the distribution associated to such an arm and this sample is the instantaneous reward obtained by the player. The distribution associated with each arm remains unknown to the player, and ultimate goal is to obtain highest possible reward on some finite number of the plays. Lack of knowledge about distribution forces the player to experiment with different arms in order to get some knowledge about distribution (like its first moment) - activity called “exploration” in jargon - and to utilize knowledge gathered so far to maximize total reward. What is important here, that machine multi-armed bandit - represents environment players deals with, the environment is stochastic but static. It does not evolve during the sequence of plays. From our point of view this situation is modelled by the extended game of two players. ΓM AB = ({(p, Mp , learnMp , selectp ), (b, {∗}, learnMb , selectb )}, {Ap , {∗}b }, {up , ub }) where p represents a player and b represents bandit machine1 . Ap is the finite set of arms, ub ≡ 0, and up : A × {∗} → P(R). Note that learnMp , selectp are irrelevant from the point of view of definition of the MAB problem (but relevant of course when one starts to analyze particular algorithm of solving MAB problem), and learnMb , selectb are uniquely determined (an also irrelevant). 1

We introduce this player to show general scheme and make the translation of the problems into the language of extended game close to the original. One can model both MAB and MMAB as single-player game and multi-player game without introducing representation of bandit-machine just by properly defining payoff

Modelling of Learning and Acting

287

Multi-player, Multi-armed Bandits. Widely discussed extension to the MAB problem is so called multi-player multi-armed bandits problem (MMAB)). This model extends MAB by introducing many players but staying with the assumption about single multi-armed bandit machine, static reward etc. In this game, all “human” players choose and pull some arm simultaneously, and, if arm was pulled individually by a single player, reward for this player is the same as in the case of the MAB. However if it happens that two or more players choose the same arm reward is altered. Typically, when considering this model it is assumed that reward in this case is 0. Our representation of the model is: ΓM M AB = ({(p, Mp , learnMp , selectp )}p∈P ∪ {(b, {∗}, learnMb , selectb )}, {Ap }p∈P ∪ {{∗}b }, {up }p∈P ∪ {ub }) where P is a set of players and b is a bandit machine. Ap = A for each p ∈ P is the finite set of arms, ub ≡ 0, and  dst(ap ) ∈ P(R) if ap = aq for q = p up ((ai )i∈P ) = otherwise. δ0 where dst : A → P(R) is a function assigning distribution to the arm in bandit machine. Similarly to previous case learnMb , selectb are uniquely determined (an also irrelevant). The model, as presented so far, captures well interaction of the players with environment. However description of the MMAB model typically involves certain assumptions about knowledge of the players. This can be elegantly expressed as restriction to the learning function of players. One of the variants of the MMAB is MMAB where players know only about their strategies and their rewards. This can be expressed as requirement about factorization of the learnMp function as follows. We additionally assume that for each learnMp for p ∈ P function factorizes learnMp Mp Mp × AP × {∗} (πMp ,πp ,up ◦πAP )

φp

Mp × A × R Another variant, especially interesting because it allows to realize scenarios where certain communication or collaboration between players occurs, is the one where player additionally to its own choice of arm and reward knows if the receiving 0 is the result of reward from playing the arm or result of choosing the same action together with other player. This can also be expressed via requirement on the factorization. learnMp Mp Mp × AP × {∗} (πMp ,πp ,up ◦πAP ,indp ◦πAP )

Mp × A × R × {0, 1}

φp

288

A. Poplawski

where indp : AP → {0, 1} is an indicator defined as  1 if ap = aq for q = p indp ((ai )i∈P ) = 0 otherwise. Markov Decision Processes. Markov Decision Process (MDP, see e.g. [15] for thorough overview of model and related algorithms) is a model much more general than previously discussed. It is fundamentally explored in the theory and practice of reinforcement learning. Contrary to previous two, where environment representing the machine would be eliminated, in this model environment plays crucial role. One possible approach to modelling environment here is, similarly to the way we did it in the case of MMAB introduce the player representing the environment This leads to model: ΓM DP1 = ({(p, Mp , learnMp , selectp )}p∈P ∪ {(e, E, learnMe , selecte )}, {Ap }p∈P ∪ {{O}e }, {up }p∈P ∪ {ue }) E here describes the state of the environment. learnMe represents change in the state of the environment - so its mnemonic name is a little bit misleading in this context. Op is observable corresponding to the state of environment as observed by agents in P . As it is typically the case in MDP theory to assume that environment itself may be stochastic, to reflect it one needs to move from deterministic, functional dependency of change in memory into the monadic one (in the sense of Giry monad).

4

Algorithms and Architectures

To further illustrate and show expressiveness of approach we will outline couple of algorithms fitted into this framework. 4.1

-greedy Algorithm

Well known −greedy algorithm can be easily represented within the framework. Let S be a finite set of states. We define M = RS functions from the states to numbers. We define function update : M × S × R → M as update(f, s, r) = (αf + (1 − α)r)χ{s} + f χS\{s} We also define select : M → S by following formula specified for the singletons as the measure is on the discrete space : select(m)({s}) = (1 − )

1 1 χarg max m (s) +  x∈S | arg max m| |S| x∈S

Modelling of Learning and Acting

289

Classical case of the -greedy applied in MAB we simply consider system described in Sect. 3.3 with one player being nature and the other being the greedy player described by Mp = Ap , learnMp = update and selectp = select. In completely analogous way one can fit into the framework some version of the Thompson Sampling. 4.2

UCB Algorithm

For well known UCB (see e.g. [13]) situation is more complicated. One can fit UCB into the scheme by following. We take M = RS × RS where pair (pl, m) ∈ M describes respectively set of counters associated with each arm and current “experimental” mean value associated to the arm. We will immediately present UCB in the context of the framework of MAB. Learning is defined by: learnMp : RS × RS × S × R → RS × RS by learnMp (pl, m, s, v) = (pl + χ{s} , m +

v − m(s) χ{s} pl(s) + 1

and with selection function choosing highest current confidence level. selectp (pl, m, s, v) = 

where

A = arg max m(s) + c s∈S

4.3

1 χA |A|

 log( x∈S pl(x)) pl(s)

Fictitious Play

Fictitious plays is one of the most studied heuristics in the game theory. Historically it is probably the oldest approach to learning in the game in extensive forms (for thorough discussion and historical comments see e.g. [4] Chap. 2) In this case - as problem originated from the game theory one does not even have to introduce Nature or Environment to the game. Initially problem was posed for two person game and first famous result was achieved for two-person zero sum game, but we will pose two versions that may be applied to many person game. We start from the game Γ = (P, {Sp }p∈P , {up }p∈P ). We will assume that each Sp is finite. We will say that p ∈ P applies fictitious play if following Mp is associated with  the player: Mp = Ri∈P \{p}

Si

and following function is applied as a learning learnMp (f, s) = f + χ{s}

and selection:



selectp = arg max x∈Sp

s∈

 i∈P \{p}

f (s)up (x, s) Si

290

A. Poplawski

Here algorithm learns joined distribution of actions of other players and selects its own action by maximizing the empirical expectation of the payoff. Other option would be fictitious play with assumption about the independence, where one can significantly reduce the state space Mp comparing to the aforementioned version. In this case we have  RSp Mp = i∈P \{p}

and following functions: learnMp ((fi )i∈P \{p} , s) = (fi + χ{si } )i∈P \{p} and



selectp ((fi )i∈P \{p} ) = arg max x∈Sp

s∈

 i∈P \{p}

(



fj (sj ))up (x, s)

j∈P \{p}

This algorithm learns marginal distribution of strategies played by each of the other players and maximizes expectation over the distribution being the product of these empirical marginal distributions for particular players. It is immediately visible that both version of fictitious play coincide in the case of two person games. Significant difference of these algorithms comparing to the considered in the context of MAB lies in the amount of knowledge the algorithm needs to have. In MAB and similar cases algorithm really needed to know its own strategies and payoff (and perhaps the fact of incidence in the case of MMAB). Here algorithm needs to know full strategy profile applied. It may be difficult in practice. Also - if we switch from theoretical considerations to realization - the physical memory space required to store representation of Mp in both cases grows heavily with the number of players and strategies. For the same reason it is expected that reaching sufficiently good approximation of the distribution may take long time. This in many situation makes these algorithms impractical (although sometimes, when game has particularly good structure some of these obstacles may disappear - this is indeed the case in some problems of joined allocation of spectrum in in OFDM based network - for modified versions of the fictitious play working in this context see e.g. [2]) Presented version was applicable to the deterministic games. There are serious issues related to application of deterministic FP in general games however. In [11] (then re-published in [3]) Shapley proves that for 3 by 3 bi-matrix games it is possible that fictitious play does not reach any convergence (not to mention the equilibrium). This behaviour for 3 by 3 is stable, in the sense that set of games for which it can be observed contains set which is open in the topology induced by topology in matrix space (or in more intuitive terms for certain games small perturbation of the payments for these games leads to the games where this behaviour can still be observed). This observation illustrates interesting twist which game theory brings to the theory of learning system comparing to pure ML: environment here is not only active but also reactive. Interesting paper that exploits this point and difficulties it brings in slightly different context is [7].

Modelling of Learning and Acting

4.4

291

From Classifier to Player

First, by classifier of space S to classes C we will understand the following function: classif y : M × S → C and supervised learning algorithm is function of the form: learn : M × S × C → M We assume about S and C that they are finite. Note, that we do not assume anything about quality of learning, just the formal shape. Simply, we may have completely dumb learning algorithms, or trivial or one with good quality, where notion of quality requires some precise definition. Having the classifier one can turn it into the playing algorithm. One of the possible method is following. Let’s take some game Γ = (P, {Sp }p∈P , {up }p∈P ), Let’s choose the classifier on thespace S T × Sq into the space {ru , rs } for some chosen q ∈ P . Here S = p∈P Sp and T is a natural number representing length of the history sample (time window) and ru and rs represents two classes - one meaning that application of the sq ∈ Sq after seeing the (st )Tt=1 output gives satisfactory result, other when not. We leave the exact meaning of the satisfactory result undefined, but it may be for example, reward bigger than some predefined value (satisf action). We can thus define Mq as M × S T where M comes from the definition of the classifier and define: learnMq : M × S T × S → M × S T by learnMq (m, (st )Tt=1 , s) = (learn(m, πq (s), asses(s)), (st )Tt=1 ) where asess(s) is some function deciding if s is satisfactory, e.g. function returning  rs , for uq (s) > satisf action assesq (s) = rs , for uq (s) ≤ satisf action 4.5

Neural Network

It is almost natural to think about classification algorithm (and any learning algorithm) the way as it was presented in previous sections. One can instead of using “memory” metaphor think about M or Mp in the context of the learning system, as about set of variables, data structures or state of the algorithm. Let’s illustrate this on some special example. Neural networks (NN) of various architectures are recently mostly studied classifiers. On the example of the simple NN consisting of many layers of perceptrons as the one that can easily fit into our framework. ex Let σ : R → R be defined as σ(x) = 1+e x (in fact any similar sigma-shaped function will do).

292

A. Poplawski

Let fn : Rn+1 × Rn → R be defined as: fn (w1 , . . . , wn+1 , x1 , . . . , xn ) = σ(wn+1 +

n 

wi xi )

i

Now, having list of natural numbers: l0 , l1 , ..., lm we can define m-layered NN function: gl0 ,l1 ,...,lm : RLm → Rlm defined as gl0 ,l1 ,...,lm = Fm where Fj : RLj → Rlj for 0 ≤ j ≤ m are defined inductively F1 (x) = (fl0 (π(l0 +1,2l0 +2) (x), π(1,l0 ) (x)), . . . , fl0 (π(l0 +(l1 −1)l0 +(l1 −1),l0 +l1 l0 +l1 ) (x), π(1,l0 ) (x))) and Fj (x) = (flj−1 (π(Lj−1 +1,Lj−1 +lj−1 +2) (x), Fj−1 ((π1,Lj−1 ) (x))), . . . , flj−1 (π(Lj−1 +(lj −1)lj−1 +(lj −1),Lj−1 +lj lj−1 +lj ) (x), Fj−1 (π(1,Lj−1 ) (x)))) where for a, b ∈ N, a ≤ b and N ≥ b the π(a,b) : RN → R( b − a + 1) is a projection on arguments a, . . . , b and Lj is a dimension of the arguments of Fj , i.e.: L1 = l0 + l1 (l0 + 1) and Lj = Lj−1 + lj (lj−1 + 1) Effectively we receive differentiable function. Now, let’s assume that one has some finite sets I and C, NN g = gl0 ,l1 ,...,lm and some representation function rep : I → Rl0 and mapping function map : Rlm → C. Let’s assume that {xc }c∈C be such that xc ∈ Rlm for each c ∈ C and map(xc ) = c. One obtains classifier given by classif y : RLm −l0 ×I → C given by classif y(w, i) = map(g(rep(i), w)). Function learn : RLm −l0 × I × C → RLm −l0 is given by learn(w, i, c) = w + dw where dw is given by the gradient dw = ∇w (g(rep(i), w) − xc )2 It is worth to note, that in practice it is often the case that we have the {xc }c∈C (e.g. map function corresponds to the strongest output of g) which makes this procedure indeed function (except null probability case when some outputs of NN are equal). In the heart of the NN applications lie effective procedures of computation of gradients for multi-layered perceptrons (backpropagation). However one can

Modelling of Learning and Acting

293

proceed in different manner turning the learn function into monadic operator. One example of such a procedure would be instead of computing the gradient, to take some random vector v by sampling uniform distribution on ball B(w, ), to calculate the (g(rep(i), w + v) − xc )2 and (g(rep(i), w − v) − xc )2 and to take  v dw = 0

if (g(rep(i), w + v) − xc )2 < (g(rep(i), w) − xc )2 otherwise

this procedure of course will provide smaller progress in learning (although less computations of each step). For NN it will not bring any benefits from the point of view of efficiency and is mentioned here to illustrate fact, that learning may be stochastic in its very nature. However good properties of an NN algorithm are effect of special structure of function realizing NN. In other kind of learning, where learning is based on merely continuous or even not continuous function stochastic mechanisms similar to the described procedure may be the only choice. This method somehow combines stochastic optimization problem with learning.

5

Further Work

Presented framework was successfully used to create computational models and simulations of phenomena in wireless networks (some results are in [10]). One can expect that it may simplify also theoretical analysis in this area, i.e. facilitate derivation of proofs of convergence or guarantees of effectiveness of algorithms. Especially interesting are the questions to what extent simple algorithms like those designed for multi-armed bandit problems work in interactive environments. In general bringing together field of ML and problem of learning in games opens new perspective on the ML algorithms. Question posed in game theory are often different than those one posed in ML - they introduce some global view, placing particular algorithm as merely an element of the larger system. Central problems of game theory are related to equilibria and reaching the equilibrium is global phenomenon. In context of the learning system these concepts need to be redefined. E.g. one can ask not only if some type of equilibrium (like Nash equilibrium) is reachable in the system where learning algorithm governs the behaviour of player, but also, turn the question in opposite direction - what are the stable states of such a system and what properties this algorithm driven equilibria have. Addressing these problems is interesting and ongoing work of practical importance. Another direction of research is related to composability and decomposability of the learning systems, i.e. to synthesis of complex one from simple and analysis of the complex by decomposing into simpler. Here it seems to be valuable to fit the learning system framework into the language of category theory which offers rich albeit abstract tools to deal with the similar matters. As giving any

294

A. Poplawski

definition of the morphism that turns the formal system into category is almost trivial, to find really meaningful definitions giving the insight into deep structural properties of the system is a challenge. In this line of thinking one can easily notice, at least on the formal level, similarity of some ideas presented here to recent efforts to express machine learning and artificial intelligence in category theory (see e.g. [12])2 . All of this makes this field interesting and potentially fruitful for both engineering and scientific applications.

References 1. Proceedings of the First International Symposium on Category Theory Applied to Computation and Control. Springer-Verlag, Berlin, Heidelberg (1974). https:// doi.org/10.1007/3-540-07142-3 2. Bistritz, I., Leshem, A.: Game theoretic dynamic channel allocation for frequencyselective interference channels. IEEE Trans. Inf. Theory (5) (2017) 3. Dresher, M., Shapley, L.S., Tucker, A.W.: Advances in Game Theory. (AM-52). Princeton University Press, Princeton, vol. 52 (2016) 4. Fudenberg, D.: The Theory of Learning in Games. MIT Press, Cambridge, MA (1998) 5. Giry, M.: A categorical approach to probability theory. In: Categorical aspects of topology and analysis (Ottawa, Ont. 1980), volume 915 of Lecture Notes in Mathematics, pp. 68–85. Springer, Berlin (1982). https://doi.org/10.1007/BFb0092872 6. Lasaulce, S., Tembine, H.: Game Theory and Learning for Wireless Networks: Fundamentals and Applications. Elsevier Science (2011) 7. Laurent, G.J., Matignon, L., Le Fort-Piat, N.: The world of independent learners is not Markovian. Int. J. Know.-Based Intell. Eng. Syst. 15(1), 55–64 (2011) 8. MacKenzie, A.B., DaSilva, L.A.: Game Theory for Wireless Engineers (Synthesis Lectures on Communications). Morgan & Claypool Publishers (2006) 9. Milnor, J.W.: Games Against Nature. RAND Corporation, Santa Monica, CA (1951) 10. Poplawski, A., Szott, S.: Optimizing spectrum use in wireless networks by learning agents. In: 2020 IFIP Networking Conference (Networking), pp. 569–573 (2020) 11. Shapley, L.S.: Some topics in two-person games. Memorandum RM-3672-PR, RAND, The RAND Corporation, Santa Monica, California, 6 (1963) 12. Shiebler, D., Gavranovic, B., Wilson, P.W.: Category theory in machine learning. CoRR, abs/2106.07032 (2021) R Mach. Learn. 13. Slivkins, A.: Introduction to multi-armed bandits. Found. Trends 12(1-2), 1–286 (2019) 14. Solis, F.J., Wets, R.J.B.: Minimization by random search techniques. Math. Oper. Res. 6(1), 19-30 (1981) 15. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. The MIT Press, Cambridge, second edition (2018) 16. Tembine, H.: Distributed Strategic Learning for Wireless Engineers. CRC Press Inc, USA (2012) 17. von Neumann, J., Morgenstern, O.: Theory of Games and Economic Behavior. Princeton University Press, Princeton (1947) 2

Although line of research on application of category theory in ML is relatively recent growing in popularity in last decade, it is not entirely new. Similar approach was applied in the 1970s s in context of control theory (see e.g. [1]) although the interest was short-lived and seemed to be abandoned for long time.

How to Reduce Emissions in Maritime Ports? An Overview of Cargo Handling Innovations and Port Services Sergey Tsiulin(B) and Kristian Hegner Reinau Department of the Built Environment, Aalborg University, Aalborg, Denmark [email protected]

Abstract. Maritime ports, being major transportation hubs in the worldwide trade, are going through significant challenges within globalization and increase of turnover. It forces port authorities and terminal operators to work on area expansion, equipment modernization and comply with the new standards regarding climate change and energy transition. The purpose of the study is to introduce various technologies and innovations within terminal and administrative side of a maritime port that contribute to reduction of emissions. For that, a literature overview is carried out, considering port operations across six areas: berth, quay, transport, yard, gate and port administrative. The focus is given to academic studies and reviews as well as commercial projects. The goal is to provide an overview of cargo handling equipment and related innovations that contribute to reduction of harmful environmental emissions in the port area. Generally, ports are undergoing significant changes: gradual shifts from vessel engine to landside power, use of electric engines, better planning over the yard space, distant coordination of transport equipment, transition of document paper flow to online format etc. Consequently, these elements of port strategic development contribute to a positive environmental effect which in perspective could transform ports into urban-friendly locations. The research could be of interest for port authorities, local municipalities and a broader audience who want to get acquainted with port facilities to a closer extent. Keywords: Maritime ports · Cargo handling equipment · Energy transition

1 Introduction Global economy is growing and so does the level of pollution from industry production and transportation [1]. The carbon dioxide emissions showed a continuous rise since 1960s and by 2019 were estimated roughly as 34.81 billion metric tons overall. Maritime industry shares 2.6% of global emission level. Most of it comes from vessels within offshore areas in less than 400 km from land area [2]. Maritime ports, although polluting considerably less than vessels, are becoming a challenge in times of globalization and urban expansion. Besides contributing to global economy as major supply chain hubs, ports bring negative environmental impact mainly due to cargo-related activities and services generated at the port site. Operations with © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 K. Arai (Ed.): IntelliSys 2022, LNNS 542, pp. 295–311, 2023. https://doi.org/10.1007/978-3-031-16072-1_22

296

S. Tsiulin and K. H. Reinau

cargo impact on air, water or soil quality, noise level and urban life in local surroundings [3]. The situation is slowly changing within the trend of energy efficiency in ports. Initially started with awareness of climate change, the transition to environmentally friendly means of port equipment is driven by a demand from government and local municipalities [4]. Energy efficiency implies the same approach for doing operations yet with a focus on less energy and fuel consumption, enhancing the use of alternative fuels or electrification to save local budgets for other improvements. Implementation of such ideas typically requires a development from technological side. Technologies can redefine the approach of how ports generate, store, consume and transmit energy [5, 6]. Certain elements of maritime terminals are capable of being modified via fleet and engine replacements, optimization of work with data, establishment of device connectivity, automation, etc. Energy transition is also achievable through enhancement of port operations not directly regarded to cargo handling: correspondence management, communication, security control, etc. For example, improvements to port gate systems can free up the surrounding area from truck congestion. Also, upgrading document correspondence can simplify communication with follow-up parties across delivery chain. The purpose of the study is to introduce various technologies and innovations that contribute to reduction of emissions within terminal and administrative side of a maritime port. For that, a literature review is carried out. The range of sources includes solutions provided by the industry companies and across academic literature. The focus is given to research studies and reviews as well as to commercial products and press releases. The research question is: What is the state of port handling equipment in maritime ports towards emission reduction and energy transition? The study splits port and considers technological innovations separately across six areas: berth, quay, transport, yard, gate and port administrative. The emphasis is on operations with cargo at a port and the means directly or indirectly affecting this process. Another focus is to what extent the selected innovations help to reduce level of harmful environmental emissions in the port area. The article is organized as follows. Section 2 reflects on the retrospective of international climate agreements and gradual shift towards energy efficiency within various domains, including maritime industry. Section 3 presents methodology of the overview and describes the built framework. Section 4 shows the results in accordance to the each area of cargo operation terminal. Section 5 concludes the study.

2 Background First noticeable actions towards regulation of emission reduction started in 1992 with United Nations (UN) emphasizing the importance of climate change. A framework was developed in order to stabilize air quality and overall GHG level. It then was followed by Intergovernmental panel discussion in 1995 and later the first international agreement in 1997 to reduce GHG emissions by 5%, known as Kyoto Protocol [7].

How to Reduce Emissions in Maritime Ports?

297

From this point, development of joint frameworks, methodology on monitoring progress in reaching climate objectives became an essential part of international panel discussions and agenda. Repercussions of global warming and the increase of CO2 emissions started to be evident to the global scientific community. Similarly to academia, commercial industry became aware of the green growth, incorporating “green strategy” into daily business and marketing campaign. In 2015, the successor of Kyoto Protocol – The Paris Agreement, emphasized the importance of immediate actions to stop global warming and global temperature increase from 1.5 °C to 2 [4, 7]. For maritime industry, identical actions began in 2005 with discussions on vesselsource pollution provided by International Maritime Organization [2]. Vessels, even though considered the least polluted type of transport, especially comparing to road, air and rail transport, are still emitting approximately 2–3% from the global level of CO2 . Such small share, nevertheless, can be comparable with the size of certain countries, for instance Germany [1]. From 2005, initiatives have emerged to raise awareness and set up energy targets in medium/long term for international shipping companies and related asset owners. The focus is given to zero-carbon fuels, alternative sources of energy, usage of waste and energy efficiency. On the scale of maritime ports, main changes are expected with terminals change towards energy transition i.e. transformation of energy-related CO2 emissions to limit climate change (International Renewable Energy Agency, 2021). Besides that, the goals are usually set to partially redesign port site land to orient it for urban-like environment. Since only 26% of port site emissions come from the terminal while the rest is on vessel and follow-up freight forwarding, the focus emphasized optimization of cargo handling equipment and related facilities. Thus, several studies [6, 8] reveal that most fuel-consuming equipment is quay cranes (70% of overall consumption), transportation between port areas (±30%). Electrically, reefer container and quay cranes (43 and 37%) consume the most of power while the rest (20%) is on administrative buildings [9]. When talking about energy-transition and optimization of port equipment, main expectations from literature mainly regard to electrification and automation of equipment, reduction of time spent per operation and reaching higher flexibility of the yard space [6].

3 Methodology The current study introduces various technologies and innovations that contribute to reduction of emissions within terminal and administrative side of a maritime port. The methodology of this overview is based on three main criteria: scope, focus, and sources. 3.1 Scope Maritime ports are mostly associated with cargo-related operations, while ports nowadays have rather transformed into a system of multifunctional clusters that can include warehousing, production line, offices, co-working and many more [10]. Within such growth of different facilities, conventional cargo operations remain the largest part, though are expected to shrink comparing to others in the long-term.

298

S. Tsiulin and K. H. Reinau

The scope of the research is narrowed down to the port area that relates to cargo operations, i.e. port terminal, including port authority building i.e. port administrative. The current study splits port terminal into five areas: berth, quay, transport, yard and gate, shown on Fig. 1 [6, 11]. These areas are considered operational parts of the terminals. Typically, cargo operations with vessels happen in the following way. Upon arrival, a vessel is assigned to a berth and then is moored for unloading. One or several quay cranes unload cargo following an unloading plan and time schedule. After that, containers are picked up by port trucks for relocation to the storage yard. At the yard, containers are then stacked by yard cranes depending on the waiting time and destination. If it must be transshipped to another seaport, then loading continues with the next vessel. Otherwise, the container is to be picked-up for landside delivery by the freight forwarder. Vessel loading, accordingly, is the similar procedure but in reverse sequence [11].

Fig. 1. General view of operational areas in a container terminal

Port administrative is the port area not directly involved in cargo operations but is important in terms of terminal area development and decision-making. As most European ports are owned by the local municipalities and managed as public self-governing ports, decisions made by port authority (not the same as terminal operator) affect the whole port’s infrastructure [12]. If port administrative sets goals of improving environmental state of the port site, it is likely to be followed by its tenants i.e., terminal operators. 3.2 Focus The focus is given to operations with cargo at a port and the means directly or indirectly affecting this process. Another important focus is to what extent the selected innovations help to reduce level of harmful environmental emissions in the port area. Generally, cargo handling equipment varies by type and includes cranes, rubbertyred and rail-mounted gantry cranes, straddle and shuttle carriers, various container handlers. At the same time, there are other existing solutions that indirectly affect port’s environmental indicators through software and automation of vessel mooring, cargo loading/unloading, optimization of paper workflow, usage of digital technologies such as predictive analytics, blockchain, etc. When searching across the literature for current means of cargo handling equipment within maritime ports, the focus is kept on emission reduction, energy efficiency and

How to Reduce Emissions in Maritime Ports?

299

information management. That is, how the available technological products result in achieving more environmentally friendly and greater real-time synchronization of port logistics. The goal is to understand the range of eco-friendly innovations scaled down to port site and systematize into a framework. 3.3 Sources The range of sources includes a review of cargo handling solutions provided either by the industry companies or found across academic literature. The focus is given to research studies and reviews as well as to commercial products and press releases. The review was carried out during December 2021. Grey literature included sources from public/private institutions namely commercial projects, products by cargo handling equipment producers and publicly available reports. For review and summary, the study uses critical review method [13] to synthesize selected material and critically evaluate and summarize it with the focus on subject’s conceptual innovation. Thus, how a found object introduces a new prospective to the market, manipulates existing inventions and turns it into the process of practical use, while contributing to energy efficiency. Thus, the paper provides an overview of cargo handling equipment as well as its technological updates – from emission reduction and energy efficiency viewpoints.

4 Overview of Means to Facilitate Energy Transition in Ports 4.1 Berth For the berth area, two the most common means of innovations exist as of today – automated mooring systems and cold ironing. Both approaches contribute to energy transition at ports. Mooring is the operation of fixing and securing a vessel to the berth. Traditionally, mooring ropes, lines, windlasses and anchors are used for that [14]. The traditional method is known for centuries and still used worldwide, which requires manpower on the land for ropes on/off fixture [6, 15]. The method, however, implies risks within difficult weather conditions, human safety procedures especially for untrained or unexperienced personnel. Considering globalization and the tendency of producing the vessels of larger size, human factor risks increase exponentially. Also, constantly having personnel available for mooring is becoming costly [16]. Alternative methods tend to automate mooring operation without active use of manpower, minimizing time and excluding the factor of weather conditions. Automated mooring uses either vacuum or magnetic systems. Such methods represent the use of several pads along the vessel, moving in three dimensions (Fig. 2). When vessel approaches the berth, pads at low-speed attach to the body and then suck it to the quay. The ship is then fixed to the quay, constantly monitoring the vacuum/magnets efficiency, notifying the crew in case of breakdowns or errors. Automated mooring is considered costly for installation yet promises high returns on manpower usage and environmental effect. Thus, it showed 76.7% pollution reduction

300

S. Tsiulin and K. H. Reinau

compared to conventional method. Typically, for traditional mooring, vessels are using power from the engines that entails fuel usage, while automated mooring saves on the consumption [14, 15]. As for cold ironing, it is a procedure when a vessel is docked to the berth and still uses one of the internal (auxiliary) engines for the use of: maintenance, lighting, refrigerating the goods, etc. The main engine is not used, however, the power from auxiliary engines significantly contributes to emissions happening at the port site [17, 18]. Cold ironing, in this case, is when power is supplied to the vessel from the port site for any hoteling activities. It is an external energy source that vessel can be connected to, hence minimizing use of its own engines during docking. This way carbon emissions are reduced by 57% as measured in certain ports [6]. As a drawback, the method involves high installation costs for the port along with energy requirements, voltage, connectivity, etc. – which could be the reason of cold ironing’s low adoption.

Fig. 2. Example of automated mooring system in Tallinn [19]

4.2 Transport for Cargo Operations and Allocation Maritime terminals in general have a wide spectrum of transportation means used for cargo loading/unloading and allocation within the site. It means the transport that can lift and move cargo items across port site areas. That could be ship-to-shore cranes, yard cranes, straddle and shuttle carriers, mobile equipment for transportation along the yard e.g. terminal tractors and reach stackers as well as regular vehicles (Fig. 3). Most of this cargo handling equipment operates on diesel fuel, being the highest source of pollution in maritime ports. The most promising method to change the state of transportation in ports is the use of either hybrid or electric power. The innovation is likely to show the following effects: noise reduction, less power spent in idle mode, greater control and monitoring over breakdowns/internal traffic/movements, less costs on full lifecycle of the facility and high environmental effect [20]. For example, Electrified rubber-tyred gantry crane delivers 86.6% less energy costs while minimizing GHG emissions by 67% in comparison with its diesel analogs [21].

How to Reduce Emissions in Maritime Ports?

301

Besides the above, electrification of transport could be turned into partial automation of the equipment. Thus, self-driven vehicles moving systematically based on predefined patterns. The method, as well as hybrid and electric mode vehicles, implies high investment costs while possible outcome varies significantly depending on the number of handling units and selected setup [11].

Fig. 3. Common means of port cargo handling equipment [34]

Quay Crane (Ship-to-shore cranes, STS) is used to load/unload containers from vessels based on the ship class it can accommodate. Cranes are equipped with spreaders that secure container lifting using locks. Within recent developments, upgrading spreaders to multilifting (dual or triple-trolley) as well as splitting unloading/loading cycles into multiple segments intended for higher productivity [31, 32]. Most of such cranes, however, operate with diesel power, which affects the number of emissions into the atmosphere. Nevertheless, various electrification technologies exist. The quay cranes that operate on electric power use alternative current drive (AC), which, by converting the AC supply to the direct current (DC) and inverting it back reduces peak demand and spreads the energy load, making it possible to minimize it by 10 times [6]. Transport. Mobile transport is usually implied by yard tractors, vehicles, reach stackers, empty container handlers or forklift trucks. To increase productivity of the vehicles, multi-loading capabilities are added, allowing vehicles to lift containers easily using steel platforms and yard racks [32]. The biggest contribution to mobile transport becoming environmentally friendly is Lithium-ion batteries usage [20, 31]. In addition to the advantage of zero GHG emissions, battery powered vehicles showed a 64% average

302

S. Tsiulin and K. H. Reinau

energy efficiency improvement in comparison to traditional vehicles [6]. Electric vehicles, however, require interval charging during which it should remain idle. Issues arise within choosing the optimal schedule for charging and hence routing [6]. Straddle Carriers. Straddle carrier (SC) is a non-automated vehicle, able to self-lift and self-carry containers underneath, without assistance from cranes or forklifts. Such carriers are usually used for transporting cargo between the quay and the yard. According to survey among 114 container terminals [33], SC is the second frequently used equipment at the terminals with 20% share. Approximately 80% of all straddle are diesel-electric powered [20]. For SCs, electric generators and hybridization between fuel and electric power is also possible. Some research account the benefits in fuel savings of 27.1% [6]. The latest hybrid developments allow to convert energy used for brakes to electric power which is saved in batteries for later use. This accounts for 10–15% higher energy savings per year for one vehicle [20]. Yard Cranes. There are several types of gantry cranes that are seen in a port. Most known is a rubber-tyred gantry crane (RTG) and rail-mounted gantry crane (RMG). RTGs are mostly operated with diesel-electric engines, showing a decent shift towards operating on grid power [11, 20, 33]. Modern types of RTG cranes are upgraded with monitoring and positioning systems as well as cameras to facilitate steering [21]. Fully electric version of an RTG is automatic stacking crane (ASC) – it is supplied by high voltage cable on a reel. Since it operates on grid power, ASCs are zero-emissions by default. ASCs can work autonomously, yet in conjunction with automated transfer vehicles unless operated manually [21]. Based on the survey by [33], only 6% out of 114 responded terminals consider the shift to ASCs. Electrified RTGs deliver 86.6% energy costs reduction while minimizing GHG emissions by 67% in comparison with its diesel analogs. Commonly, diesel engines happen to being fully removed also [21]. The biggest challenge for automated or semi-automated cranes is synchronization between the yard equipment. That is, to save the general layout of movements and keep straddle carriers to operate within initiated areas across different software. It includes the scheme of container stackings at the yard and fleet management. Issues arise when equipment is produced by different companies [11]. 4.3 Storage Yard Storage yard is the port terminal space for cargo allocation mostly in stacks on top of each other. For this area, optimization is considered mostly through reorganizing the storage yard layout and with better planning of container stacking and reshuffling. The optimization of yard layout is the planning process of reorganizing the scheme of how stacks are oriented within the yard. Typically, yard area is split into blocks, having rows and bays of container stacks. While trailers are stack alongside, containers could be stacked on top of each other.

How to Reduce Emissions in Maritime Ports?

303

For yard layout, the comparison and discussion goes over two schemes – Asian and European layouts. The difference lays in input and output points, the space where transportation units operate to store and take containers. In Asian scheme (Fig. 4), container rows are placed in parallel, leaving significant space for operating cranes to move in between. Gantry cranes have flexible access along the whole storage yard. European layout is compact and is dedicated for gantry cranes only, limiting space for input and output points. Operation of gantry cranes and transport vehicles are delimited, and movements do not intersect [11]. The efficiency of land use increases for the second layout. It provides less movements for transport vehicles as they are strictly attached to several input/output areas, while less maneuver is accessible by the gantry crane. Promising less operating time per container, better space utilization, such method requires higher level of communication between terminal equipment handlers. Both means supposed to work in conjunction and possibly be automated. In return it forecasts less pollution effect with, however, high implementation costs [22].

Fig. 4. Asian (a) and European (b) Container layouts [11]

As for container reshuffling, it is the process of unproductive moves to reach a particular container from the stack while moving aside other containers on top of it (Carlo et al., 2014). The more a container is waiting its next transportation, the more it is delayed – and hence higher the probability of it being at the bottom of the stack. Finally, when it is the moment to reach it, storage yard vehicles must relocate the whole stack on top of it. The issue results in extra work by personnel, fuel consumption, costs per operation. Moreover, the chance of time-delays increases accordingly, affecting the whole terminal’s planning and monitoring. The problem happens due to paper-based tracking system – the terminal simply stacks containers in accordance with when they arrive to the port. That is why the probability of a cargo unit always being above the others increases [30]. The issue significantly contributes to the term “dwell time” – an indicator of total time a container spends in the port.

304

S. Tsiulin and K. H. Reinau

Fig. 5. Stacking containers with conventional method and using predictive analytics (modified from [24])

Optimization of reshuffling is seen through better organizing the container stack – the way that allows to have less reshuffling and putting containers in accordance with their retrieval date. Some studies [23, 24] propose a method of predicting the time a container spends in a port with usage of machine learning. The system is meant to integrate into port’s Terminal Operating System (TOS), monitor container flow balance, and based on both historical and real-time data predict the approximate time of a unit at the storage. This way it can minimize the number of shuffle moves, save operating costs for the port as well as improve the environmental state at the site (Fig. 5). 4.4 Port Gates Port gate, besides port administrative, represents the last stage of port area, where cargo is either brought to the port or already picked up and is awaiting upon exit. If traffic at the gates is poorly optimized, it creates queuing, increases average time spent at the port, and results in time delays and extra financial burden for port and freight carriers. Moreover, high density of traffic contributes to level of CO2 emissions as truck drivers are left uncertain about when they can access the facility. Truck appointment and port gate monitoring systems provide more efficiency to such facility. Truck Appointment is a software solution to allocate timeslots for drivers so that queuing can be avoided. The peak hours are smoothened by assigning or giving a choice to drivers for a particular timeslot. Such systems typically vary by developer and providers; they, however, often receive complaints from freight companies on inaccessibility of the convenient schedule, thus also resulting in delays and sudden traffic congestion [25]. Gate Monitoring Systems, on the other hand, imply not only traffic upon the gates, but also the movement of trucks and their time spent at the site, as well as checking what goods they are taking. Additionally, such monitoring helps to keep track of taken and occupied space at the yard. Cargo security is a problem that occurs mostly regarding trailer theft or when a trailer is taken by mistake [12]. Some European ports experience a common situation when a driver comes to the port and claims to take a certain cargo unit at the gates, for example a trailer, but takes another one instead. These ports occasionally fail to identify taken trailers and recognize if the cross-check ‘driver/cargo’ is correct. Mostly it happens due to gate cameras failing to scan trailer’s license plate because of dirt. As an outcome, the situation creates confusion for the terminal as they lose track of free and occupied space at the storage yard.

How to Reduce Emissions in Maritime Ports?

305

Gate monitoring systems, accordingly, solve the issue by combining software and hardware solutions, for example, RFID chips and distributed database to track drivers’ activity and taken goods [12]. This way it helps to minimize truck movements within the yard and contribute to energy efficiency. 4.5 Port Administrative Port administrative is the port area not directly involved in cargo operations but is important in terms of terminal area development and decision-making. Typically owned by the local municipalities and managed as public self-governing ports, decisions made by port authority affect the whole port’s infrastructure and are likely to be followed by its tenants i.e., terminal operator companies. As for solutions for port administrative area, those are represented by reimagining the concept of document workflow through, for instance, establishing port community system or blockchain. Such solutions do not directly intersect with renewable energy but prospect energy and cost savings. First attempts to reorganize the workflow of port management by interconnecting port parties and their paper workflow is known as Port Community System (PCS) – an information hub that enables exchange of information and correspondence in real-time. Due to unwillingness of parties to share confidential information, many attempts of PCS have failed, even though some are successful [25]. From 2018, blockchain has been promoted in media as the technology that is able to establish digitalized workflow without a need of revealing the confidentiality of data [26, 27]. Blockchain is a distributed and decentralized technology with origins coming from computer science with key aspects of being transparent and auditable to network members. This leaves room for elimination of third-party mediators e.g., shipping brokers and agents, by directly connecting parties of origin with final customers and all the necessary parties throughout. Based on literature, one main scenario of blockchain usage have been found regarding maritime industry and ports in particular – “Document workflow management”. It changes the approach for port collaboration by combining multiple parties with equality of data ownership. The technology is considered as the tool to shift out-of-date document management and decision-making processes to fully electronic formats and thus create trackability [26, 28]. Consequently, having better control of upcoming cargo as well as knowing item’s destination, time spent in port, certificates, etc. the port can more efficiently relocate cargo and optimize internal handling equipment. Also it provides more flexibility in work with partners e.g. freight forwarders when managing traffic at the gates.

306

S. Tsiulin and K. H. Reinau

4.6 Summary

Fig. 6. Summary of port innovations

The summary of the literature findings is presented in Fig. 6, separated by port areas accordingly. Certain findings such as cargo handling equipment are typically represented by different equipment: cranes, rubber-tyred and rail-mounted gantry cranes, straddle and shuttle carriers, various container handlers etc. They are used in different parts of the port so for the figure it is distributed over several areas as well.

5 Discussion 5.1 The Concept of a New Port Over the last decade, maritime ports, usually considered as facilities lagging the technological advances, are actively catching up on new technologies. Nowadays, worldwide ports are starting to embed artificial intelligence; predictive and advanced analytics; usage of accumulated data; digitalized communication; real-time monitoring, diagnostics and scheduling. These innovations are scattered across ports, yet the trend is positive for the industry. Another perspective on port development is emphasized by merging and optimizing the most crucial port functions such as loading/unloading, transporting and stacking. The better control and coordination is achieved across those functions the higher level of productivity could be achieved. All the aspects of port development above, when combined, portraying the scenario of Port 4.0 – the concept of a port with partially or fully automated cargo operations, with real-time communication across port actors: terminal operator; port authority, customs, etc. Port 4.0 is being a part of the larger notion named Industry 4.0, i.e. the transition of fully integrated manufacturing [26]. Within such upgrades, certain innovations e.g. electric engines and battery usage match the two most relevant port long-term goals of being cost efficient and transitioning to greener sources of energy. One of the success criteria for ports within 2010–2015 was becoming a “green port”, which became not only a discussion trend, but also a business model and a marketing tool. Thus, environmental management became a critical issue, always being included into ports’ development strategies.

How to Reduce Emissions in Maritime Ports?

307

5.2 Extension of Port Orientation The technologies e.g. electrification, digitalization, connectivity and better monitoring are actively seeking to change the port from the ‘basic cargo operations facility’ to a more complex business environment with a different set of industries and processes. This highly contributes to the port cluster theory [10], i.e. the creation of a co-working industry clusters with similar, yet not identical types of business: cargo operations, industrial production, warehousing, cruise tourism and co-working centers (Fig. 7). The key transition towards greater port optimization as an ecosystem is an improvement of entrepreneurial port development capabilities – leasing port site land to commercial enterprises as it brings higher efficiency and utilization rate [12].

Fig. 7. Future evolvement of port ecosystem according to peter de langen (2020)

As forecasted, different clusters will coexist within the port site as expanded infrastructural assets, also likely increasing the role of port authorities for better cooperation and partnership. Technological innovations are seen as synchronization as it brings a synergy effect to coordinate clusters. In other words, having a similar degree of digitalization, data collection, and process automation will, in the long run, allow greater benefits for business allocated within the port land and improve overall efficiency. 5.3 Energy Transition Emissions, along with cyber security, are becoming the new major challenges for the industry. On the scale of maritime ports, transition to renewal and more efficient sources of energy, however, is never the first priority. Instead, it is cost efficiency. Investments are primarily considered for greater future profits, and if such investments overlap with green agenda, the solutions most likely become popular. For example, electric vehicles (EV) have been publicly promoted as the best solution for last mile deliveries to cities, as well as for transport operations in ports. Nevertheless, few businesses consider the alternative largely due to higher purchase costs. Simply, it is too costly to acquire such a vehicle outside of pilot projects. Green solutions, largely innovative in nature, like other mass technologies, within time are reducing in price compared to when they had appeared on the market. And therefore, becoming financially affordable. Consumer demand and proportionally decreasing prime costs motivate companies to spread it further, providing better product availability [26].

308

S. Tsiulin and K. H. Reinau

On the other hand, energy transition, slowly turning into a trend, possesses a variety of business opportunities and a possibility away from the traditional, volume-based business model such as development of clusters discussed above. According to [29], the complexity of energy transition for maritime ports is often overlooked because it implies a fully integrated management strategy. Implementation is abstracted into three dimensions: 1) shift towards alternative energy sources and hence construction of specific infrastructure and involvement into different markets; 2) establishment of new network, accounting investment plans and payback strategy; 3) support from regional authorities with policy regulation. Any transition towards alternative sources of energy is complex and highly uncertain. Moreover, port authorities can not directly influence terminals when it comes to infrastructural upgrades [12, 29]. 5.4 Limitations Defining common patterns across individual assets such as maritime ports could be crucial when the literature on the topic is limited. Nevertheless, academic literature has a few comprehensive reviews [4, 6, 11] focused on understanding the existi,ng yard operations and technologies of energy management systems in ports. These studies review the range of analytical models, numerical experiments, modelling and benchmarking of specific cargo handling working modes. Though showing high level of expertise in the domain, this could be complex for readers that rather want to get an overall understanding of the port site and equipment within. The current research being rather descriptive in nature, aims to introduce port area to a broader audience without covering technical specific and the variety of research on simulation methodologies. Since the study covers innovative aspects of maritime transportation in the port, and due to its novelty, the lack of implementation and long-term use, experience should be considered regarding certain cargo handling equipment. Hence, the goal is to formulate the understanding of new means available at a port towards emission reduction and improvement of operation efficiency.

6 Conclusion During the last years, maritime ports, usually considered as facilities lagging the technological advances, are actively catching up on new technologies. All examined areas of a port tend to develop greater utilization rate of space, human resources and operational costs. Generally, ports are undergoing significant changes: gradual shifts from vessel engine to landside power, use of electric engines, better planning over yard space, distant coordination of transport equipment, transition of document paper flow to online format etc. Consequently, these elements of port strategic development contribute to a positive environmental effect which in perspective could transform ports into urban-friendly locations. In this review, it is shown that there are several technologies and management approaches towards each of the terminal area, including the port administrative. This will catch attention of the next generation ports and those looking for optimization improvements.

How to Reduce Emissions in Maritime Ports?

309

Regarding limitations, the literature review on the topic is limited, mainly represented by studies focusing on analytical models, numerical experiments, modelling and benchmarking the specific cargo handling working modes. Though showing high level of expertise in the domain, this could be complex for readers who rather want to get an overall understanding of the port site and equipment within. The current research being descriptive in nature yet aims to introduce port area to a broader audience without covering technical specifics and the variety of research on simulation methodologies. Future research directions reveal a great potential for the validation, simulation, and workability of each of the innovations described above, as well as their likelihood of working in conjunction with enterprises based in the port area. Acknowledgments. This research was supported as part of BLING – Blockchain In Government, an Interreg project supported by the North Sea Program of the European Regional Development Fund of the European Union.

References 1. Global Carbon Project: Historical Carbon Dioxide Emissions from Global Fossil Fuel Combustion and Industrial Processes from 1758 to 2020* (in billion metric tons). Statista. Statista Inc. https://www-statista-com.zorac.aub.aau.dk/statistics/264699/worldwide-co2-emissions/ (2020) 2. IMO (International Maritime Organization): Air Pollution, Energy Efficiency and Greenhouse Gas Emissions. Available online: http://www.imo.org/en/OurWork/Environment/Pol lutionPrevention/AirPollution/Pages/Default.aspx (2019) 3. Merk, O.: Shipping Emissions in Ports: Discussion Paper No. 2014–20, pp. 10. OEDC International Transport Forum, Olaf Merk (2014). Dec 2014 4. Azarkamand, S., Wooldridge, C., Darbra, R.M.: Review of initiatives and methodologies to reduce co2 emissions and climate change effects in ports. Int. J. Environ. Res. Public Health 17(11), 3858 (2020). https://doi.org/10.3390/ijerph17113858 5. Parise, G., Parise, L., Malerba, A., Pepe, F.M., Honorati, A., Chavdarian, P.B.: Comprehensive peak-shaving solutions for port cranes. IEEE Trans. Ind. Appl. 53(3), 1799–1806 (2017). https://doi.org/10.1109/TIA.2016.2645514 6. Iris, Ç., Lam, J.: A review of energy efficiency in ports: operational strategies, technologies and energy management systems. Renew. Sustain. Energy Rev. 112, 170–182 (2019) 7. United Nations: Kyoto Protocol, Review of European Community and International Environmental Law, vol. 7, pp. 214–217. Available online: https://unfccc.int/resource/docs/convkp/ kpeng.pdf (1998) 8. Michele, A., Gordon, W.: Energy efficiency in maritime logistics chains. Res. Transp. Bus. Manag. 2015(17), 1–7 (2015) 9. GREENCRANES: Green Technologies and Eco-Efficient Alternatives for Cranes and Operations at Port Container Terminals. GREENCRANES Project. Technical report October. (2012) 10. de Langen, P.W.: Towards a Better Port Industry: Port Development. Routledge, Management and Policy (2020) 11. Carlo, H., Vis, I., Roodbergen, K.: Storage yard operations in container terminals: literature overview, trends, and research directions. Eur. J. Oper. Res. 235(2), 412–430 (2014)

310

S. Tsiulin and K. H. Reinau

12. Tsiulin, S., Reinau, K.H.: The role of port authority in new blockchain scenarios for maritime port management: the case of Denmark. In: transportation research procedia. Proceedings of 23rd EURO Working Group on Transportation Meeting, EWGT 2020, Paphos, Cyprus (2020) 13. Grant, M., Booth, A.: A typology of reviews: an analysis of 14 review types and associated methologies. Health Info. Libr. J. 26, 91–108 (2009) 14. Cullinane, K., Cullinane, S.: Atmospheric emissions from shipping: the need for regulation and approaches to compliance. Transp. Rev. 33(4), 377–401 (2013). https://doi.org/10.1080/ 01441647.2013.806604 15. Piris, A.O., Navamuel, E.D.-R., Pérez-Labajos, C., Chaveli, J.-O.: Reduction of CO2 emissions with automatic mooring systems. The case of the port of Santander. Atmos. Pollut. Res. 9(1), 76–83 (2018) 16. Kuzu, A.C., Arslan, Ö.: Analytic comparison of different mooring systems. In: Global Perspectives in MET: Towards Sustainable, Green and Integrated Maritime Transport, 265–274 (2017) 17. Arduino, G., Carrillo, D., Ferrari, C.: Key factors and barriers to the adoption of cold ironing in Europe. Società Italiana di Economia dei Trasporti e della Logistica-XIII Riunione Scientifica–Messina, 16–17 (2011) 18. Innes, A., Monios, J.: Identifying the unique challenges of installing cold ironing at small and medium ports – The case of aberdeen. Transp. Res. Part D: Transp. Environ. 62, 298–313 (2018) 19. Baltic Transport Journal BTJ:: Automated mooring in Tallinn, Baltictransportjournal.com. Available at: https://baltictransportjournal.com/index.php?id=776. Accessed 1 Feb 2022 20. Hirvonen, A., Salonen, H., Söderberg, P.: Reducing Air Emissions in a Container Terminal. Overview of Means related to Cargo Handling Equipment Helsinki: Kalmar (2017) 21. Cederqvist H., Holmgren, C.: Investment vs. operating costs: a comparison of automatic stacking cranes and RTGs. Available at: https://www.porttechnology.org/technical-papers/ investment_vs-_operating_costs_a_comparison_of_automatic_stacking_cranes_an/ (2011) 22. Taner, M.E., Kulak, O., Koyuncuoˇglu, M.U.: Layout analysis affecting strategic decisions in artificial container terminals. Comput. Ind. Eng. 75(1), 1–12 (2014) 23. Gaete, M., González-Araya, M., Gonzalez R., César. R.: A dwell time-based container positioning decision support system at a port terminal, 128–139 (2017) 24. Blockshipping: Blockshipping - Transforming the Global Container Shipping Industry. [online] Available at: https://www.blockshipping.io/. Accessed 13 Dec 2021 25. Azab, A., Karam, A., Eltawil, A.: A simulation-based optimization approach for external trucks appointment scheduling in container terminals. Int. J. Model. Simul. 40(5), 321–338 (2020) 26. Tsiulin, S., Reinau, K.H., Hilmola, O.P., Goryaev, N.K., Mostafa, A.K.A.: Blockchain in maritime port management: defining key conceptual framework. In: Special Issue of “Blockchain and the Multinational Enterprise”, Review of International Business and Strategy 30(2), 201–224 (2020). Emerald 27. Tsiulin, S., Reinau, K.H., Goryaev, N.: Conceptual comparison of Port Community System and blockchain scenario for maritime document handling. 2020 Global Smart Industry Conference (GloSIC). IEEE, pp. 66–71 6 (2020b) 28. Francisconi, M.: An explorative study on blockchain technology in application to port logistics. Master Thesis, Delft University of Technology (2017) 29. Fattouh, B., Poudineh, R., West, R.: The rise of renewables and energy transition: what adaptation strategy exists for oil companies and oil-exporting countries? Energy Transitions 3(1–2), 45–58 (2019). https://doi.org/10.1007/s41825-019-00013-x 30. Button, S., Button, S.: No time to dwell - MarineTraffic Blog. MarineTraffic Blog. Available at: https://www.marinetraffic.com/blog/no-time-to-dwell/. (2019)

How to Reduce Emissions in Maritime Ports?

311

31. Rodrigue, J.-P., Notteboom, T.: Port economics, management and policy. Available at: https://porteconomicsmanagement.org/pemp/contents/part3/container-terminal-des ign-equipment/. (2020) 32. Kim, K. H., Phan, M.-H.T., Woo, Y.J.: “New Conceptual Handling Systems in Container Terminals.” Industrial Engineering and Management Systems. Korean Institute of Industrial Engineers (2012). https://doi.org/10.7232/iems.2012.11.4.299 33. Wiese, J., Kliewer, N., Suhl, L.: A survey of container terminal characteristics and equipment types. Working Paper. University of Paderborn. 2012 (2009) 34. Liebherr: Maritime Cranes, 11 April 2022. Available at: https://www.liebherr.com/en/int/pro ducts/maritime-cranes/maritime-cranes.htmla. (2022)

Bridging the Domain Gap for Stance Detection for the Zulu Language Gcinizwe Dlamini1(B) , Imad Eddine Ibrahim Bekkouch2 , Adil Khan1 , and Leon Derczynski3 1

2

Innopolis University, Tatarstan, Russian Federation [email protected] Sorbonne Center for Artificial Intelligence, Sorbonne University, Paris, France 3 IT University of Copenhagen, Copenhagen, Denmark

Abstract. Misinformation has become a major concern in recent last years given its spread across our information sources. In the past years, many NLP tasks have been introduced in this area, with some systems reaching good results on English language datasets. Existing AI based approaches for fighting misinformation in literature suggest automatic stance detection as an integral first step to success. Our paper aims at utilizing this progress made for English to transfers that knowledge into other languages, which is a non-trivial task due to the domain gap between English and the target languages. We propose a black-box nonintrusive method that utilizes techniques from Domain Adaptation to reduce the domain gap, without requiring any human expertise in the target language, by leveraging low-quality data in both a supervised and unsupervised manner. This allows us to rapidly achieve similar results for stance detection for the Zulu language, the target language in this work, as are found for English. We also provide a stance detection dataset in the Zulu language. Our experimental results show that by leveraging English datasets and machine translation we can increase performances on both English data along with other languages. Keywords: Stance detection · Domain adaptation languages · Misinformation · Disinformation

1

· Less resourced

Introduction

Social media platforms have become a major source of information and news. At the same time, the amount of misinformation on them has also become a concern. Automatic misinformation detection is a challenging task. One was of categorising rumours is into those that can be grounded and verified against a knowledge base, and those that cannot (e.g. due to a lack of knowledge base coverage). This task has required the work of professional fact-checkers. This work has recently been complemented by fact-checking systems. However, fact checking only works when there is evidence to refute or confirm a claim. Automated c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023  K. Arai (Ed.): IntelliSys 2022, LNNS 542, pp. 312–325, 2023. https://doi.org/10.1007/978-3-031-16072-1_23

Bridging Domains for Zulu Stance Detection

313

fact checking relies on databases for this evidence. These databases often lack the information needed, due to e.g. lag or a lack of notability. In this case, another information source is needed. Prior work has hypothesised [30] and shown that the stance, that is the attitude, that people take towards claims can act as an effective proxy for veracity predication [10,22]. A primitive solution to analyzing texts written in uncommon languages, i.e. those which are not supported by the state-of-the-art NLP models, is to translate the text to a language an appropriate NLP model has been trained on. However, this solution is not always effective and does not give good performance. Some of the main difficulties faced by state-of-the-art NLP models (especially in the task of stance detection using translated LREL) come from: discrepancies induced by the LREL translation process; LREL data scarcity [1]; and the noise that exists in the social media data itself [8]. An alternative approach to solving this problem can be ensemble models for stance detection [19] trained on different languages and combining their respective results, but we still face the same problem which is the fact that each model is built from a well-resourced, well-represented language, so if we introduce another language to it even by translation, its individual performance is not likely to be good, hence the overall ensemble performance will probably be the same as the previously suggested solution [27]. Thus, in light of the problem of wide spread of false rumours, mis-information, and challenges faced by state-of the-art NLP models when it comes to LREL, we propose an approach to solving these challenges in this paper. We propose using Domain Adaptation (DA) [17] in the task of Stance Detection on LREL Twitter text data. In our approach we use machine translation to translate the model’s training data, taking into account the amount of noise this introduces. Our intended stance detection model will be able to learn from variety of low quality data, whereby the low quality aspect comes from the automated translation. This comes up as a strength in our approach for two reasons: 1) For most social media text classification tasks, precise and canonical forms of expression are not the most important feature for making the decisions. 2) The target data (The LREL) has an element of being noisy since it’s from social media and is a direct output of a translator. Our method uses data from multiple languages to build a stance detection model for a less-privileged language. In addition to the proposed Domain Adaptation technique for stance detection, we contribute a new dataset for Stance Detection in a language previously without resources. The structure of this paper is as follows: Sect. 2 outlines the related works. In Sect. 3, the model architecture, methodology and datasets used in detail, while Sect. 4 presents the obtained results followed by discussion. In conclusion Sect. 5 we discuss possible future directions and insights from the paper results.

314

2

G. Dlamini et al.

Related Work

Researchers have approached this problem of fake news detection, misinformation and stance detection from many point of views. Our approach is about combining concepts from different areas of research, mainly: Domain Adaptation [3], Domain Generalization [21], Domain Randomization [32] and some old and new techniques where the concept of using data from one language to improve performances on another was used or from one task to improve performances on another [23]. 2.1

Domain Generalization, Adaptation, Randomization

These three fields of research are very tightly related, the basic idea behind them is to use multiple simple to collect data sources to improve performance on a harder to collect (or label) dataset. Domain Adaptation (DA) [3] is the most widely researched topic and specifically Unsupervised Domain Adaptation (UDA) where we use both data and labels from one source domain and use only the data of a target domain without its labels and try to build a model that gives good performances on both domains. The most common method for performing UDA is by utilizing Generative Adversarial Networks [12] in multiple ways [16], but there are other techniques that use simpler and faster techniques based only on adversarial losses. Domain Generalization on the other hand uses multiple source domains and aims at generalizing to unseen target domains. The current state if the art in DG is leaning towards improving the ability to learn and merge knowledge both from supervised and unsupervised learning. The supervised part is by classifying samples into their corresponding labels whereas the unsupervised part leverages only data in many ways, one way is by reorganising the images into a jigsaw puzzle and training a classifier to solve this [6]. Domain Randomization [31] is the extreme case where we only have access to one domain on training and we want to improve the results on unseen domains. Most of these techniques agree on the fact that having a lot of messy and nonperfect data that comes from multiple sources improves accuracy and allows the model to perform better on real data [32]. 2.2

Stance Detection

One of the core approaches to automatic fake news assessment is stance detection which is the extraction of a subject’s reaction to a claim made by a primary actor [2]. The main approach taken by NLP researchers for the past years has been shifted towards less hand engineered techniques and into using Deep Learning by taking two text sequences, encoding them in some form (mainly by adding a mask that determines which word belongs to which sentence), and then estimating the type of relationship that joins them. One way of formulating stance detection was to divide it into two subtasks [25], Task A: fully supervised, classify labeled texts into “IN-FAVOR”,

Bridging Domains for Zulu Stance Detection

315

“AGAINST”, “NONE”. The text belongs to five topics: “Atheism”, “Climate Change is a Real Concern”, “Feminist Movement”, “Hillary Clinton”, and “Legalization of Abortion”. Task B on the other hand aims at solving weakly supervised stance detection by training on data from Task A and unlabeled data from another topic and measuring the performance on it. The goal of this paper is actually a next step on top of these two Tasks which is sharing the knowledge from labeled data into another dataset that comes from a totally different language not just a different topic. Zhou et al. [33] introduced an approach based on convolution neural networks(CNN) and Attention to detect stances in tweeter. Their proposed approach addresses the challenges faced by CNN’s which is generating and capturing high quality word embedding having global textual information. Less Resourced and Endangered Languages are not addressed in their proposed approach for stance detection. Other major efforts in stance detection include the RumourEval challenge series [13]. Work on LRELs includes datasets in Czech [14] and Turkish [20]. 2.3

Explicit and Implicit Transfer Learning for NLP

Adapting Computational Language Processing (CLP) [11] techniques were used in the early 2000s as a first approach to transfer knowledge from one language to the other which we can see in the ‘MAJO system’ which uses the similarities between Japanese and Uighur to improve substantially the performance [23]. This idea although non-generic and a completely human-centric approach it is one of the first approaches of improving results on a LREL by using a more popular and widely used language. Another more recent approach is Bidirectional Encoder Representations from Transformers (BERT) [9]. BERT is built on the idea of building a general purpose model for NLP that is trained on large amounts of text data and can be easily fine tuned to downstream NLP tasks like stance detection for example. BERT also has a multilingual mode that has learned on data from 104 languages. Another very popular approach to Inductive transfer learning in NLP is Universal Language Model Fine-tuning (ULMFiT) [15], which aims at reducing the amount of labeled data needed for building any NLP tasks and specially for classification. It works on leveraging the existence of big amounts of unsupervised text that can be used to train a general purpose model which can later be finetuned in two steps, the first one is unsupervised where the model learns from raw text that is related to the problem and then the second step is supervised where we train the final layers of the model using a small amount of labeled data, for example we train a general model on Wikipedia texts and then fine-tune it for emails by using a large amount of unlabeled emails in the first step and just a hand-full of labeled emails in the second. From all the existing state-of-the-art approaches in stance detection, breakthroughs of transfer learning and domain generalization great performance in lot of different domains, we have found that there still exists a need to cover LRELs in the domain of NLP, specifically in stance detection. With our approach we are hoping to contribute to the existing methods and help the research community

316

G. Dlamini et al.

develop methods for fighting mis-information, fake news detection and understanding social media content better as compared to existing state-of-the-art methods.

3

Architecture, Methodology and Dataset

Our method consists of a two step process. Firstly, the creation of the training Dataset where training dataset is retrieved and joined from multiple languages in order to train a more resilient stance detection model. Secondly, building the training pipeline whereby the chronology for implementing stance detection for less-privileged languages is outlined. We use Zulu as a case study for a less-privileged language. The Zulu language [7] is a good example of a Less Resourced and Endangered Language (LREL) [4] and uncommon language found in social media. Zulu is spoken by an estimated 10 million speakers, mainly located in the province of KwaZulu-Natal of South Africa and Swaziland. Zulu is recognised as a one of the 11 official South African languages and it is understood by over 50% of its population. As interesting as it is, there are not enough Zulu language resources on the internet to help build major NLP models; and even though Google has done an amazing job with providing models that can be fine-tuned for specific tasks on more than 104 languages using BERT-multilingual [29], Zulu and many other languages were not part of their research. 3.1

Step 1: Build the Training Dataset

Our aim is to have a dataset large enough to expose an underlying structure of a given topic in LREL for any NLP model to capture and generalize. For this reason we gather our dataset from multiple sources which are chosen randomly from the languages supported by Google translate. The total number of sources ranges from 2 to 106. More formally, we will have a training dataset coming from N languages, where Ni is the number of labeled samples in the ith source dataset, such that s i )}N Xis = {(xsi,j , yi,j j=1 j where xjis denotes the jth text sample from the ith source dataset and yis is its label. In the case where there is just a source dataset Xis that gets translated into s . other languages we denote the kth translated version as Xik We denote also M the number of labeled samples in the target dataset and

X t = {(xtj , yjt )}M j=1 where xtj denotes the jth text sample from the target dataset and yjt is its label. In adoption of the ensemble learning principle of using different training data sets [18], we select data sources with different marginal distributions (i.e.

Bridging Domains for Zulu Stance Detection

317

they have some dissimilarities between them in terms of syntax rules, language family tree, e.t.c) and from the dataset of stance detection. These differences in marginal distributions are at the core of the strength of our stance detection model since the input of the model will be a translator’s result we will force the model to actually learn from the translator results. This idea can also be used in the case where your input data isn’t perfect; for example if your model is being deployed on a task where the target audience isn’t a language expert they will most likely make many mistakes in their writing, your model should be familiar with these mistakes and more resilient to them. Such mistakes can be easily generated from translators where we can translate from language A to language B and then vice versa, giving us more variety in the data which can be considered as a data augmentation technique. Modeling this domain gap in the training data can be done in one of two ways which are shown on Fig. 1. The domain gap in Text datasets can be anything from the length of sentences, to sentiment changes and even difference in the grammatical correctness of the sentences. In our case we are mostly focused on modeling the errors that a translator might make, and putting them in our training data so that we can correctly classify them on inference time. Domain Generalization: For DG we have available for us multiple datasets similar in structure and purpose but come from different languages, we translate them all to English and build our classifier. We denote nG as the number of datasets used to build the model. Although there are no constraints on which family the language comes from, but empirical results show that using datasets from different families help to generalize better to unseen datasets and using datasets from the same family as the target language helps for generalizing to it better. Domain Randomization: In this case we only have one dataset from one language, so we apply some randomization to increase the size of the dataset and make it more inclusive to mistakes; We do that by translating the dataset into multiple intermediate languages and then translate them into English. The results empirically appear to contain many mistakes but the huge increase in size makes up for it. We denote nR as the number of intermediate languages used to build the model. Same remarks as for DG, using intermediate languages that are quite different allows the model to learn from an even richer dataset and overall generalizes better to unseen domains, whereas for a specific target domain, using languages that are from it’s same family allows the model to perform better on this target dataset. 3.2

Step 2: Build the Training Pipeline

Now that the data set is ready, the next step will be to build a pipeline that inputs the dataset, cleans it, tokenizes it, convert it into word embeddings and train the model on it. This step is the reason why we convert all of our datasets into English rather than the target language it self or some other language, given

318

G. Dlamini et al. Target Language

English

Source Language 1 Source Language 2

......

English Training Dataset

Model Training

Evaluation

Source Language i Source Language NG

(a) Domain Generalization: Multiple Source Languages All Converted into English to use Pre-trained Models. Target Language

Source Language

English

intermediate language 1

Source Language

......

English Training Dataset

Model Training

Evaluation

intermediate language i intermediate language NR

(b) Domain Randomization: One Source Language to which we Add Multiple Translated Versions as Data Augmentation.

Fig. 1. Dataset construction process for domain generalization and domain randomization.

the fact that the tokenization process requires hand crafted, man-made rules and knowledge about the constructs of the language which is something that can’t be done automatically for now on all languages and specially the LREL ones. For model choice, our method is also model-generic meaning it can be used on any model given its non-intrusive property. But for the purpose of this research we will use the same architecture as ULMFiT, where we use a three-layer LSTM model that was pre-trained on wikitext-103 dataset [24] (which is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia). The later model is then fine-tuned on the training dataset in an unsupervised fashion and finally we add a 2 layer fully connected neural

Bridging Domains for Zulu Stance Detection

319

network as the classifier which is carefully fine-tuned on the training dataset using gradual unfreezing of layers [28]. The Final part, is to introduce a new loss which operates on the final layers of the model. They act as enforcement for the model to dismiss the noise in the data and only extract features useful for classification. The new loss is inspired by Linear Discriminant Analysis (LDA) to capture the separability (based on the assumption that samples which are closer to each other in the latent space are classified similarly), Ficher defined an optimization function to maximize the between-class variability and minimize the within-class variability regardless of the source of the sample. The separability loss is defined as follows:    i∈Y zij ∈Zi d(zij , μi ) E  × λBF Lsep (W ) = i∈Y d(μi , μ) λBF =

mini |Yit | maxi |Yit |

where Zi is the set of latent variables that belongs to class i and it can be expressed as: s M t Zi = ∪N s=1 Zi ∪t=1 Zi which is the union of the sets of latent representation of all language domains that have the same label i. μi is the mean of the latent representations that has label i, so it can be expressed as μi = mean(Zi ), while μ is the mean of all the latent representations μ = mean(Z), the calculations of such loss can be hard so for simplicity we use this loss as batch based and in order to mitigate the drain of information we used a balancing factor λBF which proved to be useful in other applications of the separability loss [3]. d(., .) is the cosine dissimilarity which is used as a measure of distance between the samples. This loss allows us to increase the separability of the latent space according to the labels, regardless of its domain (language), which in return helps the classifier to generalize and provide more resilient and higher performances. 3.3

Dataset

Source Domain Dataset. The Source domain dataset which is the training set of our model, used to predict tweets stances on unseen domain is a dataset for English Tweets. We use Semantic Evaluation competition train dataset. The data has 4163 samples with three features being ID, Target, tweet and Stance as the response variable. Each tweet in the dataset has a target and can classified into stance class. The are five targets and three stances classes. The summary of the SemEval-2016 data set is presented in Fig. 2 Target Domain Dataset. The target domain dataset language is Zulu. The Zulu language is a tonal language which also features click consonants. In comparison to English, in these clique consonants’ noises are conveyed in literature

320

G. Dlamini et al.

with the word ‘tut!’. The Standard Zulu that is taught in schools is also referred to as ‘deep Zulu’, as it uses many older Zulu words and phrases, and is altogether a purer form of the language than many of the dialects that are used in common speech. The Zulu language relies heavily on tone to convey meaning; but when the language itself is written down, often no tones are conveyed in the writing. This means that the speaker must have a good understanding of spoken Zulu before being able to read the Zulu language fluently, which is unusual among languages. All these interesting aspects and facts about our chosen target language makes it a perfect choice for Domain Generalization tasks since there exist a wide domain gap between itself and the English language (which is our source domain). For our approach, we randomly sampled 1343 tweets from Semantic Evaluation competition test dataset (SemEval-2016) [25]. We translated the tweets to Zulu language with the help of the Google Translate API, together with a native Zulu language speaker, to try to minimize the grammatical errors from the Google translator. Examples of tweets with Zulu translation: Atheism : AGAINST English : The humble trust in God: ‘Whoever leans on, trusts in, and is confident in the Lord happy, blessed, and fortunate is he’ Zulu : Abathobekile bathembela kuNkulunkulu: ‘Noma ngubani oncika, athembela kuye, futhi othembela eNkosini uyajabula, ubusisiwe, futhi unenhlanhla’ Legalization of Abortion : FAVOR English : Would you rather have women taking dangerous concoctions to induce abortions or know they are getting a safe & legal one? Zulu : Ngabe ungathanda ukuthi abesifazane bathathe imiqondo eyingozi ukukhipha izisu noma wazi ukuthi bathola ephephile futhi esemthethweni ? Feminist Movement : NONE English : Some men do not deserve to be called gentlemen Zulu : Amanye amadoda awakufanele ukubizwa ngokuthi ngamanenekazi The Stances and Targets distribution in our final target domain data is in Fig. 2 3.4

Baselines

For the purpose of evaluating our method, we will compare it’s results against multiple baselines that represent an estimation of a lower-bound and an upperbound of the performance. Lower Bound Baselines. Our method should perform better than a model that was directly trained on English dataset and tested on a Zulu dataset (translated to English) and it should outperform a model trained only on the Zulu dataset and tested on it, given that we don’t have enough data to fully train a model. That’s why we will be using such models as our lower bound. These model are denoted as DLB (Direct Lower Bound).

Bridging Domains for Zulu Stance Detection

321

Source Domain Data 0.5

Against Not Related Favor

0.4

Portion

0.3

0.2

0.1

0

Atheism

Climate Change Feminist Movement Hillary Clinton

Legal. Abortion

Tweet target

(a) Source Domain: Overall Balanced w/ less Data in Favor of Legal Abortion. Target Domain Data 0.8

Against Not Related Favor

Portion

0.6

0.4

0.2

0

Atheism

Climate Change Feminist Movement Hillary Clinton

Legal. Abortion

Tweet target

(b) Target Domain: Not as Well Balanced as the Source Dataset, with a Big Lack of Data for ‘against’ in the ‘Climate Change’ Topic and for Both ‘not-related’ and ‘infavor’ of the ‘Atheism’ Topic.

Fig. 2. Dataset comparison between the source and the target dataset.

Upper Bound Baselines. At the same time, Our model should not be able to exceed the performance of a model trained and tested on the same distribution meaning trained and tested on English without any translation from other languages. This model is denoted as DUB (Direct Upper Bound).

322

G. Dlamini et al.

Table 1. Evaluation of domain randomization, we denote a randomized dataset with Index i as the degree of randomization. We use the/symbol to denote that this result is not reported given that it is not accessible by the evaluation script of the original dataset. We also use - symbol to denote that this experiment isn’t feasible. DUB and BLB are used to denote the baselines described in the baselines description Sect. 3.4 Tested On

English

Trained On

F1-score Accuracy FAVOR-F1-score AGAINST-F1-score F1-score Accuracy FAVOR-F1-score AGAINST-F1-score

Zulu-Only (DLB) English-Only (DUB/DLB) Randomized-English-1 Randomized-English-2 Randomized-English-3 Randomized-English-4 Randomized-English-5 Randomized-English-Zulu

– 0.5792 0.5686 0.5993 0.6070 0.6125 0.6296 0.5690

4

Zulu – / / / / / / /

– 0.4476 0.4591 0.4626 0.4658 0.4656 0.4710 0.4586

– 0.6908 0.6951 0.6999 0.7083 0.7243 0.7201 0.7083

0.3942 0.4906 0.5061 0.5112 0.5087 0.5186 0.5293 0.5493

0.4861 0.5258 0.5386 0.5443 0.5512 0.5635 0.548 0.5596

0.1929 0.3749 0.3932 0.3923 0.3852 0.4038 0.4014 0.4228

0.4259 0.5742 0.5987 0.6147 0.6090 0.6204 0.6286 0.6423

Evaluation and Results

We compare our model with several baselines on for Stance Detection and transfer learning. Using the four metrics used for the stance detection challenge. 4.1

Domain Randomization

We evaluate our stance detection model using Domain Randomization where we use the English dataset for Tweet Stance Detection, and test it on both its testing data and our Zulu dataset for Tweets Stance Detection. Here we notice that the model generalizes to itself even better the more randomization we add and the same for the Zulu dataset, as the results in Table 1 show. Results are reported with macro-F1-score and accuracy along with Favor-F1-score and Against-F1-score because the main challenge reported these metrics. The different experiments were conducted all on the same dataset by adding different translated versions on top of it. The results reported are the mean five random runs of the models, where we noticed that on some experiments the results on the English dataset drops because the quality of the data changes drastically when the target language translation quality is too poor. We also noticed that by removing @ and # and some other symbols, the accuracy increases and becomes more stable over several re-runs. It is also worth mentioning that the best results on the Zulu dataset: F1-score = 0.5634 were achieved on Domain Randomization towards: English(original dataset), Zulu, Xhosa, Shona and Afrikaans languages [26], although we have no way of defining the reason for this increase it is most likely that it was achieved due to the similarities of the languages since they are all African languages even though Afrikaans is very different from Zulu, and due to the homogeneous performance of Google Translate on these languages [5]. Some of the draw backs of the Domain Randomization technique is that it takes up to 100 times more time per epoch to train the model specially in the first unsupervised part and requires 3 to 5 times more epochs to converge(so that the accuracy and loss aren’t changing a lot), plus the translation over-head which

Bridging Domains for Zulu Stance Detection

323

can take up to hours and can be faced by blocking IP-address by the Google API. Another potential issue is that after a certain degree of randomization (nR = 16 in our case) the models performance drops drastically even on the training data. 4.2

Domain Adaptation

We implemented a supervised domain adaptation scenario where we train on both English and Zulu datasets both with access to the labels. We used 70% of the Zulu dataset for training and 30% for testing. This is the only experiment where we used the Zulu dataset as part of the training set, unlike the Randomized-English-Zulu experiment where we only used Zulu as an intermediate language for the randomization process. Table 2. Domain adaptation results on the Zulu-30 test dataset. F1-score Accuracy FA-F1-score AG-F1-score Eng-Only 0.4906

0.5258

0.3749

0.5742

Eng-Zulu 0.5673

0.5448

0.4434

0.7469

The results on Table 2 are the output of k-fold cross validation with k set to 10. The Domain Adaptation results show that there is definitely a visible increase in the performance on the Zulu test dataset (referred to as Zulu-30), it even outperforms the best domain randomization models evaluated in Table 1.

5

Conclusions

We have proposed a non-intrusive method for improving results on LessResourced/Endangered Languages by leveraging low quality data from English datasets, using Zulu as a demonstration case. We also provide a new dataset for Stance Detection on Zulu. Our method was able to effectively transferring knowledge between different languages. In future work, we aim to improve the technique by adding more intrusive techniques like distribution matching between Source and Target domains in the latent space by reducing KL divergence. Acknowledgments. This research was supported by the Independent Danish Research Fund through the Verif-AI project grant.

References 1. Allah, F.A., Boulaknadel, S.: Toward computational processing of less resourced languages: primarily experiments for Moroccan Amazigh language. Text Mining. Rijeka: InTech, pp. 197–218 (2012)

324

G. Dlamini et al.

2. Augenstein, I., Rockt¨ aschel, T., Vlachos, A., Bontcheva, K.: Stance detection with bidirectional conditional encoding. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, volume abs/1606.05464 (2016) 3. Bekkouch, I.E.I., Youssry, Y., Gafarov, R., Khan, A., Khattak, A.M.: Triplet loss network for unsupervised domain adaptation. Algorithms 12(5), 96 (2019) 4. Besacier, L., Barnard, E., Karpov, A., Schultz, T.: Automatic speech recognition for under-resourced languages: a survey. Speech Commun. 56, 85–100 (2014) 5. Bourquin, W.: Click-words which xhosa, zulu and sotho have in common. Afr. Stud. 10(2), 59–81 (1951) 6. Carlucci, F.M., D’Innocente, A., Bucci, S., Caputo, B., Tommasi, T.: Domain generalization by solving jigsaw puzzles. CoRR, abs/1903.06864 (2019) 7. Cope, A.T.: Zulu phonology, tonology and tonal grammar. Ph.D. thesis, University of Durban (1966) 8. Derczynski, L., Maynard, D., Aswani, N., Bontcheva, K.: Microblog-genre noise and impact on semantic annotation accuracy. In: Proceedings of the 24th ACM Conference on Hypertext and Social Media, pp. 21–30. ACM (2013) 9. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding (2018) 10. Dungs, S., Aker, A., Fuhr, N., Bontcheva, K.: Can rumour stance alone predict veracity? In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 3360–3370 (2018) 11. Ferraro, J.P., Daum´e, H., III., DuVall, S.L., Chapman, W.W., Harkema, H., Haug, P.J.: Improving performance of natural language processing part-of-speech tagging on clinical narratives through domain adaptation. J. Am. Med. Inform. Assoc. 20(5), 931–939 (2013) 12. Goodfellow, I.: Nips 2016 tutorial: generative adversarial networks. arXiv preprint arXiv:1701.00160 (2016) 13. Gorrell, G., et al.: Semeval-2019 task 7: rumoureval, determining rumour veracity and support for rumours. In: Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 845–854 (2019) 14. Hercig, T., Krejzl, P., Kr´ al, P.: Stance and sentiment in czech. Computaci´ on y Sistemas 22(3) (2018) 15. Howard, J., Ruder, S.: Fine-tuned language models for text classification. CoRR abs/1801.06146 (2018) 16. Hu, L., Kan, M., Shan, S., Chen, X.: Duplex generative adversarial network for unsupervised domain adaptation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1498–1507, June 2018 17. Kochkina, E., Liakata, M., Augenstein, I.: Proceedings of the 11th International Workshop on Semantic Evaluation (semeval-2017). In CoRR volume abs/1704.07221 (2017) 18. Kotu, V., Deshpande, B.: Chapter 2 - data mining process. In: Kotu, V., Deshpande, B. (eds.) Predictive Analytics and Data Mining, pp. 17–36. Morgan Kaufmann, Boston (2015) 19. Kotu, V., Deshpande, B.: Chapter 2 - data science process. In: Kotu, V., Deshpande, B. (ed.) Data Science, 2nd edn., pp. 19 – 37. Morgan Kaufmann (2019) 20. K¨ u¸cu ¨k, D.: Stance detection in Turkish tweets. arXiv preprint arXiv:1706.06894 (2017) 21. Li, D., Yang, Y., Song, Y.-Z., Hospedales, T.M.: Deeper, broader and artier domain generalization. CoRR, abs/1710.03077 (2017)

Bridging Domains for Zulu Stance Detection

325

22. Lillie, A.E., Middelboe, E.R., Derczynski, L.: Joint rumour stance and veracity prediction. In: Proceedings of the 22nd Nordic Conference on Computional Linguistics (NoDaLiDa), pp. 208–221 (2019) 23. Mahsut, M., Ogawa, Y., Sugino, K., Inagaki, Y.: Utilizing agglutinative features in Japanese-Uighur machine translation. Proc. MT Summit 8, 217–222 (2001) 24. Merity, S., Xiong, C., Bradbury, J., Socher, R.: Pointer sentinel mixture models. CoRR, abs/1609.07843 (2016) 25. Mohammad, S., Kiritchenko, S., Sobhani, P., Zhu, X., Cherry, C.: Semeval-2016 task 6: detecting stance in tweets. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pp. 31–41 (2016) 26. Niesler, T., Louw, P., Roux, J.: Phonetic analysis of Afrikaans, English, Xhosa and Zulu using south African speech databases. South. Afr. Linguistics Appl. Language Studi. 23(4), 459–474 (2005) 27. Nisbet, R., Elder, J., Miner, G.: Chapter 13 - model evaluation and enhancement. In: Nisbet, R., Elder, J., Miner, G. (eds.) Handbook of Statistical Analysis and Data Mining Applications, pp. 285–312. Academic Press, Boston (2009) 28. Peters, M.E., Ruder, S., Smith, N.A.: To tune or not to tune? adapting pretrained representations to diverse tasks. CoRR, abs/1903.05987 (2019) 29. Pires, T., Schlinger, E., Garrette, D.: How multilingual is multilingual bert? CoRR, abs/1906.01502 (2019) 30. Qazvinian, V., Rosengren, E., Radev, D.R., Mei, Q.: Rumor has it: identifying misinformation in microblogs. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1589–1599. Association for Computational Linguistics (2011) 31. Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world. CoRR, abs/1703.06907 (2017) 32. Weng, L.: Domain randomization for sim2real transfer. lilianweng.github.io/lil-log (2019) 33. Zhou, S., Lin, J., Tan, L., Liu, X.: Condensed convolution neural network by attention over self-attention for stance detection in twitter. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8, July 2019

Arcface Based Open Set Recognition for Industrial Fault Jeongseop Yoon1(B) , Donghwan Kim1 , and Daeyoung Kim2 1

Research Team, BISTelligence, 128, Baumoe-ro, Seocho-gu, Seoul, Republic of Korea [email protected] 2 Aidentyx, San Jose, USA [email protected]

Abstract. In industry, fault classification is important to avoid economic losses. Fault type classification is an important because fault detection and classification before equipment shutdown allows accurate maintenance. However, it is difficult to define all fault types in advance. It is impossible to know everything in advance what kind of fault will occur. Therefore, we propose an Arcface-based open set recognition method. We propose an algorithm that can classify a known fault type or an unknown fault type by fusion of a deep learning-based classification model and a distribution model that can estimate whether it is a known type. We apply the proposed method to the AHU dataset. The proposed model shows better performance compared to the existing methods.

Keywords: Fault classification

1

· Open set recognition · Arcface

Introduction

Fault Classification. Fault classification is an important problem in industrial filed. When industrial equipment continues to operate in a breakdown status or breakdown due to a fault, it causes huge economic losses. For example, in the case of air-conditioning equipment in almost all buildings, even a slight breakdown results in a very inefficient situation in terms of energy efficiency. In addition, fault of equipment in the manufacturing field causes not only repair of equipment, but also economic damage such as production delay and labor cost due to fault. Therefore, many studies have been conducted to prevent such cases. In the case of [1–4], machine learning and deep learning algorithms were proposed to classify fault occurring in the industrial field. In addition, [4] showed how to study the algorithm optimized for fault classification of air conditioning equipment. As such, many previous studies focused on finding the optimal classification algorithm. We studied to classify fault types using existing machine learning algorithms such as one-class Support Vector Machine (SVM), Decision Tree (DT), and Bayesian Network. And relatively recently, an algorithm showing excellent performance through a network that selects and classifies features with c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023  K. Arai (Ed.): IntelliSys 2022, LNNS 542, pp. 326–335, 2023. https://doi.org/10.1007/978-3-031-16072-1_24

Arcface Based OSR for Industrial Fault

327

a 1D-Convolutional Neural Network (CNN) has been proposed, and a Generative Adversarial Network (GAN) has been proposed to solve the degradation of classification performance due to lack of fault data. However, this approach is difficult to solve the problems that occur in the real world. This is because, in the real world, unknown fault occur. Because it is difficult to define all fault, it is difficult to classify existing methods when fault that are not known in advance occur. Therefore, we define this problem as open set recognition and deal with it in detail in the section below. Open Set Recognition. In the real world, we never know what fault will happen. It is practically impossible to define every fault class in industrial equipment before the fault event. Because no one knows what fault classes will occur. If a trained classification model does not correctly classify or detect an unknown fault, a fault in the equipment can result in break down. Therefore, when an unknown fault classes data is input, it is necessary to classify it as an unknown fault. In Fig. 1, A problem like this can be defined as Open Set Recognition (OSR). OSR is a field that researches algorithms that perform a function that can be classified as Unknown rather than one of the known classes when a class that did not exist when the model was trained appears in the test data. Until now, although many OSR studies have been conducted in fields where research is being actively conducted, such as image data [8,10], studies related to time series fault data have not been sufficiently conducted. Therefore, in this paper, the existing OSR algorithm was applied to time series fault data, and the excellent performance of the algorithm proposed in this paper was demonstrated and results were analyzed. That is, the purpose of this paper is to solve the classification problem of the unknown fault classes as well as the known classes, in practice. The contribution of the proposed method are as follows, 1. We proposed Multi-Class Classification Deep Learning Model for Industrial Fault dataset. 2. we proposed an algorithm that can reliably classify an unknown Fault Class in the Industrial domain. 3. Our proposed method to improve classification performance by applying Arcface to the manufacturing domain dataset classification model.

2

Related Work

Until now, many Fault classification studies related to Chiller systems have been conducted. In the early days, there were many studies on classification models for normal and abnormal states, and recently, studies to classify what fault classes are being actively conducted. Fault Classification with Machine Learning. Yan et al. [19] selected features suitable for classification with Kalman Filter and performed fault detection with Recursive One-Class SVM. This is an approach that learns only normal data and detects faults with the loss value that increases when fault data is input. However, this approach is not suitable in the real world because it is

328

J. Yoon et al.

Fig. 1. A conceptual diagram of open set recognition. When unknown data is input, the left model (without open set recognition) classifies it as an already known class. However, the right model (with open set recognition) can classifies unknown data.

necessary to know the fault type. Dan et al. [9] proposed an algorithm for classifying types based on clusters after projecting high-dimensional data to lowdimensional using Linear Discriminant Analysis (LDA). Fault Classification with Deep Learning. Yan et al. [17] performed data augmentation with CWGAN and showed that performance improves when the existing classification model (SVM) additionally learns data generated through CWGAN. However, in a situation where a clear criterion for the fault data is not defined, it is difficult to determine whether the generated dataset is meaningful data. Wang et al. [16] performed fault classification using the 1D CNN-GRU model. They studied from the perspective of Hyper-parameter Optimization so that the model can show the best performance. They performed performance comparisons with many algorithms that have been released until recently and showed excellent performance. Even for difficult to classify data (low fault severity), it outperformed other algorithms. Traditional OSR. Early Open Set Recognition (OSR) Algorithm attempted to create an independent Binary Classification Model for each class. Approaching the One vs All methodology, if there are k classes, k models are created, and if all models are rejected, it is classified as an unknown class. As such, many studies attempted OSR problems with traditional machine learning algorithms. Sch et al. [10] establishes a one-class SVM and defines an unlearned class as an unknown class. It is similar to the traditional OSR method described above and is characterized by using the SVM algorithm. Fei et al. [7] constructed a binary classification for each class through the Center-based-Similarity (CBS)

Arcface Based OSR for Industrial Fault

329

spatial learning method by obtaining a similarity vector for the center of each class of data. It is an algorithm that is classified as unknown when all models are rejected. This is a difficult method to use when it is difficult to define spatial characteristics as the data dimension increases. In Scheirer et al. [12], the binary SVM classifier for each class is trained. After that, the Extreme Value Theory (EVT) for each class is obtained, and data rejected by all models are classified as Unknown. OSR with Deep Learning. As such, many previous studies attempted to solve the OSR problem centering on the binary classification model. Recently, as the CNN structure of deep learning shows excellent performance in multiclass classification problems, OSR problems are solved by using a single model rather than a multi-model(one vs all model). Bendale et al. [1] applied the EVT algorithm to the softmax value using a deep learning model and proposed a scoring method for unknown and known classes. They proposed an approach that can achieve good enough performance without the need for managing many models or an optimization process. For image data, it shows decent performance, but for fault data, there is no research history, so in this paper, Bendale at al. [8] algorithm was implemented and compared. In addition, Shu et al. [13] proposed a method of classifying data below the threshold into an unknown class by using the threshold for the sigmoid value for multi-classification. Since this is a method to find the optimal threshold for each data, it is difficult to apply in the real world. Dhamija et al. [4] proposed a method of learning about the space where the unknown class will be embedded using background data (data of a class that is not included in the training data among the datasets it has). This is an approach that uses uninterested data to determine whether newly input data is a known class or an unknown class, and it is a method that effectively solves the OSR problem. However, it is an algorithm that is difficult to apply in a field with little fault data itself, such as a fault classification problem. Since the number of fault data is small, not including some of them in training cause difficulties in subsequent classification tasks. OSR with Fault Classification. Marchel et al. [5] proposed a fault classification system using Autoencoder and ANN. When the reconstruction error exceeds the threshold, the fault classifier is activated to classify known/unknown faults. Since it is a viewpoint of solving the fault detection part, classification, and OSR problems together, it did not show high performance. Yan et al. [20] proposed a method of retraining the model so that the decision boundary of the model differs more clearly by adding a noisy dataset to the trained model.

3

Method

The proposed method is different from the training process and the test process. An explanation of this is given in Fig. 2. In the training process, the classification model trains and the Cumulative Distribution Function (CDF) model is constructed using the features of each class data. Then, open set recognition is performed using the classification model trained in the test process and the conducted CDF model.

330

J. Yoon et al.

Fig. 2. The framework of proposed method

3.1

Proposed Fault Classification Approach

The classification model we propose is shown in Fig. 3. The classification model of this study used a structure similar to the algorithm proposed by Wang et al. [4]. Since it is a structure that shows high performance on the same dataset (AHU), it was used for the efficiency of this study. In addition, since the novelty of this paper is not in the classification model itself, but in the classification process for an unknown class, better or worse results can be obtained when using other classification models as well as the adopted classification model.

Fig. 3. The classification model structure of proposed method (1D CNN-GRU)

3.2

Proposed Open Set Recognition Method

The EVT algorithm is applied by extracting the Activation Vector (AV, from Dense layer, called Penultimate Layer) of each class from the trained classification model. First, for the classification of known/unknown classes, the center of each class is obtained by applying the Nearest Class Mean [11,14] algorithm, and then the distance from the center is obtained for the data of the corresponding class. The following calculates the Weibull distribution and calculates the CDF value of the class data.

Arcface Based OSR for Industrial Fault

331

Algorithm 1. Fit the Weibull CDF from the Center of the Activation Vector of each Class. Returns the Weibull CDF Model Pi and the Scale Parameter li , the Shape Parameter ki to make Model Pi . Require: : Weibull CDF Model Pi Require: : Activation vector from Penultimate layer V (xj ) = v1 , . . . , vN Require: : Each class center vector Si calculated by activation vector 1: for i = 1, . . . , N do 2: for j = 1, . . . , M do 3: Euclidean distance di,j = (V (xj ), Si ) 4: end for 5: Fit Weibull CDF Pi =(li ,ki ,Di ) 6: end for 7: Return: Weibull CDF Model P1,...,N

Through Algorithm 1, get the CDF model for each class. Create a CDF model by finding shape and scale parameters from the data. And classify Known and Unknown classes using CDF and Activation vector with Algorithm 2.

Algorithm 2. Known and Unknown Classes Probability Estimation with Activation Vector and Weibull CDF Require: : Activation vector from Penultimate layer V (xj ) = v1 , . . . , vN Require: : Weibull CDF Model Pi 1: for j = 1, . . . , M do 2: for i = 1, . . . , N do 3: Knowns vector Ki (xj )=vi (xj ) ∗ (1 − Pi (di,j )) 4: end for  N 1 5: Unknown vector U (xj )= N1 N i=1 vi (xj )* N i=1 Pi (di,j ) 6: end for 7: Update Activation vector Vˆ (x)=[K1 ,. . . ,KN ,U (x)] 8: Define: ˆ eVn (x) pˆ(y = n|x) = N +1 ˆ Vi (x) i=1 e

The total number of data is M, and the number of classes is N. For input data, the probability of being classified into each Known class and Unknown class is calculated. After calculating all class values and the softmax was applied to convert it to probability. 3.3

Arcface Loss Function

When applying the OSR method, the Arcface loss function [3] was applied to more clearly decide the features of the classification model by class.

332

J. Yoon et al.

LArcf ace

N 1  es(cos(θyi +m)) =− log s(cos(θ +m)) n yi N i=1 e + j=1,j=yi es(cosθj )

(1)

The Arcface loss function learns to embed the decision boundary between each class at a certain angle in the feature space of the classification model. The overall structure of the function is the same as that of Softmax. S is a feature re-scale that scales the distance between features of each class. θyi means the center of the class to which the input data belongs, and m the margin for the angle between the classes. Therefore, through Arcface, the features of each class are learned at a certain angle, so the decision boundary is learned. This paper utilizes this algorithm because the effect of the OSR method is maximized when the decision boundary is clear.

4 4.1

Experiments Data Description

In this paper, we used the ASHRAE RP-1043 [2] dataset (called AHU). It is the data of many studies on the fault of the Chiller. Data with detailed information about the air conditioning system. Various types of Chiller Faults are included, and 4 levels of severity are included for each fault. And data was collected every 10 s. AHU data has 7 Fault Classes. In this paper, 5 Known Fault Classes (EO, NC, FWC, FWE, RO) and 2 Unknown Fault Classes (CF, RL) were setted. The fault classes are as follows: Condenser Fouling (CF), Excess Oil (EO), Non-condensable in Refrigerant (NC), Reduced Evaporator Ware Flow (FWE), Reduced Condenser Water Flow (FWC), Refrigerant Leak (RL), Refrigerant Overcharge (RO). Among the 64 parameters, we used 10 parameters effectively employed for classification [6,15,18]. And the selected parameters are as follows: TCI, TEO, kW/Ton, TCO, PO feed, TEI, PRC, Evap Tons, TCA, TRC sub. Data preprocessing was performed for training. For each parameter, z-normalization was performed. And the window size is 10. The training data is 80% of the known fault class data, and the test data is the remaining 20% and the entire unknown fault class. 4.2

Experimental Design

In this paper, we used several evaluation metrics for classification problems of algorithms. These metrics are Accuracy, Precision, Recall. Accuracy (ACC). It is the most used to evaluate algorithm performance in classification problems. It can be defined as the ratio of correctly classified data items to whole observations. Precision. Out of the observations that an algorithm has predicted to be positive, how many of them are actually positive. Recall. Out of the observations that are actually positive, how many of them have been predicted by the algorithm.

Arcface Based OSR for Industrial Fault

4.3

333

Comparison Method

In this paper, we compared Hendrycks et al. [8] (called DMOD) and Bendale et al. [8] (called OpenMax) algorithm with the proposed method. DMOD is a method of classifying into Unknown class when rejected in all classes by finding the optimal threshold for each class. This method was used as a way to know which classification model is suitable. OpenMax is a classification method using features extracted from the trained model and a weibull CDF model created for each class. It is widely used and popular among OSR. 4.4

Experimental Result

In this paper, we performed a performance comparison analysis of OSR algorithms. OpenMax and proposed algorithms are applied to the 1D CNN-GRU model. The results of such a model are shown in Table 1. We analyzed how the performance of each algorithm affects the performance of proposed OSR method and Arcface while applying it. Proposed is about 25% improved than OpenMax. Table 1. Classification result with unknown classes Algorithm

Accuracy Precision Recall

GRU & DMOD 1D CNN & DMOD 1D CNN-GRU & DMOD OpenMax (with Softmax) OpenMax (with Arcface) Proposed (with Softmax)

0.576 0.614 0.623 0.665 0.721 0.822

0.603 0.630 0.639 0.677 0.757 0.868

0.573 0.611 0.619 0.658 0.731 0.821

Proposed (with Arcface)

0.917

0.922

0.904

To compare the performance more precisely, we conducted an additional experiment based on the fault classes. The result is shown in Table 2. The final proposed model shows an accuracy of 99.5% for the Unknown class. And it shows a stable high accuracy (about 91.4%) for the known class. Most of the comparison models show low accuracy (below 50%) for unknown classes. And the accuracy for a specific known class (NC, FWE) is low. The reason seems to be that when the OpenMax approach is used in the latent space feature, the distribution estimation for each class is not accurately performed. Also, in the case of DMOD, it is difficult to classify the unknown using only the softmax threshold, but the classification performance for the known class was stable and not low in a specific class. However, the proposed model showed stable classification performance for the unknown class and the known class.

334

J. Yoon et al. Table 2. Classification result about each classes

5

Algorithm

Unknown ACC Known classes ACC EO NC FWC FWE RO

GRU & DMOD 1D CNN & DMOD 1D CNN-GRU & DMOD OpenMax (with Softmax) OpenMax (with Arcface) Proposed (with Softmax)

0.307 0.279 0.251 0.443 0.564 0.957

0.664 0.623 0.751 0.996 0.992 0.590

Proposed (with Arcface)

0.995

0.901 0.903 0.974 0.919 0.875

0.685 0.701 0.621 0.530 0.521 0.849

0.639 0.711 0.674 0.991 0.995 0.761

0.709 0.615 0.547 0.528 0.533 0.757

0.731 0.757 0.612 0.959 0.956 0.793

Conclusion

We propose the Arcface-based Open set Recognition method for industrial data when the class information is not available. We use the Arcface for a more precise decision boundary. Then, the Weibull CDF is generated by extracting features from the training model. And the Open Set Recognition methodology is used to classify fault types. Comparison analysis was conducted with existing methods for the performance of the proposed method. The experimental result showed the highest performance when Arcface and Proposed OSR methods are applied. Acknowledgment. This work was supported by the World Class 300 Project (R&D) (S2641209, “Improvement of manufacturing yield and productivity through the development of next generation intelligent Smart manufacturing solution based on AI & Big data”) of the MOTIE, MSS (Korea).

References 1. Bendale, A., Boult, T.E.: Towards open set deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1563–1572 (2016) 2. Comstock, M.C., Braun, J.E.: Development of analysis tools for the evaluation of fault detection and diagnostics in chillers, ashrae research project rp-1043. American Society of Heating, Refrigerating and Air-Conditioning Engineers, Inc., Atlanta. Also, Report HL, pp. 99–20 (1999) 3. Deng, J., Guo, J., Xue, N., Zafeiriou, S.: ArcFace: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4690–4699 (2019) 4. Dhamija, A.R., G¨ unther, M., Boult, T.: Reducing network agnostophobia. Adv. Neural Inf. Process. Syst. 31 (2018) 5. Dix, M., Borrison, R.: Open set anomaly classification. In: Proceedings of the 8th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, pp. 361–364 (2021)

Arcface Based OSR for Industrial Fault

335

6. Fan, Y., Cui, X., Han, H., Hailong, L.: Feasibility and improvement of fault detection and diagnosis based on factory-installed sensors for chillers. Appl. Thermal Eng. 164, 114506 (2020) 7. Fei, G., Wang, S., Liu, B: Learning cumulatively to become more knowledgeable. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1565–1574 (2016) 8. Hendrycks, D., Gimpel, K.: A baseline for detecting misclassified and out-ofdistribution examples in neural networks. arXiv preprint arXiv:1610.02136 (2016) 9. Li, D., Hu, G., Spanos, C.J.: A data-driven strategy for detection and diagnosis of building chiller faults using linear discriminant analysis. Energy Build. 128, 519–529 (2016) 10. Manevitz, L.M., Yousef, M.: One-class SVMS for document classification. J. Mach. Learn. Res. 2(Dec), 139–154 (2001) 11. Mensink, T., Verbeek, J., Perronnin, F., Csurka, G.: Metric learning for large scale image classification: generalizing to new classes at near-zero cost. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7573, pp. 488–501. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3642-33709-3 35 12. Scheirer, W.J., Rocha, A., Micheals, R.J., Boult, T.E.: Meta-recognition: the theory and practice of recognition score analysis. IEEE Trans. Patt. Anal. Mach. Intell. 33(8), 1689–1695 (2011) 13. Shu, L., Xu, H., Liu, B.: DOC: deep open classification of text documents. arXiv preprint arXiv:1709.08716 (2017) 14. Tibshirani, R., Hastie, T., Narasimhan, B., Chu, G.: Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Nat. Acad. Sci. 99(10), 6567–6572 (2002) 15. Tran, D.A.T., Chen, Y., Jiang, C.: Comparative investigations on reference models for fault detection and diagnosis in centrifugal chiller systems. Energy Build. 133, 246–256 (2016) 16. Wang, Z., Dong, Y., Liu, W., Ma, Z.: A novel fault diagnosis approach for chillers based on 1-d convolutional neural network and gated recurrent unit. Sensors 20(9), 2458 (2020) 17. Yan, K., Chong, A., Mo, Y.: Generative adversarial network for fault detection diagnosis of chillers. Build. Environ. 172, 106698 (2020) 18. Yan, K, Hua, J.: Deep learning technology for chiller faults diagnosis. In: 2019 IEEE International Conference on Dependable, Autonomic and Secure Computing, International Conference on Pervasive Intelligence and Computing, International Conference on Cloud and Big Data Computing, International Conference on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), pp. 72–79. IEEE (2019) 19. Yan, K., Ji, Z., Shen, W.: Online fault detection methods for chillers combining extended Kalman filter and recursive one-class SVM. Neurocomputing 228, 205– 212 (2017) 20. Yan, Z., et al.: A new universal domain adaptive method for diagnosing unknown bearing faults. Entropy 23(8), 1052 (2021)

Sensitivity of Electrophysiological Patterns in Level-K States as Function of Individual Coordination Ability Dor Mizrahi(B)

, Inon Zuckerman , and Ilan Laufer

Department of Industrial Engineering and Management, Ariel University, Ariel, Israel [email protected], {inonzu,ilanl}@ariel.ac.il

Abstract. Tacit coordination games are games in which players get rewarded by choosing the same alternatives as an unknown player when communication between the players is not allowed or not possible. Classical game theory fails to correctly predict the probability of successful coordination due to its inability to prioritize salient Nash equilibrium points (in this setting, also known as “focal points”). To bridge this gap a variety of theories have been proposed. A prominent theory is the level-k theory which assumes that players’ reasoning depth relies on their subjective level of reasoning. Previous studies have shown that there is an inherent difference in the coordination ability of individual players, and that there is a correlation between the individual coordination ability and electrophysiological measurements. The goal of this study is to measure and quantify the electrophysiological sensitivity patterns in level-k states in relation to the individual coordination abilities of players. We showed that the combined model capabilities (precision and recall) improve linearly depending on the player’s individual coordination ability. That is, player with higher coordination abilities exhibit a more pronounced and prominent electrophysiological patterns. This result enables the detection of capable coordinators based solely on their brain patterns. These results were obtained by constructing a machine-learning classification model which predicting one of the two cognitive states, picking (level-k = 0) or coordination (level-k > 0), based on electrophysiological recordings. The results of the classification process were analyzed for each participant individually in order to assess the sensitivity of the electrophysiological patterns according to his individual coordination ability. Keywords: EEG · Tacit coordination · Level-k · Transfer learning · Classification

1 Introduction Tacit coordination games are games in which two players will receive a reward for choosing the same option out of a closed set when communication between the players is not allowed or not possible [1]. Analysis of such games according to classical game theory yields that in these games there are a number of Nash equilibrium points, i.e. each one of the players has no incentive to deviate from his initial strategy [2]. The © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 K. Arai (Ed.): IntelliSys 2022, LNNS 542, pp. 336–347, 2023. https://doi.org/10.1007/978-3-031-16072-1_25

Sensitivity of Electrophysiological Patterns in Level-K States

337

solution offered by classical game theory in these cases is a random choice with equal probability between all of Nash’s equilibrium points in the game [3]. Studies have also shown that human coordination is better than predicted by game theory (e.g. [4–6]). The gap between the assessment given by game theory and the actual results happens because in these games the more salient solutions, denoted as focal points, are often perceived as more prominent (e.g. [4, 7]). While human players are often very good at detecting these salient solutions, classical game theory does not detect or prioritize focal point solutions over other “standard” Nash equilibrium points. In order to bridge the gap in which classical game theory fails to evaluate human behavior in tacit coordination games, a number of theories have been developed which aim to explain how the coordination process is carried out. One of the most significant models is the level-k theory [8–12] which is based on cognitive hierarchy theory [9, 13–15]. The theory assumes that players’ reasoning depth relies on their subjective level of reasoning k. For example, in tacit coordination games, players of the lowest level-k, i.e., k = 0 (sometimes referred to as L 0 players), will not make preliminary assumptions and will choose at random, as recommended by classical game theory. Players with level-k = 1 (L 1 players) will assume that all other players are with level-k = 0 and will act accordingly. Generally, L 0 players might utilize rules but will apply them randomly (picking), whereas L k > 0 players will apply their strategy based on their beliefs regarding the actions of the other players (coordination). Other studies examined and showed that there is an inherent difference in the individual coordination ability of the human players and that a deterministic strategy or rule will not be available or applied by all players (e.g. [16–20]). Based on these studies, it was possible to develop and optimize an autonomous agent which used an individual analysis for each player [17, 21, 22]. Following the individual analyses, further studies have shown that there is a correlation between the individual coordination processes and electrophysiological patterns [11, 23–25]. For example, the authors of [11] showed that the level-k can be predicted on the basis of the electrophysiological measurements of the subject while performing the coordination task. With that in mind, the goal of the current study was to measure and quantify the electrophysiological sensitivity patterns in level-k states in relation to the individual coordination abilities of players. On our proposed model we showed that the combined model capabilities (precision and recall) improve linearly depending on the player’s individual coordination ability. That is, players with higher coordination abilities exhibit a more pronounced and prominent electrophysiological patterns. This in turn enables the detection of capable coordinators based solely on their electrophysiological patterns. In contrast to previous works (e.g. [11, 26, 27]), which treated all subjects as a single homogeneous group based on demographic characteristics, this study analyzes the sensitivity of the electrophysiological patterns of different actors individually based on behavioral measures. This study will make it possible to assess the significance level of such models which are based on electrophysiological recordings and whether their reporting performance can be generalized to the whole population or whether differentiation should be performed based on different behavioral measures. The manuscript is structured as follows, Sect. 2 defines the iCA (individual coordination ability) index, in addition, the experimental design, the profile of the participants,

338

D. Mizrahi et al.

and the EEG recording specifications are presented in this chapter. Section 3 presents the study results and is divided into three sub-sections. Subsection 3.1 presents the preprocessing pipeline of the EEG segments. Subsection 3.2 presents the architecture and results of the level-k states classifier which is based on the transfer learning technique.

2 Materials and Methods 2.1 Individual Coordination Ability (iCA) In order to assess the coordination ability of the various players, i.e. what is their average individual probability of being able to coordinate against a random player, based on only the behavioral results we have utilized individual coordination ability (iCA) index (e.g. [16–18, 23]). This index compares the choices of each player with the other players in the team and normalizes the amount of successes (i.e. the number of coordination cases) against the total number of attempts. iCA is formally defined as follows: ICA(i) =

t N   CF(i, j, k) (N − 1) ∗ t

(1)

j=1|(j=i) k=1

where i denotes the ith participant, N denotes the total number of participants, and t denotes the number of games in the experiments. The CF (Coordination Function) is defined as follows:  1 ; if players i and j chose the same label in game k CF(i, j, k) = (2) 0; otherwise

2.2 Experimental Design Procedure. The study comprised the following stages. First, participants received an explanation regarding the overarching aim of the study and were given instructions regarding the experimental procedure and the interface of the application. Participants were offered a reward based on the total number of points they earned in both tasks (picking and coordination). The experiment has two stages which were based on the same set of stimuli and presentation schemes. The experiment consisted of two sets of 12 different trials each with a different set of words. For example, game board #1 displays a trial containing the set {“Water”, “Beer”, “Wine”, “Whisky”} (a prominent focal point in this game is “Water”, as it is the only non-alcoholic beverage). Each set of words was displayed between two short vertical lines following a slide containing only the lines without the word set so that participants will focus their gaze at the center of the screen (Fig. 1, A and B). In the first experimental condition, the task presented to the players was a picking task, i.e., participants were only required to freely pick a word out of each set of four words presented to them in each of the 12 trials. Subsequently, participants were presented with the coordination task, comprising the same set of 12 different trials. In the coordination condition participants were instructed to coordinate

Sensitivity of Electrophysiological Patterns in Level-K States

339

their choice of a word with an unknown partner so that they would end up choosing the same word from the set. Each participant sat alone in front of the computer screen during the entire experimental session. It is important to note that no feedback was given between the games. That is, the participants were not informed whether they have coordinated successfully or not with their unknown co-player. Finally, the output file with the experiment logs and EEG signals was uploaded also to a shared location for offline analysis.

Fig. 1. (A) Stand by screen (B) Game #1 {“Water”, “Beer”, “Wine”, “Whisky”}

Figure 2 portrays the outline of the experiment. Each slide containing the set of words (task trials) was preceded by a slide containing only the vertical lines without the word set (stand-by slides) to keep the gaze of participants in the middle of the screen throughout the experiment. Each of the stand-by slides was presented for U (2,2.5) seconds, while each slide containing the set of words was presented for a maximal duration of 8 s. Following a task trial, participants could move to the next slide with a button press. The sequence of the task trials was randomized in each session.

Fig. 2. Experimental paradigm with timeline

Participants. The experiment included ten participants (eight men, and two women). All participants were undergraduate students in the Faculty of Engineering at Ariel University. All participants have a dominant right hand. The average age of the participants is 26 years and a standard deviation of 4 years. All participants signed a form that constitutes their consent to participate in the study. The study was approved by the institutional review board of Ariel University.

340

D. Mizrahi et al.

EEG Recordings. EEG acquisition during the different tasks in the experiment was done using an active EEG amplifier with 16 channels (g.USBAMP which manufactured by g.tec, Austria). The EEG acquisition has a sampling interval of 1/512 [sec] and electrodes layout were fixed by the 10–20 international system. The maximal impedance of each electrode during the recordings does not exceeded 5000 [ohm] as was monitored by the OpenVibe [28] processing and recording software. In order to avoid bias of the results, before conducting the study itself, a training session was performed in which the subjects wore the EEG cap while reviewing the experimental applications.

3 Results and Discussion 3.1 EEG Preprocessing Scheme EEG preprocessing is done in order to remove noise from the data to get closer to the true neural signals. Thus, we have used the preprocess pipeline which is described in Fig. 3. The first preprocessing block is a band pass filter ([1, 32] Hz) followed by a notch filter of 50 [Hz] combined with an independent component analysis for artifact removal. Then we have changed the signal reference into an average reference together with downsampler with N = 8 (512 Hz to 64 Hz). Finally, we have performed baseline correction and epoched the signal into a 1-s window from the onset of each game which result a total of 12 decision points (i.e., EEG epochs) per participant. It is important to note that based on the literature (e.g. [24, 29–32]), we focused on the following cluster of frontal and prefrontal electrodes (Fp1, F7, Fp2, F8, F3, and F4).

Fig. 3. EEG preprocess scheme

3.2 Classifying EEG Segments into their Corresponding Level-K To assess the sensitivity of the electrophysiological patterns as a function of the individual coordination abilities, we will first create a model which predicts the level-k value (i.e., level-k = 0 or level-k > 0) based on the electrophysiological signals. To that end, we will base on [11] which showed that a reliable electrophysiological classification model can be produced based on data of a single electrode using continues wavelet transform

Sensitivity of Electrophysiological Patterns in Level-K States

341

(CWT) [33–35] which is feed-forward into a VGG16 [36] network trained on ImageNet dataset [37] with additional trainable neuron (i.e. transfer learning [38]). In order to improve the classification results, the researchers reported that the mutual information between the electrodes could be used by performing a smart weighting of all six models built for the frontal and pre-frontal electrodes based on a genetic algorithm [39, 40]. The architecture of the multi-electrodes model is presented in Fig. 4.

Fig. 4. Classification model scheme

We used the weights that have been reported in the literature by the researchers in [11] as presented in Table 1: Table 1. Channels optimal weights Electrode name

(Fp1)

(F7)

(Fp2)

(F8)

(F3)

(F4)

Weighted score

0.1216

0.0013

0.1553

0.0108

0.4153

0.2957

To avoid over-fitting, each single electrode model was trained with a four-fold cross validation method so that the training set included 180 samples at a time (three folds) and the test set included 60 samples (one-fold) where each fold contains even number of observations from each class. We repeated this process three times to obtain a reliable prediction of all the samples in the test group. After training all six single electrode models we performed assembly according to the scheme presented in Fig. 4 by multiplying each model by his corresponding weight and summing all results together. The classification results of the final multi-electrode model are presented in Table 2.

342

D. Mizrahi et al. Table 2. Model evaluation

“0” picking (level-k = 0) “1” coordination (level-k > 0) Positive predicted value False discovery rate True class

Predicted class “0” “1” picking coordination (level-k = 0) (level-k > 0) 90 30

28

92

76.27% 23.73%

75.41% 24.59%

True positive rate

False negative rate

75%

25%

76.66%

23.34%

Prediction Accuracy 75.83% (182/240)

It can be seen that the classification method described manages to produce reliable results with an accuracy of over 75% (while random selection provides an accuracy of only about 50%), using a complex model based on a relatively small sample set of only 240 samples on a population of 10 different subjects. On the other hand, this model functions as a black box, meaning it does not provide insights into brain topographies regarding the epochs that have been incorrectly labeled. That is, investigating the errors of the model is cumbersome and it will be difficult to decipher and identify the source of the error in case of insufficient classification results. 3.3 EEG Patterns Sensitivity Analysis Previous studies (e.g. [16–18, 41]) have shown that in tacit coordination games there is a difference in the coordination abilities of the players on the individual level. Also, [23] presented that in tacit coordination games players with different coordination abilities have different electrophysiological patterns. These patterns are expressed by different cognitive load levels, the more cognitive load the player invests in the coordination game correlates positively to higher player’s individual coordination ability. To evaluate the effect of the iCA index on the identification of coordination epochs based on electrophysiological data, we will calculate the F1 score for each given iCA value, i.e., for each of the experiment participants. The F1 score is a measure of a model’s accuracy on a dataset which is calculated by the harmonic mean of the precision and recall indices. The recall index (also known as sensitivity) is the ratio between the number of true positives and the number of true positives plus the number of false negatives. This index describes the ability of the model to find all the relevant cases within a dataset [42, 43]. The precision index (also known as positive predictive value) is the ratio between the number of true positives and the number of true positives plus the number of false positives. This index describes the correctness of the model prediction within the given class [43, 44]. Table 3 shows the recall, precision and F1 score for each player individually along with his iCA value, for the coordination epochs class which is the relevant for the iCA index.

Sensitivity of Electrophysiological Patterns in Level-K States

343

Table 3. Classification model performance in relation to iCA Player number

1

2

3

4

5

6

7

8

9

iCA

0.3796

0.2685

0.3056

0.3056

0.2500

0.2315

0.2870

0.2963

0.2963

10 0.1389

Recall

100% (12/12)

66.7% (8/12)

100% (12/12)

83.3% (10/12)

58.3% (7/12)

58.3% (7/12)

83.3% (10/12)

83.3% (10/12)

91.2% (11/12)

41.7% (5/12)

Precision

85.7% (12/14)

88.9% (8/9)

75% (12/16)

71.4% (10/14)

77.8% (7/9)

77.8% (7/9)

76.9% (10/13)

66.7% (10/15)

73.3% (11/15)

83.3% (5/6)

F1-score

0.9231

0.7619

0.8571

0.7692

0.6667

0.6667

0.8

0.7407

0.8148

0.5556

Overall results Recall: 76.66%; (92/120) Precision: 75.41% (92/122)

To ascertain if there is a significant statistical relationship between the player’s individual coordination ability (iCA) and the model’s ability to predict the cognitive level (F1 score) we calculated a linear regression model between these two variables, which is described using the following equation: F1 score(iCA) = 0.3149 + 1.5970 ∗ iCA

(3)

R2 = 0.89; F = 64.728; p < 0.001; VARerror = 0.0014

(4)

The results of the regression model show that there is a positive significant statistical relationship. The greater the player’s iCA value, i.e., the better its coordination capabilities, the more pronounced the electrophysiological changes between different level-k states, which consequently allows a greater predictive quality by the model. Visualization of the iCA values coupled with the F1 score data and the corresponding regression model can be seen in Fig. 5.

Fig. 5. Individual F1 score in relation to iCA value

344

D. Mizrahi et al.

4 Conclusions and Future Work The purpose of this study was to measure and quantify the electrophysiological sensitivity patterns in level-k states in relation to the individual coordination abilities of players. For this aim we have designed and implemented a predictive model which distinguish the level-k value (level-k = 0 or level-k > 0). EEG classification which is based on frequency based transformation, such as continuous wavelet transform, combined with transfer learning based on pre-existing state-of the art models for object recognition (e.g. VGG16), as was suggested by [11]. Based on the proposed model we performed an assessment by the F1 score individually for each player based on his coordination abilities. Analysis of the results showed statistically significant that the combined model capabilities (precision and recall) improve linearly depending on the player’s individual coordination ability. That is, the higher the coordination ability, the more pronounced and prominent the player’s electrophysiological patterns. That is, the study achieved its goal, which is to test and confirm that behavioral measures such the individual coordination ability affects the significance of players’ electrophysiological patterns when performing cognitive processes of tacit coordination. This study has some limitations which have to be mentioned. First, the group of participants which has selected to this study is homogeneous. The participants have the same education, demographic, and ethnic characteristics. Previous studies (e.g. [16, 41]) demonstrated players performance in tacit coordination games may be effected to those parameters. Therefore, it is worthwhile to extend the study to include and analyze the effect of each of those parameters. Second, this study uses a model which weighs all the electrodes, and no sensitivity analysis is performed at the individual electrode level. Third, this study used an EEG system with 16 electrodes. As such, this experimental setup did not allow us to isolate specific relevant brain regions using advanced imaging techniques (such as LORTA) to solve the inverse problem, since these algorithms require a higher number of electrodes [45, 46]. Based on the results obtained from this study there are many avenues for future research. For example, as was mentioned in the limitation section above, previous studies have shown various features such as culture [16, 47], social value orientation [41, 48] and loss aversion [49, 50] affect human behavior in tacit coordination games. It will be interesting to investigate whether and how the behavioral impact of these parameters is reflected in the electrophysiological pattern’s sensitivity. Also, Various studies (e.g. [7, 21, 51–53]) have shown how, in tacit coordination games, the player results can be optimized by replacing him with an autonomous agent based on mathematical models. Following the results of this study, it will be interesting to produce an agent that will combine in its action the behavioral and electrophysiological data together under a unified model for optimal results. Finally, In addition to the cognitive hierarchy and level k theory there are many other behavioral economic models, such as team reasoning (e.g. [5, 10, 54, 55]) for example, it will be interesting to examine the compatibility between the various electrophysiological changes and the predictions and hypotheses of these theories.

Sensitivity of Electrophysiological Patterns in Level-K States

345

References 1. Schelling, T.C.: The Strategy of Conflict. Cambridge (1960) 2. Nash, J.: Equilibrium points in n-person games. Proc. Natl. Acad. Sci. USA 36, 48–49 (1950) 3. Mailath, G.J.: Do people play Nash equilibrium? Lessons from evolutionary game theory. J. Econ. Lit. 36, 1347–1374 (1998) 4. Mehta, J., Starmer, C., Sugden, R.: Focal points in pure coordination games: an experimental investigation. Theory Decis. 36, 163–185 (1994) 5. Bardsley, N., Mehta, J., Starmer, C., Sugden, R.: Explaining focal points : cognitive hierarchy theory versus team reasoning. Econ. J. 120, 40–79 (2009) 6. Sitzia, S., Zheng, J.: Group behaviour in tacit coordination games with focal points – an experimental investigation. Games Econ. Behav. 117, 461–478 (2019) 7. Zuckerman, I., Kraus, S., Rosenschein, J.S.: Using focal point learning to improve humanmachine tacit coordination. Auton. Agent. Multi. Agent. Syst. 22, 289–316 (2011) 8. Strzalecki, T.: Depth of reasoning and higher order beliefs. J. Econ. Behav. Organ. 108, 108–122 (2014) 9. Costa-Gomes, M.A., Crawford, V.P., Iriberri, N.: Comparing models of strategic thinking in Van Huyck, Battalio, and Beil’s coordination games. J. Eur. Econ. Assoc. 7, 365–376 (2009) 10. Faillo, M., Smerilli, A., Sugden, R.: The Roles of Level-k and Team Reasoning in Solving Coordination Games (2013) 11. Mizrahi, D., Laufer, I., Zuckerman, I.: Level-K classification from eeg signals using transfer learning. Sensors 21, 7908 (2021) 12. Zuckerman, I., Mizrahi, D., Laufer, I.: EEG pattern classification of picking and coordination using anonymous random walks. Algorithms 15, 114 (2022) 13. Kneeland, T.: Coordination under limited depth of reasoning. Games Econ. Behav. 96, 49–64 (2016) 14. Georganas, S., Healy, P.J., Weber, R.A.: On the persistence of strategic sophistication. J. Econ. Theory 159, 369–400 (2015) 15. Colman, A.M., Pulford, B.D., Lawrence, C.L.: Explaining strategic coordination: cognitive hierarchy theory, strong Stackelberg reasoning, and team reasoning. Decision 1, 35–58 (2014) 16. Mizrahi, D., Laufer, I., Zuckerman, I.: Collectivism-individualism: strategic behavior in tacit coordination games. PLoS One 15(2), e0226929 (2020) 17. Mizrahi, D., Laufer, I., Zuckerman, I.: Modeling individual tacit coordination abilities. In: International Conference on Brain Informatics, pp. 29–38. Springer, Cham, Haikou, China (2019) 18. Mizrahi, D., Laufer, I., Zuckerman, I.: Individual strategic profiles in tacit coordination games. J. Exp. Theor. Artif. Intell. 33, 1–16 (2020) 19. Mizrahi, D., Laufer, I., Zuckerman, I.: Modeling and predicting individual tacit coordination ability. Brain Inf. 9, 4 (2022) 20. Mizrahi, D., Laufer, I., Zuckerman, I.: Predicting focal point solution in divergent interest tacit coordination games. J. Exp. Theor. Artif. Intell. 1–21 (2021) 21. Mizrahi, D., Zuckerman, I., Laufer, I.: Using a stochastic agent model to optimize performance in divergent interest tacit coordination games. Sensors 20, 7026 (2020) 22. Rosenfeld, A., Zuckerman, I., Azaria, A., Kraus, S.: Combining psychological models with machine learning to better predict people’s decisions. Synthese 189, 81–93 (2012) 23. Mizrahi, D., Laufer, I., Zuckerman, I.: The effect of individual coordination ability on cognitive-load in tacit coordination games. In: Davis, F.D., Riedl, R., vom Brocke, J., Léger, P.M., Randolph, A.B., Fischer, T. (eds.) NeuroIS 2020. LNISO, vol. 43, pp. 244–252. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60073-0_28

346

D. Mizrahi et al.

24. Mizrahi, D., Laufer, I., Zuckerman, I.: Topographic analysis of cognitive load in tacit coordination games based on electrophysiological measurements. In: Davis, F.D., Riedl, R., vom Brocke, J., Léger, P.-M., Randolph, A.B., Müller-Putz, G. (eds.) NeuroIS 2021. LNISO, vol. 52, pp. 162–171. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-88900-5_18 25. Laufer, I., Mizrahi, D., Zuckerman, I.: An electrophysiological model for assessing cognitive load in tacit coordination games. Sensors 22, 477 (2022) 26. Lin, Y.-P., Jung, T.-P.: Improving EEG-based emotion classification using conditional transfer learning. Front. Hum. Neurosci. 11, 334 (2017) 27. Zarjam, P., Epps, J., Chen, F.: Spectral EEG features for evaluating cognitive load. In: Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 3841–3844. EMBS (2011) 28. Renard, Y., et al.: Openvibe: an open-source software platform to design, test, and use brain– computer interfaces in real and virtual environments. Presence Teleoperators Virtual Environ. 19, 35–53 (2010) 29. Gartner, M., Grimm, S., Bajbouj, M.: Frontal midline theta oscillations during mental arithmetic: effects of stress. Front. Behav. Neurosci. 9, 1–8 (2015) 30. De Vico Fallani, F., et al.: Defecting or not defecting: how to “read” human behavior during cooperative games by EEG measurements. PLoS One 5(12), e14187 (2010) 31. Boudewyn, M., Roberts, B.M., Mizrak, E., Ranganath, C., Carter, C.S.: Prefrontal transcranial direct current stimulation (tDCS) enhances behavioral and EEG markers of proactive control. Cogn. Neurosci. 10, 57–65 (2019) 32. Moliadze, V., et al.: After-effects of 10 Hz tACS over the prefrontal cortex on phonological word decisions. Brain Stimul. 12, 1464–1474 (2019) 33. Mallat, S.: Wavelet zoom. In: A Wavelet Tour of Signal Processing, pp. 163–219. Elsevier (1999). https://doi.org/10.1016/B978-012466606-1/50008-8 34. Rioul, O., Duhamel, P.: Fast algorithms for discrete and continuous wavelet transforms. IEEE Trans. Inf. theory. 38, 569–586 (1992) 35. Hazarika, N., Chen, J.Z., Tsoi, A.C., Sergejew, A.: Classification of EEG signals using the wavelet transform. Signal Process. 59, 61–72 (1997) 36. Simonyan, K., Andrew, Z.: Very Deep Convolutional Networks for Large-scale Image Recognition. arXiv Prepr. 1409:1556 (2014) 37. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255 (2009) 38. Tsung, F., Zhang, K., Cheng, L., Song, Z.: Statistical transfer learning: a review and some extensions to statistical process control. Qual. Eng. 30, 115–128 (2018) 39. Mitchell, M.: An Introduction to Genetic Algorithms. MIT press (1998) 40. Harik, G.R., Lobo, F.G., Goldberg, D.E.: The compact genetic algorithm. IEEE Trans. Evol. Comput. 3, 287–297 (1999) 41. Mizrahi, D., Laufer, I., Zuckerman, I., Zhang, T.: The effect of culture and social orientation on Player’s performances in tacit coordination games. In: Wang, S., Yamamoto, V., Jianzhong, S., Yang, Y., Jones, E., Iasemidis, L., Mitchell, T. (eds.) BI 2018. LNCS (LNAI), vol. 11309, pp. 437–447. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-05587-5_41 42. Raghavan, V., Bollmann, P., Jung, G.S.: A critical investigation of recall and precision as measures of retrieval system performance. ACM Trans. Inf. Syst. 7, 205–229 (1989) 43. Buckland, M., Gey, F.: The relationship between recall and precision. J. Am. Soc. Inf. Sci. 45, 12–19 (1994) 44. Cleverdon, C.W.: On the inverse relationship of recall and precision. J. Doc. (1972) 45. Michel, C.M., Murray, M.M., Lantz, G., Gonzalez, S., Spinelli, L., de Peralta, R.G.: EEG source imaging. Neurophysiology 115, 2195–2222 (2004)

Sensitivity of Electrophysiological Patterns in Level-K States

347

46. Pascual-Marqui, R.D., Christoph, M.M., Lehmann, D.: Low resolution electromagnetic tomography: a new method for localizing electrical activity in the brain. Int. J. Psychophysiol. 18, 49–65 (1994) 47. Cox, T.H., Lobel, S.A., Mcleod, P.L.: Effects of ethnic group cultural differences on cooperative and competitive behavior on a group task. Acad. Manage. J. 34, 827–847 (1991) 48. Mizrahi, D., Laufer, I., Zuckerman, I.: The effect of expected revenue proportion and social value orientation index on players’ behavior in divergent interest tacit coordination games. In: Mahmud, M., Kaiser, M.S., Vassanelli, S., Dai, Q., Zhong, N. (eds.) BI 2021. LNCS (LNAI), vol. 12960, pp. 25–34. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86993-9_3 49. Mizrahi, D., Laufer, I., Zuckerman, I.: The effect of loss-aversion on strategic behaviour of players in divergent interest tacit coordination games. In: Mahmud, M., Vassanelli, S., Kaiser, M.S., Zhong, N. (eds.) BI 2020. LNCS (LNAI), vol. 12241, pp. 41–49. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59277-6_4 50. Liu, W., Song, S., Wu, C.: Impact of loss aversion on the newsvendor game with product substitution. Int. J. Prod. Econ. 141, 352–359 (2013) 51. Kraus, S.: Predicting human decision-making: from prediction to action. In: Proceedings of the 6th International Conference on Human-Agent Interaction, p. 1 (2018) 52. Fenster, M., Kraus, S., Rosenschein, J.S.: Coordination without communication: experimental validation of focal point techniques. In: Proceedings of the First International Conference on Multiagent Systems, pp. 102–108. San Francisco, California, USA (1995) 53. Zuckerman, I., Kraus, S., Rosenschein, J.S.: The adversarial activity model for bounded rational agents. Auton. Agent. Multi. Agent. Syst. 24, 374–409 (2012). https://doi.org/10. 1007/s10458-010-9153-2 54. Bacharach, M.: Interactive team reasoning: a contribution to the theory of cooperation. Res. Econ. 53, 117–147 (1999) 55. Colman, A.M., Gold, N.: Team reasoning: Solving the puzzle of coordination. Psychon. Bull. Rev. 25(5), 1770–1783 (2017). https://doi.org/10.3758/s13423-017-1399-0

A Machine Learning Approach for Discovery of Counter-Air Defense Tactics for a Cruise Missile Swarm Joshua Robbins(B) and Laurie Joiner Department of Electrical and Computer Engineering, University of Alabama in Huntsville, Huntsville, AL 35899, USA [email protected]

Abstract. This paper presents a systematical framework for the discovery of counter-air defense tactics for tactical weapons such as cruise missiles (CM) under control of a cognitive agent, using an unsupervised machine-learning approach. Traditionally, counter-air-defense mission effectiveness (ME) is achieved through a combination of high quantities, low radar cross section (RCS), high speed, low altitude, and/or electronic attack. In the absence of any of these force multipliers, cooperative swarming tactics can be leveraged to achieve ME. This domain presents a highly complex state-action space compared to other more constrained rule-based games where ML agents have been successful in learning gameplay strategies. The approach taken in this research is to develop highly semantic observation and action functions, interfacing the cognitive agent behavior function to the gameplay environment, which is trained through repeated gameplay. The observation and action function designs of a cognitive CM agent are proposed and the framework described is used to train the agent as well as evaluate ME. Numerical simulations demonstrate that the proposed agent is capable of learning highly effective, swarm-enabled tactical behaviors maximizing mission effectiveness and leveraging traditional optimizations such as RCS reduction, where a non-cognitive agent is unable to do so. Keywords: Swarming autonomy · Destruction of enemy air defenses · Machine learning

1 Introduction The network-centric integrated air defense system (IADS) composed of radar sensors and guided surface to air missile (SAM) weapons presents a difficult challenge for offensive operations. The distributed, layered, and interconnected nature of air defense (AD) elements provides for a defense in depth strategy, allowing for multiple engagements to increase the probability of successfully countering an air attack. A conceptual diagram of a typical IADS is shown in Fig. 1. At its most basic, the IADS consists of multiple AD radar sites, provided with air picture information and interconnected via early warning radar (EWR) and command © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 K. Arai (Ed.): IntelliSys 2022, LNNS 542, pp. 348–366, 2023. https://doi.org/10.1007/978-3-031-16072-1_26

A Machine Learning Approach for Discovery of Counter-Air Defense Tactics

349

and control (C2) nodes. Typically, the IADS is deployed to defend other high-value protected assets (PA) such as airfields, operating bases, weapon storage facilities, or other support infrastructure key to operations of the IADS.

Fig. 1. Conceptual diagram of an IADS

The AD systems are typically deployed forward of the PAs in the direction of the expected attack and arranged such that their weapon engagement zones (WEZ) overlap, providing layered and redundant coverage. As the engagement capacity of an AD system is limited by its number of fire channels and the number of ready rounds available, successful air attacks typically involve launching volleys of dozens of weapons, seeking to overwhelm the defenses. The mission effectiveness (ME) of the attack is traditionally enhanced through use of additional factors which reduce the performance of the AD radars and/or the SAMs. These factors include low radar cross section (RCS), high speed, low altitude, and electronic attack. Destruction of enemy air defense (DEAD) ME can be predicted analytically for simple cases, for example the cruise missile (CM), tactical missile (TM), or ballistic missile (BM) attack, where the attack weapons fly a pre-determined route to their targets. In this case, the various solutions to the canonical weapon target allocation (WTA) problem apply [1–3]. Any particular WTA solution seeks to maximize the expected success of interception of all incoming targets while minimizing the total expenditure of SAM resources. However, when the attacking weapons do not follow a pre-determined trajectory to their targets, the allocation of AD resources must be evaluated as the weapon trajectories

350

J. Robbins and L. Joiner

change. A swarm of weapons, which is able to dynamically alter their trajectories, change their targets, and coordinate their behaviors in response to changing conditions in the battle space, forces the IADS to evaluate engagement plans at a much higher tempo, typically faster than state-of-the art WTA optimization algorithms allow for. The resulting coupled dynamical system presents a high complexity state-action space over which behaviors can be optimized. Combining high quantities of weapons with ME enhancing factors yields high mission cost. This research investigates the extent to which cooperative swarming tactics can replace RCS reduction, a costly ME enhancing feature. First, a swarm-vs-IADS DEAD simulation framework (SwarmCommand real time strategy game (RTSG)) that was developed is described. The simulation framework provides a means to develop swarm behaviors and evaluate ME against different IADS weapon allocation and engagement algorithms. The DEAD simulation serves as the objective function to train the ML agent behavior functions, as well as the game engine enabling human-in-the-loop participation in place of either swarm or ADS units for spot-check evaluation [4]. Next, the agent observation and action functions designs are described, along with the details of the implementation of the behavior function artificial neural network (ANN). The ML agents behavior functions are optimized for DEAD mission tactics in an unsupervised learning methodology: playing the SwarmCommand game against the IADS agents and receiving reinforcement for successful game outcomes, as in [5–8]. Next, to demonstrate the process, a baseline scenario is presented, in which an IADS defends various PA. First, the DEAD mission outcomes are simulated with a baseline CM attack. Next the mission is simulated with a CM swarm wherein with each unit is a ML trained agent autonomously making tactical decisions. Finally, the results of combining ML behavior optimization with CM RCS reduction are examined.

2 A Framework for Evaluating Mission Effectiveness The SwarmCommand software implements the physics-based swarm-vs-IADS DEAD environment model which dictate how the various states and properties of the constituent objects and agents can change and interact. The simulation includes models for SAM flyout kinematics (three degrees of freedom (3-DOF) model), SAM endgame kill assessment, ADS radar physics, swarm unit kinematics (3-DOF model), and swarm unit endgame kill assessment. As shown in Fig. 2, multiple agents, each representing either an individual ADS battery (radar and missile complex) or an individual swarm unit interact through the environment model and play out the DEAD mission. Each software agent interacts with the environment through its observation function, its behavior function, and its action function. This feedback mechanism implements the classical perception-action cycle. The ML swarm agent behavior function is implemented as an ANN, implementing a cognitive dynamic system [9]. Optionally, any one or multiple agents can be controlled by a human player by using a graphical user interface (GUI), which displays outputs of the particular observation function, and provides controls implementing the agent’s available actions.

A Machine Learning Approach for Discovery of Counter-Air Defense Tactics

351

Fig. 2. SwarmCommand RTSG block diagram

With human operators in the loop, the gameplay is executed in real-time to allow human-level operator decision making and course-of-action. Running in purely agentvs-agent mode, the gameplay is executed much faster than real time in a “headless” mode to allow high throughput during ML training and Monte-Carlo data generation. The degree to which the swarm is able to destroy the value-weighted elements of the IADS defines the objective measure of ME.

3 SAM Engagement Geometry The ADS WEZ is the volume of space where a weapon is effective at destroying its target, and is defined by five parameters: the minimum range Rmin , the maximum range RInt , the field of view (FOV) angle, the minimum altitude hmin , and the maximum altitude hmax . The range and altitude limitations are functions both of the weapon and its associated AD guidance radar. The FOV angle limitation is strictly a function of the ADS radar technology. Most tactical AD radar systems have their antenna mounted on a rotating platform so that the FOV can be moved to ensure the WEZ covers the relevant airspace [4]. For an engagement of a target at relative position having velocity with a SAM having average flyout speed of Vm , the time of flight (TOF) to the predicted intercept will be given by (2), point (PIP), TOF PIP , is given by (1) [4], and the PIP position, illustrated in Fig. 3. This assumes the target maintains its velocity over the entire flight of the SAM. (1) (2)

352

J. Robbins and L. Joiner

(3) (4)

Fig. 3. Plan view of PIP and ADS WEZ

4 ML Swarm Agent Design The methodology used to design a swarm agent which can effectively improve ME is to specify relevant observation and action functions, connected via the behavior function as shown in Fig. 2. Implementing the behavior function as an ANN then allows MEmaximizing behaviors to be discovered through iterative gameplay as in [5–7]. 4.1 Observation Function The observation function for the example ML swarm agent works by first identifying which ADS in the IADS is most threatening to that particular swarm unit. The observation focuses on the perceived engagement states of its most threatening threat (MTT). Additionally, it provides information about other swarm units by clustering swarm units together. The clustering is performed on the swarm group positions and MTT assignment, using the DBSCAN algorithm [10] with the value of the clustDist metric as the clustering distance metric. The observation function produces the following outputs (perceptions), operating on various states and properties of other agents and objects which are observable in the environment.

A Machine Learning Approach for Discovery of Counter-Air Defense Tactics

353

The MTT for a swarm group is defined by (5) where the subscripts s and t represent the swarm unit and threat indices, respectively. NT is the number of track channels a particular ADS has and NT ACT is the number of which are currently active. Rs,t is the range of swarm s to ADS t, and RTRK,t is the range at which ADS t is capable of tracking the swarm unit [4]. MTT = arg max threat(s, t)  threat(s, t) =

(5)

t∈T

cT (s, t)

RIntt −Rs,t RIntt ,

0,

Rs,t < RIntt otherwise

(6)



2, trackable(s, t) 1, otherwise      trackable(s, t) = NTACTt < NTt ∪ tracking(s, t) ∩ Rs,t < RTRKt ∩ aboveHorizon(s, t) ∩ inFOV (s, t) cT (s, t) =

(7)

(8)

Self Cluster Proportion. This is the proportion of its cluster that the swarm unit constitutes, having a value between 0.0 and 1.0. Cluster Swarm Proportion. This is the proportion of the entire swarm that the units cluster constitutes, having a value between 0.0 and 1.0. Proportional SAMs Fired. This is the running count of SAMs fired by any ADS in the IADS divided by the initial swarm size. Distance to Cluster. This is the distance the swarm group is away from the centroid of its parent cluster, divided by clustDist the clustering distance metric. This value decreases the closer a swarm group gets to its cluster. If the group is the only member of its cluster, this value will be zero. Merge Candidate Available. This is a perception having value of 0.0 or 1.0 and detects when another group has moved into a posture that allows acceptable merging maneuvers, based on its distance and the alignment of its velocity vector with the self-group. The signal value is set to 1.0 when two conditions are met: first, another swarm group is within the mergeMaxDistance metric and second, the dot product of the velocity vector of the two swarm groups is greater than or equal to the value of mergeMinVelAlign metric. These two metrics are extra parameters in the behavior function under optimization. Trackable by MTT. This is a perception having value of 0.0 or 1.0 depending on whether or not the swarm group’s perceived MTT is currently capable of tracking the group, as in (8). Tracked by MTT. This is a perception having value of 0.0 or 1.0 depending on whether or not the swarm group perceives that its MTT is currently tracking the group.

354

J. Robbins and L. Joiner

Engageable by MTT. This is a perception having value of 0.0 or 1.0 depending on whether or not the swarm group’s perceived MTT is currently capable of tracking the group, as in (5).   (9) engageable(s, t) = tracking(s, t) ∩ RPIPs,t < RIntt Engaged by MTT. This is a perception having value of 0.0 or 1.0 depending on whether or not the swarm group perceives that its MTT is currently engaging the group with one or more SAMs. Proportion of Cluster Ingressing to MTT. This is the proportion of the swarm group’s cluster (by group count) that is currently selecting the action to ingress toward this group’s perceived MTT. Proportion of Cluster Threatened by MTT. This is the proportion of the swarm group’s cluster (by group count) that is currently threatened (tracked) by this group’s perceived MTT. Nearest PA Reachable. At the swarm group’s current position and speed, it estimates the number of engagements M its MTT can execute before the group reaches the nearest PA. If the number of swarm units in the group is larger than M, assuming a two-SAM salvo, then this perception is set to a value of 1.0, otherwise it is 0.0. Estimating the maximum number of successive engagements an ADS can attempt is accomplished by evaluating (1) recursively along the target trajectory and accumulating the TOF for each engagement [4]. Nearest EWR Reachable. This is the same perception as above, except computed for the nearest EWR rather than the nearest PA.

4.2 Action Function The ML agent action function can produce actions for each swarm unit relative to its locally clustered group and relative to the overall IADS. It does this mainly by setting the swarm unit navigation waypoints, which are the inputs to the simulation kinematics block. The actions available to be made by the ML agent are listed below. Loiter. This causes the swarm group to cease its translational motion and remain at its current location in a tight orbit. Split. This causes the swarm group to divide itself into two independent swarm groups having an equal number of units. The newly-created swarm group is independent and makes its own decisions having an identical behavior function as its progenitor, but acting upon perceptions unique to its own perspective. Merge with Nearest. This causes the swarm group to begin a merge maneuver with the nearest swarm group. Once completed, the merge maneuver results in both swarm groups aligning their velocity vectors and positions. They then become a single group

A Machine Learning Approach for Discovery of Counter-Air Defense Tactics

355

with the number of units equal to the sum of the merging groups. At this point, they act as a single group with one behavior function. While performing the merge maneuver, all further behavior decisions by both groups are ceased until the merge is completed or aborted. The merge is aborted when one group or both is terminated. Ingress to Cluster Centroid. This causes the swarm group waypoint to update to the swarm centroid position. The centroid position is the weighted (by number of units) average of plan position of all active groups in the swarm. In effect, this causes a particular group to move toward groupings of other swarm constituents. Egress from Cluster Centroid. This causes the swarm group waypoint to update to the opposite direction of the swarm centroid position. In effect, this causes a particular group to break away from groupings of other swarm constituents. Ingress to MTT. This causes the swarm group waypoint to update to the position of its MTT. Egress from MTT. This causes the swarm group waypoint to update to the opposite direction of the position of its MTT. Flank MTT Toward Cluster. This causes the swarm group waypoint to update in such a fashion so that the group moves around its MTT position while maintaining a constant distance. The direction of the flanking, whether clockwise (CW) or counter-clockwise (CCW) is determined by the relative location of the swarm group’s cluster centroid. In this case, the group orbits the MTT in a direction that moves it nearer to its cluster centroid. If the swarm group is the only member of its cluster, then it flanks the MTT in the direction of the nearest swarm group, regardless of its cluster membership. If the swarm group is the only group in the swarm, then the direction of rotation is CW. Flank MTT Away from Cluster. This is a similar action as Flank MTT Toward Cluster, however this causes the group to flank the MTT in a direction that moves it further away from its cluster centroid. If the swarm group is the only member of its cluster, then it flanks the MTT in the opposite direction of the nearest swarm group, regardless of its cluster membership. If the swarm group is the only group in the swarm, then the direction of rotation is CCW. Ingress to Nearest Asset. This causes the swarm group waypoint to update to the position of the PA nearest it. Ingress to Nearest EWR. This causes the swarm group waypoint to update to the position of the EWR nearest it.

4.3 Behavior Function The ML agent behavior function is implemented as an ANN. The behavior function operates on the observation function outputs (perceptions). The outputs of the behavior function are tied one-to-one to the action function entries. Each evaluation, the behavior

356

J. Robbins and L. Joiner

function outputs activation levels are masked and sorted, with the highest activated neuron selecting the action to be taken. For this article, the ANN is implemented in a fully-connected feed-forward architecture. The ANN has 13 inputs (behavior function outputs) and 11 outputs (action function inputs) connected through three hidden layers of widths 33, 20, and 13 neurons each. Configured in this way, the ANN has 77 total neurons, 77 bias inputs, and 1,491 total weights. Taken together, the behavior function has 1,572 parameters. The hyperbolic tangent function was selected as the activation function for all neurons.

5 Training Methodology The ML agent is trained through iterative gameplay of the swarmCommand RTSG against a software controlled and heuristic IADS agent [4]. The values for the parameters of the ANN (weights and biases) which constitute the behavior function, as well as extra parameters defining the observation and action functions (clustDist, and mergeMaxDistance and mergeMinVelAlign, respectively) are optimized using a genetic algorithm (GA) [4]. The basic flowchart for the GA is shown in Fig. 4. SelecƟon

Crossover

MutaƟon no

IniƟal

Evaluate Fitness

Ending Criteria yes

End

Fig. 4. Genetic algorithm flowchart

Initially, a large population of candidate behavior functions (characterized by the collection of ANN weights and biases as well as the extra parameters) is randomly generated. Each generation, the fitness of each candidate in the population is evaluated as the output of the objective function under examination. This is accomplished through a RTSG game session of the particular agent, with the fitness values taken as the achieved ME. The iterative selection, crossover, and mutation process produce highly-fit members of the population, capable of maximizing the objective fitness criteria, ME. Each generation, a random, procedurally-generated IADS laydown scenario is generated to evaluate the ME against for each candidate in the population [4].

A Machine Learning Approach for Discovery of Counter-Air Defense Tactics

357

6 Baseline Simulation Scenario The baseline scenario for all the following analysis and simulation results is depicted in Fig. 5. This scenario was hand-made, and was not in the training set for the ML agent. The three PA (green squares) are defended by two ADS batteries (red diamonds). The ADS batteries are supported by one EWR (red triangle). The objective value assigned to each PA is 50,000 and the 15,000 for the EWR. The simulation parameters for the AD batteries are summarized in Table 1. Table 1. Baseline scenario ADS parameters Parameter

ADS 1

ADS 2

Position (km)



SAM effective range (km)

40

20

Baseline tracking range (km)

60

30

Number track channels

2

2

Number missile channels

2

2

Instantaneous FOV (degrees)

60

60

FOV turn rate (degrees per second)

25

25

Salvo delay (s)

10

10

Number ready rounds

16

8

SAM average flyout speed (m/s)

819.4

726.6

Maximum turn acceleration (g)

30.0

30.0

SAM in-range single-shot kill probability

0.95

0.95

AD battery objective value

11,840

7,090

SAM round objective value (ea.)

265

181

With the swarm attack originating to the north, the ADS were situated so as to provide overlapping coverage of the two eastern-most PA, with the longer-range ADS forward deployed.

358

J. Robbins and L. Joiner

Fig. 5. Plan view of simulated IADS laydown

The total objective value of all IADS elements in the baseline scenario is 189,618. The reported ME in the following sections is the fraction of total objective value achieved by the swarm by destroying the various IADS elements. The swarm speed setpoints were held constant and were not controlled by the MLtrained agent behavior function. The simulation parameters for the swarm units are summarized in Table 2. The update rates for the various agent modules are summarized in Table 3. Table 2. Baseline scenario swarm parameters Parameter

Value

Initial position (km)

Speed setpoint (m/s)

300

Single-unit kill probability

0.85

Maximum turn acceleration (g)

3

Objective value (ea.)

200

Maximum time of flight (s)

1650

A Machine Learning Approach for Discovery of Counter-Air Defense Tactics

359

Table 3. Simulation module update rates Module

Update period (s)

Physics engine

0.1

ADS target assignment

10

ADS track control

10

ADS FOV control

1

ADS fire control

1

ML swarm observation function

5

ML swarm behavior function

5

ML swarm action function

5

7 Analytic Prediction of Mission Effectiveness Given a predicted target trajectory, the maximum number of successive engagements an ADS can attempt can be computed, bounded by the target endpoint, by evaluating (1) recursively along the target trajectory and accumulating the TOF for each engagement [4]. For the baseline scenario, the maximum number of single-SAM engagements per track channel available to each ADS for a CM target attacking each PA on a straight-line trajectory is summarized in Table 4. The predicted engagements available are tabulated for the baseline case as well as for the case when the unit CM RCS is reduced by 3 dB. Table 4. Predicted maximum single-channel engagement capacity for scenario IADS Baseline RCS

Reduced RCS

ADS 1

ADS 2

ADS 1

ADS 2

PA 1

11

2

10

2

PA 2

11

7

11

7

PA 3

11

9

10

8

Since both AD radars have two track channels, the total engagement capacity for each is double that in the table entries. The 3 dB RCS reduction only reduces the ADS radar tracking range for the CM targets from 60 and 30 km, to 50.5 and 25.2 km, respectively. The reduced tracking range for the ADS results in slightly diminished engagement capacity, as seen in Table 4. At 300 m/s, a CM can be engaged at a maximum range of 37 and 18 km, a reduction of 7.5 and 10%, respectively, of the maximum effective ranges two ADS batteries. The total timeline capacity between 26 and 40 CM targets is tempered by the limited FOV for each ADS. As depicted in Fig. 3, the FOV bounds the WEZ angularly, and simultaneous engagement in multiple track channels is only possible for targets within the ADS FOV.

360

J. Robbins and L. Joiner

Situating the ADS relative to the PA such that multiple ADS can provide overlapping protection mitigates the degree to which an attack can be optimized a priori.

8 Simulation Results In each of the following sections, the CM swarm size was varied from 11 units to 22 units, much lower than the total engagement capacity of the IADS, and fewer than the ready rounds available, totaling 24. Each configuration was run as an ensemble (N = 100) to collect ME statistics. 8.1 Non-reactive CM Attack In the non-reactive CM attack, each CM flies a straight-line, constant-speed trajectory toward one of the PA in the IADS. The allocation of CM targets is done at initialization and the swarm does not alter its course afterward. The allocation attempts to maximize the probability that at least one unit leaks, taking into account the overlapping coverage provided by the IADS [4]. The resulting target allocation over the different swarm sizes is summarized in Table 5. As the west-most PA is least defended, it has the fewest CM allocated for targeting. Table 5. Non-reactive CM swarm unit targeting allocation Swarm size (units)

# Units targeting PA 1

PA 2

PA 3

11

3

4

4

12

3

5

4

13

3

5

5

14

3

6

5

15

3

6

6

16

4

6

6

17

4

7

6

18

4

7

7

19

5

7

7

20

5

8

7

21

5

8

8

22

6

8

8

The simulation results plotted in Fig. 6 and Fig. 7 show that the non-reactive CM swarm requires a size of 21 units before it can achieve a mean or median ME of 50% or greater. Additionally, a swarm size of at least 14 units is required to achieve a reliable

A Machine Learning Approach for Discovery of Counter-Air Defense Tactics

361

destruction of at least 1 PA. Varying the number of units between 15 and 20 does not result in much appreciable increase in mean or median ME achieved. Figure 7 shows the outcomes of all ready rounds available to the IADS. For the low swarm sizes, many rounds were left unused as they were in excess of what the IADS required to repel the CM attack. A large number of drops exist as a result of the high probability that the first SAM destroys a CM target before the second SAM in the salvo can attempt to do so, resulting in a “drop”. Figure 7 shows the ratio of the achieved time-of-flight (TOF) to the TOF of the predicted intercept point (PIP) calculated by the ADS fire control logic [4]. Because the CM trajectories are straight and level, the realized TOF data are very tightly clustered around the predicted TOF data for all shots (average value 0.96), allowing the ADS to be very efficient in allocating engagements throughout the CM attack.

Fig. 6. Non-reactive CM attack ME statistics vs swarm size (Box and whisker chart details: black bars are minimum and maximum values, blue box edges are at 1st and 3rd quartiles (25% and 75%), blue plus is mean value, red bar is median value.)

Fig. 7. Non-reactive CM attack: total IADS ready round outcomes vs swarm size (left), engagement TOF extension distribution (right)

362

J. Robbins and L. Joiner

8.2 Non-reactive Reduced RCS CM Attack For this set of simulation runs, the scenario was repeated identically, with the exception that the CM unit RCS was reduced by half (−3 dB). As the plot in Fig. 8 shows, the overall effect on ME achieved is insignificant compared to what is achieved by adding additional swarm units.

Fig. 8. Non-reactive reduced RCS CM attack ME statistics vs swarm size

8.3 Autonomous ML Agent-Controlled CM Attack This set of simulation runs was run with the autonomous ML agent described in Sect. 4 making tactical decisions for each CM unit. Figure 9 shows the achieved ME vs swarm size. The average TOF INT /TOF PIP value is 1.21. Compared to the non-reactive CM swarm, the ML controlled swarm has demonstrated the learned ability to extend engagements sufficiently to reduce the ADS engagement capacity by more than 25%. Additionally, by comparing Fig. 10 and Fig. 7, it is evident that the ML agent swarm also learned the ability to cause more SAM misses. With a single-shot Pk of 0.95 in the WEZ, the increased miss probability is achieved by maneuvering outside the WEZ during max-range engagements. On an average, the ML agent was able to more than double the number of missed engagements compared to the non-reactive CM swarm.

A Machine Learning Approach for Discovery of Counter-Air Defense Tactics

363

Fig. 9. Autonomous ML controlled CM attack ME Statistics vs swarm size

Fig. 10. Autonomous ML controlled CM attack: total IADS ready round outcomes vs swarm size (left), engagement TOF extension distribution (right)

8.4 Autonomous ML Agent-Controlled Reduced RCS CM Attack For this set of simulation runs, the ML agent control swarm scenario was repeated identically, with the exception that the CM unit RCS was reduced by half (−3 dB). Figure 11 shows the achieved ME vs swarm size.

364

J. Robbins and L. Joiner

The TOF INT /TOF PIP distribution is shown in Fig. 12, having an average value of 1.21.

Fig. 11. Autonomous ML controlled reduced RCS CM attack ME statistics vs swarm size

Fig. 12. Autonomous ML controlled reduced RCS CM attack: total IADS ready round outcomes vs swarm size (left), engagement TOF extension distribution (right)

9 Results Summary The median and mean ME values achieved for the four swarm configurations examined are plotted as a function of swarm size in Fig. 13. As configured, the 3 dB reduction in CM unit RCS was insufficient to allow the non-reactive CM swarm to improve its ME at any swarm size. The mean ME was improved by adding additional CM units to the swarm. The ML controlled agent demonstrated the ability to achieve very high ME at smaller swarm sizes compared to the non-reactive CM swarm allocation technique. Additionally, the RCS reduction allowed the ML controlled swarm to achieve higher ME at the lower end of the swarm size distribution.

A Machine Learning Approach for Discovery of Counter-Air Defense Tactics

365

Fig. 13. Comparison of simulated ME statistics

10 Conclusions A framework has been developed to discover, test, and evaluate counter-air defense tactics for aerial weapons such as cruise missiles under control of a cognitive agent. The framework allows the mission effectiveness to be evaluated against a simulated integrated air defense system with varying levels of capacity and decision-making. A cognitive CM swarm agent has been demonstrated, which developed highly effective cooperative swarming tactics through an unsupervised learning process. The cognitive swarm agent behaviors were shown to enhance the weapon capabilities when paired with non-cognitive improvements, in this case, radar cross section reduction, whereas the improvement alone was insufficient to improve mission effectiveness. This work can be used to discover and evaluate swarm-enabled cognitive behaviors for unmanned or manned aerial platforms and weapons, enhancing survivability and reducing the required weapon quantities to achieve mission effectiveness in highly dynamic and defended airspaces.

References 1. denBroeder Jr., G.G., Ellison, R.E., Emerling, L.: On optimum target assignments. Oper. Res. 7(3), 322–326 (1959) 2. Hosein, P.A., Athans, M.: Some analytical results for the dynamic weapon-target allocation problem. Office of Naval Research, Arlington, VA (1990) 3. Hyun, S.U., Young, H.L.: A heuristic algorithm for weapon target assignment and scheduling. Mil. Oper. Res. 24(4), 53–62 (2019)

366

J. Robbins and L. Joiner

4. Robbins, J.: Discovery of counter IADS swarming autonomy behaviors with machine learning, Ph.D. dissertation, Department of Electrical and Computer Engineering, University of Alabama in Huntsville, Huntsville, AL, 35899 (2022) 5. Silver, D., et al.: Master the game of go with deep neural networks and tree search. Nature 529, 484–489, 27 (2016) 6. Weber, B.G., Mateas, M., Jhala, A.: Building human-level AI for real-time strategy games. In: AAAI Fall Symposium, vol. FS-11, no. 01, pp. 329–336 (2011) 7. The AlphaStar Team: “AlphaStar: Mastering the Real-Time Strategy Game Starcraft II (2019). https://deepmind.com/blog/alphastar-mastering-real-time-strategy-game-sta rcraft-ii/. Accessed 08 Dec 2021 8. ˙Ilhan, E., Gow, J., Perez-Liebana, D.: Teaching on a budget in multi-agent deep reinforcement learning. In: 2019 IEEE Conference on Games (CoG), London, UK (2019) 9. Haykin, S.: Cognitive Dynamic Systems. Cambridge University Press, New York (2012) 10. Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: International Conference on Knowledge Discovery and Data Mining, Portland (1996)

Tackling Marine Plastic Littering by Utilizing Internet of Things and Gamifying Citizen Engagement Stavros Ponis, George Plakas, Eleni Aretoulaki(B) , and Dimitra Tzanetou School of Mechanical Engineering, National Technical University Athens, Athens, Greece [email protected], [email protected]

Abstract. Marine littering is a severe global issue, which is getting worse as millions of tonnes of waste end up in the oceans every year leading to a series of environmental, economic, health and aesthetic problems with serious implications. The research presented in this paper sets its efforts in tackling this problem through the determination of streams of plastic waste from land sources until they reach the shore and before entering the sea as well as the facilitation of the pro-active role of citizens. To that end, the proposed paper introduces an integrated methodology of plastic marine littering management through the use of an innovative technology solution integrating drones, sensor and station pad technologies in one cloudbased platform solution providing services to both public authorities and citizens inhabiting coastal areas. The developed Beach Quality Monitoring system will be supported by an advanced suite of software tools, which will be responsible for data collection, processing, visualization and reporting to all project stakeholders. The platform will integrate data into a gamified environment to enable citizens’ behavioral and mindset change with regard to marine littering prevention. Keywords: Marine littering · Plastic beach littering · Internet of Things · Wireless sensor networks · Unmanned vehicles · Cloud based platforms · Gamification · Serious Games

1 Introduction Plastics are cheap, recalcitrant, lightweight, impervious to water, multipurpose materials, widely employed in a plethora of applications, including packaging, building and construction, transportation and electronic equipment [1]. These characteristics have skyrocketed plastics’ market potential, leading to rapid growth in their production, which started in the 1950’s and significantly increased henceforward, amounting to 367 million tonnes in 2020 in a worldwide scale [2]. Yet, the same properties which have determined plastic omnipresence in the market are linked to detrimental environmental consequences, including marine pollution. Marine pollution is a major global issue, which is getting worse as millions of tonnes of waste culminate in the oceans every year causing multiple environmental and socioeconomic implications [3]. Based on a research conducted by the European Union [4], it was estimated that 84% of the waste © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 K. Arai (Ed.): IntelliSys 2022, LNNS 542, pp. 367–375, 2023. https://doi.org/10.1007/978-3-031-16072-1_27

368

S. Ponis et al.

accumulated in European coastal areas is made of plastic, 50% of which is comprised mostly of single-use plastics. To add insult to injury, the pandemic-related relaxation of plastic bans might significantly worsen the situation in the future [5]. Therefore, the need to transition from the current resource-inefficient linear economy model to a sustainable circular one, preventing litter from entering the marine environment and maximizing resource efficiency, is an urgent solution to alleviating the aforementioned impacts. Land-based sources, including waste from landfills near coastal zones or on the banks of rivers and ports as well as tourist and recreational activities taking place at the coasts, are considered the primary cause of marine plastic pollution, accounting for up to 80% of total marine debris [6]. There is currently very limited information pertinent to the inland flow of plastic debris [7]. However, according to the work presented in [8], the majority of debris on beaches are brought and left by beachgoers, a behavior most likely associated with poor education and lack of environmental awareness. Therefore, even though sea clean-ups are urgent to mitigate the situation, marine pollution prevention remains the prime concern, which is intrinsically linked to public awareness and responsible recycling behavior. This is why environmental education activities are deemed by policy makers as the most appropriate approach to guiding citizens towards a more sustainable and environmental friendly behavior [9, 10]. It should not be overlooked that such efforts made thus far, are considered ineffective and inadequate given their high expenses. Hence, it is imperative that a cost effective awareness program be designed and implemented with a view to educating citizens and inspiring them to actively participate in the long term combat against marine pollution.

2 Proposal Objectives and Challenges That is exactly where the research presented in this paper sets its visions, ambitions and aspirations, by presenting an ideal turnkey solution for beaches that are willing to promote the preservation and tackling of the littering of marine and coastal ecosystems. Our project aspires to accomplish that, by demonstrating a set of innovative solutions and applications able to detect and monitor debris in real time and acting as an indicator for evaluating marine littering performance, with the aim of ultimately preventing waste from entering the marine environment. The main innovation of what is proposed lies in the conception, design and development of an innovative virtual competition, based on the daily results, metrics and benchmarking of a disruptive Beach Quality Monitoring System. The proposed system is supported by an integrated solution, which incorporates a customized technology of drones, sensors and autonomous station pads for the launching and landing of drones. Citizen awareness and participation plays a crucial role in the context of the proposed research and more specifically, in increasing marine littering performance. This is why the proposed solution heavily relies on behavioral change science by incorporating an effective “social pressure” element. In fact, it focuses on finding new, innovative and more efficient solutions to prevent marine littering, while harnessing the power of local authorities and communities by designing and promoting enhanced citizen engagement. To that end, the project introduces a disruptive, environmentally conscious, first-of-itskind product and service, that analyses behavioral patterns to improve citizens’ environmental performance, minimize plastic marine littering and create a strong sense of

Tackling Marine Plastic Littering by Utilizing Internet of Things

369

community contribution at the coastlines around the world. It also introduces a Serious Game element, so as to motivate as many citizens as possible, on the one hand, to participate in educational programs pertinent to circular economy issues and recycling behavior and on the other, to give and receive information about the situation of the beach in real time (i.e. uploading photographs demonstrating accumulated plastics that should be collected before entering the marine environment). Based on this information, the participating beaches will compete each other daily on the basis of cleanliness. The winning beach, i.e. with the lowest carbon footprint among the ones participating in the environmental Serious Game, will be presented with an “Eco-Label”. The proposed research is fully in line with the actions of the European Union, under the research framework of the EU Framework Programme in areas of Blue and Circular Economy as well as the Digital Agenda, for the design and successful implementation of innovative technological solutions that support the recording of waste quantities and the identification of sources of marine pollution from plastic waste. The implementation of the proposed solution will also actively contribute to reaching the United Nations 2030 Sustainable Development Goals and the Paris Agreement on Climate Change. The ultimate objective of the proposed research is the alleviation of marine pollution through the prevention or/and diversion of the waste entering the marine environment, as a result of the active role of citizens, reinforcement of their awareness and empowerment of local authorities. To achieve this, the following sub-objectives are set: 1. Design and development of innovative and more efficient systems for the effective recording of waste quantities and the identification of various sources of marine pollution. 2. Raising community awareness on the importance of pollution-free seas through an innovative gamification-based Serious Game, aiming at protecting the valuable natural resources and ultimately promoting the sustainable development of coastal cities. 3. Development of a world-renowned “Eco-Label” trusted by hundreds of beaches around the globe. 4. Supporting public authorities as part of the Integrated Maritime Surveillance component of the Blue Growth strategy. 5. Development of a detailed business plan for the sustainable and commercially successful introduction and scale up of an integrated solution that can be exploited in both domestic and international markets.

3 The Proposed Solution The research presented in this paper demonstrates a complete management methodology for tackling marine littering with the support of ground and underwater sensors, a set of Unmanned Vehicles (aerial and underwater) with imaging and environmental sensors backed by autonomous station pads and a specialized web platform, enabling the reporting of results to system users. In doing so, the project produces an integrated infrastructure of products and services. It consists of six (6) distinct modules which are categorized as follows:

370

S. Ponis et al.

Module 1 - Monitoring: This module is concerned with the monitoring carried out by all field sensors (ground and underwater) which are used in this project to detect marine debris and extract environmental measurements in real time. Ground sensors are employed for the identification of land sources and the quantification of marine debris, while underwater sensors are used to monitor the underwater pollution levels. Module 2 - Detection and Data Collection: This module refers to the autonomous Unmanned Vehicles which are launched in the air and sea and equipped with high resolution cameras and image recognition analysis software. The use of such vehicles has a twofold purpose. First, through the imaging sensors, marine debris is identified and quantified along a designated part of the coast several times per day. Second, the ground and underwater sensors transmit the data collected to the passing by UAVs and UUVs respectively, during their schedule routes. Module 3 - Maintenance and Data Upload: The project provides the autonomous Unmanned Vehicles with land-based station pads for the UAVs and floating ones for the UUVs. Apart from battery recharging, the station pads are used to protect and automatically check and maintain the vehicles to the furthest extent possible. They are also used for the uploading of the data collected by both types of vehicles and the ground and underwater sensors, which are transmitted wirelessly to a cloud central server. Module 4 - Data Analysis: In this module, the plethora of data collected by all the project’s sensors (e.g. thermal and optical cameras, environmental sensors etc.) and transmitted via the station pad are aggregated, filtered and analyzed. The project provides its stakeholders with an advanced data analysis platform and complete monitoring system for the inspected coastal area. Module 5 - Reporting: A fully customizable content management system is developed through which, the platform provides the project’s results to all stakeholders and other interested entities with authorized access. Module 6 - Raising Awareness and Game Planning: This final module of the project’s system is built around a citizen awareness program, augmented with gamification elements. The proposed Serious Game consists of a user-friendly interface which can be interacted with via the solar-powered screens installed on the beaches and the project’s mobile application. The purpose of this module is on the one hand, to inform residents and tourists visiting a selected area about its marine littering status, through a real-time debris value calculator algorithm and on the other, to promote circular economy principles and encourage them to properly manage their waste and participate in cleaning activities. A schematic of the aforementioned modules is presented in Fig. 1:

Tackling Marine Plastic Littering by Utilizing Internet of Things

371

Fig. 1. The six (6) project modules

The platform will integrate data into a gamification-enhanced environment to enable the accomplishment of the project’s goals, with regard to citizens’ behavioral change. Moreover, as far as the gamification design and mechanisms are concerned, the thorough description of competition scenarios among the municipalities and beach administrators participating in the project is of paramount importance. Leaderboards, ranking and performance details will be displayed through self-sufficient solar powered screens. These screens will be installed on all beaches participating in the project’s pilot test. Nevertheless, merely educating citizens about marine pollution challenges, while important, is not a panacea. Therefore, a gamification-based, engaging, citizen-oriented approach enhanced with an element of social pressure emerges as the appropriate way to promote sustainable citizen behavior and bolster sincere and continuing commitment to protecting the marine environment. Citizen engagement is considered by the authors as the most critical component of the success of any marine littering related solution, as greater engagement means decreased rates of littering. To boost social behavior and promote a sense of community among the citizens in an impactful fashion, the project will launch an innovative virtual competition. In fact, fun is an ingredient currently missing in the behavioral change equation, therefore, the project’s researchers have planned a well-designed citizens-centric solution, which encourages participation and informs citizens on the advantages of keeping their beaches clean. The proposed virtual competition is not an official accreditation but rather a ‘game’ played by participating beaches. Nonetheless, this program will result in the birth of a new “Eco-Label” for beaches, which will be based in dynamic real time monitoring and citizens engagement. The proposed “Eco-Label” program will act as a trademark and declare winners on a daily basis among the best performing beaches. It is expected that beach administrators (e.g. local municipalities, private hotels, national parks, or private businesses) will seek the new “Eco-Label” as an indication of their high environmental standards. In fact, this label is both a symbol that the beach participates in a community of beaches and an indication of excellence in terms of marine littering prevention. To win the accreditation,

372

S. Ponis et al.

a beach has to exceed the performances of other competing beaches, meaning recreational users and residents have to contribute the most to marine littering prevention. The project will demonstrate its solution, by deploying simultaneously multiple pilots in test beds of beaches in coastal areas. Overall, the project proposes an integrated technology solution which combines a series of State-of-the-Art technologies, each of which takes part in the configuration of an innovative technological ecosystem, dedicated to tackling plastic litter on recreational beaches. The outcome of the proposed solution will be a fully customized and pilot tested Proof of Concept [POC] of a First of a Kind [FOAK] integrated infrastructure. More specifically, the project technological outcome includes: • A network of State-of-the-Art ground and underwater sensors, based on the basic principles of Internet of Things technology. • A pair of Unmanned Vehicles (UAV & UUV) with mounted sensors, audio-visual equipment and the capability to communicate in a wireless way, in order to receive the field sensors’ data. • A set of station pads (ground & surface) able to recharge the Unmanned Vehicles and transmit their collected data to the cloud server. • A suite of software tools and algorithms of Artificial Intelligence for the processing and analysis of data received from the Unmanned Vehicles, land and undersea sensors. • A cloud-based platform which will be in charge of tracking the data received and able to analyze imaging data and produce meaningful and customized results. • A novel debris value calculator software able to determine, in real time, key metrics corresponding to the environmental impact of the most common marine debris. • A fully integrated web tool reporting the processed data results to the beaches’ management authorities and project’s stakeholders. • An eco-system of stakeholders, i.e. a virtual community and a Serious Game that informs and motivates all sides to embark on the project’s radically disruptive approach. • A solar screen installed at the beach for real time updates about the littering and current beach quality. • A mobile application enabling the two-way communication between the project stakeholders and citizens in a practical fashion. • An innovative “Eco-Label” virtual competition to rate the beaches by taking into account the actions of the citizens.

4 Research Methodology The outcome of the project is dependent on the successful implementation of methodological approach. In order to achieve the expected results, the research methodology focuses on four (4) basic steps. The critical points of each methodological step are the following: Step 1 - Current State-of-the-Art of Various Fields of Research: This step focuses on the investigation of technologies that could contribute to the implementation of the

Tackling Marine Plastic Littering by Utilizing Internet of Things

373

proposed system. The assessment of their compatibility, effectiveness and sustainability will lead to a combination of technologies that will be eventually applied. Great attention will also be paid to the new possibilities that the proposed gamification-based approach can offer, with regard to citizen awareness of circular economy principles and recycling behavior as well as the opportunities of utilizing Serious Games in order to influence traditional models of behavioral change. Step 2 - Development of Technological Systems: In this step, system specifications, users, stakeholders and application scenarios are determined. The development of game elements is also a crucial part of this step, given that user acceptance will play a key role in the success of the project as a whole. Furthermore, technology wise, the installment of ground and underwater sensors as well as the integration of imaging and environmental sensors into the aerial and underwater Unmanned Vehicles are vital parts of this step. Last but not least, effective data processing and reporting mechanisms will be developed, as well as middleware systems that will ensure the interconnection among the subsystems of the project. Step 3 - Control and Testing of Systems and Their Interconnections: This step includes all necessary functional controls in all the subsystems (i.e. UAVs, UUVs, Sensors, Station Pads, Platform, Solar Screens, Mobile Application) are conducted and the feedback obtained is used for modifications. Additionally, the project’s pilot test is planned in detail, bearing in mind to ensure participants’ safety and the extraction of useful results for further utilization. Step 4 - Pilot Test and Evaluation of Results: In this step, after the necessary permits are obtained, the project’s pilot test will take place. The pilot’s success relies on the conformity to the scenario developed in the previous step. Performance indicators will also be created, in order to evaluate the pilot results and define future actions.

5 Added Value and Impact The project will bring together key technologies (i.e. Unmanned Vehicles, Sensors, Station pads) and skillsets (i.e. Advanced Data Algorithms, Serious Game Engagement), with a view to providing an end-to-end solution. Furthermore, a new brand creation, the proposed “Eco-Label” mentioned above, will upgrade the brand image of the participating beaches and open a niche market. The project’s turnkey solution challenges local authorities, beach administrators and citizens to achieve high standards in the prevention of marine littering based on a specific performance against a daily dynamic benchmarking. The business model on which the project is based, is a First of a Kind [FOAK] solution, delivering a unique value proposition, aiming to become the first mover in a totally new and uncontested market. It can be repeatable and scalable and therefore, will raise awareness, as even more citizens will comprehend the adverse, anthropogenic impact on marine pollution and be encouraged to participate in the proposed marine littering competition and its virtual community. Additionally, the project contributes to wider sustainability targets. The innovative “Eco-Label” program as well as the customized solution of Unmanned Vehicles, Sensors, and Autonomous Station Pads, is

374

S. Ponis et al.

bound to determine the project’s successful transferability and its further commercial exploitation. The impact provided with its adaptability and scalability can significantly support global marine littering related policies thus, contributing to the alignment with the goals of European Union and bringing benefits to all stakeholders. Overall, the successful implementation of this project will lead to the increase of tourism, the adoption of Circular Economy’s principles, as well as the reintroduction of secondary plastic material hence, decreasing plastic production on a great scale.

6 Conclusion The research project presented in this paper introduces an innovative system designed to tackle marine littering with the support of State-of-the-Art technologies. The proposed type of integrated solution seems to be the way towards connecting Unmanned Vehicles to a broader Internet of Things formation, while the ability to recharge and be maintained on the station pad could further decrease human involvement, thus rendering several processes fully autonomous. The plethora of data collected and analyzed is reported to all involved stakeholders in the form of visualizations of the current situation of the beach. A series of gamification elements forming multiple Serious Games scenarios will be utilized, in order to enable citizens to play their key role in addressing the issue, along with local authorities and beach administrators. This project, due to its generic solution, can be perfectly applied at any coastline or beach around the world and lead to the gradual mitigation of marine littering. Acknowledgments. The present work is co-funded by the European Union and Greek national funds through the Operational Program “Competitiveness, Entrepreneurship and Innovation” (EPAnEK), under the call “RESEARCH-CREATE-INNOVATE” (project code: T2EDK-03478 & acronym: GOLDEN SEAL).

References 1. Ryberg, M.W., Laurent, A., Hauschild, M.: Mapping of global plastics value chain and plastics losses to the environment: with a particular focus on marine environment (2018) 2. Plastics Europe: Accelerating sustainable solutions valued by society. https://www.plasti cseurope.org/en/newsroom/news/eu-plastics-production-and-demand-first-estimates-2020. Accessed 31 Jan 2022 3. Aretoulaki, E., Ponis, S., Plakas, G., Agalianos, K.: A systematic meta-review analysis of review papers in the marine plastic pollution literature. Mar. Pollut. Bull. 161, 111690 (2020) 4. Addamo, A.M., Laroche, P., Hanke, G.: Top marine beach litter items in Europe. A review and synthesis based on beach litter data. MSFD Technical group on marine litter. Report No. EUR29249, 148335 (2017) 5. Vanapalli, K.R., et al.: Challenges and strategies for effective plastic waste management during and post COVID-19 pandemic. Sci. Total Environ. 750, 141514 (2021) 6. Andrady, A.L.: Microplastics in the marine environment. Mar. Pollut. Bull. 62(8), 1596–1605 (2011)

Tackling Marine Plastic Littering by Utilizing Internet of Things

375

7. Borrelle, S.B., et al.: Opinion: why we need an international agreement on marine plastic pollution. In: CONFERENCE 2017, vol. 114, pp. 9994–9997, Proceedings of the National Academy of Sciences (2017) 8. De-la-Torre, G.E., et al.: Abundance and distribution of microplastics on sandy beaches of Lima. Peru. Mar. Pollut. Bull. 151, 110877 (2020) 9. Kimaryo, L.: Integrating environmental education in primary school education in Tanzania: teachers perceptions and teaching practices (2011) 10. Marpa, E.P.: Navigating environmental education practices to promote environmental awareness and education. Online Submission 2(1), 45–57 (2020)

HiSAT: Hierarchical Framework for Sentiment Analysis on Twitter Data Amrutha Kommu, Snehal Patel, Sebastian Derosa, Jiayin Wang, and Aparna S. Varde(B) Montclair State University, Montclair, NJ 07043, USA {kommua1,patels41,derosas1,wangji,vardea}@montclair.edu

Abstract. Social media websites such as Twitter have become so indispensable today that people use them almost on a daily basis for sharing their emotions, opinions, suggestions and thoughts. Motivated by such behavioral tendencies, the purpose of this study is to define an approach to automatically classify the tweets on Twitter data into two main classes, namely, hate speech and non-hate speech. This provides a valuable source of information in analyzing and understanding target audiences and spotting marketing trends. We thus propose HiSAT, a Hierarchical framework for Sentiment Analysis on Twitter data. Sentiments/opinions in tweets are highly unstructured-and do not have a proper defined sequence. They constitute a heterogeneous data from many sources having different formats, and express either positive or negative, or neutral sentiment. Hence, in HiSAT we conduct Natural Language Processing encompassing tokenization, stemming and lemmatization techniques that convert text to tokens; as well as Bag-of-Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF) techniques that convert text sentences into numeric vectors. These are then fed as inputs to Machine learning algorithms within the HiSAT framework; more specifically, Random Forest, Logistic Regression and Na¨ıve Bayes are used as text-binary classifiers to detect hate speech and non-hate speech from the tweets. Results of experiments performed with the HiSAT framework show that Random Forest outperforms the others with a better prediction in estimating the correct labels (with accuracy above the 95% range). We present the HiSAT approach, its implementation and experiments, along with related work and ongoing research. Keywords: Bayesian models · Knowledge discovery · Logistic Regression · NLP · Opinion mining · Random Forest · Social media Text mining

1

·

Introduction

It is a well-known fact that the Internet plays a major role in modern life spanning the gamut of communication and information sharing. People express their c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023  K. Arai (Ed.): IntelliSys 2022, LNNS 542, pp. 376–392, 2023. https://doi.org/10.1007/978-3-031-16072-1_28

HiSAT: Hierarchical Framework for Sentiment Analysis on Twitter Data

377

views and opinions through various channels such as blogs, online forums, product/service websites, social media etc. Various social media networks, e.g. Facebook, Twitter, Orkut and Instagram have acquired a reputation for connecting people, finding relevant communities and growing businesses. Social media users enter real-time posts on a variety of topics, ranging from current issues, brands and celebrities to products and services, due to which social media networks generate data of the order of millions of characters per second. Adequate analysis on such data benefits organizations and consumers in numerous ways. While on the one end social media provides tremendous benefits of exchanging news and views, on the other end it can lead to a number of disadvantages, adversely affecting people’s lives. In extreme situations, there is the propensity for major problems including mental health issues, personal insecurities, and addiction. Hence, it is necessary to explore various users’ perception of aspects/opinions, analyze them as hate speech versus non-hate speech, and ascertain whether they express positive, negative or neutral sentiments, e.g. like/dislike, yes/no, love/hate, interested/not interested etc. In order to conduct research on these lines, exploring behavioral and emotional content, we broach the paradigm of Sentiment Analysis.

Fig. 1. Protest against hate speech: importance of addressing the issue

As the very name implies, Sentiment Analysis [1] comprises methods to classify outcomes from text based on the sentiment or opinion expressed therein. Since most languages are highly complex in nature (objectivity, subjectivity, negation, vocabulary, grammar, and others), performing sentiment analysis is highly interesting and yet quite challenging. Sentiment analysis is known by various names, such as opinion mining, appraisal extraction, subjectivity analysis,

378

A. Kommu et al.

and others. It focuses on polarity classification (positive, negative, neutral), and also captures feelings and emotions (angry, happy, sad, etc.), urgency (urgent, not urgent) and intentions (interested versus not interested). Types of Sentiment Analysis typically include the following. – Subjectivity: It expresses an opinion that describes people’s feelings towards a specific topic. – Objectivity: It expresses the fact. – Feature/aspect-based: Its main goal is to recognize the aspect of a given target and the sentiment shown towards each aspect. Subjectivity detection ensures that factual information is filtered out and only opinionated information is passed on to the polarity classifier. The subjective sentence expresses personal feelings, views, or beliefs. Since the evolution of social media, tons of structured and unstructured data is being generated which if mined carefully can help various organizations and businesses to find hidden patterns of high value. Sentiment analysis determines the outlook/viewpoint of the given speaker or writer with respect to the concerned topic or sentence. Analyzing such comments and opinions can potentially help businesses improve their customer experience, understand the current trends, obtain information on unfamiliar situations, update their present resources, and in some cases potentially forestall future risks, e.g. bankruptcies and man-made disasters, because some of these could be strongly related to sentiments. The aforementioned background and motivation leads us to the problem definition in this paper. The problem addressed herewith is to classify the tweets into two major categories, namely, hate speech versus non-hate speech. This is a significant issue today, concerning the masses worldwide, as exemplified in the adjoining Fig. 1 from a recent periodical based in London, UK [2]. Our primary source of the data in this work is the Kaggle dataset [3] comprising a file of 31,962 tweets in the training set with labeled attributes such that each line contains the label, tweet identifier and tweet content. We aim to conduct subjective/objective identification of the tweets. In this context, label 1 indicates that the tweet is offensive (due to being racist or sexist), while label 0 indicates otherwise, connoting the tweet as non-offensive or acceptable. One of the significant challenges in analyzing the tweets is the complexity of the contents within the large data sets on Twitter. First, there is the issue of noisy data including grammatical errors and special symbols, that can adversely impact the performance of detecting negative versus positive speech. Second, there is an abundance of features created from the large data set, thereby requiring clear distinction and extraction. Moreover, the feature extraction from tweets (via NLP techniques) can rely significantly on cleaned data sets. Hence, this analysis poses non-trivial problems. We therefore propose the following solution as our contribution. The main contribution of this paper is to propose and develop “HiSAT”, a Hierarchical framework for Sentiment Analysis of Twitter data sets. This hierarchical framework operates in three phases. In the first phase, We carefully apply the techniques of data cleaning for the raw data set to filter out the noise, correct

HiSAT: Hierarchical Framework for Sentiment Analysis on Twitter Data

379

the errors, and so on. In the next phase, given the cleaned data, multiple NLP techniques are applied to extract the features from the entire data set. Balancing the samples of both non-hate and hate speeches, also occurs in this phase. In the last phase, multiple classifiers of supervised learning are deployed in order to assess their performance in categorizing the tweets. The organization of this paper is as follows. We present the related work in Sect. 2. Section 3 describes the details of our proposed framework HiSAT. The performance with multiple machine learning classifiers is evaluated in Sect. 4. We finally give our conclusion and ongoing work in Sect. 5.

2

Related Work

There are works focusing on supervised learning for the classification of Twitter text or speech. For example, Anjaria et al. [4] propose a supervised learning approach to classify Twitter messages using Principal Component Analysis (PCA) for feature reduction and Support Vector Machines (SVM [5]) for classification. Their experimental results provide an accuracy of 88% in the case of analyzing US Presidential Elections in 2012. Cao et al. [6] apply SVM to recognize the emotion via ranking on a given piece of speech. In another interesting piece of research, an ensemble of feature sets is created in [7,8] for sentiment classification purposes. Na¨ıve Bayes, maximum entropy and SVM are employed for classification. In yet another study [9], structured data from relational databases and social media data from Twitter, are analyzed for fine particle pollutants and their impact on air quality, incorporating health standards. Both types of data are mined using approaches based on supervised learning and accordingly, a pilot tool is built for air quality estimation. A survey of research work in speech emotion recognition is offered in another interesting article [10]. Some papers address various facets of sentiment analysis. Hybrid learning over partially labeled training data is proposed in [11] to conduct sentiment analysis for recommender applications, e.g. product reviews, political campaigns, and stock predictions. Semantic extraction in NLP and Na¨ıve Bayes are applied in [12] to identify both negative and positive sentiment over social media posts, and the results show an average increase of F harmonic accuracy. Techniques in the realm of Deep Learning are utilized in [13] to detect hate speech where evaluation results show precision in the range of about 72% to 93%. In recent work [14], the subtle issue of commonsense knowledge is harnessed in an interesting manner to capture human judgment for mining opinions about ordinances (local laws) to gauge public reactions on urban policy as per the satisfaction of the masses. SVM and logistic regression are applied in [15] to detect the offensive language on Twitter, achieving 91% accuracy. This work harnesses Bayesian probabilistic models over sentence level data (as opposed to document level) to address the issue of limited labels, and yields results with high accuracy. There is much research on other techniques in social media data analysis. For example, [16] discusses the challenges of streaming analysis of Twitter data sets. A recent survey article [17] addresses several works ranging from topic modeling

380

A. Kommu et al.

with LDA (Latent Dirichlet Allocation) to polarity classification, mainly for environmental issues such as climate change and disaster management, where public opinion is crucial and is often incorporated in decision-making, e.g. by legislative bodies. Sentiment analysis considering discourse relation and negation is addressed in [18]. The prolific spread of social media often mandates fake news detection [19] and verification of authenticity in postings. Some researchers focus on these issues, either in a generic context with respect to data veracity [20], or in a specific context, e.g. COVID-related themes [21]. The latter is particularly relevant today, especially because many people rely excessively on the opinions posted via social media albeit the domain being as crucial as healthcare. Considering a plethora of related work in this area, our article makes a modest contribution in terms of addressing the significant issue of detecting hate speech that has several repercussions today, and makes a broad impact on the general area of AI and Data Science for Social Good. Our work in this paper is orthogonal to the existing literature. It presents a good method of harvesting techniques in Natural Language Processing as well as Machine Learning to address a good cause with social implications. As stated earlier, our study in this paper helps businesses detect important information pertaining to hate speech; this is helpful for making arrangements to potentially prevent the reuse of such offensive speech, forestall problems related to the business personnel as well as the general public, and possibly adjust future marketing trends based on public sentiment. We now present our solution approach.

3

Proposed Solution: HiSAT

Considering the multiple issues of filtering out the noise in Twitter data, precisely detecting significant terms in the tweets, and aiming to optimize the performance of classification, we hereby propose an approach HiSAT, a Hierarchical framework to achieve Sentiment Analysis on Twitter data. As illustrated in its architecture in Fig. 2, HiSAT has a three-step hierarchy. The first step applies a series of techniques to clean and preprocess the data set, excluding errors and non-important information. Techniques in NLP such as tokenization and stemming are also implemented for preprocessing. The second step entails identification of important items and extraction of features over the cleaned data, which is achieved by NLP techniques, namely, Bag-of-Words (BoW) and Term Frequency - Inverse Document Frequency (TF-IDF). In the third step, the bulk of the Machine Learning occurs where multiple supervised classifiers are trained on the extracted data to distinguish the hate speeches from the non-hate ones. The details of the design and implementation of HiSAT are explained in the following section. 3.1

Data Preprocessing

The data preprocessing step involves cleaning the data; this entails removal of grammatical errors, special characters, punctuation marks, unknown symbols,

HiSAT: Hierarchical Framework for Sentiment Analysis on Twitter Data

381

Fig. 2. Architecture of HiSAT

hashtags and Greek characters. Tweets consist of the following: (1) feelings, including facial expressions via emoticons showing their mood/other emotions indicated by using punctuation marks and letters; (2) target, because social media users use ‘@’ symbol to refer to other users and using ‘@’ symbol automatically alerts them; (3) hashtags, since this helps to increase the visibility of their post by tagging relevant terms. Therefore, we employ stemming and lemmatization, removal of stop words techniques, and other such approaches to clean the data. Figure 3 portrays the flow of the data preprocessing.

Fig. 3. Data preprocessing flow in HiSAT

Tokenization: Tokenization is a process of splitting text strings into tokens that are represented by words in sentences. Paragraphs, sentences or words could be used as tokens. Likewise, tweets are translated into individual words in this situation. As a result, they are easier to process later. A sample tokenization output from the analysis is shown in Fig. 4.

382

A. Kommu et al.

Fig. 4. Example of tokenization in sentences

Stemming and Lemmatization: Stemming is a technique used to extract the base form of the words by removing affixes from them. For example, “lovely”, “lovable”, “loving” is treated as love. There are various stemming algorithms such as PorterStemmer, Lancaster Stem, RegexpStemme and SnowballStemm. We have used SnowballStemm algorithm for stemming from the NLTK (Natural Language Toolkit) library. This algorithm also has the capability to work with 15 non-English languages. An example of the results after stemming is shown in Fig. 5.

Fig. 5. Example of stemming words in tweets

Lemmatization is a more calculated process than stemming. It involves resolving words based on their dictionary common base form, e.g. the words “am”, “are”, and “is” correspond to the common base “be”, hence the word “be” is the lemma of the word “are”. This is more complicated than converting words such as “discovery”, “discovering”, discovered” to the stem “discover”. Since a lemma of a word pertains to its original dictionary form, if lemmatization is used, it needs to know the respective part of speech. In order to get the best results, we have to actually feed the Part-of-speech (PoS) tags to the lemmatizer, else it might not reduce all the words to the lemmas. This feeding of PoS tags is designed and implemented in our approach. Removal of Stop Words, Special/Greek Characters, Punctuation: Stop words are usually articles or prepositions that do not help us find the context or the true meaning of a sentence. Examples of stop words include “is”, “am”, “and”, “I”, “are” etc. An example of stop word removal is illustrated in Fig. 6. Analogous to usernames, special characters and symbols, Greek characters, punctuation marks, hashtags, and meaningless alphabets occurring due to typo errors are removed from the data.

HiSAT: Hierarchical Framework for Sentiment Analysis on Twitter Data

383

Fig. 6. Example of results after removing stop words

3.2

Word Cloud Visualization

Word cloud is a visual representation of words wherein the most frequent words appear in large size and the less frequent words appear in smaller sizes. We therefore harness this in order to help us observe how well the given sentiments are distributed across the dataset. We carry out an analysis on positive word cloud and negative word cloud. Negative word cloud depicts the racist/sexist tweets while positive word cloud depicts the others, i.e. those that are acceptable and do not connote hate speech. Figure 7, Fig. 8 and Fig. 9 respectively illustrate the word clouds containing the overall tweets analyzed, the positive tweets and the negative tweets connoting hate speech.

Fig. 7. Word cloud of all tweets

3.3

Feature Extraction

Once the data is cleaned, the next step is feature extraction from cleaned tweets. We consider two main approaches for feature generation as follows. – Bag-of-Words [22] (BOW) – Term Frequency Inverse Document Frequency [23] (TF-IDF) Bag-of-Words is a method used to represent text into columns. We use Bagof-Words to convert text to numerical data as fixed length vectors. This process is called vectorization. It is a feature extraction technique in which counts of the most repeated words are collected. It does not address the order of words

384

A. Kommu et al.

Fig. 8. Word cloud of positive tweets

Fig. 9. Word cloud of negative tweets

occurring in a document but considers the frequency/count of words occurring in the document. Consider that we have a corpus (collection of texts) named C for different documents {D1 , D2 , ..., DM }. From this, we extract N unique words called tokens and arrange them into a matrix R with dimension N × M . Suppose we have two documents as follows. D1 : She is an internet influencer. He is an internet sensation D2 : The Internet is all fake news

The list created consists of all the unique tokens in the corpus C. C = [ She , He , internet , inf luencer , sensation , f ake , news ], where M = 2, N = 7. The matrix M of size 2×7 can be represented as Table 1.

HiSAT: Hierarchical Framework for Sentiment Analysis on Twitter Data

385

Table 1. Example of bag of words with 2 documents. She He Internet Influencer Sensation Fake News D1 1

1

2

1

1

0

0

D2 0

0

1

0

0

1

1

Thereafter the columns in this matrix are used as features to build a classification model. We use sklearn’s CountV ectorizer function to create Bag-of-Words features. The other method, i.e. Term Frequency-Inverse Document Frequency indicates how relevant a word is within a document. It is calculated by taking the frequency of a word that appears in a document and dividing it by the total number of features. TF-IDF uses the Countvectorizer function that converts text documents into a matrix of token counts and penalizes the common words by assigning lower weights. The important terms related to TF-IDF are as follows. N umber of times that term t appears in a document (1) N umber of terms in the document N (2) IDF = log( ) n T F − IDF = T F × IDF (3) TF =

where, N is the number of documents and n is the number of documents in which a term t has appeared. Based on this discussion, we model the data using both BoW as well as TF-IDF. 3.4

Model Training with Batch of Classifiers

A batch of supervised classifiers in machine learning, including Logistic Regression, Na¨ıve Bayes, and Random Forest, are employed in order to further analyze the well-processed data set as described herewith. The details of these classifiers along with the reasons for selecting them within the HiSAT framework, their hyper-parameter tuning and the corresponding evaluation results are all synopsized together in Sect. 4.

4

Implementation and Evaluation of Classifiers

Various classification methods have been developed in Machine Learning, which use different strategies to categorize unlabeled data. The machine learning classifiers selected for our analysis are Logistic Regression, Na¨ıve Bayes and an Ensemble method: Random Forest, for reasons explained in their respective subsections next. These are all supervised machine learning techniques because they require pre-labeled training data. In line with this, it is important to mention that effectively training a classifier with adequate data obviously makes future predictions

386

A. Kommu et al.

easier. In order to assess the effectiveness of the classifiers, evaluation metrics are needed. 4.1

Evaluation Metrics

While choosing an evaluation metric, it is very important to consider the available metrics [24]. The following metrics are often applied to evaluate a model in supervised learning. – Precision: This metric is applied to measure the ratio of true positive cases over all the predicted positive cases, hence it helps to assess “how precise is the predicted answer”. – Recall: This can be simply described as the ratio of what the given model predicts correctly over what the actual labels are; in other words “how much of the actual answer does the model recall”. – Accuracy: This metric is a ratio to assess how often a model classifies a data point correctly. In this ratio, the numerator is the total number of predictions that were correct while the denominator is the total number of predictions. – F1 score: The F1 score metric is a combination of the precision and recall of a classifier. It calculates their harmonic mean. – ROC curve: A Receiver Operating Characteristic Curve (ROC) illustrates the performance of a classification model with two parameters, namely, True Positive Rate (TPR) and False Positive Rate (FPR). ROC is thus a probability curve, and the Area Under the Curve (AUC) represents the degree or measure of separability. It indicates how much the model is capable of distinguishing between the classes. The higher the AUC, the better the model is at predicting the 0-labeled classes as 0 and the 1-labeled classes as 1. In the evaluation of all the classifiers in HiSAT, we focus on both accuracy and F1 score. This is done in order to evaluate the performance of distinguishing both hate and non-hate speech in the Twitter data set. The details of the classifiers are elaborated below. 4.2

Logistic Regression Classifier

Logistic Regression predicts the occurrence of an event by fitting the given data to a “logit” function [24]. In other words, it forecasts the likelihood of fitting the data into the function. This classifier is selected in HiSAT because it is known to be a highly efficient method for binary and linear classification problems, and we deal with two main classes in our work, namely, hate speech and non-hate speech. Figure 10 portrays the ROC curve for training the Logistic Regression classifier. In our experiments, we achieve an overall accuracy of 94.97% with a F1 score of 0.4780 with this classifier in HiSAT.

HiSAT: Hierarchical Framework for Sentiment Analysis on Twitter Data

387

Fig. 10. ROC curve for Logistic Regression

4.3

Random Forest Classifier

A Random Forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples [24]. Hence, it is an ensemble learning method, i.e. it works based on a mixture of experts paradigm. This is our very reason for choosing such a classifier, because we intend to observe results with multiple decision trees (rather than a single tree), and the concept of decision trees per se is fundamental in some classification problems due to path-wise reasoning. The hyper-parameters used in Random Forest within HiSAT are as follows. – n estimators. This hyper-parameter is adjusted by setting how many decision trees should be built. – n jobs. This specifies the number of cores to use when training the decision trees in parallel. Based on tuning these hyper-parameters, we implement this classifier in the HiSAT framework. Its experimental results reveal that its F1 score is 0.67 and its accuracy is 96.12% on the whole. The ROC of Random Forest is illustrated in Fig. 11. 4.4

Na¨ıve Bayes Classifier

The Na¨ıve Bayes classifier conducts reasoning based on probability using the fundamental concept of Bayes Theorem [24]. This is selected in HiSAT to distinguish hate speech, and is deployed mainly due to its probabilistic reasoning which can be useful in estimating what constitutes hate-related terms. Since all the features are considered as independent, we prefer Na¨ıve Bayes rather than Bayesian belief networks (which use conditional probability and are more useful for dependent variables). We achieve an accuracy of 94.979% and an F1 score of 0.4772 in our experiments with this classifier in HiSAT. The ROC curve of Na¨ıve Bayes Classifier is shown in Fig. 12.

388

A. Kommu et al.

Fig. 11. ROC curve for Random Forest

Fig. 12. ROC curve for Na¨ıve Bayes

4.5

Comparison and Discussion

We compare the performance of all the classifiers deployed within the HiSAT framework, considering the selected metrics of accuracy and F1 score. In order to get a clear at-a-glance comparison, we provide an illustration using one common plot as shown in the next figure. Figure 13 portrays the accuracy and F1 score for all the classifiers as implemented within HiSAT. Overall, it is observed that the Random Forest classifier typically achieves the best performance across all the experiments in our study. In general, all the supervised classifiers harnessed within the HiSAT framework perform well, and lead to promising results. While we experiment with a limited collection of tweets in this study, the framework per se is usable for experimenting with more data. It is extendable to other problems, beyond categorizing hate speech versus non-hate speech in the overall realm of Sentiment

HiSAT: Hierarchical Framework for Sentiment Analysis on Twitter Data

389

Analysis. There is scope for further enhancement of this work in terms of robustness and intricacy. We can consider a vast range of data sets for robustness and delve into more parameters as well as hyper-parameters for further intricate details, e.g. user demographics of the posts, geo-tagging based on location, time sensitivity and evolution of concerned terms within the posts, severity of the expressed sentiments etc. Other paradigms can be explored in this context such as Deep Learning [25], with convolutional neural networks [26] that are popular with many tasks, and with recurrent neural networks [27] that are particularly well-suited for language-related tasks. These are likely to yield enhanced results in more advanced tasks.

Fig. 13. Accuracy and F1 score for all classifiers in HiSAT

5

Conclusions and Future Work

In this paper, we propose a framework called HiSAT to automatically classify tweets on Twitter data into two different classes namely hate speech and nonhate speech. We implement the HiSAT framework to discover the sentiments behind the opinions in texts/tweets using Python. We conduct data preprocessing via NLP-based techniques. More specifically, we use Stemming, Tokenization, and Lemmatizations to convert text to tokens; as well as Bag-of-Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF) to convert text sentences into numeric vectors. The latter are fed as inputs to Machine learning algorithms. Random Forest Classifier, Logistic Regression and Na¨ıve Bayes Classifier are used as supervised classifiers to detect hate speech and non-hate speech of Twitter tweets. These models are pre-trained and validated. Experimental results indicate that Random Forest overshadows the others with a better prediction performance in assigning correct labels with an accuracy of 96% overall. An obvious conclusion further corroborated by our study is that the cleaner the data, the more accurate are the results obtained.

390

A. Kommu et al.

Our roadmap entails comparing our work with other existing approaches in related areas. We also aim to improve our framework in order to increase its F1 score in the future work. Furthermore, we would aim to experiment with a wider range of data sets in order to augment the robustness of our approach. The possible inclusion of commonsense knowledge [28], recurrent neural networks [29] and transformers [30] can be explored based on their success stories in several tasks as evident from numerous recent studies [28–30] that span the landscape of natural language processing as well as social media mining. On the whole, our study that constitutes work-in-progress contributes the two cents to the overall realm of Sentiment Analysis of social media data, a much explored avenue today with implications ranging from areas such as business management to healthcare. Our work makes broader impacts within the area of AI and Data Science for Social Good. Acknowledgments and Disclaimer. Dr. Jiayin Wang and Dr. Aparna Varde acknowledge a grant from the US National Science Foundation NSF MRI: Acquisition of a High-Performance GPU Cluster for Research and Education. Award Number 2018575. Dr. Aparna Varde is a visiting researcher at Max Planck Institute for Informatics, Saarbrucken, Germany, in the research group of Dr. Gerhard Weikum, during the academic year 2021–2022, including a sabbatical visit. The authors acknowledge the CSAM Dean’s Office Travel Grant from Montclair State University to support attending this conference. The authors would like to make the disclaimer that the opinions expressed, analyzed and presented in this work are obtained from knowledge discovery by mining the concerned data only. These do not reflect the personal or professional views of the authors.

References 1. Liu, B.: Sentiment Analysis: Mining Opinions, Sentiments, and Emotions, 2nd edn. Cambridge University Press (2020) 2. Simon Perfect (Theos, London, UK), What are the hate crime laws and should they be reformed? November 2020. https://www.theosthinktank.co.uk/comment/ 2020/10/29/what-are-the-hate-crime-laws-and-should-they-be-reformed 3. Twitter Sentiment Analysis. https://www.kaggle.com/arkhoshghalb/twittersentiment-analysis-hatred-speech 4. Anjaria, M., Guddeti, R.M.R.: Influence factor based opinion mining of Twitter data using supervised learning, In: 2014 Sixth International Conference on Communication Systems and Networks (COMSNETS), pp. 1–8 (2014) 5. Cristianini, N., Ricci, E.: Support Vector Machines. In: Kao, M.Y. (eds.) Encyclopedia of Algorithms. Springer, Boston (2008) 6. Cao, H., Verma, R., Nenkova, A.: Speaker-sensitive emotion recognition via ranking: studies on acted and spontaneous speech. In: Comput. Speech Lang. 28(1), 186–202 (2015) 7. Xia, R., Zong, C., Li, S.: Ensemble of feature sets and classification algorithms for sentiment classification. Inf. Sci. 181(6), 1138–1152 (2011) 8. Chen, L., Mao, X., Xue, Y., Cheng, L.L.: Speech emotion recognition: features and classification models. Digit. Signal Process. 22(6), 1154–1160 (2012)

HiSAT: Hierarchical Framework for Sentiment Analysis on Twitter Data

391

9. Du, X., Emebo, O., Varde, A.S., Tandon, N., Chowdhury, S.N., Weikum, G.: Air quality assessment from social media and structured data: pollutants and health impacts in urban planning. In: IEEE International Conference on Data Engineering (ICDE) Workshops, pp. 54–59 (2016) 10. El Ayadi, M., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recogn. 44, 572–587 (2011) 11. Gandhe, K., Varde, A.S., Du, X.: Sentiment analysis of Twitter data with hybrid learning for recommender applications. In: IEEE Ubiquitous Computing, Electronics and Mobile Communication Conference (UEMCON), pp. 57–63 (2018) 12. Saif, H., He, Y., Alani, H.: Semantic sentiment analysis of Twitter. In: Cudr´eMauroux, P., Heflin, J., Sirin, E., Tudorache, T., Euzenat, J., Hauswirth, M., Parreira, J.X., Hendler, J., Schreiber, G., Bernstein, A., Blomqvist, E. (eds.) ISWC 2012. LNCS, vol. 7649, pp. 508–524. Springer, Heidelberg (2012). https://doi.org/ 10.1007/978-3-642-35176-1 32 13. Badjatiya, P., Gupta, S., Gupta, M., Varma, V.: Deep learning for hate speech detection in tweets. In: The 26th International Conference on World Wide Web Companion (WWW), pp. 759–760. ACM (2017) 14. Puri, M., Varde, A.S., de Melo, G.: Commonsense based text mining on urban policy. In: Language Resources and Evaluation (LREV) Journal, Springer (2022) 15. Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. In: 11th International AAAI Conference on Web and Social Media, pp. 512–515 (2017) 16. Bifet, A., Frank, E.: Sentiment knowledge discovery in Twitter streaming data. In: Discovery Science - 13th International Conference (2010) 17. Du, X., Kowalski, M., Varde, A.S., de Melo, G., Taylor, R.W.: Public opinion matters: mining social media text for environmental management. In: ACM SIGWEB vol. 2019, issue Autumn, pp. 5:1–5:15 (2019) 18. Namita, M., Basant, A., Garvit, C., Prateek, P.; Sentiment analysis of Hindi review based on negation and discourse relation. In: International Joint Conference on Natural Language Processing (2013) 19. Wang, L., Wang, Y., de Melo, G., Weikum, G.: Understanding archetypes of fake news via fine-grained classification. Soc. Network Anal. Mining 9(1), 37:1–37:17 (2019) 20. Popat, K., Mukherjee, S., Str¨ otgen, J., Weikum, G.: CredEye: a credibility lens for analyzing and explaining misinformation. In: International Conference on World Wide Web Companion (WWW), pp. 155–158 (2016) 21. Torres, J., Anu, V., Varde, A.S.: Understanding the information disseminated using Twitter during the COVID-19 pandemic. In: IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS), pp. 1–6 (2021) 22. Yin, Z., Rong, J., Zhi-Hua, Z.: Understanding bag-of-words model: a statistical framework. Int. J. Mach. Learn. Cybern. (2010) 23. Stemler, S.E., Tsai, J.: Best practices in interrater reliability three common approaches. In: Osborne, J. (ed.) Best Practices in Quantitative Methods, pp. 29–49. SAGE Publications Inc., Thousand Oaks (2011) 24. Mitchell, T.: Machine Learning. McGraw Hill (1997) 25. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016) 26. LeCun, Y., Bengio, Y.: Convolutional networks for images, speech, and time series. In: The Handbook of Brain Theory and Neural Networks, vol. 3361, no. 10 (1995) 27. Mikolov, T., Karafiat, M., Burget, K., Cernock´ y, J., Khudanpur, S.: Recurrent neural network based language model. Interspeech J. 2(3), 1045–1048 (2010)

392

A. Kommu et al.

28. Razniewski, S., Tandon, N., Varde, A.S.: Information to wisdom: commonsense knowledge extraction and compilation. In: ACM Conference on Web Search and Data Mining (WSDM), pp. 1143–1146 (2021) 29. Zaramba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. arXiv preprint arXiv:1409.2329 (2014) 30. Bra¸soveanu, A.M.P., Andonie, R.: Visualizing transformers for NLP: a brief survey. In: IEEE 24th International Conference Information Visualisation (IV) (2020)

Characterization of Postoperative Pain Through Electrocardiogram: A First Approach Raquel Sebasti˜ ao(B) Institute of Electronics and Informatics Engineering of Aveiro (IEETA), Department of Electronics, Telecommunications and Informatics (DETI) University of Aveiro, 3810-193 Aveiro, Portugal [email protected]

Abstract. Current standard practices to evaluate pain are mainly based on self-reporting instruments. However, pain perception is subjective and influenced by several factors, making objective evaluation difficult. In turn, the pain may not be correctly managed, and over or under dosage of analgesics are reported as leading to undesirable side-effects, which can be potentially harmful. Considering the relevance of a quantitative assessment of pain for patients in postoperative scenarios, recent studies stress out alterations of physiological signals when in the experience of pain. As the Autonomic Nervous System (ANS) functions without conscious control, it is difficult to deceive its reactions, this is a feasible way to assess pain. The goal of the proposed work is to characterize pain in postoperative scenarios through physiological features extracted from the electrocardiogram (ECG) signal, finding features with the potential to discriminate the experience of pain. Using ECG from ‘pain’ and ‘nopain’ intervals reported from 19 patients during the postoperative period of neck and thorax surgeries, several features were computed and scaled regarding the baseline of each participant to vanish inter-participant variability. Upon, selected features, though pairwise correlation, were analyzed using pairwise statistical tests to infer differences between ‘pain’ and ‘no-pain’ intervals. Results showed that 6 features extracted from ECG are able to discriminate the experience of postoperative pain. These initial results open the possibility for researching physiological features for a more accurate assessment of pain, which is critical for better pain management and for providing personalized healthcare. Keywords: ECG monitoring · Pain · Postoperative correlation · Feature extraction · Statistical tests

1

· Feature

Introduction, Motivation and Goals

Pain involves dysregulations in the Autonomic Nervous System (ANS), a primary behavioral system [23], which regulates fundamental physiological states that c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023  K. Arai (Ed.): IntelliSys 2022, LNNS 542, pp. 393–402, 2023. https://doi.org/10.1007/978-3-031-16072-1_29

394

R. Sebasti˜ ao

are typically involuntary, upregulating, and downregulating various functions within our body. The experience of pain induces reactions in the ANS, and, as it functions without conscious control [5], is a feasible way to assess pain. In the current standard practices to evaluate pain, due to the lack of quantified measures of pain, there are several barriers that can lead to a misleading pain assessment, resulting, therefore, in over or under dosage of analgesics. Either way, the pain may not be correctly managed and the undesirable side-effects of wrong doses can be potentially harmful. An incorrect assessment of pain may lead to undertreatment or overtreatment of pain [6,19], difficult the overall recovery [11,20], and lead to adverse psychological and cognitive effects [9,18]. Indeed, in postoperative scenarios, pain assessment is considered the most important task to ensure patient comfort. Considering that it is of utmost importance to properly measure and assess pain in such cases [8], there are several studies on the alterations of physiological signals when in the experience of pain. Recent studies have shown that common symptoms associated with pain are increased respiratory activities, cardiac acceleration, a burst of sweat, increased skin conductance and heightened muscle contraction [4,7,10,13–16,21,22]. Thus, it is paramount to process and extract information from these physiological signals in order to present an understandable pain assessment. In the works [13,14], the authors agree that the severity of postoperative pain significantly influences the skin conductance (SC), demonstrating a correlation between the number of fluctuations in SC per second (NFSC) and self-assessed pain measured using a numeric rating scale. Within the same scope, the authors of [7] proposed using changes in the NFSC as a biomarker to assess postoperative pain in children, being able to predict moderate to severe postoperative pain from NFSC. Also concerning physiological signals, the authors of [4], based on four pain induction tasks, proposed heart rate variability (HRV) as a biomarker for chronic pain in children. Embracing this concern, the goal of this work is to characterize postoperative pain through ECG, finding features with the potential to discriminate the experience of pain. For that, it uses pain assessments from 19 patients, who underwent neck and thorax surgeries, collected during the postoperative at the recovery room. The remainder of this work is organized as follows: Sect. 2 describes the setup for data monitoring and collection, as well as the methods used for data analysis. In Sect. 3 the results are presented and discussed. Final remarks and future research lines are presented in Sect. 4.

2

Materials and Methods

This section describes the setup for data monitoring and collection, as well as the methods proposed to identify ECG-based features which are able to provide a feasible characterization of pain in postoperative scenarios. Data processing was

Characterization of Pain in Postoperative Scenario through ECG

395

performed in MATLAB [17] and Python, using the NeuroKit21 , which provides biosignal processing routines. 2.1

Setup and Data Collection

Twenty participants undergoing elective neck and thorax surgeries at Centro Hospitalar Tondela-Viseu (CHTV) took part in this study. The recruitment was performed on a volunteer basis and after a written informed consent form. ECG signals were monitored, using minimally invasive equipment, in the recovery room and during the standard clinical practices of analgesia, fulfilling all the clinical aspects and without compromising the patient’s well-being. The R [2], with a sampling ECG data was recorded, through the Vital Jackett-shirt rate 500 Hz, and using two electrodes placed on the right and left side of the participant’s ribcage and a reference electrode placed above the pelvic bone. Besides the ECG signals, this dataset contains information on patient’s age, gender, type of surgical intervention, and type of anesthesia protocol. The procedures performed during the postoperative recovery of patients were also registered, including self-reports of pain, pain relief therapeutics, and other medical interventions (such as patient repositioning). These procedures are associated with time triggers that mark the event occurrence in the ECG signal. The evaluation of pain was based on self-report instruments (Numerical Rating Scale NRS [25]) and several assessments, as necessary according to the clinical team, as obtained until discharge. From the twenty patients in the dataset, one patient was withdrawn from the study because of the lack of pain assessment annotations during the ECG recording, resulting in a total of nineteen patients (60 ±21 years old), ten females. 2.2

ECG Processing

The ECG signals are affected by noise, such as skin-electrode interference (lowfrequency noise, which is amplified by motion, movements, and respiratory variation), powerline (with a frequency 50 Hz), and electronic devices (high-frequency noise) interference, namely from the clinical apparatus that concern this specific clinical scenario [3,12]. To attenuate the effects of noise and improve the quality of the signal, the raw ECG was low-pass filtered at a cut-off frequency 40 Hz, as the useful band of frequencies for these research purposes, without clinical relevance, varies between 0.5 Hz 40 Hz. The fundamental frequencies for the QRS complex, which is composed of Q, R, and S waves, are 30 Hz, and for the Pwave and T-wave components are 20 Hz 10 Hz, respectively [24]. Afterward, the baseline wander was removed with a moving average filter. To achieve the proposed goal of characterizing postoperative pain through ECG-based features, we rely on information from the self-reported pain and from pain analgesia to define intervals related to pain experience. Further, we investigate if features expose differences between ‘pain’ and ‘no pain’ intervals, which 1

https://neurokit2.readthedocs.io/en/latest/

396

R. Sebasti˜ ao

corresponds, respectively, to the intervals of 15-minutes of data before and after these reported instances. Also, the baseline for each participant, which serves as a comparison regarding the pain state, was selected. In this work, it was considered that the last 10 min of useful ECG provide information on the state of the patient without the influence of pain or analgesia for pain management. Therefore, to reduce inter-patient dependency, each feature in both the 15-minutes intervals was divided by the respective feature computed in the baseline. Table 1 presents a description of the 34 features computed, using the NeuroKit2 package, for the two 15-minutes intervals and for the baseline. Table 1. Different types of features extracted from monitored ECG signals. ECG-based features Heart Rate (HR); Amplitude of peaks: P, Q, R, S, and T; Intervals: PP, QQ, RR, SS, TT, PR, QT, ST, and QRS; HRV-Time domain features RMSSD: The square root of the mean of the sum of successive differences between adjacent RR intervals; MeanNN: The mean of the RR intervals; SDNN: The standard deviation of the RR intervals; SDSD: The standard deviation of the successive differences between RR intervals; CVNN: The standard deviation of the RR intervals (SDNN) divided by the mean of the RR intervals (MeanNN); CVSD: The root mean square of the sum of successive differences (RMSSD) divided by the mean of the RR intervals (MeanNN); MedianNN: The median of the absolute values of the successive differences between RR intervals; MadNN: The median absolute deviation of the RR intervals MCVNN: The median absolute deviation of the RR intervals (MadNN) divided by the median of the absolute differences of their successive differences (MedianNN); IQRNN: The interquartile range (IQR) of the RR intervals; pNN50: The proportion of RR intervals greater than 50 ms, out of the total number of RR intervals; pNN20: The proportion of RR intervals greater than 20 ms, out of the total number of RR intervals. HRV Frequency-domain features LF: The spectral power density pertaining to low frequency band; HF: The spectral power density pertaining to high frequency band HRV Non-linear features SD1: index of short-term RR interval fluctuations; SD2: index of long-term RR interval fluctuations; SD1/SD2: ratio between short and long term fluctuations of the RR intervals; HRV Complexity features ApEn: approximate entropy measure of HRV; SampEn: The sample entropy measure of HRV;

Characterization of Pain in Postoperative Scenario through ECG

397

Thereafter, for each feature, the average in the 15-minutes intervals and in the baseline was calculated and the ratio between each ‘pain’ and ‘no pain’ 15-minutes intervals and the baseline was computed. Moreover, to characterize the postoperative pain through the ratio of averaged features, feature selection was performed to reduce the total number of features, relying upon a filter-based method. At first, the Lilliefors test was applied to all features to decide if data comes from a normally distributed family (with a significance level of 5%). Thus, the pairwise Spearman correlation (as not all of the ratio of averaged features were normally distributed) between the 34 features was computed, and for each pair with an absolute correlation value above 0.9, the feature with lower variance was discarded. To explore if the selected features differ according to the ‘pain’ or ‘no-pain’ experience, an analysis using boxplots, with notched boxes, was performed to visualize the distribution of the features and assess differences between the medians. Afterward, to test which features expose differences between ‘pain’ and ‘nopain’ groups, the pairwise Student’s t-test or the Wilcoxon Signed Rank test were applied, depending on the normality of the distribution of the features, to decide if the samples from the 2 groups (‘pain’ vs. ‘no pain’) originated from the same distribution, by comparing the mean or the mean ranks of both groups.

3

Results and Discussion

This section presents the results from the approach previously described and summarizes with a discussion with respect to related works.

Fig. 1. Data analysis workflow.

3.1

Results

As detailed above, the workflow process for data analysis consisted of several steps, from the collection of data to the evaluation of the results, including the preprocessing of ECG data, the selection of ‘pain’ and ‘no pain’ 15-minutes intervals according to the reports, the extraction of features and the computation of ratios with the respective feature in the baseline (from now on referred only

398

R. Sebasti˜ ao

as features), the analysis of the boxplots of the features, the selection of most relevant features, and, finally, the application of pairwise tests (Student’s t-test or the Wilcoxon Signed Rank test) to infer which features expose statistical significant differences in between pain groups. This workflow is illustrated in Fig. 1. From the 34 features extracted, feature selection was performed based on the pairwise correlation of the features and variance analysis, resulting in a total of 19 relevant features. Moreover, and in accordance with the literature, some of the features that report similar measures, and thus, are highly correlated, were discarded. For example, the authors of [1] indicate that although on a different scale, the RMSSD is equivalent to SD1, both representing short-term HRV, and the obtained correlation shows a perfect relation of these features. Thus the selected features were: ECG-based features: Amplitude of peaks: P, Q, R, S, and T; Intervals: PR, QT, ST, and QRS; HRV Frequency-domain features: RMSSD, SDNN, MedianNN, IQRNN, pNN50, pNN20; HRV Frequency-domain features: LF, HF; HRV Non-linear features: SD1/SD2; HRV Complexity features: SampEn; Figure 2 displays a heatmap of the correlation matrix of the selected features, showing that, most of the time, the same type of feature presents a larger correlation in between and lower correlation with other types of features.

Fig. 2. Heatmap of the correlation matrix of the selected features.

Characterization of Pain in Postoperative Scenario through ECG

399

Thereafter, the boxplots of the features were computed to analyze the distribution in both ‘pain’ and ‘no pain’ groups. Figure 3 shows the boxplots for 8 of the selected features, namely ECG-based (QT and ST intervals), HRV timedomain (IQRNN and pNN20), HRV frequency-domain (LF and HF), HRV nonlinear (SD1/SD2), and HRV complexity (SampEn) features.

Fig. 3. Boxplots for 8 of the selected features: QT interval, IQRNN, LF and SD1/SD2 (left) and ST interval, pNN20, HF and SampEn (right).

From the boxplots, it can be observed that, with the exception of IQRNN, the median values for the group ‘no pain’ are lower than for the group ‘pain’. Whereas for the dispersion, ST interval, pNN20, LF, and SD1/SD2 presented higher values for the ‘pain’ group. With respect to the ECG-based features, there was a statistical difference between both pain groups for the QT and ST intervals (p = 0.0176 and p = 0.0242, respectively, with a 95% confidence interval (CI)). Concerning HRV time-domain features, only MedianNN (p = 0.0176, 95% CI) shows a significant difference between the groups of ‘pain’ and ‘no-pain’.

400

R. Sebasti˜ ao

For the HRV frequency-domain features, both low and high frequencies were significantly different (p = 0.0003 and p = 0.0112, respectively, 95% CI) in pain groups. Whereas for the HRV non-linear features, only the SD1/SD2, associated with the randomness of the HRV signal, shows a statistical difference between ‘pain’ and ‘no-pain’ (p = 0.0367, 95% CI). For the remaining selected features, there was no significant difference between ‘pain’ and ‘no-pain’ groups. 3.2

Discussion

With a similar protocol to the one presented, the work [13] monitored 25 patients, after minor elective surgeries, in the recovery room. At different time-points, the patients reported the pain on an NRS scale, and SC, HR, and blood pressure were recorded. From the analysis performed, the authors found that while NFSC is significantly different for different groups of pain levels, HR showed no correlation with pain. The same authors have other works regarding HR and HRV responses to postoperative pain [15,16], which report no statistically significant differences in HR, LF, HF, and LF/HF between pain groups (defined according to NRS values). Also assessing effects of postoperative pain in HRV measures, the authors of [21] found that LF and LF/HF were significantly different in moderate/severe pain and that HF did not present statistical differences in pain groups. In the present study, the finding that the response of HR did not show significant differences among pain groups is common to these related works. With respect to HRV measures, the present work showed that several HRV measures presented differences between ‘pain’/‘no pain’ groups. However, these findings can not be directly compared with the results mentioned above, as differences in the HRV responses among different levels of pain were not assessed in this work.

4

Conclusions and Future Work

Pain perception is subjective and influenced by several factors, turning its objective evaluation an added difficulty. Embracing this concern, this work proposes the study of ECG signals of patients in postoperative pain, in order to extract the necessary information for its characterization. Thus, relying on ANS reactions, which are difficult to deceive, this work aims to describe postoperative pain through physiological features extracted from the ECG signal. The ECG-based features, QT and ST intervals, the MedianNN, an HRV time-domain feature, the low and high frequencies of HRV, and the HRV nonlinear SD1/SD2, show to have the potential to discriminate the experience of postoperative pain. Although the limitations of the used dataset, the encouraging obtained results, sustain the feasibility of these physiological features to serve as pain indicators, enabling a more accurate assessment. Thus, future research should focus on enlarging the number of patients, exploring the responses from other

Characterization of Pain in Postoperative Scenario through ECG

401

physiological signals, such as electrodermal activity and electromyogram, and discerning the physiological responses to different levels of pain. Collecting different physiological signals, and considering more patients under study, would also allow learning a classification model for pain recognition. These first results can advise on the most relevant ECG-based features to be included in this pain recognition task. An assessment of pain based on physiological signals, besides improving the comprehension of pain mechanisms, may provide objective and quantified inputs that could help enhance self-care and promote the health and well-being of patients. Moreover, it can contribute to opening the path of adaptive and personalized therapies, such as adaptive drug adjustment according to the level of pain. Acknowledgments. This work was funded by national funds through FCT - Funda¸ca ˜o para a Ciˆencia e a Tecnologia, I.P., under the Scientific Employment Stimulus - Individual Call - CEECIND/03986/2018, and is also supported by the FCT through national funds, within IEETA/UA R&D unit (UIDB/00127/2020). Particular thanks are due to the clinical team for allowing and supporting the researchers of this work during the procedure of data monitoring and collection. The author also acknowledges all volunteers that participated in this study.

References 1. Anthony, B., et al.: Reminder: RMSSD and SD1 are identical heart rate variability metrics. Muscle Nerve 56(4), 674–678 (2017) 2. Cunha, J.P.S., Cunha, B., Pereira, A.S., Xavier, W., Ferreira, N., Meireles, L.: R a wearable wireless vital signs monitor for patients’ mobility in Vital-Jacket: cardiology and sports. In: 4th International ICST Conference on Pervasive Computing Technologies for Healthcare (Pervasive-Health), vol.6, pp. 1–2 (2010) 3. do Vale Madeiro, J.P., Cortez, P.C., da Silva Monteiro Filho, J.M., Rodrigues, P.R.F.: Chapter 3 - techniques for noise suppression for ECG signal processing. In: Developments and Applications for ECG Signal Processing, pp. 53–87. Academic Press (2019) 4. Evans, S., Seidman, L.C., Tsao, J.C., Lung, K.C., Zeltzer, L.K., Naliboff, B.D.: Heart rate variability as a biomarker for autonomic nervous system response differences between children with chronic pain and healthy control children. J. Pain Res. 6, 449–457 (2013) 5. Gabella, G.: Autonomic Nervous System. John Wiley & Sons Ltd. (2012) 6. Gan, T.J.: Poorly controlled postoperative pain: prevalence, consequences, and prevention. J. Pain Res. 10, 2287–2298 (2017) 7. Hullett, B., et al.: Monitoring electrical skin conductance: a tool for the assessment of postoperative pain in children? Anesthesiology 111(3), 513–517 (2009) 8. Jang, J.H., Park, W.H., Kim, H.-I., Chang, S.O.: Ways of reasoning used by nurses in postoperative pain assessment. Pain Manage. Nurs. 21(4), 379–385 (2020) 9. Joshi, G.P., Ogunnaike, B.O.: Consequences of inadequate postoperative pain relief and chronic persistent postoperative paint. Anesthesiol. Clin. North Am. 23(1), 21–36 (2005) 10. Joshi, M.: Evaluation of pain. Indian J. Anaesth. 50(5), 335–339 (2006)

402

R. Sebasti˜ ao

11. Kang, S., Brennan, T.J.: Mechanisms of postoperative pain. Anesth. Pain Med. 11, 236–248 (2016) 12. Berkaya, S.K., Uysal, A.K., Gunal, E.S., Ergin, S., Gunal, S., Gulmezoglu, M.B.: A survey on ECG analysis. Biomed. Signal Process. Control 43, 216–235 (2018) 13. Ledowski, T., Bromilow, J., Paech, M.J., Storm, H., Hacking, R., Schug, S.A.: Monitoring of skin conductance to assess postoperative pain intensity. Br. J. Anaesth. 97, 862–865 (2006) 14. Ledowski, T., Preuss, J., Schug, S.A.: The effects of neostigmine and glycopyrrolate on skin conductance as a measure of pain. Eur. Soc. Anaesthesiol. 26, 777–781 (2009) 15. Ledowski, T., Reimer, M., Chavez, V., Kapoor, V., Wenk, M.: Effects of acute postoperative pain on catecholamine plasma levels, hemodynamic parameters, and cardiac autonomic control. Pain 153(4), 759–764 (2012) 16. Ledowski, T., Stein, J., Albus, S., MacDonald, B.: The influence of age and sex on the relationship between heart rate variability, haemodynamic variables and subjective measures of acute post-operative pain. Eur. J. Anaesthesiol. 28(6), 433– 437 (2011) 17. MATLAB version 9.10.0.1684407 (R2021a). The Mathworks, Inc. Natick, Massachusetts (2021) 18. Middleton, C.: Understanding the physiological effects of unrelieved pain. Nurs. Times 99(37), 28 (2003) 19. Pogatzki-Zahn, E., Segelcke, D., Schug, S.: Postoperative pain-from mechanisms to treatment. Pain Rep. 2(1), 03 (2017) 20. Segelcke, D., Pradier, B., Pogatzki-Zahn, E.: Advances in assessment of pain behaviors and mechanisms of post-operative pain models. Curr. Opin. Physio. 11, 07 (2019) 21. Sesay, M., et al.: Responses of heart rate variability to acute pain after minor spinal surgery: optimal thresholds and correlation with the numeric rating scale. J. Neurosurg. Anesthesiol. 27(2), 148–154 (2015) 22. Storm, H.: Changes in skin conductance as a tool to monitor nociceptive stimulation and pain. Curr. Opin. Anesthesiol. 21, 796–804 (2008) 23. Storm, H.: The capability of skin conductance to monitor pain compared to other physiological pain assessment tools in children and neonates. Pediatr. Ther. 3, 168 (2013) 24. S¨ ornmo, L., Laguna, P.: Electrocardiogram (ECG) Signal Processing. John Wiley & Sons Ltd. (2006) 25. Williamson, A., Hoggart, B.: Pain: a review of three commonly used pain rating scales. J. Clin. Nurs. 14(7), 798–804 (2005)

Deep Learning Applied to Automatic Modulation Classification at 28 GHz Yilin Sun(B) and Edward A. Ball The University of Sheffield, Sheffield S1 4ET, UK [email protected]

Abstract. Automatic Modulation Classification (AMC) is a fast-expanding technology, which is used in software defined radio platforms, particularly relevant to fifth generation and sixth generation technology. Modulation classification as a specific topic in AMC applies Deep Learning (DL) in this work, which contributes a creative way to analyse the signal transmitted in low Signal to Noise Ratio (SNR). We describe a dynamic system for the Millimeter wave (mmW) bands in our work. The signals collected from the receiving system is without phase lock or frequency lock. DL is applied to our system to classify the modulation types within a wide range of SNR. In this system, we provided a method named Graphic Representation of Features (GRF) in order to present the statistical features in a spider graph for DL. The RF modulation is generated by a lab signal generator, sent through antennas and then captured by an RF signal analyser. In the results from the system with the GRF techniques we find an overall classification accuracy of 56% for 0 dB SNR and 67% at 10 dB SNR. Meanwhile the accuracy of a random guess with no classifiers applied is only 25%. The results of the system at 28 GHz are also compared to our previous work at 2 GHz. Keywords: Automatic modulation classification · Deep learning · Millimeter wave

1 Introduction Taking advantage of the radio spectrum in an efficient way is becoming increasingly significant, with the evolution of digital communication systems. Dynamic Spectrum Access (DSA) is a critical technique for meeting this need, demanding spectrum sensing and signal classification. With this scenario, the application of modulation classification is critical and can be frequently used in a range of applications, including software defined radio system, radar and military communications [1, 2]. Radio frequency (RF) bands are in great demand. In the situation when multiple complex/unknown signals are to be handled, or for cognitive radio primary-user detection, the Automatic Modulation Classification (AMC) [3] technology is proposed to adapt the need and optimise identification and demodulation of the received signals. In the modern society, radio spectrum becomes a precious resource in a crowded environment. Based on this circumstances, AMC is the approach of identifying the users of the spectrum as well as unused spectrum [4]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 K. Arai (Ed.): IntelliSys 2022, LNNS 542, pp. 403–414, 2023. https://doi.org/10.1007/978-3-031-16072-1_30

404

Y. Sun and E. A. Ball

This work indicates creative methods of modulation classification associated with deep learning technology. The system also has good performance at very low Signal to Noise Ratio (SNR). In our research, the main motivation is to improve the performance of detecting signals in the low SNR area using more efficient approaches. As the development and the advancement of Artificial Intelligence (AI) technology, collaboration in numerous areas has made great strides [5]. Classification of modulation types based on analytical properties of signals is common in classic statistical machine learning approaches [6]. In this way, Deep Learning (DL) emerged as a subproject of machine learning, advanced to a higher level by incorporating the principle of biological information processing systems [7]. In image classification, DL is employed as an effective approach [8]. In our work, we develop a model extracting the essence of image classification with DL technique, utilising the Graphic Representation of Features (GRF) to represent the statistical features of the signals. The GRF method benefits from both statistical features and image classification. After presenting the modulated signals in GRF, they are tested by the pre-trained network with the existing advanced image classifiers. Transfer Learning (TL) [9] is introduced as an effective technique in our system. Several TL networks are compared in this work. For the GRF, spider graphs of the statistical features present a novel and efficient method to identify the signals. The model is used to classify the modulated signals with SNR ranges from −10 dB to 20 dB in the 28 GHz mmW band. In the previous work, Maximum-likelihood decision theory is a demanding technique [10]. When the SNR goes above 9 dB, the accuracy of identifying of PSK and QAM is around 90% [11]. Signal characteristics are used in their models for pattern recognition algorithms, and high order cumulants performs an important part in this approach. With the efficient features, a Support Vector Machines (SVM) [12] is involved. The accuracy achieves an outstanding result of 96% at 10 dB SNR [13]. However, at circa 0 dB SNR, the chance of proper identification ranges from 50% to 70%, indicating that further work is needed. In addition to the preceding classifiers, binary hierarchical polynomial classifiers are presented, coming with the detection accuracy of 56% at 0 dB SNR [14]. In this way, optimising the AMC system, particularly at low SNR, still needs effort and investigation. This work will introduce the new systems with DL technology, which is proposed by utilising the GRF, benefiting from the statistical features and image classification with DL. Convolutional Neural Network, GoogleNet [15], SqueezeNet [16] and Inception-v3 [17] are tested and compared in the system. The identification formats are created by extracting statistical characteristics from the GRF, while the constellation graphs are acquired directly from the received signals. We utilise this existing technology while simultaneously assessing its applicability by repurposing pre-existing neural networks designed for general image identification, Five key contributions of our work are proposed. First of all, evaluating the TL technology and applying the pretrained networks for the classification system. The effective use of the TL minimizes training time and computation complexity while increasing the accuracy of modulation classification. In the second contribution, the creative method named Graphic Representation of Features (GRF) is provided and evaluated, taking advantages of the statistical characters from different modulation types, representing

Deep Learning Applied to Automatic Modulation Classification at 28 GHz

405

graphically, using as image classification data for DL. This technique combines the benefits of both statistical characterisation with image classification. Thirdly, the results of testing conducted and radiated data at 28 GHz in our system are compared to find the possibility to improve the robustness for wireless communications. Our fourth contribution is an overall evaluation of generic DL classifiers as applied to the usage of communications AMC. Our final contribution is to compare the results for our technique at 28 GHz with the detection accuracy for lower frequency carriers, in our previous work [18]. In the following sections, the system model, classification method, and discussion results and performance will be proposed in the following sections.

2 System Model and Problem Description 2.1 RF Signal Description The received signal in baseband is defined by r(t), which is shown in (1). (t) = (t) + (t)

(1)

In this case, s(t) is the original transmitted signal, in the channel with the additive white Gaussian noise (AWGN), and n(t) is the noise associated with the message signals. In order to calculate and analyse the features extracted from the received signals, the raw data should be represented by convention, applying in-phase and quadrature components (I/Q), (2). a[i] = aI [i] + j ∗ aQ [i]

(2)

In-phase and quadrature components build up to the signal, which can be used to indicate the characteristics of their constellation diagrams. The constellations are captured and observed before the subsequent experiments. The statistical features are calculated from this data. Additionally, the data captured and provided by signal analyser is collected as I/Q format as well [19]. To implement and assess this classification system, we only consider four kinds of modulated signals: BPSK, QPSK, 8PSK and QAM16. For each type, 5000 symbols are captured and sampled with the rate of 4 samples per symbol. In this work, signals are considered at the SNR level ranges from −10 dB to 20 dB. It should be noted that prior efforts concentrated on classification with over 10 dB SNR [20]. The SNR value less than 0 dB also should be tested to enhance the performance in discriminating between the four modulation types. The system is proposed from − 10 dB to 20 dB SNR to develop an exhaustive dataset. 2.2 Modulation Constellations We now show the constellations graphs in this section to help display the characteristics. Although in this work, we are utilising the statistical features, some of the features also describe the characteristics of shape of the modulated signals. Figure 1 displays the constellations of conducted data. And the four modulation types (BPSK, QPSK, 8PSK, QAM16) at 10 dB SNR at 28 GHz are using as examples.

406

Y. Sun and E. A. Ball

Fig. 1. 28 GHz constellations of captured conducted data.

Figure 1 shows the constellations diagrams of conducted data. At 10 dB SNR, the underlying constellation types are still identifiable and clear. However, at lower SNR levels, it becomes hard to distinguish the characteristics. The statistical features are thus introduced to the AMC systems. 2.3 Statistical Features for the GRF In the previous study, typical statistical approaches of machine learning were presented to identify the digital modulations [21]. We should also consider the features provided in the previous work which also applied artificial intelligence technologies. The statistical models can provide the results by collecting the statistical features from the four kinds of modulated signals from −10dB to 20 dB SNR. The suitable features in our earlier paper [18] include the standard deviation of the signal instantaneous normalised amplitude, σaa ; the maximum value of power spectral density (PSD), γmax ; the cumulants of signals, C20 , C40 , C41 , C42 , C63 , C80 ; Kurtosis, K; Skewness, S; the ratio of peak-to-rms, PR; the ratio of peak-to-average of the signal, PA. Some of them also describe the characteristic

Deep Learning Applied to Automatic Modulation Classification at 28 GHz

407

of the signals.    E[(a − E[a])4 ]    K =   E[(a − E[a])2 ]2 

(3)

For example, Kurtosis is extracted by (3), which describes the steepness or flatness of the distribution of signals [22].    E[(a − E[a])3 ]    S= (4)   E[(a − E[a])2 ]3/2  Skewness is extracted by (4), which describes the position of the tapering side of the distribution of signals. PA and PR indicate the sufficient details of the shape from different signals. With the following part, we evaluate these statistical characteristics against the acquired data to determine the relevant and essential features for building the GRF system.

3 Classification Method 3.1 System Structure In this work, the system employs DL with GRF. The critical step of the building the system is analysing the statistical features and selecting the proper ones to work after collecting raw data. GRF is our novel method to extract the statistical character of the signal on a graph. In order to collect the data and express them into characteristics reliably, the receiving system is dynamic but also without phase and frequency lock. After generating the graphs with GRF, the graphs are used as the dataset for the pretrained

Fig. 2. DL system with GRF

408

Y. Sun and E. A. Ball

CNNs which include advanced image classifiers. Figure 2 indicates the classification system. After calculation, the statistical features are quoted from the modulated signals. The statistical features which can help distinguish the modulation type are indicated in the spider graphs to show the graphical features. After input to the spider graphs, we can find the prediction of the modulation types. In this work, we use the conducted data as the training dataset. Conducted data and radiated data are then used for testing datasets. 3.2 Graphical Representation of Features (GRF) Based on the features from our earlier work [18], all the features are calculated for the four modulation types. Figure 3 shows all statistical features of the four modulated signals at 10 dB SNR at 28 GHz. All the statistical features are displayed in the graph, some of the features are shown in logarithmic form, such as σaa , X2 , γmax , which is a convenient form to be compared with different magnitudes. In Fig. 3, it is obvious that β, σdp , σv , v20 , X , X2 , and C21 do not show any advantage in classifying the four modulations (BPSK, QPSK, 8PSK and QAM16) – the difference of the values of these statistical features are insignificant.

Fig. 3. Features of the 28 GHz conducted signals at 10 dB SNR.

Deep Learning Applied to Automatic Modulation Classification at 28 GHz

409

Fig. 4. Spider graphs of the 28 GHz conducted signals at 10 dB SNR.

Figure 4 shows spider graphs with statistical features involved. According to the analysis of Fig. 3, the twelve selected features are displayed in the spider graphs and used as dataset. The features are used as the labels in the structure of the spider graphthe value characteristics of different features are plotted in one spider graph. There are different shapes of the spider graphs from different modulation types. The spider graphs turn the statistical features into one representation and represent the collected statistical features in a graphical way. To label the value of the features in the same common graph, required use of the log function for some of the features, where appropriate. As seen in Fig. 4, there are four modulation types with four different shapes. All the data are plotted in this way and these graphical representations are used for Neural Network with image classification. 3.3 Deep Learning Networks Deep Learning as a popular topic, provided an effective technique to enhance the performance of AMC. For this part, four different Convolutional Neural Network (CNN) models are provided, a simple CNN developed from the Iris case [23], the SqueezeNet model [16], the GoogleNet model and the Inception-v3 model [24]. Last three models from them are already pretrained with thousands of images, known as TL. To create the training and testing dataset for the networks, the signals need to be pre-processed and create a set of GRF images.

410

Y. Sun and E. A. Ball

As shown in Fig. 5, a basic example of CNN structure is built with convolutional layers and dense fully connected layers. The convolutional layers can extract the characters of the signals and generate the feature map. The features can then be learned by the sequential layers. At the last step, the categorization results are provided.

Fig. 5. Example structure of a neural network.

CNN. This model is designed as an extension of the Iris recognition case, and we build this model with fifteen layers. There are the batch normalisation layers and ReLu layers after three convolutional layers. Between the other three blocks, two max-pooling layers are added to down-sample the input data and limit the possibility of overfitting. For the convolutional layers, the filters can extract the physical characteristics of the images, such as profile and grayscale. In this case, the convolutional layer is crucial in its ability to classify the images. After that, the input channel can be normalized by the batch normalisation layers and the threshold can be calculated to the elements by the ReLu layers [25]. Transfer Learning Models. The SqueezeNet contains 68 layers, at the same time, the size of input image of the network is 227-by-227 pixels [16]. The GoogleNet is pretrained to classify 1000 different categories of photos with 144 layers and the input image size is 224-by-224 pixels [15]. The Inception-v3 is comprised of 315 layers and has been trained to distinguish 1000 categories among millions of images [17]. The size of input image is 299-by-299 pixels. Before feeding to the models, images are required to be resized to satisfy the input criteria. To best suit our work, the last learning layers are set to output 4 kinds of recognition for the modulation types. Here, 70% of the GRF images in our experiments are used as training, 20% of the characteristic graphs are used for the validation process and the last 10% are used for results test [26]. We set the system to rotate the images from −90 to 90° steps and rescale them randomly from 1 to 1.5, which can assist to increase the quantity of training data and avoid overfitting [27].

4 Evaluated Performance The findings of the various classification networks are presented in this section. This system employs high order cumulants. The Kurtosis, Skewness, PR, and PA can indicate

Deep Learning Applied to Automatic Modulation Classification at 28 GHz

411

the shape characteristics of the modulated signals. The system uses conducted data and radiated data collected by horn antennas. Signals are generated by a Rohde & Schwarz SMM 110A mmWave signal generator and are captured by a Keysight PXA N9030B signal analyser. In this work, different DL structures are compared. We only use conductive data as training dataset, but both conducted data and radiated data are tested in classification. In Fig. 6, the results from the four CNN models are displayed for different network structures. The CNN developed from Iris case proposes worse accuracy than the other DL models. This is most likely owing to the fact that the structures and coefficients of this CNN variation are potentially highly sensitive, which cannot show the good performance in the classification system.

Fig. 6. General accuracy of classification over different SNR levels at 28 GHz.

Figure 7 shows two example confusion matrices of the results at 10 dB SNR by using Inception-v3 network. The random guess of the system to detect the four modulated signals without any specific classifiers would be 25% in general (the probability of choosing the correct one from the four modulations). This model also provides accuracy slightly higher than random guess at −10 dB SNR, an SNR level well below what most communication system would use.

412

Y. Sun and E. A. Ball

Fig. 7. Accuracy of classification at 10 dB SNR.

Fig. 8. General accuracy of classification at 2 GHz [18].

We also compare the detection accuracy for our technique as applied in [18] at 2 GHz. In that scenario we obtained detection accuracy as in Fig. 8 for various SNR levels. From this, we can see that the detection accuracy using our system at 28 GHz is slightly worse (circa 10% worse at −10 dB SNR, though this improves as SNR increases). We are investigating possible causes for this difference, which could be due to propagation effects in the lab and the different RF equipment used.

5 Conclusion In this work, the AMC models associated with DL are involved into modulation recognition. The dynamic receiving system in our research is without phase or frequency lock in a mmW band at 28 GHz. Firstly, we collect conducted and radiated data at 28 GHz and

Deep Learning Applied to Automatic Modulation Classification at 28 GHz

413

analyse the statistical features. After that, we provide an overview of the GRF method for feature representation. The system utilizes Inception-v3 to obtain the highest accuracy. We then provide a brief comparison between the results at 28 GHz and our earlier results [18] at a lower frequency of 2 GHz and discuss possible causes for differences in classification accuracy. Though the 28 GHz modulation classification performance is circa 10% lower than with our 2 GHz system, it still is capable of good classification and is significantly better than a random guess probability. The results also give us stimulus to explore our classifier in higher mmW bands. Therefore, for our future work, we will continue to analyze mmW RF signals and improve applicability of the classifier system in the mmW area. Acknowledgment. This work was in part supported by a UKRI Future Leaders Fellowship [grant number MR/T043164/1].

References 1. Kulin, M., Kazaz, T., Moerman, I., De Poorter, E.: End-to-end learning from spectrum data: a deep learning approach for wireless signal identification in spectrum monitoring applications. IEEE Access 6, 18484–18501 (2018) 2. Hamid, M., Ben Slimane, S., Van Moer, W., Björsell, N.: Spectrum sensing challenges: blind sensing and sensing optimization. IEEE Instrum. Meas. Mag. 19(2), 44–52 (2016) 3. Zhechen, Z., Asoke, K.N.: Automatic Modulation Classification: Principles, Algorithms and Applications. Wiley, New York (2015) 4. Hindia, M.H.D.N., Qamar, F., Ojukwu, H., Dimyati, K., Al-Samman, A.M., Amiri, I.S.: On platform to enable the cognitive radio over 5G networks. Wirel. Pers. Commun. 113(2), 1241–1262 (2020). https://doi.org/10.1007/s11277-020-07277-3 5. West, N.E., O’Shea, T.: Deep architectures for modulation recognition. In: Proceedings of the 2017 IEEE International Symposium on Dynamic Spectrum Access Networks (DySPAN), Baltimore, USA, pp. 1–6 (2017) 6. Kim, J., Lee, B., Lee, H., Kim, Y., Lee, J.: Deep learning-assisted multi-dimensional modulation and resource mapping for advanced OFDM systems. In: Proceedings of the 2018 IEEE Globecom Workshops (GC Wkshps), Abu Dhabi, United Arab Emirates, pp. 1–6 (2019) 7. Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015) 8. Chen, H., Wang, Z., Zhang, L.: Collaborative spectrum sensing for illegal drone detection: a deep learning-based image classification perspective. China Commun. 17(2), 81–92 (2020) 9. Gao, Y., Mosalam, K.M.: Deep transfer learning for image-based structural damage recognition. Comput. Civ. Infrastruct. Eng. 33(9), 748–768 (2018) 10. Sills, J.A.: Maximum-likelihood modulation classification. In: Proceedings of the MILCOM 1999. IEEE Military Communications, Atlantic City, USA, pp. 217–220 (1999) 11. Whelchel, J.E., McNeill, D.L., Hughes, R.D., Loos, M.M.: Signal understanding: an artificial intelligence approach to modulation classification. In: Proceedings of the IEEE International Workshop on Tools for Artificial Intelligence, Fairfax, USA, pp. 231–236 (1989) 12. Corinna, C., Vladimir, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995) 13. Gang, H., Jiandong, L., Donghua, L.: Study of modulation recognition based on HOCs and SVM. In: Proceedings of the IEEE 59th Vehicular Technology Conference, Milan, Italy, pp. 898–902 (2004)

414

Y. Sun and E. A. Ball

14. Abdelmutalab, A., Assaleh, K., El-Tarhuni, M.: Automatic modulation classification based on high order cumulants and hierarchical polynomial classifiers. Phys. Commun. 21, 10–18 (2016) 15. Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9 (2015) 16. Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and 0.80 otherwise

(1)

where MTS is Manual Textual Similarity [11] of commands. MTS is calculated based on the commands’ textual description as provided by the Microsoft Official Documentation2 . Each command’s textual description is converted into a Term Frequency-Inverse Document Frequency (TFIDF) matrix. These matrices are compared to calculate how similar the textual description of two commands is. For example, del and erase have high similarity, so their MTS score is greater than the given threshold. Similarly, for command-line commands which have just one SUBCOMM token: 

SUBCOMM =

1, 0,

if subcommands are same otherwise

(2)

For the command-line commands with more than one SUBCOMM, the position of SUBCOMM is significant.: 

SUBCOMM =

1, 0,

if subcommands match at the same indices otherwise

(3)

In case, both the command-line commands do not contain SUBCOMM, the value will be set as −1. For FLAG, we have set a threshold of 0.9. If both the command-line commands have FLAG tokens, and their similarity is more than 0.9, we set the value of FLAG as 1, otherwise 0. For example, three flags /p, /s, /f are in the first command-line and /s, /f, /r, are in the second command-line, the value will be 0 as the similarity score of these flags is only 0.66. This very high threshold value puts a strict check on assigning a score of 1 to FLAG. If both the command-line commands do not contain FLAG tokens, their value will be −1. For the PARAM token, we set a lenient threshold value for similarity. If two command-line commands have PARAM tokens, and their similarity is more than 2/3, we set their value as 1 otherwise 0. If PARAM is not present in both the command-line commands, the value will be −1. Upon receiving two new commands, we compare their tokens based on the above described set of rules. The tokens values will be 0, 1 or −1 indicating COMM, SUBCOMM, FLAG, and PARAM match with each other or not. Once we determine the token values, we map their combination with the reference table and classify them as Similar or Not-Similar based on the value of column OUTPUT. 2

https://docs.microsoft.com/en-us/windows-server/administration/windowscommands/windows-commands.

434

Z. Hussain et al.

Algorithm 1 Commands’ Comparison using Set of Rules Initialization for ind in indices do j=0 while j “D:\Program Files(x86)\Microsoft Azure Site Recovery \home\svsystems\\var\\services\\esx”” When this command is compared with an edited version of itself as shown below where the tokenizer detects the only double-quotes and makes one token with the label PARAM, the rule-based system classifies it wrongly as not-similar because of a varying number of PARAM tokens, whereas the ML system (DL sentence-pair classifier) classifies it correctly as similar. cmd.exe /x/d/c echo 1591825808 > “D:\Program Files(x86)\Microsoft Azure Site Recovery \home\svsystems\\var\\services\\esx” The reason for the correct classification by the ML system is that it is trained on enough data that it learned the structure of nested double-quotes whereas for the rule-based system there are six PARAM tokens in the original command and one PARAM token in the edited version of the command.

9.2

Testing on Unseen Data

The data used for the training and evaluation is of one machine, shares the same structure, and there are hundreds of commands with minor changes in the values. Therefore, the models are performing well. To verify the generalization of models, we tested the three models against unseen data. We selected 75 commands randomly from Stack Overflow 4 and made combinations of each command with the other 74 4

https://stackoverflow.com/.

Combining Rule-Based System and Machine Learning

439

Table 7. Comparison of the three models’ performance on unseen data Models

Accuracy

Baseline logistic regression 0.576 DL document classifier

0.943

DL sentence pair classifier 0.983

commands. This gave us a total of 2640 pairs of unseen commands for testing. It is worth mentioning that all these commands are formed in a way that follow the same structure as our training data. All the pairs of commands are classified using the rulebased system. Out of the 2640 commands, 1087 commands are classified as Similar, and 1553 commands as Not-Similar. We passed these commands to the three models and compared their outputs with the output of the rule-based system. The results in Table 7 show that our logistic regression model did not perform well on the unseen data, as it has an accuracy of a mere 0.576. Whereas the DL document classifier has accuracy of 0.943 in classifying unseen commands. Once again, the DL sentence-pair classifier outperformed the other two models by classifying unseen commands with an accuracy 0.983. These results indicate that when a rule-based approach is combined with machine learning, adequate results can be achieved for semi-natural complex data.

10

Discussion

Semi-natural languages, such as markup languages, algorithms, command-line commands, and processes are hard to analyze in the absence of a ground truth and without any syntactic and semantic knowledge. To add the extra layer of knowledge, expert opinions can be useful in creating a set of rules. These rules act as grammar, add syntactic and semantic meanings, and help in understanding the sentence (commands) structures just like natural languages. In general, all the possible analyses can be performed using a set of rules, but a rule-based system is not efficient when the data size is increasing exponentially and insights from data are needed continuously. Maintaining a rule-based database is also an exhaustive and expensive task. To solve these problems, we studied a hybrid approach of a rule-based system with a machine learning system. This approach is useful in finding similar commands, understanding the hierarchy and dependency of the prevalent flags and parameters, and learning the structure of the commands. The approach of combining a rule-based system with machine learning is not common because machine learning has the capability to solve the problems related to quantitative data, image data, and natural languages solely. Since our data is of semi-nature and unlabeled, applying machine learning exclusively is not a feasible solution. With this hybrid approach, we managed to solve this problem, and this approach can also be useful for other use cases where data is of semi-nature and expert opinions are needed. For example, a lot of research has already been done to detect code clones but this hybrid approach has not been explored before. To detect code clones or to analyze code structures, chunks of codes can be clustered. Following the proposed approach, experts can help in labeling the pieces of codes as function, declarative statement, conditional statement etc. and then using carefully created set of rules, codes similarities can be detected. Another possible use for this approach can be in the conversion of algorithms

440

Z. Hussain et al.

to code, where parts of an algorithms can be labeled, such as instruction, variable, condition, loop etc. Then applying set of rules created by the domain experts it is possible to detect similar algorithms. This can be helpful in evaluating how structurally similar codes can be generated against the similar algorithms of a cluster. Though this approach is proven beneficial in certain cases as discussed earlier but it needs a lot of human effort in the beginning. The customized labeling of tokens also requires extensive domain knowledge and adequate time. Experts need to be careful while creating a set of rules, as they can keep increasing with the complexity of the data. Besides all these limitations, this hybrid approach is a promising solution to a problem which involves complex and semi-natural data, and where the experts opinions are required to understand the sensitivity of the domain.

11

Conclusion

The objective of this study was to first create a rule-based system for the commands data and then build ML models on top of it. Without the rule-based system, applying ML models do not yield satisfactory results as they will always be missing the context of the commands. Since the rule-based system is exhaustive and expensive to maintain, we need a more robust ML system for the classification of commands. As discussed above, there is always uncertainty with the tokenizer tool to label a token wrongly. An ML system trained on enough data can easily ignore these mistakes and classify them correctly. The experts take a random sample of the ML system’s output and identify the mis-classifications. They manually correct the class of the commands and complement the training data to create a more diverse data-set. The inclusion of experts in the ML system can improve its performance. The studied approach and its results indicate that with the help of expert opinions, and a carefully created set of rules, machine learning models can solve a complex problem, such as the classification of unsupervised semi-natural language data.

12

Future Work

For the future work, we plan to cluster the commands and upon receiving a new command would map it to any existing cluster. This will help us in analyzing the structure of the commands by calculating inter-class and intra-class similarities. If a new command does not map with any of the existing clusters, a new cluster will be started if the command is safe, otherwise the new command will be part of risky cluster of commands.

References 1. Villena-Rom´ an, J., Collada-P´erez, S., Lana-Serrano, S., Gonz´ alez, J. : Hybrid approach combining machine learning and a rule-based expert system for text categorization. In: FLAIRS Conference (2011) 2. Melero, M., Aikawa, T., Schwartz, L.: Combining machine learning and rule-based approaches in Spanish and Japanese sentence realization. In: INLG 2002 (2002) 3. Pihlqvist, F., Mulongo, B.: Using rule-based methods and machine learning for short answer scoring (2018)

Combining Rule-Based System and Machine Learning

441

4. Ng, A.Y.: Feature selection, L1 vs. L2 regularization, and rotational invariance. In: Proceedings of the Twenty-First International Conference on Machine Learning. ICML 2004, Banff, Alberta, Canada, vol. 78. Association for Computing Machinery, New York (2004). 1581138385 5. Mladeni, D., Brank, J., Grobelnik, M.: Document classification. In: Sammut, C., Webb, G.I. (eds.) Encyclopedia of Machine Learning, pp. 289–293. Springer, Boston (2010). 978-0-387-30164-8. https://doi.org/10.1007/978-0-387-30164-8 230 6. Chicco, D., Jurman, G.: The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom. 21 (2020). https://doi.org/10.1186/s12864-019-6413-7 7. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 (2019) 8. Qing, L., Jing, W., Dehai, Z., Yun, Y., Wang, N.: Text features extraction based on TF-IDF associating semantic. 12, 2338–2343 (2018). https://doi.org/10.1109/ CompComm.2018.8780663 9. Zhang, Y., Zhou, Y., Yao, J.T.: Feature extraction with TF-IDF and gametheoretic shadowed sets. In: Lesot, M.-J., et al. (eds.) IPMU 2020. CCIS, vol. 1237, pp. 722–733. Springer, Cham (2020). https://doi.org/10.1007/978-3-03050146-4 53 10. Vaswani, A., et al.: Attention is all you need. arXiv:1706.03762 (2017) 11. Hussain, Z., Nurminen, J.K., Mikkonen, T., Kowiel, M.: Command similarity measurement using NLP. In: 10th Symposium on Languages, Applications and Technologies (SLATE 2021), p. 13:1 14. Schloss Dagstuhl - Leibniz-Zentrum f¨ ur Informatik, Dagstuhl, August 2021. (Open Access Series in Informatics; vol. 94) 12. Bedziechowska, J.: NLP for cyber security - language model for command lines @ F-Secure. https://www.youtube.com/watch?v=yORkNjBzuN0&ab channel=GHOSTDay%3AAMLC 13. Waltl, B., Bonczek, G., Matthes, F.: Rule-based information extraction: advantages, limitations, and perspectives. In: Proceedings of IRIS 2018 (2018) 14. Yoon, Y., Guimaraes, T., Swales, G.: Integrating artificial neural networks with rule-based expert systems. Decis. Support Syst. 11(5), 497–507 (1994). ISSN 01679236 15. https://docs.microsoft.com/en-us/windows-server/administration/windowscommands/defrag 16. Volker, T., Jurgen, H., Munchen, T., Subutai, A.: Network structuring and training using rule-based knowledge (2002). https://www.researchgate.net/profile/ Volker-Tresp/publication/2400373 Network Structuring And Training Using Rule-based Knowledge/links/0deec515be8bfa3b7b000000/Network-StructuringAnd-Training-Using-Rule-based-Knowledge.pdf 17. Gallant, S.I.: Connectionist expert systems. Commun. ACM (Association for Computing Machinery, New York, NY, USA) 31(2), 152–169 (1988). ISSN 0001-0782. https://doi.org/10.1145/42372.42377 18. Pomerleau, D.A., Gowdy, J., Thorpe, C.E.: Combining artificial neural networks and symbolic processing for autonomous robot guidance. Eng. Appl. Artif. Intell. 4(4), 279–285 (1991). ISSN 0952-1976. https://www.sciencedirect.com/science/ article/pii/0952197691900425

ASDLAF: A Novel Autism Spectrum Disorder Learning Application Framework for Saudi Adults Yahya Almazni1,2(B) , Natalia Beloff2 , and Martin White2 1 Department of Information Systems, Najran University, Najran, Saudi Arabia

[email protected]

2 Department of Informatics, University of Sussex, Brighton, UK

{n.beloff,m.white}@sussex.ac.uk

Abstract. Autism spectrum disorder (ASD) is a lifelong developmental disorder that causes difficulties in social interaction and communication for individuals with ASD. Although many studies about autism have been conducted across the world especially in Western countries, the topic is still in its infancy stage in developing countries. In particular, little is known about ASD in Saudi Arabia. With the increase of autism prevalence, it is noticeable that there is a lack of services for ASD people - especially adults. In this context, this study reviews and outlines the current situation with ASD services for adults in Saudi Arabia and how technology has supported the autism society in the region. This position paper advances the literature by developing a proposed novel Autism spectrum disorder learning application framework (ASDLAF) to investigate how ASD learning applications can help adults in Saudi Arabia by discovering and assessing factors that most influence the adoption of ‘ASD learning applications’. The paper also outlines the methodology to be followed for ASDLAF evaluation. The result provides a comprehensive blueprint for applications developers and offers salient insights for the Saudi government for special education improvements. Keywords: Autism spectrum disorder (ASD) · Learning applications (LA) · M-learning · Assistive technology (AT) · Technology-enhanced learning

1 Introduction Autism spectrum disorder (ASD) is a lifelong developmental disorder that causes different challenges for children and adults with their social skills, repetitive behaviours, and speech and nonverbal communication. There is no one cause of autism; consequently, there is no cure for autism at the present time. However, reducing the severity of a person’s case is possible by applying some behavioural training sessions—early interventions are preferred. The prevalence of autism is increasing all over the world [1]. According to the Centers for Disease Control and Prevention [2], the current estimate is 1 in 68 individuals with autism. However, the number is increasing rapidly over the years compared to previous estimates. Over the next two decades, many ASD children © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 K. Arai (Ed.): IntelliSys 2022, LNNS 542, pp. 442–458, 2023. https://doi.org/10.1007/978-3-031-16072-1_33

ASDLAF: A Novel Autism Spectrum Disorder Learning Application Framework

443

and teens will enter the adulthood stage and leave school life while some of them have not received any health care yet causing a challenging, may be impossible, environment for them. Another study reported that 2.21% of adults in the USA have been diagnosed with autism [3]. In England, a study reported that ASD affects at least 1.1% of adults individuals [4]. The estimated prevalence of autism cases in Saudi Arabis is 42,000, while other cases remain undiagnosed [5] Adults with autism may have a high rate of unemployment, and it is a high possibility that they will grow up with the same difficulties as they experienced as a child, or even worse since people in this particular age group are looking for jobs, making friends networks, and engaging in new and various environments; the workplace for instance. As recognized by specialists, people with autism tend to feel safer and more comfortable when using technologies to communicate with others instead of using a direct contact approach; hence, this paper will discuss intervention services and behavioural programs as well as how technology can play a significant role in improving the quality of life of autistic people. Interventions to improve autistics’ skills are not an easy task as the severity of ASD for each individual differs from one to another; thus, precise and deep diagnosis is considerably important as some ASD people have certain sensitivities [6]. Interventions have started as therapeutic training especially in the childhood stage. It is important to keep in mind that autism is a neurodevelopmental disorder, meaning the cause is incurable [7]. However, there are some training programs that aim to improve the way ASD people adapt to their environment instead of curing the cause of autism. As recommended by experts, the earlier an intervention is applied, the more positive results can be achieved. Any delay in autism diagnosis may complicate the situation to a really serious issue. In Arab countries, late ASD diagnosis has led to burdens and complex impairment for ASD individuals and their families [8]. The reasons for late diagnosis could be that some people hesitate to be officially assessed for personal reasons. While other people know that they have some autism symptoms, but they think that they can manage their lives independently. Another reason can be the limited awareness about autism in most Arab countries. In this paper, we aim to understand the factors that most influence the adoption of ASD learning applications (LA) for adults in Saudi Arabia. The paper is organized as follows. The next section will discuss relevant studies. Section three presents the framework proposed for this research in detail. Then, in section four, methodology and future work will be discussed to evaluate the proposed framework and conclude the paper.

2 Related Work Assistive technology (AT) is software or hardware that aims to assist individuals in specific activities by providing solutions for a problem. It has become an important concept in various disability domains requiring assistance such as mobility devices, alternative input devices, or peripherals that can assist disabled people at any age. Assistive technology (AT) has proven its positive benefits for children with disabilities [9]. Enabling users not only to accomplish tasks but also to control their environments can increase their quality of life. Furthermore, AT can also preserve the skills that a person has such

444

Y. Almazni et al.

as math skills, decision making, and reading, and explore more skills. Computer and Internet technologies have offered spectacular opportunities and great effort in order to help people with ASD to overcome many obstacles like verbal, interaction problems, and communications difficulties. A study on facial recognition training for ASD individuals has reported that benefits in facial recognition are associated with brain activation support [10]. The more technological advances occur, the more effective treatment ASD people will receive. At the International Meeting for Autism Research (IMFAR), where the idea of autism and technology has started, there were eight accepted technology presentations in 2004 which increased in 2008 up to 36 [11]. Assistive technologies include the Internet, online communities, robotics, assistive and prompting devices, computer-aided instruction, video modelling, video instruction, virtual reality, voice communication devices, telecommunication, and computer training. Learning applications (LA) can apply most of these services in one device as a solution and helpful resource for an individual with special needs. However, many assistive technologies that were developed to help people with ASD may not be scientifically proven yet. Augmentative and Alternative Communication (ACC) refers to systems and devices that are developed to assist people with speech impairments. Despite the few studies that have been conducted in Saudi Arabia on LA used by ASD adults, this section considers how LA can have an impact on ASD people as well as relevant studies. In Saudi Arabia, there has been an indication of implementing technologies in its schools as a part of Saudi programs to improve human resources. A promising Saudi project was publicized in 2000 by Crown Prince Abdullah to integrate technology in classrooms [5]. This was great news for the autism community in Saudi Arabia since this kind of tool is the preferred approach by the autistic community to communicate and learn. In their study, Al-Wabil et al. [12] outlined the analysis and design of an interactive program in development for children. The goal of the study was to evaluate how these kinds of programs can help ASD children with their Arabic-speaking for communications, and they found that the main requirements were sound sensitivity and idiosyncratic preferences children, and a sense of engagement and progress tracking for rehabilitation specialists. When it comes to adults, progress tracking and a sense of engagement are also important when developing ASD adults’ LA; plus, it is important to deeply understand the current cultural factors in the region. Some requirements can be added to best suit adults’ needs such as negotiation scenarios and voice expressions understanding. ‘Tapto-Talk’ and ‘Touch-to-Speak’ are both applications that aim to help children improve their communications skills. These applications have been evaluated by Al-Wakeel et al. [13] in terms of usability and outlined their advantages and disadvantages, and they demonstrated that both have several advantages such as users can deal with a variety of pictures with the ability to customize them as preferred, and children find them easy to use. Moreover, a few disadvantages have been stated, e.g. in ‘Touch-to-Speak’ displays too many pages which can distract the user’s attention while in ‘Tap-to-Talk’ centralizing pictures inside categories can be a distraction as well [14]. Both ‘Touchto-Speak’, translating a series of pictures into well-structured Arabic sentences, and ‘Tap-to-Talk’ are applications that have been developed for children’s training purposes.

ASDLAF: A Novel Autism Spectrum Disorder Learning Application Framework

445

Yet, children’s LA may not be a suitable choice for adults because of the age difference, design features, and expected outcomes. There are few ACC solutions with Arabic language support that exists in the marketplace and most of them are compatible for children which are Picture Exchange Communication (PECs) based [15]. ‘Talk the Talk’ is the first Arabic application being developed to help ASD children improve their conversation skills by teaching them basic daily life words [16]. However, this application has not been implemented yet. In 2020, Alnaghaimshi et al. [17] have proposed an application called ‘Autismworld’ aiming to provide assistance for children and parents by enhancing care delivery, and an ASD signs detection tool which can enable parents for early prevention. The application aims to create a communication space with specialists to share experiences [17]. Public awareness growth about ASD is needed because autism has different levels of severity and certain services may not be the best choice. Thus, developing applications with features like sharing experiences with ASD specialists can limit the prevalence of ASD especially in Saudi Arabia as studies have proven the lack of ASD health services in many regions. This research will identify factors that most influence ASD LA used by adults in Saudi Arabia. One of the most widely used LA in the western countries is Proloquo2Go. Proloquo2Go is an AAC LA that helps ASD people improve their skills through a basic dynamic interface depending on visual support using both symbols and photographs with motor planning support (see Fig. 1, 2). Unfortunately, it is only available in English, Dutch, French, and the Spanish language.

Fig. 1. Proloquo2Go school schedule interface

The application has a lot of core words that can be used to add meaning to a certain sentence or story with several customization features. This application is not widely used by autistics in Saudi as it does not offer an Arabic version for Arabic ASD users. As a result, the demand for AAC applications that support local dialects with customization

446

Y. Almazni et al.

Fig. 2. Proloquo2Go typing training interface

options has emerged [15]. The demand for ASD adults’ LA still has not been completely fulfilled. Although these studies provide valuable insights on the importance of technology in supporting ASD individuals, they suffer from one main limitation. Most of the addressed studies including the most recent research focused mainly on children’s aspects without paying attention to different groups such as adults; unfortunately, there is a high possibility that children grow up with the same difficulties unless early interventions were given. Some individuals have not been diagnosed with ASD until turning adults due to the lack of awareness in the developing countries. Thus, more studies are urgently needed to fulfil this important gap. In this paper, we aim to propose a new acceptance model or framework to evaluate the factors that affect the adult’s intention to use ASD LA in Saudi Arabia.

3 ASDLAF Framework Saudi Arabia, like any other country, has unique traditional, cultural, and societal factors. Generally, Arab culture is a significant barrier to the diffusion of technology [18]; as Arab individuals may face some social, political, and religious rules and restrictions [19]. These rules and restrictions make the Arab society more complex. Different cultural aspects have different impacts on any technology adoption, diffusion, and acceptance [20]; moreover, individuals may get influenced when using technologies, and some beliefs may change. It is important to mention that Arab countries may have slight differences in some religious and cultural aspects [21]. Hence, not all the factors have the same impacts on technology adoption and diffusion. Our proposed model (or framework) is proposed based on studies found in the literature based on researchers’ backgrounds in the field and based on my personal knowledge and experience as a Saudi citizen. The scope of this research was taken into consideration when factors have been formed. Thus, the framework is summarized in

ASDLAF: A Novel Autism Spectrum Disorder Learning Application Framework

447

Fig. 3 and named “Autism Spectrum Disorder Learning Application Framework for Saudi Adults (ASDLAF)”. The aim of this study is to explore and examine the major factors of LA that help adults with ASD in Saudi Arabia.

Fig. 3. Autism spectrum disorder learning application framework for saudi adults (ASDLAF)

Our framework has been developed to have a dependent variable that evaluates the adoption of ASD LA, intermediate variables which offer a basic approach to analyse the framework in terms of useability, trust and acceptance, and effectiveness of the technologies, and independent variables (or factors) focused around technology, cultural and pedagogy which influence the adoption of ASD LA. There are many models (or frameworks) developed by researchers to evaluate certain factors and how they play their roles towards new technology adoption including, but no limited to, Technology Acceptance Model (TAM), Unified theory of acceptance and use of (UTAU), and Theory of Reasoned Action (TRA). These models have different factors serving and justifying different choices. 3.1 Intermediate Factors As mentioned, these intermediate variables evaluate the users’ behaviours and attitudes towards technology. Our proposed framework consists of three factors: Usability, Trust and Acceptance, and Effectiveness. Usability. Evaluations for any type of application consider usability as one of the major factors that affect the users’ experience and impression. Usability is the degree to which an application is used easily and accessibly to perform required tasks [22]. According to the Technology Acceptance Model (TAM), ease of use is how the user believes using technology is free of effort [23]. This affects how the users perform tasks when using the technology in order to meet their satisfaction. Taking into consideration that our framework is designed for ASD users to learn using LA, it is important to keep these applications simple in terms of design and content according to ASD users’ disabilities.

448

Y. Almazni et al.

Usability can determine “the success of any system” by making users adapt to the technology. Based on that, hypothesis H1 is defined: H1. If ASD LAs’ usability is increased, then this will lead to ASD users’ increased intention to the adoption of ASD LA. Trust and Acceptance. When using technology, there is always a big concern for users, especially when using mobile devices. Users start asking if they are safe and secure behind the screen. They always ask if their data is stored somewhere else but on their phones. These concerns heavily affect users’ trust and acceptance of any mobile application. When it comes to autism aspects, users have more sensitive information that, from their perspective, cannot be shared with others. Trust has been defined by many researchers and has become a vague term [24]. That is because many definitions have been acquired to define trust. Rousseau et al. [25] have defined trust as “an individual’s willingness to depend on another party because of the characteristics of the other party”. Based on that, hypothesis H2 is stated: H2. If ASD LAs’ trust and acceptance is increased, then this will lead to ASD users’ increased intention to the adoption of ASD LA. Effectiveness. The effectiveness of a particular system attracts users. In terms of learning, students or learners usually use mobile technology as a tool for various purposes including accessing learning resources such as articles [26]. Due to the fact that ASD users prefer to use technology for socializing [7] because of its efficiency, having a mobile device is a promising factor to accept the adoption of ASD LA. Enhancing ASD users’ skills and performance using LA to achieve certain objectives affects the attitude of ASD individuals and their thoughts about LA. Based on that, hypothesis H3 is formed: H3. If ASD LAs’ effectiveness is increased, then this will lead to ASD users’ increased intention to the adoption of ASD LA. To identify the main independent variables, the framework was designed based on three main dimensions as follows: • Technological Factors: this explains related technical issues that may impact ASD LA in Saudi Arabia in terms of adoption, diffusion, and usage. • Cultural Factors: this explains related cultural concepts in Saudi society that can affect the use of ASD LA. These factors can be related to traditions, rules, education, and religion. • Pedagogical Factors: this explains the approach that ASD individuals may use to learn from ASD LA. Also, it explains the learning behaviour skills. 3.2 Technological Factors Technological factors tend to discuss technical aspects and issues that may affect the adoption of new technology, characterizing how this new technology will be operated.

ASDLAF: A Novel Autism Spectrum Disorder Learning Application Framework

449

Many aiding technologies and applications for disabled people have been developed with language requirements that vary from native Arabic-speaking users [12]. This section discusses the technological factors that most influence any ASD LA implementation. Availability. Availability describes how applications are operational and functional to deliver services in an appropriate way; meeting users’ requirements. This factor measures the performance of certain applications and the ability to perform tasks as required. If an individual with ASD has become accustomed to using an application, that would become a part of his/her daily life activities; consequently, changing, modifying, or banning certain services causes serious psychological problems as studies demonstrated that changing autistic routines may lead to frustration issues [27]. However, if an application is running, that does not always mean the application is well preformed, so measurement procedures should be followed to make sure the application is functioning to meet users’ requirements. Based on that, the below hypothesis H4 is formed: H4. If ASD LA is available to function, then this will lead to ASD users’ increased intention to the adoption of ASD LA because ASD users will trust that this service will be always available use. Accessibility. Accessibility is a vital process as it allows different kinds of users to access an application including disabled users by considering the range of difficulties that might be faced. In terms of ASD technologies, accessibility is not only important for how simple and clear an application design is, but also how ASD users can interact and perform certain tasks. Language is a vital factor that can impact the usability of new technologies [28]. Individuals with autism usually prefer to interact with their parents and close friends, so it has been recommended that applications may consider allowing ASD users to customize what pictures and voices to be used on these LA since this will increase the level of usability and acceptance among ASD users. Based on that, hypothesis H5 is formulated: H5. If ASD LA are accessible properly for autistic users, then this will lead to ASD users’ increased intention to the adoption of ASD LA because they will use these applications more easily. Privacy. When it comes to privacy, mobile applications have become increasingly widespread, generating huge amounts of personal data that can contribute to privacy threats. ASD users have more sensitive data than normal users since most ASD individuals are not willing to be publicly known or information to be shared with others. As some ASD people depend on their families of close friends because of their age or level of literacy, they may get help to create an account, as an example, or set the application on their mobile devices. Hence, this may prevent them from having complete privacy. This privacy violation is likely to happen. Moreover, there is a situation where ASD individuals feel worried about the amount of information that should be shared with others [6]; as the learners may need frequent monitoring by specialists to better help and improve the users’ capabilities. Based on that, hypothesis H6 is addressed:

450

Y. Almazni et al.

H6. If ASD LAs’ privacy is increased, then this will lead to ASD users’ increased intention to the adoption of ASD LA because they will trust that their personal data will be secured. Cost. Despite the high cost of special education in private schools and the huge money spent for autism centres, there are extra expenses that parents and ASD individuals themselves have to pay including conciliations, medical expenses, and other services [29]. Mobile devices cost has been incredibly increasing let alone smartphones that can be used for ASD learners [30]. Internet service is mandatory to download and use applications; hence, the cost of running smartphones is definitely an important factor [31]. These include the high cost of purchasing a smartphone and apps and paying Internet bills. Based on the above, hypothesis H7 is stated: H7. If ASD LA are at a free or reasonable cost, then this will lead to ASD users’ increased intention to the adoption of ASD LA because they will accept these applications available with low cost. 3.3 Cultural Factors Cultural factors are a set of values and beliefs with which a group of people lives. This includes, but is not limited to, education, social rules, awareness, and religion. These factors shape the lifestyle of a society in a certain region. In terms of new technology adoption acceptance and growth, it has been stated that culture is a vital factor [32–34]. Baggozi [35] claimed that the Technology Acceptance Model (TAM) has failed to consider the importance of social and cultural perspectives in accepting new technologies. Determining which cultural factors that impact the acceptance of ASD LA adoption is mandatory; thus, the following is a set of cultural factors explaining how these factors can play a significant role in new technology adoption. Social Rules. Social rules can be defined as attitudes, norms, and behaviours that are expected to be followed by anyone in a particular society. Social rules have been considered as hindrances to successful mobile learning adoption in education due to smartphones distractions [30]. On the other hand, individuals with ASD may intensively focus on activities that interest them without any distractions. Another aspect of social rules is independence. In Saudi Arabia and most of the Arab regions, people with a disability depend highly on their parents and most of them do not leave their families’ houses unless there are reasons to move out [36], which increases the level of dependence. Women usually live with their parents as long as they are unmarried [37] unless they have to live somewhere else for work reasons as an example. Some families avoid sharing or discussing their ASD diagnosis with others for many conservative reasons including feeling shame, unnecessariness, the capability of managing their lives without the need for assistance and protecting themselves from harsh societal judgments. Alqahtani [38] conducted a study to evaluate parents’ beliefs towards their children’s autism diagnosis and it has been found that parents feel guilty that they cause autism to their children. More surprisingly, some parents believe that the cause of autism can be medical, nonmedical, and cultural reasons such as an evil eye, childhood vaccines, which studies do not support this hypothesis [39], and black magic. The

ASDLAF: A Novel Autism Spectrum Disorder Learning Application Framework

451

different beliefs of parents’ knowledge about autism inferred how Saudis have various cultural backgrounds. Most assistive technologies and LA have been developed according to the language and culture of western countries, preventing those who speak a different language from getting benefits from these technological solutions. Reinecke and Bernstein [28] reported that certain factors such as language can influence the satisfaction of users’ experiences. People who do not understand English find it difficult to deal with the technology [40]. In Saudi Arabia, thanks to its vast areas and people’s various backgrounds, there are many accents that people use when speaking or writing. However, software developers should be aware of the social rules such as gender roles, religious symbols, and humour [41] since users will apply their cultural beliefs when using technologies. All of these social rules can influence the adoption of ASD LA. Based on the above, hypothesis H8 is formed: H8. If social rules prevent ASD users to use learning applications, then this will lead to ASD users’ decreased intention to the adoption of ASD LA because they will not socially accept these applications. Awareness. Awareness about autism is limited in Arabs countries [42] as well as in Saudi Arabia due to the uncertainty of autism causes [43] and the lack of services. However, the majority of Saudis have heard about autism [44] but with confusion surrounding its organic causes [45]. As mentioned in previous sections, there are different beliefs, based on culture, about the cause of autism such as an evil eye in addition to limited knowledge about treatments. However, Hussein et al. [46] reported that the level of awareness in Saudi Arabia is good due to many nongovernmental organizations’ efforts; this can increase the acceptance of technological solutions for autistic individuals especially LA adoption. ‘Autismworlds’ is an Arabic version application that aims to help parents with a handy tool to detect signs of autism [17], which can also help in raising the awareness of autism among society. Recently, the Saudi government approves a national policy for national survey of Autism Spectrum Disorder (ASD) [47]. This will definitely raise the level of awareness among people. Based on that, hypothesis H9 is formulated: H9. If awareness of autism is increased, then this will lead to ASD users’ increased intention to the adoption of ASD LA because they will have a better understanding to accept these LA. Education. Among many factors, education may be one of the most important variables. The level of individuals’ education affects the acceptance of new technology [48, 49]. Research demonstrates that education is positively correlated with the adoption of technology [50]. However, according to studies [51, 51], ASD individuals in Arab countries do not receive proper support with limited services. This limits the users’ abilities to use LA as they are poorly educated. Hence, the Saudi government has improved special education services by ensuring that students with disabilities receive proper education services. It also has passed laws to ensure any disabled student receives free and proper education [53]. In 2004, the Saudi government began investing in special education research and encouraging and financially supporting universities to establish special

452

Y. Almazni et al.

education and autism programs [45]. Providing more educational services helps ASD individuals have better outcomes [6], independence as an instance. Based on the above, hypothesis H10 is addressed: H10. If ASD users are educated, then this will lead to ASD users’ increased intention to the adoption of ASD LA because they will have proper skills and accept these LA. Religion. Religion has played a major role in shaping social and cultural beliefs and governmental policies. New technologies acceptance may rely on some religious beliefs that prevent users from accepting to accept and deal with the technology. Technologies that have been developed according to western countries’ culture and religion may not be accepted in Saudi Arabia due to religious rules differences. In Saudi Arabia, music has been forbidden for many people beliefs as a religious reason. However, listening to and performing music has been a debatable topic in Muslim countries [54]. Mashat [14] found that music was not a big concern for ASD users when using social media. Music is used in the UK to help adults with high functioning autism (HFA) in managing mood and social integration. Based on that, hypothesis H11 is addressed: H11. If religious rules limit the use of ASD LA, then this will lead to ASD users’ decreased intention to the adoption of ASD LA because they will not accept these LA. 3.4 Pedagogical Factors Generally, there are several learning approaches have been used in education. However, some individuals with ASD have learning difficulties even if the level of intelligence exists. Autism and learning disability are co-associated since a large number of autistic struggle in functional speech, where speech and language are highly correlated with the Intelligence Quotient Test (IQ) [55]. The impact of learning difficulties can create huge barriers for ASD people to accept and use LA. Responsivity. It is important to keep in mind that some ASD individuals are hypersensitive to environmental stimuli such as bright lights, loud sounds, or strong smells [6]. These factors may influence the ASD users’ acceptance of LA. Additionally, ASD individuals tend to face difficulties in facial expression [56], while Kana [57] found that ASD individuals are more reliant on visualization to help understand a sentence. Picture Exchange Communication System (PECS) has shown its effectiveness in children with ASD, especially for prompting speech in children [58], and modest usefulness indicated in children learning [59]. Using graphic symbols and visual support have gained acceptance as a set of practices for ASD individuals [60]. It is known that ASD individuals are visual thinkers, meaning they think in pictures. They may need a longer time to think as they find it difficult to follow in sequence. These factors imply how the LA developers can create responsive interfaces for those ASD users to better meet their requirements. Based on that, hypothesis H12 is formed: H12. If ASD LA provide proper responsive interfaces, then this will lead to ASD users’ increased intention to the adoption of ASD LA because ASD LA will be effective and easy to use.

ASDLAF: A Novel Autism Spectrum Disorder Learning Application Framework

453

Motivation. The main goal of any assistive technology is to help disabled users apply what they acquired from these solutions to real life. Many caregivers confirmed that individuals with autism have no motivations to perform everyday life skills such as shaving, dressing, or household chores [61], besides their desire to avoid socializing with others [62, 63]; instead, ASD individuals find it easy and safe for them to deal with technology [7], in which they can avoid eye contact since it is considered as one of the autistic difficulties. Thus, one of the technology objectives is to keep ASD users motivated by taking into consideration their limitations in order to ensure gaining their trust. Based on the above, hypothesis H13 is formed: H13. If ASD users have motivations to use learning applications, then this will lead to ASD users’ increased intention to the adoption of ASD LA because they will use these LA effectively. Age. There are many examples of ASD LA that are designed for children, yet these technology-based products might not be suitable for adults. Adults interact with technology differently [64]. The reason might be because younger people have been growing with the technology, so they become more familiar to use technologies. When ASD adults are engaged in a new environment, they perhaps need some time to become accustomed to settling in and feel comfortable. Structured learning strategies are important because unknowing what to do next makes them feel anxious. So, when using sentences, it takes them a long time to process, so pictures and symbols work better. Encouraging and praise also make them feel good about themselves which results in increasing their confidence and independence. Age plays an important role in accepting new technology and its usability as well [65]. Based on the above, hypothesis H14 is addressed: H14. If ASD LA are suitable for adultescent age, then this will lead to ASD users’ increased intention to the adoption of ASD LA because they will find these LA effective and easy to use. Learning Behaviour. Generally, adults are self-directed to control their learning experiences with responsibility skills. However, this might not be the same case when speaking about ASD adults as they may depend on their parents, teachers, or caregivers, at least in Arab countries [36] and especially for women as they remain in their parents’ house until getting married [37]. This increases the level of dependence as they will not practice any self-responsibility skills. Bishop [62] has designed a collaborative learning technology, called ‘Portable Affect Recognition Learning Environment (PARLE)’, that explains certain idioms and phrases for ASD individuals and suggests proper responses. The ASD users found this learning solution useful. Learning behaviour may also shape how these learning products are designed for ASD individuals by taking into account what they prefer to deal with and what limitations might prevent them from accepting the new technology-based solutions. Based on that, hypothesis H15 is formed: H15. If ASD LA are aligned with ASD individuals learning behaviours, then this will lead to ASD users’ increased intention to the adoption of ASD LA because they will use these LA effectively and easily.

454

Y. Almazni et al.

Readiness. Individuals with ASD have different impairments of autism where some of them are not ready to learn. Yet, those with high functioning autism (HFA) exhibiting no intellectual disability may seem ready to learn. However, as mentioned in previous sections, there are many factors that can limit ASD individuals’ readiness to learn or use technology; these factors can be, but are not limited to, independence, responsibility, desire, and skills. Based on that, hypothesis H16 is formulated: H16. If ASD users are ready to use LA independently, then this will lead to ASD users’ increased intention to the adoption of ASD LA because they will use these LA effectively and easily. This section addressed important factors that affect the intention of ASD LA adoption for adults in Saudi Arabia along with hypotheses. These hypotheses aim to verify the presented factors to examine the relationship between these factors and the intention of ASD LA adoption.

4 Conclusion and Future Work This study aims to identify factors that most influence ASD LA used by adults in Saudi Arabia in order to understand the current situation of autism and effectiveness of associated LA in Saudi Arabia. A new acceptance model (or framework) for Saudi autistic adults in LA has been developed to explore and examine the effectiveness and usability of the LA that help adults with ASD. This position paper is a part of ongoing research which will be followed by a data collection for validation and evaluation of the proposed framework. The study will aim to understand in detail and explain the current difficulties that prevent ASD adult users from using LA sufficiently by using a mixed research method approach, i.e. quantitative surveys and qualitative interviews. The quantitative approach will investigate the ASD users’ experience to accept the LA adoption by distributing an online survey among Saudi Arabian ASD individuals and their parents as well as caregivers, if needed. The target sample size is 381 participants according to the population of ASD people in Saudi Arabia. The results will help make significant contributions to each of our framework factors. The survey questions will follow Likert scale questions and will be analysed using descriptive and inferential statistics. In the follow-up stage, the qualitative method will be applied by conducting interviews with ASD individuals or their parents and caregivers to better investigate to what extent ASD LA can help ASD affected adults in Saudi Arabia. Interviews will be conducted face-to-face by the researcher using semi-structured questions with a total of five to ten interviewees. The study provides insights for ASD individuals, their families, and caregivers. The study also provides insights for the Saudi government to help enhance the special education policies. Moreover, the results of this study will be helpful for the developers of learning systems for ASD people, as well as for teaching and learning technologists working with ASD individuals.

ASDLAF: A Novel Autism Spectrum Disorder Learning Application Framework

455

References 1. Ouhtit, A., et al.: underlying factors behind the low prevalence of autism spectrum disorders in Oman, p. 5 (2015) 2. Centers for Disease Control and Prevention (2014). http://www.cdc.gov/media/releases/2014/ p0327-autism-spectrum-disorder.html. Accessed 30 Nov 2021 3. Centers for Disease Control and Prevention (2017). https://www.cdc.gov/ncbddd/autism/fea tures/adults-living-with-autism-spectrum-disorder.html. Accessed 30 Nov 2021 4. Brugha, T., et al.: Estimating the prevalence of autism spectrum conditions in adults, p. 31 (2012) 5. Alotaibi, F., Almalki, N.: Saudi teachers’ perceptions of ICT implementation for student with autism spectrum disorder at mainstream schools. J. Educ. Pract. 7, 116–124 (2016) 6. Burke, M., Kraut, R., Williams, D.: Social use of computer-mediated communication by adults on the autism spectrum. In: Proceedings of the 2010 ACM conference on Computer supported cooperative work - CSCW 2010, Savannah, Georgia, USA, p. 425 (2010). https:// doi.org/10.1145/1718918.1718991 7. Benford, P., Med, B.: The use of Internet-based communication by people with autism, p. 391 (2008) 8. Seif Eldin, A., et al.: Use of M-CHAT for a multinational screening of young children with autism in the Arab countries. Int. Rev. Psychiatry 20(3), 281–289 (2008). https://doi.org/10. 1080/09540260801990324 9. Hutinger, P., Johanson, J., Stoneburner, R.: Assistive technology applications in educational programs of children with multiple disabilities: a case study report on the state of the practice. J. Spec. Educ. Technol. 13(1), 16–35 (1996). https://doi.org/10.1177/016264349601300103 10. Bölte, S., Hubl, D., Feineis-Matthews, S., Prvulovic, D., Dierks, T., Poustka, F.: Facial affect recognition training in autism: can we animate the fusiform gyrus? Behav. Neurosci. 120(1), 211–216 (2006). https://doi.org/10.1037/0735-7044.120.1.211 11. Bölte, S., Golan, O., Goodwin, M.S., Zwaigenbaum, L.: What can innovative technologies do for autism spectrum disorders? Autism 14(3), 155–159 (2010). https://doi.org/10.1177/ 1362361310365028 12. Al-Wabil, A., Al-Shabanat, H., Al-Sarrani, R., Al-Khonin, M.: Developing a multimedia environment to aid in vocalization for people on the autism spectrum: a user-centered design approach. In: Miesenberger, K., Klaus, J., Zagler, W., Karshmer, A. (eds.) ICCHP 2010. LNCS, vol. 6180, pp. 33–36. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3642-14100-3_6 13. Al-Wakeel, L., Al-Ghanim, A., Al-Zeer, S., Al-Nafjan, K.: A usability evaluation of arabic mobile applications designed for children with special needs — autism. LNSE 3(3), 203–209 (2015). https://doi.org/10.7763/LNSE.2015.V3.191 14. Mashat, A.A.: Cultural factors and usability of online social networks by adults with autism spectrum disorder (ASD) in Saudi Arabia, p. 329 (2016) 15. Al-Arifi, B., Al-Rubaian, A., Al-Ofisan, G., Al-Romi, N., Al-Wabil, A.: Towards an Arabic language augmentative and alternative communication application for autism. In: Marcus, A. (ed.) DUXU 2013. LNCS, vol. 8013, pp. 333–341. Springer, Heidelberg (2013). https://doi. org/10.1007/978-3-642-39241-2_37 16. Al-Ghamdi, F.: Two students design the first rehabilitation application for autistic children (2014). https://www.alwatan.com.sa/article/212382. Accessed 30 Dec 2021 17. Alnaghaimshi, N.I., Alhazmi, A., Alqanwah, S.A., Aldablan, M.S., Almossa, M.A.: Autismworld: an Arabic application for autism spectrum disorder. In: 2020 3rd International Conference on Computer Applications & Information Security (ICCAIS), Riyadh, Saudi Arabia, March 2020, pp. 1–6 (2016). https://doi.org/10.1109/ICCAIS48893.2020.9096811

456

Y. Almazni et al.

18. Loch, K.D., Straub, D.W., Kamel, S.: Diffusing the Internet in the Arab world: the role of social norms and technological culturation. IEEE Trans. Eng. Manag. 50(1), 45–63 (2003). https://doi.org/10.1109/TEM.2002.808257 19. Al Omoush, K.S., Yaseen, S.G., Atwah Alma’aitah, M.: The impact of Arab cultural values on online social networking: the case of Facebook. Comput. Hum. Behav. 28(6), 2387–2399 (2012). https://doi.org/10.1016/j.chb.2012.07.010 20. Srite, M., Karahanna, E.: The role of espoused national cultural values in technology acceptance. MIS Q. 30(3), 679 (2006). https://doi.org/10.2307/25148745 21. Amr, M., et al.: Sociodemographic factors in Arab children with autism spectrum disorders, p. 11 (2012) 22. Holzinger, A.: Usability engineering methods for software developers. Commun. ACM 48(1), 71–74 (2005). https://doi.org/10.1145/1039539.1039541 23. Davis, F.D.: A technology acceptance model for empirically testing new end-user information systems: theory and results, Ph.D. dissertation. Massachusetts Institute of Technology (1986) 24. McKnight, D.H., Chervany, N.L.: What trust means in E-commerce customer relationships: an interdisciplinary conceptual typology. Int. J. Electron. Commer. 6(2), 35–59 (2001). https:// doi.org/10.1080/10864415.2001.11044235 25. Rousseau, D.M., Sitkin, S.B., Burt, R.S., Camerer, C.: Introduction to special topic forum: not so different after all: a cross-discipline view of trust. Acad. Manag. Rev. 23(3), 393–404 (1998). http://www.jstor.org/stable/259285 26. Sarrab, M.: M-learning in education: Omani undergraduate students perspective. Procedia Soc. Behav. Sci. 176, 834–839 (2015). https://doi.org/10.1016/j.sbspro.2015.01.547 27. Gillott, A., Furniss, F., Walter, A.: Anxiety in high-functioning children with autism. Autism 5(3), 277–286 (2001). https://doi.org/10.1177/1362361301005003005 28. Reinecke, K., Bernstein, A.: Improving performance, perceived usability, and aesthetics with culturally adaptive user interfaces. ACM Trans. 18(2), 1–29 (2011). https://doi.org/10.1145/ 1970378.1970382 29. Gazette, S.: Parents complain from short support for autistic children in Saudi Arabia (2014). https://english.alarabiya.net/News/middle-east/2014/03/03/Parents-upset-withthe-way-institutions-handling-autistic-children. Accessed 30 Dec 2021 30. Aljaber, A.A.M.: ‘The reality of using smartphone applications for learning in higher education of Saudi Arabia, p. 280 (2021) 31. Woodcock, B., Middleton, A., Nortcliffe, A.: Considering the smartphone learner: developing innovation to investigate the opportunities for students and their interest. SEEJ 1(1) (2012). https://doi.org/10.7190/seej.v1i1.38 32. Ali, M., Weerakkody, V., El-Haddadeh, R.: The impact of national culture on e-government implementation: a comparison case study, p. 13 (2009) 33. Sunny, S., Patrick, L., Rob, L.: Impact of cultural values on technology acceptance and technology readiness. Int. J. Hosp. Manag. 77, 89–96 (2019). https://doi.org/10.1016/j.ijhm. 2018.06.017 34. Curry, G.N., et al.: Disruptive innovation in agriculture: socio-cultural factors in technology adoption in the developing world. J. Rural Stud. 88, 422–431 (2021). https://doi.org/10.1016/ j.jrurstud.2021.07.022 35. Bagozzi, R.: The legacy of the technology acceptance model and a proposal for a paradigm shift. JAIS 8(4), 244–254 (2007). University of Michigan. https://doi.org/10.17705/1jais. 00122 36. Aboul-Enein, B., Aboul-Enein, F.: The cultural gap delivering health care services to Arab American populations in the United States.pdf (2010) 37. Haboush, K.L.: Working with Arab American families: culturally competent practice for school psychologists. Psychol. Schs. 44(2), 183–198 (2007). https://doi.org/10.1002/pits. 20215

ASDLAF: A Novel Autism Spectrum Disorder Learning Application Framework

457

38. Alqahtani, M.M.J.: Understanding autism in Saudi Arabia: a qualitative analysis of the community and cultural context. J. Pediatr. Neurol. 15, 1 (2012) 39. Farrington, C.P., Miller, E., Taylor, B.: MMR and autism: further evidence against a causal association. Vaccine 19(27), 3632–3635 (2001). https://doi.org/10.1016/S0264-410X(01)000 97-4 40. Al-Jarf, R.S.: Connecting students across universities in Saudi Arabia, p. 12 (2005) 41. Evers, V.: Human - computer interfaces: designing for culture, p. 73 (1997) 42. Essa, M.M., et al.: Increased markers of oxidative stress in autistic children of the sultanate of Oman. Biol. Trace Elem. Res. 147(1–3), 25–27 (2012). https://doi.org/10.1007/s12011-0119280-x 43. Alarfaj, M.: Autism in Saudi Arabia Perspectives of parents and educational professionals, p. 356 (2014) 44. Almana, Y., Alghamdi, A., AL-Ayadhi, L.: Autism Knowledge among the public in Saudis Arabia. Int. J. Acad. Scient. Res. 5(1), 9 (2017) 45. Sulaimani, M., Gut, D.M.: Research article autism in Saudi Arabia: present realities and future challenges (2019) 46. Hussein, H., Taha, G.R., Almanasef, A.: Characteristics of autism spectrum disorders in a sample of egyptian and saudi patients: transcultural cross sectional study. Child Adolesc. Psychiatry Ment. Health 5(1), 34 (2011). https://doi.org/10.1186/1753-2000-5-34 47. Saudi Press Agency: Custodian of the two holy mosques approves national policy for national survey of autism spectrum disorder (2021) 48. Skoumpopoulou, D., Wong, A., Ng, P., Lo, M.F.: Factors that affect the acceptance of new technologies in the workplace: a cross case analysis between two universities, p. 14 (2018) 49. Mishra, A.K., Williams, R.P., Detre, J.D.: Internet access and internet purchasing patterns of farm households. Agric. Resour. Econ. Rev. 38(2), 240–257 (2009). https://doi.org/10.1017/ S1068280500003233 50. Uematsu, H.: Can education be a barrier to technology adoption?, p. 38 (2010) 51. Al-Salehi, S.M., Al-Hifthy, E.H., Ghaziuddin, M.: Autism in Saudi Arabia: presentation, clinical correlates and comorbidity. Transcult. Psychiatry 46(2), 340–347 (2009). https://doi. org/10.1177/1363461509105823 52. Amr, M., Raddad, D., El-Mehesh, F., Bakr, A., Sallam, K., Amin, T.: Comorbid psychiatric disorders in Arab children with Autism spectrum disorders. Res. Autism Spectr. Disord. 6(1), 240–248 (2012). https://doi.org/10.1016/j.rasd.2011.05.005 53. Aldabas, R.A.: Special education in Saudi Arabia: history and areas for reform. CE 06(11), 1158–1167 (2015). https://doi.org/10.4236/ce.2015.611114 54. Alamer, S.M.: Cultural Perspectives of Associating Music With the Giftedness in Saudi Arabia, p. 8 (2015) 55. O’Brien, G., Pearson, J.: Autism and learning disability. Autism 8(2), 125–140 (2004). https:// doi.org/10.1177/1362361304042718 56. Habash, M.A.: Assistive technology utilization for autism an outline of technology awareness in special needs therapy, p. 7 (2005) 57. Kana, R.K.: Sentence comprehension in autism: thinking in pictures with decreased functional connectivity. Brain 129(9), 2484–2493 (2006). https://doi.org/10.1093/brain/awl164 58. Charlop-Christy, M.H., Carpenter, M., Le, L., LeBlanc, L.A., Kellet, K.: Using the picture exchange communication system (PECS) with children with autism: assessment of PECS acquisition, speech, social-communicative behavior, and problem behavior. J. Appl. Behav. Anal. 35(3), 213–231 (2002). https://doi.org/10.1901/jaba.2002.35-213 59. Howlin, P., Gordon, R.K., Pasco, G., Wade, A., Charman, T.: The effectiveness of picture exchange communication system (PECS) training for teachers of children with autism: a pragmatic, group randomised controlled trial. J. Child Psychol. Psychiatry 48(5), 473–481 (2007). https://doi.org/10.1111/j.1469-7610.2006.01707.x

458

Y. Almazni et al.

60. Sennott, S., Bowker, A.: Autism, AAC, and Proloquo2Go. Perspect. Augment. Altern. Commun. 18(4), 137–145 (2009). https://doi.org/10.1044/aac18.4.137 61. Hong, H., Kim, J.G., Abowd, G.D., Arriaga, R.I.: Designing a social network to support the independence of young adults with autism. In: Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work - CSCW 2012, Seattle, Washington, USA, p. 627 (2012). https://doi.org/10.1145/2145204.2145300 62. Bishop, J.: The Internet for educating individuals with social impairments: Educating individuals with social impairments. J. Comput. Assist. Learn. 19(4), 546–556 (2003). https:// doi.org/10.1046/j.0266-4909.2003.00057.x 63. Chevallier, C., Grèzes, J., Molesworth, C., Berthoz, S., Happé, F.: Brief report: selective social anhedonia in high functioning autism. J. Autism Dev. Disord. 42(7), 1504–1509 (2012). https://doi.org/10.1007/s10803-011-1364-0 64. Prensky, M.: Digital natives, digital immigrants, p. 6 (2001) 65. Brauner, P., van Heek, J., Ziefle, M.: Age, gender, and technology attitude as factors for acceptance of smart interactive textiles in home environments - towards a smart textile technology acceptance model. In: Proceedings of the 3rd International Conference on Information and Communication Technologies for Ageing Well and e-Health, Porto, Portugal, pp. 13–24 (2017). https://doi.org/10.5220/0006255600130024

A Comprehensive eVTOL Performance Evaluation Framework in Urban Air Mobility Mrinmoy Sarkar(B) , Xuyang Yan, Abenezer Girma, and Abdollah Homaifar North Carolina A&T State University, 1601 East Market Street, Greensboro, NC 27401, USA {msarkar,xyan,aggirma}@aggies.ncat.edu, [email protected]

Abstract. In this paper, we developed an open-source simulation framework for the evaluation of electric vertical takeoff and landing vehicles (eVTOLs) in the context of Unmanned Traffic Management (UTM) and under the concept of Urban Air Mobility (UAM). Unlike most existing studies, the proposed framework combines the utilization of UTM and eVTOLs to develop a realistic UAM testing platform. For this purpose, we first develop an UTM simulator to simulate the real-world UAM environment. Then, instead of using a simplified eVOTL model, a highfidelity eVTOL design tool, namely, SUAVE, is employed and an dilation sub-module is introduced to bridge the gap between the UTM simulator and SUAVE eVTOL performance evaluation tool to elaborate the complete mission profile. Based on the developed simulation framework, experiments are conducted and the results are presented to analyze the performance of eVTOLs in the UAM environment.

Keywords: UAM

· UTM · eVTOL · Simulation framework

Terminology UAM = Urban Air Mobility eVTOL = Electric Vertical Takeoff and Landing Aircraft UTM = Unmanned Traffic Management SUAVE = Stanford University Aerospace Vehicle Environment AGL = Above Ground Level

1

Introduction

In 2019, NASA initiated the “Urban Air Mobility” concept to utilize the threedimensional airspace to accommodate the heavy demand for cargo deliveries as well as passenger transportation in urban areas1 . The electric Vertical Takeoff 1

https://www.nasa.gov/uam-overview/.

c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023  K. Arai (Ed.): IntelliSys 2022, LNNS 542, pp. 459–474, 2023. https://doi.org/10.1007/978-3-031-16072-1_34

460

M. Sarkar et al.

and Landing (eVTOL)2 aircraft is proposed as a promising solution to implement UAM in the future. Under this direction, different types of eVTOL have been extensively investigated by researchers from the industry such as Uber, Joby Aviation and Boeing. It is envisioned that eVTOLs will significantly reduce the heavy traffic congestion during peak times and improve the efficiency of urban traffic networks. The safety and reliability of UAM have motivated extensive research works for both design and evaluation of UAM and eVTOLs. In [1,8,9,12,15,16,18,21], the authors primarily focused on developing different algorithms or strategies for the management of UAM with respect to different objectives. However, no systematic approach is available to generate all possible strategies for UAM purpose. Also, most of these UTM algorithms are evaluated in different simulation interfaces and no common platform is yet developed for their evaluations. Considering these challenges, a preliminary framework with metrics and a simulation interface was developed to compare and evaluate different alternatives for UAM in [13]. However, it simplified eVTOL operations from 3-D to 2-D space and ignored the physical configuration, or properties of the eVTOL’s geometry, weight, propulsion system, and battery configuration. From this aspect, most of the existing UTM simulation interface are not realistic due to the fact that most physical or aerodynamic factors are ignored. Moreover, the preliminary framework did not consider the real-world mission profiles of eVTOL, and thus cannot effectively model their behaviors in real-world scenarios. Therefore, this type of simulator is efficient for high scale (large number of vehicles) eVTOL simulation but not suitable for high fidelity (complex eVTOL model) simulation. In contrast, the open-source conceptual design tool, namely SUAVE [10], is capable of modeling the physical properties of eVTOLs (high fidelity model) and incorporating the practical mission profile of eVTOLs. In [4], SUAVE is used to optimize the design of eVTOLs by identifying the relationship between vehicle configurations and performance without considering the interactions with other eVTOLs in UAM. However, in real-world scenarios, the operational environment of eVTOLs is dense and interaction from other eVTOLs must be considered. Therefore, SUAVE has the limitation of high scale collaborative simulation. In summary, several limitations of the existing studies of UAM and eVOTLs are as follows: – According to authors’ knowledge, no prior work has developed a UTM simulation framework for the evaluation of eVTOLs in a fully-realized UAM environment. More specifically, current UTM simulators either have limited access or ignore the physical properties of the eVTOLs. – The trade-off between fidelity and scale is not considered in the existing UTM simulation frameworks; High fidelity takes a longer time with poor scalability while low fidelity may ignore infeasible missions with better scalability.

2

“AHS International Leads Transformative Vertical Flight Initiative”. evtol.news. Retrieved 2020-09-23.

eVTOL Performance Evaluation

461

– Existing testing and design tools of eVTOLs rarely consider the complexity of the operational environment caused by the interactions among different eVTOLs in UAM and have limited scalablility. In light of these limitations, we develop an effective framework to evaluate the performance of eVTOLs in UAM. The proposed framework can both simulate the interactions of eVTOLs in a highly congested UAM network and evaluate the performance of a high fidelity eVTOL model. With a low-fidelity UTM simulator, the proposed framework can provide detailed analysis to identify and explain infeasible mission profiles. The primary contributions of this work are two-fold: First, we developed an open source simulation framework by utilizing open source UTM simulator from [13] and the SUAVE tool to develop a more comprehensive evaluation framework for eVTOLs. Second, we conduct simulations using the developed framework and present the simulation results to analyze the performance of eVTOLs in a UAM network. The remainder of this paper is organized as follows: Sect. 2 provides a review of different research studies in UTM and eVTOL testing. The details of the proposed framework are described in Sect. 4. Section 5 presents experimental studies using the proposed framework. The advantages of the proposed framework are summarized in Sect. 6. Finally, concluding remarks and future work are outlined in Sect. 7.

2

Literature Review

The demand for a new air traffic management system to coordinate increasingly dense low-altitude flights has recently attracted substantial research [1,3,8,12, 18,21]. Most of these research studies propose different air-traffic management methods and develop the testing tools to evaluate the effectiveness of aerial vehicle (such as eVTOL) designs under different flight profiles and objectives. In this section, we review those studies and identify gaps to motivate a more realistic aerial vehicle testing and evaluation framework. UTM Simulation Framework for UAM: A systematic approach to generate all viable architectures, as required by a thorough decision-making process, is still under development. Some architectures are still at the Concept of Operations (ConOps) stage and have not been thoroughly evaluated. Although some other architectures have been tested in simulation environments, each approach is tested with a distinct simulator and some simulators are closed-source. This makes the evaluation and comparison of the architectures difficult. For instance, architectures such as Full Mix, Layers, Zones, and Tubes, are developed using the TMX simulator [2], while the Iowa UTM [21] and Altiscope [14] architectures utilize a custom 2D simulator. Moreover, DLR Delivery Network [12], Link¨ oping distributed [17] and Link¨ oping centralized [17] architectures are built on a custom 3D simulator. To address these gaps and design a better decision-making process for UTM, recent work in [13] has proposed an open-source simulation framework that can

462

M. Sarkar et al.

generate various alternatives for different UTM subsystems. In addition, the framework can compare alternatives based on safety, efficiency, and capacity. The framework consists of four elements: system decomposition, alternative generation, comparison metric establishment, and alternative evaluations. To create general alternatives for the conceptual design, the author decomposed the UTM system into four subsystems: airspace structure, access control, preflight planning, and collision avoidance. The author used a custom-built open-source 2D agent-based simulation framework to generate and evaluate alternatives. Each agent is a representation of an eVTOL. However, the 2D simulation framework fails to capture important factors of a real-world environment, including the agent’s altitude and wind condition. Additionally, the agents are modeled as 2D points in the simulator, and the vehicle’s geometry and dynamics are ignored, which hinders the practicality of the simulator for modeling real-world scenarios. In the real-world environment, the physical design of the vehicles plays an important role in studying the effect of aerodynamics, airspace capacity, weather, and other factors. Moreover, it is assumed that the vehicles stay at the same altitude throughout the flight and a simple 2D approach is used for the modeling. Accordingly, the take-off and the landing procedures are completely ignored by the study. This simplification reduces the framework’s complexity by omitting one of the essential components of the overall flight process where there are numerous uncertainties and challenges. Once the agents are granted access to the airspace, each agent optimizes their own trajectory using either the decoupled method, safe interval path planning, or local path planning. The collision avoidance system is implemented using a reactive, decentralized strategy called Modified Voltage Potential (MVP) [5,6]. Finally, the UTM’s performance is evaluated based on the established metrics such as efficiency, safety and capacity [13].

50

siti on

300

Cruise

Arrival Procedure

Cli mb

1500

Tra n

Altitude (ft)

Departure Procedure

Accel Climb Decel Descend Hover Climb

Transition Descend

Hover Descend Mission Range

Fig. 1. A generalized eVTOL mission profile for UAM application [19].

eVTOL Performance Evaluation

463

eVTOL Design and Performance Analysis Tool: In [4], a method is proposed for testing different types of eVTOL designs in accomplishing a given mission profile. It uses SUAVE [10] to achieve a realistic design of various types of aircraft configurations with varying aerodynamic fidelity analyses. After designing and optimizing the eVTOLs in SUAVE, each vehicle is tested with a static mission profile. As shown in Fig. 1, the mission profile starts with an initial takeoff and it is followed by the ascent, cruise, descent, and reserve hover/loiter. Based on a fixed mission profile the performance of different eVTOL designs is evaluated. Despite the fact that the study in [4] considers the testing of different high fidelity eVTOL designs, it fails to investigate the effect of different dynamic mission profiles on the performance of the eVTOL. In a real-world scenario, the eVTOL should be robust enough to follow different mission profiles generated based on various environmental conditions and circumstances. However, [4] considered a fixed mission profile where the eVTOL follows a pre-defined mission with a fixed amount of flight time and a constant altitude. Under congested lowaltitude UAM conditions, the speed of the eVTOL can be affected by different internal and environmental factors, such as eVTOL payloads, weather conditions, and collision avoidance system. The author also assumes a collision-free path, which is an impractical assumption. Table 1. Comparison of studies in literature. Features

UTM simulator [13] SUAVE [4] Proposed

Comprehensive UAM simulation environment



×



High fidelity eVTOL dynamics and configuration

×





Dynamic mission profile



×



Evaluate each segment of the mission profile

×





Can test new novel eVTOL aircraft

×





In this paper, we aim to address these gaps. We propose an eVTOL performance evaluation framework that leverages the advantages of the comprehensive UTM simulator from [5,10,13] and a high fidelity eVTOL design and flight profiling tool from [4,10]. In Table 1, we summarized the features of the existing UTM simulation framework and eVTOL performance analysis tool with the proposed simulation framework.

3

Problem Statement

Given a UAM environment with all the infrastructures such as Unmanned traffic management (UTM) system, vertiport terminal procedures, flight planning

464

M. Sarkar et al.

algorithms, obstacle avoidance algorithms, or other autonomous capabilities for an eVTOL operation, our goal is to measure the performance of an high fidelity eVTOL model quantitatively for different mission profiles such as Throttle profile, Battery energy profile, Battery voltage profile, and C-Rating profile in different segment of the mission profile.

4

Proposed Framework

As described in Sects. 1 and 2, the existing studies provide UTM and eVTOL performance evaluation independently. Nevertheless, the concept of UAM can be only fully realized when UTM and eVTOL performance evaluations are integrated. In this paper, we combined these two different performance evaluation schemes and developed a new simulation framework. With this new framework, we can analyze the limitations and strengths of any UTM algorithm or realistic eVTOL model within a UAM environment. Besides, the proposed framework is an open-source simulation platform that allows for public research purpose. The system architecture of the proposed simulation framework is shown in Fig. 2. It consists of three sub-modules: (1) UTM Simulator, (2) Dilation, and (3) eVTOL Performance Evaluator. Details of each sub-module are described in the following sub-sections. UTM Simulator

eVTOL Performance Evaluator

System Decomposition Airspace structure

Dilation

Access control

Generate full mission profile

Preflight planning

Add terminal area procedures in the front and end of 2D cruise segments

Collision avoidance

Evaluate different UTM algorithms

Add altitudes to the 2D cruise trajectories according to the mission requirements

2D low fidelity eVTOL air traffic simulation

2D trajectories of all the eVTOLs during the cruise segment of the mission profile

Execute a performance evaluation of the given full mission profiles

Output Throttle profiles

Output Output

SUAVE High fidelity eVTOL model

3D full mission profile of each eVTOL

Battery Voltage profiles Battery Energy profiles C-Rating profiles

Fig. 2. The architecture of the proposed simulation framework.

4.1

UTM Simulator

The UTM Simulator is developed by adapting the open source implementation of the UTM simulation framework from [13]. It uses off-the-shelf implementations of the airspace structure, access control, preflight planning, and collision avoidance algorithms. We modify the visualization tool of the original implementation to simulate the real-world UTM systems and extract the entire trajectory of each agent/eVTOL. This new feature provides more details about the behaviors of eVTOLs at different time intervals during the overall mission. Since the

eVTOL Performance Evaluation

465

original implementation of the UTM simulator in [13] ignores vertiport3 terminal area procedures (Standard Instrument Departures (SIDs), Standard Instrument Arrivals (STARs) and Approaches etc.) by focusing only on the cruise segment of the entire flight profile, we extract only the 2D trajectory of the cruise segment in our UTM simulator sub-module. Thus, the output of the UTM simulator in our framework is a set of 2D trajectories of all the eVTOLs in the simulation, which enter the airspace at a specific time and finish an entire flight. Each entry of the 2D trajectory is composed of five elements: x-coordinate, y-coordinate, vx -velocity, vy -velocity, and time-stamp. 4.2

Dilation

We elaborate the cruise segment with other required segments such as takeoff, transition, climb, descent and land in the dilation sub-module. Accordingly, the output of the dilation algorithm is a full mission profile of each eVTOL. Moreover, we convert the trajectories into 3D trajectories by incorporating altitudes for all the entries in the 2D trajectories. The algorithmic description of the dilation sub-module is shown in Algorithm 1. Though real-world implementation of vertiport terminal area procedures requires further investigation, the Hover Climb, Transition Climb, Departure Terminal area Procedures, Arrival Terminal area Procedures, Transition Descend and Hover Descend segments in the dilation algorithm can be inferred as vertiport terminal area procedures. Algorithm 1 Creation of Full Mission Profiles for each eVTOL Input: 2D Trajectories generated from UTM Simulator Output: 3D Trajectories with initial and ending segments of the mission profile ζ⇐∅ N ⇐ Number of 2D Trajectories for i = 1 to N do τ ⇐∅ Append Hover Climb Segment to τ Append Transition Climb Segment to τ Append Departure Terminal area Procedures Segment to τ Append Accelerated Climb Segment to τ Insert a constant altitude to each waypoint of the ith 2D Trajectory Append the ith 3D Trajectory as the Cruise Segment to τ Append Decelerated Descend Segment to τ Append Arrival Terminal area Procedures Segment to τ Append Transition Descend Segment to τ Append Hover Descend Segment to τ Append τ to ζ end for return ζ

3

A type of airport for aircrafts which take off and land vertically.

466

4.3

M. Sarkar et al.

Performance Evaluation

To evaluate the performance of each eVTOL in the UAM environment, we use the open-source software SUAVE [11,20]. SUAVE is a set of tools to design and optimize conceptual novel aircraft. As an example, we used an eVTOL model developed in SUAVE. The elaborated mission profiles generated by the dilation sub-module are used in SUAVE to evaluate the performance of the eVTOL model. There are several evaluation criteria, however, four built-in metrics, including Throttle profile, Battery energy profile, Battery voltage profile, and C-Rating profile, are employed to evaluate the performance of the eVTOL for all the mission profiles. The metrics are described as follows: Throttle Profile: In general, the throttle controls the vertical motion of an eVTOL and this measurement is directly proportional to the thrust generated from its motors. From the throttle profile, the amount of thrust contribution from each motor during different segments of the mission profile are obtained. Battery Voltage Profile: Using the battery voltage profile, we can observe the decreasing trend of battery voltage for the eVTOL as the mission progresses. Battery Energy Profile: This profile shows the battery energy consumption by the eVTOL along the mission profile. It is an important metric for eVTOL performance measurement because battery energy consumption is directly related to the range of the mission that can be achieved by the eVTOL. C-Rating Profile: The C-Rating profile shows the battery discharge rate of the eVTOL for the given mission profile.

5

Results

We present our results from the proposed eVTOL performance measurement framework and those details are discussed below4 . Table 2. List of parameters used for the UTM simulator Parameter name

Value

Operational area

50 × 50 km2

Minimum separation

500 m

Sensing radius

5000 m

Max speed of the eVTOL 100.662 mph

4

The data for these experimental results can be found in the following url: https://github.com/mrinmoysarkar/A-small-dataset-for-eVTOL-performanceevaluation.git.

eVTOL Performance Evaluation

467

For the UTM simulator, a set of algorithms are selected [13] and Table 2 summarizes all the parameters. We use “free airspace structure,” meaning all eVTOL can fly their preferred path to their destination and “free access control” which allows an eVTOL to take-off if there is no immediate conflict. In this simulation study, no preflight planning is used but a reactive decentralized strategy known as Modified Voltage Potential (MVP) algorithm is used for collision avoidance. The eVTOL model used in our experiment is developed based on the geometry of the Kitty Hawk Cora eVTOL prototype. This eVTOL configuration is also known as a lift+cruise configuration. The original Kitty hawk Cora eVTOL and the generated eVTOL from SUAVE tool are shown in Fig. 3. Some of the high-level parameters of the eVTOL are listed in Table 3. The complete list of parameters of the eVTOL model can be found in the SUAVE GitHub repository5

Fig. 3. The Kitty Hawk Cora eVTOL (left) and SUAVE-Open VSP generated eVTOL model (right). Table 3. A set of high-level parameters of the considered eVTOL model Parameter name

Value

Max Takeoff mass

2450 lbs

Max Payload mass

200 lbs

Total reference area

10.76 m2

Number of lift motor

12

Number of cruise motor

1

Battery type

Lithium Ion

Battery max voltage

500 V

5

Battery energy specific density 300 Wh/kg

5

vstall

84.28 mph

Speed

111.847 mph

https://github.com/suavecode/SUAVE/blob/develop/regression/scripts/Vehicles/ Stopped Rotor.py.

468

M. Sarkar et al.

From the UTM simulator module we generated 262 2D trajectories using the simulation parameters from Table 2. We then used the dilation Algorithm 1 to convert the 2D trajectories into full mission profiles as shown in Fig. 1. During the dilation procedure, we referred to the required specification of a UAM mission for eVTOL from Uber Elevate [7,19], which are described in Table 4. However, we kept the altitude as Table 4, but chose other parameters such as vertical speed and horizontal speed randomly from a bound of [μ − Δ, μ + Δ], where μ are the values showed in Table 4. This implementation is inspired by the different environmental conditions (both geographical and weather) at different vertiport locations. Table 4. Baseline mission specification used in the integrated simulation framework. Mission segment

Vertical speed (ft/min)

Horizontal speed (mph)

AGL ending altitude (ft)

Hover Climb

0 to 500

0

Transition + Climb

500

0 to 1.2 × vstall

300 300

50

Departure Terminal area Procedures

0

1.2 × vstall

Accel + Climb

500

1.2 × vstall to 110 1500

Cruise

0

110

Decel + Descend

500

110 to 1.2 × vstall

300

Arrival Terminal area Procedures

0 to 500

1.2 × vstall

300

Transition + Descend

500 to 300

1.2 × vstall to 0

Hover + Descend

300 to 0

0

1500

50 0

From the UTM simulator with the dilation algorithm, we generated 262 full mission profiles. It took 2 h 28.21 min to execute the performance analysis of the 262 mission profiles in a workstation with configuration Intel Xeon(R) CPU at 2.2 GHz with 88 cores, 128 GB RAM, Nvidia Geforce RTX 2080Ti GPU, and Ubuntu OS. In the first step, we conducted feasibility analysis of these 262 mission profiles with the SUAVE tool using the eVTOL shown in Fig. 3. We found that only 55 mission profiles could be executed by the eVTOL. For the remaining 207 mission profiles, the SUAVE eVTOL model failed to execute at least one segment of the mission profile. Table 5 shows the performance evaluation comparison between the existing UTM simulator and the proposed simulator. From this comparison study, we can infer that the physical constraints of eVTOL performance can directly impact the applicability of various UTM algorithms.

eVTOL Performance Evaluation

469

Table 5. Performance evaluation comparison of the proposed simulation framework with baseline UTM simulator Simulation framework

Number of feasible mission profiles

Number of infeasible mission profiles

Baseline UTM simulator 262

0

Proposed simulator

207 (176 in departure terminal area procedure segment & 31 in cruise segment)

55

Fig. 4. A sample feasible and infeasible mission profile. The red vertical bars highlight the segments of the infeasible mission profile where high fidelity eVTOL model was unable to execute.

For further analysis, we show two representative mission profiles, one from the feasible set and another from the infeasible set, in Fig. 4. The corresponding airspeed profiles are shown in Fig. 5. From Fig. 4 as indicated by the red vertical bars, the eVTOL was unable to execute the departure terminal area procedure and cruise segments. Since the departure terminal area procedures are generated randomly between a given upper and lower bound, this infeasibility indicates that the considered eVTOL cannot follow any abrupt terminal area procedures. After analyzing the cruise segments of the mission profile using the UTM simulator, we found that the UTM simulator required a certain speed profile to avoid collision, or maintain minimum separation in the airspace. However, the SUAVE eVTOL cannot achieve these speed profiles. These two cases are the indication of sample contingency that will occur in UAM environment. The proposed simulation framework can also be extended to capture other types of contingencies in UAM such as vertiport terminal congestion, adverse weather or emergency landing scenarios. We continued our analysis using the feasible set of mission profiles to measure other performance metrics. Figure 6 shows the consumed battery energy for mission profiles with different ranges. Figure 7 shows the voltage reading of

470

M. Sarkar et al.

the battery at the end of different mission range values. From Figs. 6 and 7, it is clear that the energy consumption of eVTOL is proportional to the mission range while the change in voltage reading is negligible. In addition, we conducted a C-Rating analysis of the eVTOL’s battery and a throttle analysis of the lift and forward motors of the eVTOL for each segment of the mission profile. The C-rating analysis and throttle analysis is displayed in Figs. 8 and 9, respectively. From the C-rating analysis, we observe that the eVTOL drives maximum current during the transition climb and arrival procedure. From the throttle analysis, we discover that the lift motors primarily contribute during the vertical motion such as the hover climb and hover descend, while forward motor contributes in all other segments of the mission profile.

Fig. 5. A sample feasible and infeasible airspeed profile. The red vertical bars highlight the segments of the infeasible mission profile where a high fidelity eVTOL model was unable to execute.

Fig. 6. Battery energy consumption of eVTOLs along the mission range.

eVTOL Performance Evaluation

6

471

Discussion

From the simulation results, the advantages of the proposed simulation framework are summarized below. – The framework provides a comprehensive and realistic platform for the evaluation of eVTOL’s performance in the UAM realm. – It mitigates the limitation of the low-fidelity UTM simulator by capturing and analyzing the infeasible mission profiles using the dilation sub-module and a high fidelity eVTOL model in SUAVE. – And the framework can be extended to perform efficient analysis on selected representative mission profiles for a large-scale of eVTOLs in the UAM network.

Fig. 7. Battery voltage reading of eVTOLs along the mission range.

Fig. 8. C-rating of the eVTOL’s battery pack for different mission segment (averaged over the 55 feasible mission profiles).

472

M. Sarkar et al.

Fig. 9. Throttle from lift & forward motors of the eVTOL for different mission segment (averaged over the 55 feasible mission profiles).

7

Conclusion

In this paper, we presented a new simulation framework to evaluate the performance of eVTOLs utilizing state-of-the-art UTM architectures in the UAM realm. The proposed framework integrates a UTM simulator with an open-source eVTOL design tool to achieve a real-world representation of the UAM environment for evaluation purposes. The developed dilation sub-module provides detailed analysis on the unfeasible missions for the low-fidelity UTM simulator, which considers the trade-off between fidelity and scale. Simulations are performed on the developed framework to demonstrate and analyze the performance of eVTOLs in UAM facilities in terms of different evaluation metrics. From the simulation, the proposed framework not only reflects the details of an eVTOL’s performance during different segments of the mission profile, but also captures the occurrence of infeasible mission profiles. Additionally, the proposed framework provides further insights into the specific regions of the infeasible mission profiles to indicate different contingency situations that may occur in future UAM environments. In the future, the following research directions will be investigated: – The proposed framework employs a model-based performance evaluation procedure for eVOTLs. We will develop a data-driven performance evaluation tool for the proposed framework to reduce the time complexity. – The current implementation of the proposed framework omits the design of the vertiport and terminal area procedures. We will work to add the vertiport to the proposed framework and build a more feature-rich simulation software for eVTOL performance evaluation.

eVTOL Performance Evaluation

473

Acknowledgment. The authors would like to thank the Office of the Secretary of Defense (OSD) for the financial support under agreement number FA8750-15-2-0116. This work is also partially funded through the National Institute of Aerospace’s Langley Distinguished Professor Program under grant number C16-2B00-NCAT, and the NASA University Leadership Initiative (ULI) under grant number 80N SSC20M 0161.

References 1. Bulusu, V., Sengupta, R., Mueller, E.R., Xue, M.: A throughput based capacity metric for low-altitude airspace. In: 2018 Aviation Technology, Integration, and Operations Conference, p. 3032 (2018) 2. Chambers, C.: The reforms: a political safe haven or political suicide–is the labour bubble bursting? J. Financ. Regul. Compliance (2011) 3. Chowdhury, D., Sarkar, M., Haider, M.Z., Fattah, S.A., Shahnaz, C.: Design and implementation of a cyber-vigilance system for anti-terrorist drives based on an unmanned aerial vehicular networking signal jammer for specific territorial security. In: 2017 IEEE Region 10 Humanitarian Technology Conference (R10-HTC), pp. 444–448. IEEE (2017) 4. Clarke, M., Smart, J., Botero, E.M., Maier, W., Alonso, J.J.: Strategies for posing a well-defined problem for urban air mobility vehicles. In: AIAA Scitech 2019 Forum, p. 0818 (2019) 5. Eby, M.S.: A self-organizational approach for resolving air traffic conflicts. Lincoln Lab. J. (1994) 6. Hoekstra, J.M., Ellerbroek, J.: Bluesky ATC simulator project: an open data and open source approach. In: Proceedings of the 7th International Conference on Research in Air Transportation, vol. 131, p. 132. FAA/Eurocontrol USA/Europe (2016) 7. Holden, J., Goel, N.: Fast-forwarding to a future of on-demand urban air transportation, San Francisco, CA (2016) 8. Jang, D.-S., Ippolito, C.A., Sankararaman, S., Stepanyan, V.: Concepts of airspace structures and system analysis for UAS traffic flows for urban areas. In: AIAA Information Systems-AIAA Infotech@ Aerospace, p. 0449 (2017) 9. Joulia, A., Dubot, T., Bedouet, J.: Towards a 4D traffic management of small UAS operating at very low level. In: ICAS, 30th Congress of the International Council of the Aeronautical Sciences (2016) 10. Lukaczyk, T.W., et al.: Suave: an open-source environment for multi-fidelity conceptual vehicle design. In: 16th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference, p. 3087 (2015) 11. MacDonald, T., Clarke, M., Botero, E.M., Vegh, J.M., Alonso, J.J.: SUAVE: an open-source environment enabling multi-fidelity vehicle optimization (2017) 12. Peinecke, N., Kuenz, A.: Deconflicting the urban drone airspace. In: 2017 IEEE/AIAA 36th Digital Avionics Systems Conference (DASC), pp. 1–6. IEEE (2017) 13. Ramee, C., Mavris, D.N.: Development of a framework to compare low-altitude unmanned air traffic management systems. In: AIAA Scitech 2021 Forum, p. 0812 (2021) 14. Sachs, P., Dienes, C., Dienes, E., Egorov, M.: Effectiveness of preflight deconfliction in highdensity UAS operations. Technical report, Altiscope (2018)

474

M. Sarkar et al.

15. Sarkar, M., Homaifar, A., Erol, B.A., Behniapoor, M., Tunstel, E.: Pie: a tool for data-driven autonomous UAV flight testing. J. Intell. Robot. Syst. 98(2), 421–438 (2020) 16. Sarkar, M., Yan, X., Nateghi, S., Holmes, B.J., Vamvoudakis, K.G., Homaifar, A.: A framework for testing and evaluation of operational performance of multi-UAV systems. In: Arai, K. (ed.) IntelliSys 2021. LNNS, vol. 294, pp. 355–374. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-82193-7 24 17. Sedov, L., Polishchuk, V.: Centralized and distributed UTM in layered airspace. In: 8th ICRAT (2018) 18. Sunil, E., et al.: The influence of traffic structure on airspace capacity. In: ICRAT 2016, 7th International Conference on Research in Air Transportation (2016) 19. Uber: Uber air vehicle requirements and missions (2019) 20. Wendorff, A., et al.: Suave: an aerospace vehicle environment for designing future aircraft (2020) 21. Zhu, G., Wei, P.: Low-altitude UAS traffic coordination with dynamic geofencing. In: 6th AIAA Aviation Technology, Integration, and Operations Conference, p. 3453 (2016)

A New Arabic Online Consumer Reviews Model to Aid Purchasing Intention (AOCR-PI) Ahmad Alghamdi1,2(B) , Natalia Beloff2 , and Martin White2 1 Taif University, Taif, Saudi Arabia 2 University of Sussex, Brighton, Falmer, Brighton BN1 9RH, UK

{aa2585,n.beloff,m.white}@sussex.ac.uk

Abstract. Currently, customers are inclined to use online reviews to make good purchasing decisions. Such reviews are believed to have a significant impact on customer buying intentions, and, thereby, sales. Almost no previous studies have been conducted to build a comprehensive model of online consumer review (OCR) factors that influence consumer purchase intention and product sales within an Arabic context. Drawing on the elaboration likelihood model (ELM) and by considering Hall’s Cultural Model (HCM) and Hofstede’s Cultural Dimensions Framework (HCDF), this study proposes a conceptual model to assess the relationship between Arabic Online Consumer Review and Purchase Intention (AOCR-PI), using book reviews as a case study. This position paper is focused on understanding how online review- and reviewer-related factors can influence Arab readers’ book selection and, thus, book sales. The paper also outlines the proposed methodology to be followed to evaluate the proposed framework. The findings of this study will help both consumers in choosing the best product quality and sellers in improving future sales by identifying the most important OCR factors that have a significant impact on consumer buying decisions. Keywords: e-commerce · Online communities · Arabic online consumer reviews · Social networks · Word-of-mouth recommendations

1 Introduction Currently, customers are inclined to use online reviews to make good purchasing decisions. Such reviews are believed to have a significant impact on consumers’ buying intentions [1, 2], purchase decision-making and [3–5] and product sales [6–8]. Online consumer reviews (OCRs) become a reliable source of information as they provide consumers with more unbiased detailed product information from other customers unlike promotions from sellers which traditionally focus only on positive characteristics of products. Recent studies revealed that almost 80% of consumers trust OCRs as much as personal recommendations [9], and 93% of them believe that OCRs influence their purchase decisions [10]. These studies indicate that OCRs have significant impacts on consumers’ purchase intention. An OCR is defined as “peer-generated product evaluations posed on company or third party websites” [11]. More accurately and in the context of this study, OCR is “a type © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 K. Arai (Ed.): IntelliSys 2022, LNNS 542, pp. 475–492, 2023. https://doi.org/10.1007/978-3-031-16072-1_35

476

A. Alghamdi et al.

of product information created by users based on personal usage experience… helping consumers identify the products that best match their idiosyncratic usage conditions” [12]. People on the Internet “have strong views that they express openly” so that they “are not bound by standards of objectivity” [13]. The importance of reading OCRs results from the difficulty of assessing product quality prior to purchase, especially for experience products (e.g. books) that are difficult to evaluate for their quality in advance. For instance, OCRs on Goodreads.com can affect the nature of book discovery and, thus, shape reader choices by allowing readers to find lesser-known books that match their tastes, making it easier to identify high-quality books [14]. Furthermore, the significant effects of OCRs lead many businesses to adopt them as a new marketing strategy [15]. To date, no study has specifically investigated the influence of the various factors of OCRs on purchase intention within the Arabic context, language and consumer. Therefore, the main objective of this study is to investigate the impact of OCRs on Arab consumer purchasing intention. In particular, we propose a new model that assesses the relationship between Arabic OCR and purchase intention (AOCR-PI) based on three theoretical foundations, discussed below. The remainder of this position paper is structured as follows: We first define the different types of online review platforms (ORPs). We then describe the Elaboration Likelihood Model (ELM), Halle’s Cultural Model (HCM) and Hofstede’s Cultural Dimensions Framework (HCDF) as the theoretical foundations and deduce the constructs of our proposed research framework. Thereafter, we present the literature review and hypotheses. Finally, we draw our conclusion and illustrate future works outlining the methodology to be adopted to evaluate the proposed framework.

2 Online Review Platforms Before delving into OCR factors and their impact on purchase intention, we should distinguish between various types of online review platforms (ORPs). Prior studies have classified ORPs, where users can post their opinions about products, into two typologies: 1) E-merchants’ platforms (e.g. Amazon) hosted by retailers (internal OCRs) and 2) independent platforms (e.g. Goodreads) hosted by a third-party (external OCRs) [16– 18]. The majority of current studies have only focused on internal OCRs, although external reviews from third-party platforms have more influence on consumer decisionmaking [18]. Usually, independent review platforms do not sell products; therefore, the providers of these platforms are less likely to manipulate online reviews than retailers in internal platforms. That is why these reviews from third-party sources are considered by customers to be more trustworthy, thus having more impact on their buying decisions [19, 20]. However, Gu et al. [18] only focused on high-involvement products (digital cameras) and two review factors, namely, volume and valence. Therefore, the generalisability of their findings on low-involvement goods (e.g. books) may lead to biased results. This study further extends the existing literature on OCRs impact on consumers’ purchase intentions by, first, investigating more review and reviewer factors that may affect receivers’ purchasing intentions (discussed in Sect. 5); second, focusing on low-involvement products (books) reviews from independent review platform, namely, Goodreads; and third, collecting data from Arabic reviews and reviewers as there is limited research that explores this language and culture.

A New Arabic Online Consumer Reviews Model

477

3 Theoretical Frameworks of OCRs Influence Current information systems (IS) researchers have adopted various theories as a base to establish their hypotheses and build their research models upon. This section provides the theoretical foundations of this study to guide the literature review, construct the conceptual framework and formulate the research hypotheses. These theories are Elaboration Likelihood Model (ELM), Hall’s cultural model (HCM) and Hofstede’s cultural dimensions framework (HCDF). 3.1 Elaboration Likelihood Model Previous scholars, who have studied the relationship between online review platforms (ORPs) and consumer behaviour, have adopted various theories and models as bases to establish their hypotheses and build their research models upon. Elaboration Likelihood Model (ELM) is a major theoretical model in OCRs literature [21–25]. According to the ELM, people follow either a central or a peripheral route to support their decisions [26]. The central route involves more cognitive efforts to evaluate received information (i.e. high level of elaboration), whereas the peripheral route relies on simple cues in a message and requires less attention from a recipient (i.e. low level of elaboration) [26, 27]. Regarding ELM in OCRs, the central route is used when consumers are motivated to think about the online message, while the peripheral route is adopted when they do not have motivation and resources to process the information [23]; thus, in the second case, readers of OCRs tend to consider non-review factors, such as reviewer characteristics [28]. On the Internet, both processes act simultaneously and significantly impact consumers’ intentions [29]. The majority of prior studies have considered reviewer-related factors as peripheral cues, while central cues are review-related [30–33]. This study uses ELM to investigate the most significant central and peripheral cues of OCRs to predict purchase intentions of the Arab users of ORPs. Hence, in this research, reviewer identity, reputation and experience that define the believability of a message (i.e. reviewer credibility) and review volume represent peripheral cues, whereas review content factors, namely, review valence, depth, readability and images define central cues as they need more cognitive effort from consumers. 3.2 Hali’s Cultural Model Culture is defined as “the collective programming of the mind that distinguishes the members of one group or category of people from others” [34]. According to Hall’s theory [35, 36], the cultural background affects how well a person can comprehend and appreciate complex messages. Consequently, researchers admit the importance of considering the diverse cultural background of online consumers in marketing, e-commerce and recommendation systems [37–39]. Nevertheless, studies investigating how OCR factors affect buying intentions of consumers by considering their cultural characteristics are few and far between. Hall, an American anthropologist, divided cultures into two categories according to their communication patterns: high- and low-context cultures. A “high context (HC) communication is one in which most of the information is already in the person, while

478

A. Alghamdi et al.

very little is in the coded, explicit, transmitted part of the message” whereas, in the low context (LC), “the mass of the information is vested in the explicit code” [35]. Unlike the majority of Western societies, Arabs are HC culture, where communications are often indirect and implicit [36]. In other words, HC communicators convey messages more through context than direct words [39]. This may also be relevant with regard to online product reviewing and social media communication. For example, OCRs posted by HC customers are likely to be short and contain photos and emojis. As most of the current research has been done within Western countries, LC cultures, empirical studies are needed to explain online consumer behaviour and how OCRs influence the purchase intentions of HC consumers. To fill this gap, this work-in-progress study takes into consideration the communication style of Arabs according to HCM to explain the results. 3.3 Hofstede’s Cultural Dimensions Framework Information systems (IS), information technology (IT) and marketing studies have frequently referenced the cultural dimensions proposed by Geert Hofstede [34, 40, 41], the most influential researcher on cultural values, and his framework is very popular for examining the effects of culture in these research fields [42, 43]. That is likely because that the framework was developed based on a very large survey (more than 100000 IBM workforce from 53 countries) and the author revised and expanded his model regularly. Initially, based on the cultural values of nations, Hofstede classified each country into four cultural dimensions: Power Distance (PD), Uncertainty Avoidance (UA), Individualism versus Collectivism (IDV) and Masculinity versus Femininity (MAS). Later, a fifth dimension, of time orientation, Long Term versus Short Term Orientation (LTO), was added and in 2010, a sixth dimension, Indulgence versus Restraint (IVR), was integrated into the complete six dimensions model [44]. IDV, UA and PD are the most frequently used dimensions in IS and IT literature [45]. However, IDV and PD are highly correlated with each other, thereby using both dimensions may lead to multicollinearity [46]. Therefore, this study adopted IDV and UA as the main cultural dimensions that differentiate Arabs from the majority of other cultures, and it has been believed to have a significant impact on consumers’ decision-making process as discussed below. Individualism Versus Collectivism (IDV). This dimension refers to the integration of individuals into groups [34]. The main difference between the two poles of this dimension is that, within individualistic societies, individuals only take care of themselves and their immediate families (independence from in-groups), while those from collectivist cultures depend on their extended families or other in-groups to support them in exchange for the loyalty to these groups (dependence on in-groups) [44]. As shown in Fig. 1, this dimension has been expressed on a scale that runs approximately from 0 to 100, where the higher the number, the more individualistic the culture. According to Hofstede et al. [34] Arab countries scored an average of 38, hence they are generally considered as collectivist societies (low IDV). Therefore, this research focuses on the characteristics of collectivistic cultures to explore Arabic consumers’ purchase intentions based on OCRs.

A New Arabic Online Consumer Reviews Model

479

Uncertainty Avoidance (UA). This cultural dimension is related to the degree of nervousness in a society when facing an unpredictable future [44]. It is defined as “the extent to which the members of a culture feel threatened by ambiguous or unknown situations” [34]. High UA countries are intolerant of unconventional behaviour and ideas (“what is different is dangerous”), whereas low UA societies are more relaxed in unfamiliar situations and curious about new things [34]. On average, most of the collectivist societies have high UA scores and Arab countries are among them, while the opposite is true with individualistic countries. For example, as Fig. 1 illustrates, Saudi Arabia IDV = 25 and UA = 80 whereas the United Kingdom IDV = 89 and UA = 35. This can logically explain Cheong and Mohammed-Baksh’s result, who found that Koreans, as a collectivistic society, tend to seek more others’ opinions and product recommendations than US consumers, an individualistic society [47]. That is because consumers from high UA are inclined to obtain as much information as possible to reduce uncertainty and risk related to their future online purchase [48] since they have little tolerance for uncertainty [44]. Despite the considerable amount of literature that used HCDF to study the impact of cultural background on consumers’ purchase intentions, to the best of our knowledge, no single study exists which addresses the impact of cultural values of OCR receivers on their purchase intentions within the Arabic context. Most of these studies have been done within Western and/or Chinese cultures; hence the generalisability of their results to other countries may be problematic. Generally speaking, that is because Arab and East Asian countries are different in UA index and Arab and Western countries are different in both dimensions (UA and IDV), as shown in Fig. 1. Furthermore, Arabs differ from other cultures in consumption behaviour. For example, Jahandideh et al. [49] declare a significant difference in Arab and Chinese consumer complaint behaviour. Thus, in order to fill this gap, following Fischer et al. [46], this study adopts IDV and UA dimensions as a foundation to propose the research hypotheses and test and validate whether HCDF is applicable in analysing the purchase intention of Arabic consumers based on OCRs and, 89 80

25

30

85

30

35

20

Individualism Saudi Arabia

Uncertainty Avoidance Iraq

China

United Kingdom

Fig. 1. Comparison of individualism and uncertainty avoidance dimensions between two Arab and two non-Arab countries [50]

480

A. Alghamdi et al.

consequently, product sales. Fig. 1 shows IND and UA scores of Arab and non-Arab countries (Saudia Arabia, Iraq, China and the United Kingdom as examples).

4 Conceptual Framework As mentioned above, no previous study has investigated the impact of OCRs, in particular book reviews on independent review platforms, on Arab consumers. To fill this research gap, this study seeks to build a new framework that assesses the influence of online review- and reviewer-related factors on Arabic readers’ book selection and, thus, sales (our AOCR-PI model). To understand the impact of OCRs on book sales, it is necessary to identify the factors of the central and peripheral routes from reviews that affect consumers’ purchase intentions. Drawing on ELM [26] and the broad range of relevant literature, review depth, valence, readability and images considered as a central route, while review volume and reviewer credibility cues, namely identity disclosure, reputation and experience are the peripheral route of an OCR. Figure 2 shows the research conceptual framework.

Fig. 2. Proposed AOCR-PI conceptual framework

A New Arabic Online Consumer Reviews Model

481

5 Hypotheses Development A research hypothesis presumes that two variables are related. It enables researchers “to identify and test the relationship between the independent variable as a predictor… and the dependent variable as the outcome” [51]. Due to the wide variety of the contexts of the investigations, including product categories, sampling approach and different cultural factors, previous studies into OCR impacts on consumer intentions are incomprehensive and even inconsistent in most cases [52]. Hence, applying current studies on Arab countries (consumers and sellers) may lead to biased results and poor accuracy of sales prediction models as the majority of prior research in this field has been done in Western or/and East Asian countries. Therefore, based on the existing literature that has studied the impact of an OCR on purchase intention and sales and by considering Hall’s cultural model (HCM) [35] and Hofstede’s cultural dimensions framework (HCDF) [20], the following proposed hypotheses have been formulated. 5.1 Review Depth and Purchase Intention A large number of studies have investigated the impact of review depth on review helpfulness [16, 53–56]. The majority of them have indicated a positive correlation between the length (depth) and the helpfulness of OCRs with long reviews receiving more helpful votes than short ones. However, helpfulness is not necessarily related to persuading OCR receivers to buy a product, especially book reviews where readers may only want to read others’ summaries of key points that a book has discussed or their evaluation of the contents as reviewers on ORPs are “peer recognition grantors” [57]. The depth of an OCR is its comprehensiveness [58], that is the main indicator of its quality [31], and argument quality (ELM central cue) is “the persuasive strength of arguments embedded in an informational message” [59]. Thus, the depth of an OCR is associated with its persuasiveness. Current literature has used the term of review depth, length or elaborateness to refer to the total number of words, in most studies, [22, 60–62], characters [63, 64] or sentences [54] in a given review. Longer reviews (length) provide more information (elaborateness), which can indicate the depth of content [32], and this is the possible reason for these three different terms used in prior studies. This factor was found to be positively/negatively related to purchase intention and sales. For example, Zhang et al. [65] find that the number of characters in software program OCRs affects its persuasiveness positively (i.e. the more characters a review contain, the greater the review persuasiveness). Hence, the depth of reviews represents their persuasiveness which affect consumers’ purchase intentions. In the light of Hofstede’s cultural framework, Fang et al. [66] reported that long reviews do not promote more sales on Dangdang.com, a Chinese e-commerce website, like those on Amazon.com (i.e. OCR length hurts product sales); that is because Chinese prefer shorter reviews than American consumers. This result can be explained by the Uncertainty Avoidance (UA) dimension of Hofstede’s framework, which refers to what extent a certain culture is tolerant of unpredictable future [34]; thus, Chinese customers may require less information (short reviews) to reach the purchase decision.

482

A. Alghamdi et al.

However, the length of words and sentences are not the same across languages. For example, “Arabic words tend to be shorter than English words” [67] and the Arabic language has a wide range of attached pronouns, connected to words, and latent pronouns, understood from a context; this may make Arabic shorter than English sentences. Furthermore, the product type (the topic of OCRs) may have an impact on the length of reviews. For instance, in their analysis of Arabic reviews on Yahoo! Maktoob, an Arabic social network, [68] stated that science and technology reviews tend to be shorter than politics reviews; “this is due to the various conflicts and controversial issues addressed by political articles and the lengthy arguments by the readers about them”. An ORP may also influence how in depth the user evaluates a product. For instance, Zhang et al. [69] find that book reviews on Dangdang and Amazon, e-commerce websites contain general information about non-content aspects of a book, such as a price and a package, thus they are typically shorter than those on Douban.com, a Chinese ORP, where OCRs focus on content-related aspects. Similar to political discussions [68] and types of ORPs [69], we argue that long reviews are needed to deeply and comprehensively evaluate the quality of all aspects of a book. In addition, as discussed earlier, Arabs are collectivist and high uncertainty avoidance societies [34] where people from these societies desire more information when making decisions as they are intolerant of uncertain situations, hence OCRs “may reduce their feelings of uncertainty and increase their purchase intentions” [70]. In other words, the more information about a product, the less ambiguity and the more is the persuading impact; thus, improving purchase intention and, consequently, sales. Based on the above discussion, the following hypothesis is proposed: H1: A lengthy review (review depth) of a book increases Arab consumers’ purchase intention, leading to greater sales. 5.2 Review Valence and Purchase Intention Valence is the nature of OCR information, whether they are positive, negative [71] or neutral. Since OCR sentiments and ratings correlate positively with each other [72], review valence is operationalised by the star score (rating) given in the reviews [73]. In most product review platforms, OCR valence ranges from one star (extreme negative) to five stars (extreme positive) with a three stars rating representing a moderate (neutral) opinion [11]. Users of ORPs can also rate a product without writing a textual review. The majority of IS and e-commerce researchers have adopted review valence as an important factor to investigate its impact on consumers’ decision-making. For example, in the hospitality industry, review valence significantly affects intention to book and recommend hotels [74, 75]. In the publishing industry, Chevalier and Mayzlin [76] find that consumers consider reading textual reviews in addition to the reviews’ summary statistics (ratings) and that significantly influences book sales. Changes in review valence may even correlate with changes in product popularity over time [77]. In the light of Hofstede’s cultural framework, Barbro et al. [78] report that consumers from high uncertainty avoidance countries find reviews with extreme valence more helpful. This is maybe because “Customers in higher-UAI cultures tended to be hesitant toward new products and information” [34]. Therefore, review valence may significantly affect

A New Arabic Online Consumer Reviews Model

483

the purchase intentions of Arab consumers as they are less tolerant in uncertain situations (e.g. future purchases). Moreover, in the light of Hall’s cultural model, review valence “has the ability to alter consumers’ perceptions… and this effect takes place in a collective way” [79]. Hence, as Arab countries have collectivist cultures and as review valence represents the others’ evaluation of products, Arabs may consider OCR valence as an essential factor (central cue) in their purchasing decisions and subsequently influence sales. Consequently, the following hypothesis is proposed: H2: A high valence (star rating) for a book increases Arab consumers’ purchase intention leading to greater sales. 5.3 Review Readability and Purchase Intention Readability represents how easily a text can be understood. It is a more important factor than other linguistic characteristics and significantly affects the perceived value of an OCR [80] and provides an influential qualitative cue of OCRs [81]. Furthermore, in the light of HCM, the readability may be the more important textual factor for people from HC than those from LC cultures as HC societies, such as Arabs, tend to use indirect nonverbal language and rely on “reader’s ability to grasp the meaning from the context” [82]. Consequently, a more readable OCR could be a crucial factor in HC society helping the receiver to fully understand and grasp the meaning of a message. However, until now, very limited studies have attempted to explore the impact of review readability on purchase intention and product sales. For instance, Ghose and Ipeirotis [77] find OCRs of digital cameras that have higher readability scores (i.e. easy to read) are associated with higher sales when such reviews are written in a more sophisticated language, which enhances the informativeness of such reviews. In the hospitality industry, Sharma and Aggarwal [83] report a significant impact of review readability on hotel sales in India (p < 0.05). Generally speaking, people do not like grammar and spelling errors, resulting in stopping consumers to complete read misprint reviews. Hence, readability is one of the most important indicators of review quality and personalness and is considered as a central route affecting purchase intention. Since India is a collectivist and HC society (i.e. prefer indirect communication), just like Arab countries, review readability may arguably have the same influence on Arab customers. Accordingly, the following hypothesis is proposed: H3: A more readable review (review readability) of a book increases Arab consumers’ purchase intention leading to greater sales. 5.4 Review Images and Purchase Intention Prior studies suggest that product images in an OCR increase its helpfulness [84]; thus, they are more influential than text-only reviews [85–87] as pictures are considered authentic [88], thus decreasing uncertainty and increase confidence of consumers. Thus, product images can persuade a consumer to purchase. Images “act as additional cues and information for decision-making” [31], thereby attracting the attention of consumers more efficiently than text. Cheng and Ho [89] find that the number of images

484

A. Alghamdi et al.

is the main factor of argument quality (i.e. ELM central cue) and has a great effect on consumers’ perception of the usefulness of reviews. The authors mentioned that, unlike Americans, Taiwanese users of ORPs “attach much importance to both images and text”. Similarly, Zhao et al. [88] report that the number of images posted by reviewers improves the pre-purchase experience of consumers and has a significant impact on sales. Studies by Zhao et al. [88] and Cheng and Ho [89] were both done within a Chinese context, which is an HC collectivist culture like Arabs; thus, visual reviews may also affect the purchase intention of Arab consumers and, consequently, sales. on this basis, the following hypothesis is proposed: H4: A review with book images increases Arab consumers’ purchase intention leading to greater sales more than text alone. 5.5 Review Volume and Purchase Intention The relationship between OCR volume and helpfulness, persuasiveness, purchase intention and product sales have been extensively investigated in the current literature. The volume of OCRs is found to increase consumer awareness about products and leads to greater sales [71, 90–92]. For example, [90] finds that higher review volume promoted higher sales because consumers use it as an important clue to make purchase decisions. In a more detailed study of the impact of review volume and average ratings for a purchase decision, [93] elucidate that both factors affect customers’ perceived diagnosticity; however, consumer “preference shifts from the higher-rated option with fewer reviews toward the lower-rated option with more reviews”. This result highlights the importance of review volume on purchase intention, especially for Arab consumers. That is because the large number of OCRs represents that many people have bought and recommended a product [94], and, thus, provides social proof of a product’s popularity [95] and may reduce the quality uncertainty associated with a purchase and improve book sales [96]. Therefore, the quantity of OCRs may have a significant impact on the purchase intentions of Arab customers as they are, in general, collectivist and high uncertainty avoidance cultures [34]. Accordingly, the growth in the number of OCRs of a book boosts purchase intention, thereby improving sales. Regarding ELM, review volume is considered as a peripheral cue as it is not related to the quality of information and does not require high cognitive effort by consumers. Hence, the following hypothesis is proposed: H5: A high review volume for a book increases Arab consumers’ purchase intention leading to greater sales. 5.6 Reviewer Identity Disclosure and Purchase Intention Reviewer identity disclosure refers to whether an OCR source (reviewer) exposes his/her personal information such as real name and location. Current literature has noted the importance of reviewer identity for a consumer when making purchase decisions and that affects product sales growth. Visible identity improves perceived message credibility and consumer trust [97, 98], thereby encouraging sales [57]. Zhao et al. [99] find that OCR source identity is significantly correlated with “consumers’ perceptions of online

A New Arabic Online Consumer Reviews Model

485

review credibility and subsequent purchase behaviours” and ultimately influences film sales in China. This result may also be valid for Arab consumers as both cultures are HC, collectivist and highly UA. Furthermore, based on a dataset of book reviews from Amazon, Forman et al. [57] state that disclosure of reviewers’ identity, in particular, demographic information, is associated with growth in subsequent online product sales as reviewers’ geographical location affects consumers’ online behaviour. Thus, because of the absence of face-to-face communication on product review websites, revealing the identity of a reviewer may be the most important cue for Arabic receivers of OCRs to trust reviews and reduce uncertainty related to a future purchase as Arabs are collectivist and high uncertainty avoidance societies, according to HCDF [34]. Hence, the following hypothesis is proposed: H6: A reviewer who discloses his/her personal information (identity discloser) increases Arab consumers’ purchase intention leading to greater sales. 5.7 Reviewer Reputation and Purchase Intention In ORPs, a reviewer is the main information source about products [100, 101] and thereby can affect consumers’ purchase intentions [74, 102] and sales [103]. The majority of current OCR studies have measured reputation by the number of friends and/or followers that a reviewer has (a reviewer’s social network). For example, Lee [104] finds that a reviewer with a large number of friends “tends to be more influential in generating a perceived value of the review” and, thus, has a significant impact on consumers’ purchasing decisions. Hu et al. [101] report that reviews of reputable reviewers reduce consumers’ perceived uncertainty about product quality and, consequently, help them in making purchase decisions. In the light of HCDF, Hu et al.’s [101] findings may apply to Arabs as they are high uncertainty avoidance societies. This is also supported by Alsaleh [105] who reports that product recommendations posted by repeatable bloggers significantly and directly affect a consumer’s purchasing attitude and intention in Kuwait, HC, high UA and collectivist society. Baek et al. [106] find that top-ranked reviewers are more credible to readers. More precisely, the authors reveal that for low-priced and experience goods (e.g. books), peripheral cues of reviews such as reviewer reputation (ranking) are more important for consumers. On the other hand, Fang et al. [66] argued that the reviewer reputation, as operationalised by reviewer rank, is negatively correlated with book sales on Dangdang.com. They explained that by criticising the website mechanism which ranks reviewers based on their online time and the number of reviews posted and which does not reflect reviewers’ expertise. This ranking approach may lead to uncertain and biased results. Nowadays, most ORPs rank reviewers based on their social network, such as “most followed” ranking on Goodreads. Thus, based on a reviewer’s number of followers, the next hypothesis is proposed: H7: A reputable reviewer (reviewer reputation) increases Arab consumers’ purchase intention leading to greater sales.

486

A. Alghamdi et al.

5.8 Reviewer Experience and Purchase Intention The experience of the review source (reviewer) is one of the most important factors that current OCR literature has studied to assess its impact on consumer behaviour. Consumers refer to reviewer experience to evaluate the credibility of an OCR, which significantly affects their purchase intentions [107]. Most of the previous studies have used the number of reviews contributed by a reviewer as a cue of reviewer experience [74, 108, 109]. In contrast, based on data from Yelp.com, Racherla and Friske [110] found that reviewer expertise (the number of reviews written by a reviewer) is negatively correlated with review usefulness and the effect size is higher in the experience than search goods. However, the authors analysed only reviews of local businesses from a single website and, more importantly, did not survey customers to capture their behaviour and to provide more detailed explanations for their results. Arguably, cultural differences between societies may play a crucial role in consumers’ evaluation of reviewer experience. For example, advertisements in uncertainty avoiding countries frequently show experts to recommend a product because these societies are inclined to believe them [34]. A recent study shows that Saudi Arabian online shoppers consider the frequency of reviews contributed by a reviewer as a significant factor when making purchasing decisions [111]. Drawing from HCDF, therefore, reviewer experience, as measured by the number of contributed reviews, affects the purchase intention of Arabic consumers, as they have a high UA score (see Fig. 1). Considering ELM, this factor does not need intense cognitive activity and not related to the quality of review content thereby it is a peripheral cue for customers. Hence, the following hypothesis is proposed: H8: A top ranking reviewer (reviewer experience) increases Arab consumers’ purchase intention leading to greater sales. In order to formulate the research hypotheses, this section has reviewed the extant literature on OCRs with reference to the theoretical foundations and conceptual framework provided earlier and elaborated further in this section. The studies presented revealed the significant impacts of various OCR factors on purchase intention and sales. Drawing on ELM, these factors are categorised into either central or peripheral routes. The research hypotheses are proposed by considering HCM and HCDF. The major gap in the current literature, however, is that little research has considered consumers’ cultural values to study the effects of OCRs on purchase intention. Moreover, the different types of products and ORPs that have been analysed led to inconsistent and contradictory findings about the effect of different OCR factors on consumer behaviour and sales. Thus, adopting the results of much existing literature in the Arabic region may lead to biased results. Hence, there is an urgent need for an empirical study on to what extent review- and reviewer-related factors affect consumers’ purchase intentions and how OCRs can be used to predict future sales within the Arabic context. The next section, therefore, moves on to discuss the future work that will be conducted to test and evaluate the proposed framework and hypotheses.

A New Arabic Online Consumer Reviews Model

487

6 Conclusion and Future Work In this position paper, we gave an overview of ORPs, OCRs, review- and reviewerrelated factors and their impact on purchase intention and sales and identified the gaps in the current literature. This study has developed a new model of Arabic online consumer reviews to investigate the factors that affect the purchase intentions of Arab consumers, in particular book reviews, as a case study. The model takes into consideration the cultural characteristics of Arabs by adopting Hall’s and Hofstede’s cultural frameworks (HCM and HCDF) to evaluate them from the perspectives of Arab consumers who use OCRs in their purchases. Our next work is to collect data and evaluate the model. A mixed method approach offers an effective way to validate the proposed framework. Following this, a quantitative survey will be conducted, with data being gathered via an online questionnaire to investigate customers’ perspectives towards the impact of OCR factors on their purchase intentions. The questionnaire will be randomly distributed among Arab book readers on the independent online review platform, Goodreds.com. We aim at a sample size of at least 400 participants. All constructs will be measured using a five-point Likert scale anchored at 1 = strongly disagree and 5 = strongly agree. Cronbach’s alpha will be used to measure the reliability of the measurement scale (i.e. the internal consistency of the pilot survey). Next, using structural equation modelling (SEM), confirmatory factor analysis (CFA) will be conducted to test the convergent validity of the construct. SPSS Amos will be adopted to analyse the data (structural model) and test the hypotheses. After that, semi-structured interviews will be employed to get deeper insights into the cultural values of Arab readers of OCRs according to HCM and HCDF models. We will target at least ten individuals who completed the questionnaire. The interview will look at the benefits of using OCRs for book selection and the reasons behind the respondents’ answers. The framework can be adopted by Arabic publishers to improve their sales. For consumers, this work will make it possible to deeply understand review content through the analysis of non-notable factors and reviewer factors that may have a significant impact on their buying decisions.

References 1. Thomas, M.J., Wirtz, B.W., Weyerer, J.C.: Determinants of online review credibility and its impact on consumers’ purchase intention. J. Electron. Commer. Res. 20(1), 1–20 (2019) 2. Zhu, F., Zhang, X.: Impact of online consumer reviews on sales: the moderating role of product and consumer characteristics. J. Mark. 74(2), 133–148 (2010) 3. Le, L.T., Ly, P.T.M., Nguyen, N.T., Tran, L.T.T.: Online reviews as a pacifying decisionmaking assistant. J. Retail. Consum. Serv. 64, 102805 (2022) 4. Bueno, I., Carrasco, R.A., Porcel, C., Kou, G., Herrera-Viedma, E.: A linguistic multi-criteria decision making methodology for the evaluation of tourist services considering customer opinion value. Appl. Soft Comput. 101, 107045 (2020) 5. Filieri, R., Raguseo, E., Vitari, C.: When are extreme ratings more helpful? Empirical evidence on the moderating effects of review characteristics and product type. Comput. Hum. Behav. 88, 134–142 (2018) 6. Alzate, M., Arce-Urriza, M., Cebollada, J.: Online reviews and product sales: the role of review visibility. J. Theor. Appl. Electron. Commer. Res. 16(1), 638–669 (2021)

488

A. Alghamdi et al.

7. Aakash, A., Aggarwal, A.G.: Measuring the effect of EWOM readability and sentiment on sales: online cellphone reviews. Int. J. Bus. Anal. 7(4), 24–42 (2020) 8. Li, X., Wu, C., Mai, F.: The effect of online reviews on product sales: a joint sentiment-topic analysis. Inf. Manag. 56(2), 172–184 (2019) 9. Murphy, R.: 2020 local consumer review survey. BrightLocal, pp. 1–46 (2020) 10. Podium: Podium 2017 state of online reviews (2017) 11. Mudambi, S.M., Schuff, D.: What makes a helpful online review? A study of customer reviews on Amazon.Com. Inorganica Chim. Acta. 378(1), 323–325 (2010) 12. Chen, Y., Xie, J.: Online consumer review: word-of-mouth as a new element of marketing communication mix. Manag. Sci. 54(3), 477–491 (2008) 13. Johnson, T.J., Kaye, B.K.: Wag the blog: how reliance on traditional media and the internet influence credibility perceptions of weblogs among blog users. J. Mass Commun. Q. 81(3), 622–642 (2004) 14. Bondi, T.: Alone, together : product discovery through consumer ratings, September 2019 15. Um, N.H.: Antecedents and consequences of consumers’ attitude toward social commerce sites. J. Promot. Manag. 25(4), 500–519 (2019) 16. Hong, H., Xu, D., Wang, G.A., Fan, W.: Understanding the determinants of online review helpfulness: a meta-analytic investigation. Decis. Support Syst. 102, 1–11 (2017) 17. Filieri, R., McLeay, F.: E-WOM and accommodation: an analysis of the factors that influence travelers’ adoption of information from online reviews. J. Travel Res. 53(1), 44–57 (2014) 18. Gu, B., Park, J., Konana, P.: The impact of external word-of-mouth sources on retailer sales of high-involvement products. Inf. Syst. Res. 23(1), 182–196 (2012) 19. Floyd, K., Freling, R., Alhoqail, S., Cho, H.Y., Freling, T.: How online product reviews affect retail sales: a meta-analysis. J. Retail. 90(2), 217–232 (2014) 20. Senecal, S., Nantel, J.: The influence of online product recommendations on consumers’ online choices. J. Retail. 80(2), 159–169 (2004) 21. Yang, S., Zhou, C., Chen, Y.: Do topic consistency and linguistic style similarity affect online review helpfulness? An elaboration likelihood model perspective. Inf. Process. Manag. 58(3), 102521 (2021) 22. Mousavizadeh, M., Koohikamali, M., Salehan, M., Kim, D.J.: An investigation of peripheral and central cues of online customer review voting and helpfulness through the lens of elaboration likelihood model. Inf. Syst. Front. (1) (2020) 23. Roy, G., Datta, B., Basu, R.: Effect of eWOM valence on online retail sales. Glob. Bus. Rev. 18(1), 198–209 (2017) 24. Tsao, W.-C.: Which type of online review is more persuasive? The influence of consumer reviews and critic ratings on moviegoers. Electron. Commer. Res. 14(4), 559–583 (2014). https://doi.org/10.1007/s10660-014-9160-5 25. Park, D.H., Kim, S.: The effects of consumer knowledge on message processing of electronic word-of-mouth via online consumer reviews. Electron. Commer. Res. Appl. 7(4), 399–410 (2008) 26. Petty, R.E., Cacioppo, J.T.: The elaboration likelihood model of persuasion. Adv. Exp. Soc. Psychol. 19(C), 123–205 (1986) 27. Petty, R.E., Cacioppo, J.T., Strathman, A.J., Priester, J.R.: To think or not to think. Exploring two routes to persuasion. Persuas. Psychol. Insights Perspect., 81–116 (2005) 28. Park, S., Nicolau, J.L.: Asymmetric effects of online consumer reviews. Ann. Tour. Res. 50, 67–83 (2015) 29. SanJosé-Cabezudo, R., Gutiérrez-Arranz, A.M., Gutiérrez-Cillán, J.: The combined influence of central and peripheral routes in the online persuasion process. Cyberpsychol. Behav. 12(3), 299–308 (2009)

A New Arabic Online Consumer Reviews Model

489

30. Ismagilova, E., Slade, E.L., Rana, N.P., Dwivedi, Y.K.: The effect of electronic word of mouth communications on intention to buy: a meta-analysis. Inf. Syst. Front. 22(5), 1203–1226 (2019). https://doi.org/10.1007/s10796-019-09924-y 31. Srivastava, V., Kalro, A.D.: Enhancing the helpfulness of online consumer reviews: the role of latent (content) factors. J. Interact. Mark. 48, 33–50 (2019) 32. Zhu, L., Yin, G., He, W.: Is this opinion leader’s review useful? Peripheral cues for online review helpfulness. J. Electron. Commer. Res. 15(4), 267–280 (2014) 33. Pan, Y., Zhang, J.Q.: Born unequal: a study of the helpfulness of user-generated product reviews. J. Retail. 87(4), 598–612 (2011) 34. Hofstede, G., Hofstede, G.J., Minkov, M.: Cultures and organizations: software of the mind: intercultural cooperation and its importance for survival (2010) 35. Hall, E.T.: Beyond Culture. Anchor Press, Garden City, N.Y. (1976) 36. Hall, E.T., Hall, M.: Understanding Cultural Differences. Intercultural Press, Yarmouth (1990) 37. Lee, K.Y., Choi, H.: Predictors of electronic word-of-mouth behavior on social networking sites in the United States and Korea: cultural and social relationship variables. Comput. Human Behav. 94, 9–18 (2019) 38. Rosillo-Díaz, E., Blanco-Encomienda, F.J., Crespo-Almendros, E.: A cross-cultural analysis of perceived product quality, perceived risk and purchase intention in e-commerce platforms. J. Enterp. Inf. Manag. 33(1), 139–160 (2020) 39. Usunier, J.C., Roulin, N.: The influence of high-and low-context communication styles on the design, content, and language of business-to-business web sites. J. Bus. Commun. 47(2), 189–227 (2010) 40. Hofstede, G.: Culture’s Consequences: Comparing Values, Behaviors, Institutions and organizations across nations. Sage, Thousand Oaks (2001) 41. Hofstede, G.: Culture’s Consequences: International Differences in Work-Related Values. SAGE Publications, Beverly Hills (1984) 42. ur Rehman, A.: Consumers’ perceived value of luxury goods through the lens of Hofstede cultural dimensions: a cross-cultural study. J. Public Aff. e2660 (2021) 43. Alsaleh, D.A., Elliott, M.T., Fu, F.Q., Thakur, R.: Cross-cultural differences in the adoption of social media. J. Res. Interact. Mark. 13(1), 119–140 (2019) 44. Hofstede, G.: Dimensionalizing cultures: the Hofstede model in context. Online Read. Psychol. Cult. 2(1), 2307–2919 (2011) 45. Leidner, D.E., Kayworth, T.: Review: a review of culture in information systems research: toward a theory of information technology culture conflict. MIS Q. 30(2), 357–399 (2006) 46. Fischer, R.A.L., Walczuch, R., Guzman, E.: Does culture matter? Impact of individualism and uncertainty avoidance on app reviews. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Society, pp. 67–76 (2021) 47. Cheong, H.J., Mohammed-Baksh, S.: U.S. and Korean consumers: a cross-cultural examination of product information-seeking and-giving. J. Promot. Manag. 26(6), 893–910 (2020) 48. Mazaheri, E., Richard, M.O., Laroche, M.: Online consumer behavior: comparing Canadian and Chinese website visitors. J. Bus. Res. 64(9), 958–965 (2011) 49. Jahandideh, B., Golmohammadi, A., Meng, F., O’Gorman, K.D., Taheri, B.: Cross-cultural comparison of Chinese and Arab consumer complaint behavior in the hotel context. Int. J. Hosp. Manag. 41, 67–76 (2014) 50. Compare countries - Hofstede Insights. https://www.hofstede-insights.com/product/com pare-countries/. Accessed 06 Jan 2022 51. Allen, M.: The SAGE Encyclopedia of Communication Research Methods. SAGE Publications, Inc., Thousand Oaks (2017)

490

A. Alghamdi et al.

52. Pentina, I., Basmanova, O., Zhang, L., Ukis, Y.: Exploring the role of culture in eWOM adoption. MIS Rev. 20(2), 1–26 (2015) 53. Jia, H., Shin, S., Jiao, J.: Does the length of a review matter in perceived helpfulness? The moderating role of product experience. J. Res. Interact. Mark. 16(2), 221–236 (2021) 54. Wu, C., Mai, F., Li, X.: The effect of content depth and deviation on online review helpfulness: evidence from double-hurdle model. Inf. Manag. 58(2), 103408 (2021) 55. Wang, Y., Wang, J., Yao, T.: What makes a helpful online review? A meta-analysis of review characteristics. Electron. Commer. Res. 19(2), 257–284 (2018). https://doi.org/10.1007/s10 660-018-9310-2 56. Chua, A.Y.K., Banerjee, S.: Understanding review helpfulness as a function of reviewer reputation, review rating, and review depth. J. Am. Soc. Inf. Sci. Technol. 64, 1852–1863 (2015) 57. Forman, C., Ghose, A., Wiesenfeld, B.: Examining the relationship between reviews and sales: the role of reviewer identity disclosure in electronic markets. Inf. Syst. Res. 19(3), 291–313 (2008) 58. Wu, J.: Review popularity and review helpfulness: a model for user review effectiveness. Decis. Support Syst. 97, 92–103 (2017) 59. Bhattacherjee, A., Sanford, C.: Influence processes for information technology acceptance: an elaboration likelihood model. MIS Q. 30(4), 805–825 (2006) 60. McCloskey, D.W.: An examination of the data quality of online reviews: who do consumers trust? J. Electron. Commer. Organ. 19(1), 24–42 (2021) 61. Fresneda, J.E., Gefen, D.: A semantic measure of online review helpfulness and the importance of message entropy. Decis. Support Syst. 125, 113117 (2019) 62. Guo, B., Zhou, S.: What makes population perception of review helpfulness: an information processing perspective. Electron. Commer. Res. 17(4), 585–608 (2017). https://doi.org/ 10.1007/s10660-016-9234-7 63. Kaur, W., Balakrishnan, V., Rana, O., Sinniah, A.: Liking, sharing, commenting and reacting on Facebook: user behaviors’ impact on sentiment intensity. Telemat. Inform. 39, 25–36 (2019) 64. Fink, L., Rosenfeld, L., Ravid, G.: Longer online reviews are not necessarily better. Int. J. Inf. Manag. 39, 30–37 (2018) 65. Zhang, J.Q., Craciun, G., Shin, D.: When does electronic word-of-mouth matter? A study of consumer product reviews. J. Bus. Res. 63(12), 1336–1341 (2010) 66. Fang, H., Zhang, J., Bao, Y., Zhu, Q.: Towards effective online review systems in the Chinese context: a cross-cultural empirical study. Electron. Commer. Res. Appl. 12(3), 208–220 (2013) 67. Abbasi, A., Chen, H.: Applying authorship analysis to extremist-group Web forum messages. In: International Conference on Intelligence and Security Informatics, pp. 183–197 (2005) 68. Al-Kabi, M.N., Abdulla, N.A., Al-Ayyoub, M.: An analytical study of Arabic sentiments: Maktoob case study. In: 2013 8th International Conference for Internet Technology and Secured Transactions (ICITST 2013), pp. 89–94 (2013) 69. Zhang, C., Tong, T., Bu, Y.: Examining differences among book reviews from various online platforms. Online Inf. Rev. ahead-of-p (ahead-of-print), 1169–1187 (2019) 70. Wang, C.-C., Wang, Y.-T.: Persuasion effect of e-WOM: the impact of involvement and ambiguity tolerance. J. Glob. Acad. Mark. Sci. 20(4), 281–293 (2010) 71. Liu, Y.: Word of mouth for movies: its dynamics and impact on box office revenue. J. Mark. 70(3), 74–89 (2006) 72. Ghasemaghaei, M., Eslami, S.P., Deal, K., Hassanein, K.: Reviews’ length and sentiment as correlates of online reviews’ ratings. Internet Res. 28(3), 544–563 (2018) 73. Quaschning, S., Pandelaere, M., Vermeir, I.: When consistency matters: the effect of valence consistency on review helpfulness. J. Comput. Commun. 20(2), 136–152 (2015)

A New Arabic Online Consumer Reviews Model

491

74. Syafganti, I., Walrave, M.: Assessing the effects of valence and reviewers’ expertise on consumers’ intention to book and recommend a hotel. Int. J. Hosp. Tour. Adm. 1–21 (2021) 75. East, R., Uncles, M.D., Romaniuk, J., Lomax, W.: Measuring the impact of positive and negative word of mouth: a reappraisal. Australas. Mark. J. 24(1), 54–58 (2016) 76. Chevalier, J.A., Mayzlin, D.: The effect of word of mouth on sales: online book reviews. J. Mark. Res. 43(3), 345–354 (2006) 77. Ghose, A., Ipeirotis, P.G.: Estimating the helpfulness and economic impact of product reviews: mining text and reviewer characteristics. IEEE Trans. Knowl. Data Eng. 23(10), 1498–1512 (2010) 78. Barbro, P.A., Mudambi, S.M., Schuff, D.: Do country and culture influence online reviews? An analysis of a multinational retailer’s country-specific sites. J. Int. Consum. Mark. 32(1), 1–14 (2020) 79. Yang, J., Sarathy, R., Walsh, S.M.: Do review valence and review volume impact consumers’ purchase decisions as assumed? Nankai Bus. Rev. Int. 7(2), 231–257 (2016) 80. Fang, B., Ye, Q., Kucukusta, D., Law, R.: Analysis of the perceived value of online tourism reviews: influence of readability and reviewer characteristics. Tour. Manag. 52, 498–506 (2016) 81. Agnihotri, A., Bhattacharya, S.: Understanding the acceptance of mobile SMS advertising among young Chinese consumers. Psychol. Mark. 33(11), 1006–1017 (2016) 82. Wurtz, E.: Intercultural communication on web sites: a cross-cultural analysis of web sites from high-context cultures and low-context cultures. J. Comput. Commun. 11(1), 274–299 (2005) 83. Sharma, H., Aggarwal, A.G.: The influence of user generated content on hotel sales: an Indian perspective. J. Model. Manag. 16(4), 1358–1375 (2021) 84. Chen, M.Y., Teng, C.I., Chiou, K.W.: The helpfulness of online reviews: images in review content and the facial expressions of reviewers’ avatars. Online Inf. Rev. 44(1), 90–113 (2020) 85. Zinko, R., Burgh-Woodman, H.D., Furner, Z.Z., Kim, S.J.: Seeing is believing: the effects of images on trust and purchase intent in eWoM for hedonic and utilitarian products. J. Organ. End User Comput. 33(2), 85–104 (2021) 86. Zinko, R., Stolk, P., Furner, Z., Almond, B.: A picture is worth a thousand words: how images influence information quality and information load in online reviews. Electron. Mark. 30(4), 775–789 (2019). https://doi.org/10.1007/s12525-019-00345-y 87. Casaló, L.V., Flavián, C., Guinalíu, M., Ekinci, Y.: Avoiding the dark side of positive online consumer reviews: enhancing reviews’ usefulness for high risk-averse travelers. J. Bus. Res. 68(9), 1829–1835 (2015) 88. Zhao, Z., Wang, J., Sun, H., Liu, Y., Fan, Z., Xuan, F.: What factors influence online product sales? Online reviews, review system curation, online promotional marketing and seller guarantees analysis. IEEE Access. 8, 3920–3931 (2020) 89. Cheng, Y.H., Ho, H.Y.: Social influence’s impact on reader perceptions of online reviews. J. Bus. Res. 68(4), 883–887 (2015) 90. Jabr, W.: Review credibility as a safeguard against fakery: the case of Amazon. Eur. J. Inf. Syst. 31(4), 1–21 (2021) 91. Ren, J., Nickerson, J.V.: Arousal, valence, and volume: how the influence of online review characteristics differs with respect to utilitarian and hedonic products. Eur. J. Inf. Syst. 28(3), 272–290 (2019) 92. Duan, W., Gu, B., Whinston, A.B.: Do online reviews matter? - An empirical investigation of panel data. Decis. Support Syst. 45(4), 1007–1016 (2008) 93. Watson, J., Ghosh, A.P., Trusov, M.: Swayed by the numbers: the consequences of displaying product review attributes. J. Mark. 82(6), 109–131 (2018)

492

A. Alghamdi et al.

94. Park, D.H., Lee, J., Han, I.: The effect of on-line consumer reviews on consumer purchasing intention: the moderating role of involvement. Int. J. Electron. Commer. 11(4), 125–148 (2007) 95. Gavilan, D., Avello, M., Martinez-Navarro, G.: The influence of online ratings and reviews on hotel booking consideration. Tour. Manag. 66, 53–61 (2018) 96. Chen, P.-Y., Wu, S., Yoon, J.: The impact of online consumer reviews on sales. In: International Conference on Information Systems, pp. 711–724 (2004) 97. Dou, X., Walden, J.A., Lee, S., Lee, J.Y.: Does source matter? Examining source effects in online product reviews. Comput. Human Behav. 28(5), 1555–1563 (2012) 98. Kusumasondjaja, S., Shanka, T., Marchegiani, C.: Credibility of online reviews and initial trust: the roles of reviewer’s identity and review valence. J. Vacat. Mark. 18(3), 185–195 (2012) 99. Zhao, K., Yang, X., Tao, X., Xu, X., Zhao, J.: Exploring the differential effects of online reviews on film’s box-office success: source identity and brand equity from an integrated perspective. Front. Psychol. 11, 1–19 (2020) 100. Wu, X., Jin, L., Xu, Q.: Expertise makes perfect: how the variance of a reviewer’s historical ratings influences the persuasiveness of online reviews. J. Retail. 97(2), 238–250 (2021) 101. Hu, N., Liu, L., Zhang, J.J.: Do online reviews affect product sales? The role of reviewer characteristics and temporal effects. Inf. Technol. Manag. 9(3), 201–214 (2008) 102. Li, S.T., Pham, T.T., Chuang, H.C.: Do reviewers’ words affect predicting their helpfulness ratings? Locating helpful reviewers by linguistics styles. Inf. Manag. 56(1), 28–38 (2019) 103. Lee, S., Choeh, J.Y.: Using the social influence of electronic word-of-mouth for predicting product sales: the moderating effect of review or reviewer helpfulness and product type. Sustainability 12(19), 7952 (2020) 104. Lee, I.: Usefulness, funniness, and coolness votes of viewers: an analysis of social shoppers’ online reviews. Ind. Manag. Data Syst. 118(4), 700–713 (2018) 105. Alsaleh, D.: Understanding the role of blogger recommendations on consumer purchasing behavior. J. Bus. Inq. 17(1), 23–40 (2017) 106. Baek, H., Ahn, J., Choi, Y.: Helpfulness of online consumer reviews: readers’ objectives and review cues. Int. J. Electron. Commer. 17(2), 99–126 (2012) 107. Koo, D.M.: Impact of tie strength and experience on the effectiveness of online service recommendations. Electron. Commer. Res. Appl. 15, 38–51 (2016) 108. Choi, H.S., Leon, S.: An empirical investigation of online review helpfulness: a big data perspective. Decis. Support Syst. 139, 113403 (2020) 109. Liu, Z., Park, S.: What makes a useful online review? Implication for travel product websites. Tour. Manag. 47, 140–151 (2015) 110. Racherla, P., Friske, W.: Perceived “usefulness” of online consumer reviews: an exploratory investigation across three services categories. Electron. Commer. Res. Appl. 11(6), 548–559 (2012) 111. Almana, A.M., Mirza, A.A.: The impact of electronic word of mouth on consumers’ purchasing decisions. Int. J. Comput. Appl. 82(9), 183–193 (2013)

Trustworthy Artificial Intelligence for Cyber Threat Analysis Shuangbao Paul Wang1(B) and Paul A. Mullin2 1

Department of Computer Science, Morgan State University, Baltimore, MD, USA [email protected] 2 Softrams, LLC, Leesburg, VA, USA

Abstract. Artificial Intelligence brings innovations into the society. However, bias and unethical exist in many algorithms that make the applications less trustworthy. Threats hunting algorithms based on machine learning have shown great advantage over classical methods. Reinforcement learning models are getting more accurate for identifying not only signature-based but also behavior-based threats. Quantum mechanics brings a new dimension in improving classification speed with exponential advantage. In this research, we developed a machine learning-based cyber threat detection and assessment tool. It uses twostage (unsupervised and supervised learning) analyzing method on 822,226 log data recorded from a web server on AWS cloud. The results show the algorithm has the ability to identify the threats with high confidence. Keywords: Trustworthy AI

1

· Log analysis · User behavior prediction

Overview

The advancement of Artificial Intelligence (AI) has accelerated the adoption and integration of innovations into many frontiers including automobiles, banking, insurances, etc. Machine learning can reveal a lot of things that human beings can hardly find out. On the theoretical side, Machine Learning (ML) and deep learning algorithms are able to not only analyze data efficiently but also accumulate the knowledge gained from previous learning. The ML models are getting improved from time to time with new feed-in data to the neural networks under reinforcement learning. On the other hand, the accuracy or even the correctness of the AI/ML algorithms could be affected by many factors, from algorithm, data, to prejudicial, or even intentional. As a result, AI/ML applications need to be not only accurate, robust, reliable, transparent, non-biased, and accountable but also deemed trustworthy and are able to adapt, recover, reconfigure in response to challenges. To achieve this goal, foundational study about trust and trustworthiness is vital. Use-inspired research can bring new discoveries into commercialization to benefit the society. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023  K. Arai (Ed.): IntelliSys 2022, LNNS 542, pp. 493–504, 2023. https://doi.org/10.1007/978-3-031-16072-1_36

494

S. P. Wang and P. A. Mullin

Cyber threat analysis involves billions of data either live or recorded from servers, firewalls, intrusion detection and prevention systems, and network devices [25,27]. Signature-based analyzing method is effective only if the attack vectors are pre-defined and already stored in the knowledge database. Hence, if threat actors change their behavior, it would be hard to capture. To identify the abnormal behaviors, threat analytics systems need to not only examine the irregular patterns at certain time but also observe the behavior changes comparing with the normal states. ML algorithms can be very helpful in identifying cyber threats with reinforcement learning models. They can be classified as: – Unsupervised Learning. A clustering model attempts to find groups, similarities, and relationships within unlabelled data, Fig. 1 illustrates a unsupervised learning model.

Fig. 1. Unsupervised Learning

– Supervised Learning. A classification model to identify how input variables contribute to the categorization of data points. Figure 2 depicts a supervised learning model.

Fig. 2. Supervised Learning

– Semi-Supervised Learning. A classification model falls between supervised learning and unsupervised learning by combining a small amount of labeled data with a large amount of unlabeled data during training.

Trustworthy Artificial Intelligence for Cyber Threat Analysis

495

– Reinforcement Learning. Reinforcement learning is characterized by a continuous loop where an agent interacts with an environment and measures the consequences of its actions. Figure 3 shows a reinforcement learning model. 1

Fig. 3. Reinforcement Learning

– Transfer Learning. Transfer learning stores knowledge gained while solving one problem and applies it to a different but related problem. Public sector organizations operate external-facing applications with broad multi-task user interfaces. There are significant investments made in humancentered design, but new opportunities exist to enhance personalization of the interfaces through AI/ML to reduce burden on the public users in navigation and task performance. The problem involves leveraging information from user roles, prior behavior, schedules of required activities, and other characteristics to predict the intended reasons why users are entering a system at any given time, and where they plan to navigate and work. This information would be used to facilitate the users in those navigations. Tracking and log data from servers on AWS or Google Analytics can be used for exploration in the development of a methodology. More Machine Learning complex navigation analytics can be explored using behavior analysis method and visualized through business intelligence dashboards. 1.1

Literature Review

Artificial intelligence revolutionizes industry and everyday life in many aspects. The Deep Blue computer can defeat the greatest human chess player in the world. The autonomous vehicles such as Tesla can drive on the road without human interactions. Machine learning can reveal a lot of things that human beings can hardly find out. By analyzing music using IBM Watson AI, people can learn the 1

Fig. 1, 2, 3 image source: [3].

496

S. P. Wang and P. A. Mullin

mode of songs and hence discovered that the majority of songs from 60s to now are in the mode of “sadness”. In cybersecurity, AI/ML is used to deep inspect the packets, analyze the network activities, and discover abnormal behaviors. Sagar et al. conducted a survey of cybersecurity using artificial intelligence [7]. It discusses the need for applying neural networks and machine learning algorithms in cybersecurity. Mittu et al. proposed a way to use machine learning to detect advanced persistence threats (APT) [17]. The approach can address APT that can cause damages to information systems and cloud computing platforms. Mohana et al. proposed a methodology to use genetic algorithms and neural networks to better safe guard data [4]. A key produced by a neural network is said to be stronger for encryption and decryption. With a grant from the National Science Foundation (NSF), Wang and Kelly developed a video data analytics tool that can penetrate into videos to “understand” the context of the video and the language spoken [26]. Kumbar proposed a fuzzy system for pattern recognition and data mining [14]. It is effective in fighting phishing attacks by identifying malware. Using Natural Language Processing (NLP), Wang developed an approach that can identify issues with cybersecurity policies in financial processing process so financial banking companies can comply with PCI/DSS industry standards. Harini used intelligent agent to reduce or prevent distributed denial of service (DDoS) attacks [21]. An expert system is used to identify malicious codes to prevent being installed in the target systems. With a grant from National Security Agency (NSA), Wang and his team developed an intelligent system for cybersecurity curriculum development. The system is able to develop training and curricula following the National Initiative of Cybersecurity Education (NICE) framework. Dilrmaghani et al. provide an overview of the existing threats that violate security and privacy within AI/ML algorithms [8]. Gupta et al. studied quantum machine learning that uses quantum computation in artificial intelligence and deep neural networks. They proposed a quantum neuron layer aiming to speed up the classification process [13]. Mohanty et al. surveyed quantum machine learning algorithms and quantum AI applications [18]. Edwards and Rawat conducted a survey on quantum adversarial machine learning by adding a small noise that leads to classifier to predict to a different result [10]. By depolarization, noise reduction and adversarial training, the system can reduce the risk of adversarial attacks. Ding at al. proposed a quantum support vector machine (SVM) algorithm that can achieve exponential speedup for least squares SVM (LS-SVM) [9]. The experiments show it has the advantages in solving problems such as face recognition and signal processing. Ablayev et al. provided a review of quantum methods for machine learning problems [2]. In quantum tools, it lists the fundamentals of qubits, quantum

Trustworthy Artificial Intelligence for Cyber Threat Analysis

497

registers, and quantum states, tools on quantum search algorithms. In quantum classification algorithms, it introduces several classification problems that can be accelerated by using quantum algorithms. Wang & Sakk conducted a survey of overviews, foundations, and speedups in quantum algorithms [28]. It provides a detailed discussion on period finding, the key of quantum advantage to factor large numbers. 1.2

Bias in Existing AI/ML Algorithms

People hope the neutrality in AI/ML algorithms. Unfortunately, bias does exist. Washington Post published an article that “credit scores are supposed to be raceneural. That’s impossible”. Forbes published a report that “self-driving card are more likely to recognize white pedestrians than black pedestrians, resulting in decreased safety for darker-skinned individuals”. A Nature paper reports that “a major healthcare company used an algorithm that deemed black patients less worthy of critical healthcare than others with similar medical conditions”. Associate Press published an article that “financial technology companies have been shown to discriminate against black and latinx households via higher mortgage interest rates”. In general, bias can be classified into the following categories: – Algorithm Bias. Bias as a result of inaccurate algorithms are used. – Data Bias. Bias due to incorrectly sample the data for training that are not reflect the whole data set. – Prejudicial Bias. Feeding model with prejudicial knowledge, for example, “nurses are female”. – Measurement Bias. Bias as a result of incorrect measurement. – Intentional Bias. People embed unjust or discriminatory rules in the AI/ML models. Trustworthy AI/ML is to discover those bias and build robust AI/ML algorithms that is trustworthy. For example, a Tesla with autopilot could crash onto a fire truck, which is hardly possible even for the worst human drivers. This shows that the nine “eyes” (radar system) under artificial intelligence are still inferior than two eyes of human intelligence. To be trustworthy, Tesla cars need to be trained with reliable data that can “filter” the “noise” caused by the emergency light flashing, which changes the images of a fire truck. Effort has been put in countering the AI bias. Obaidat et al. uses random sampling on images with a convolutional neural network (CNN) [20]. They tested using the Fashion-MNIST dataset that consists of 70,000 images, 60,000 for training and 10,000 for testing with an accuracy of 87.8%.

498

S. P. Wang and P. A. Mullin

Fig. 4. Two-tier analyzing method for rating bias in AI/ML algorithms

Bernagozzi et al. at IBM conducted a survey that reveals the presence of bias in chatbots and online language translators using two-tier method to rate bias [5]. The two-tier method is illustrated in Fig. 4. Lohia et al. at IBM proposed a framework that can detect bias to improve fairness [15]. The algorithms detects individual bias and then post-processing for bias mitigations.

2

Using Adversarial ML Model to Discover Bias

Adversarial ML model is commonly used to attack ML algorithms in a way that the models would function abnormally. McAfee once attacked Tesla’s system by adding a strip to a speed limit sign that fooled the car to drive 50 mph over the speed limit. A “stealth” ware technology with adversarial pattern on glasses and clothes can fool facial recognition systems. Adversarial attacks can be classified into three categories: – Evasion. An attack uses steganography and other technologies to obfuscate the textual content. – Poisoning. An attack to contaminate the training data. – Model stealing. An attack to consider the target as a black box and try to extract data from the model. By introducing quantum neuron layers, there is a potential to speed up the classification process with an acceptable error rate. 2.1

Notation of Bias and Mitigation

Consider a supervised classification problem with features X ∈ X , categorical protected attributes D ∈ D, and categorical labels Y ∈ Y. We are given a set of training samples {(x1 , d1 , y1 ), ..., (xn , dn , yn )} and would like to learn a classifier

Trustworthy Artificial Intelligence for Cyber Threat Analysis

499

yˆ : X × D → Y. For ease of exposition, we will only consider a scalar binary protected attribute, i.e. D = {0, 1}. The value d = 1 is set to correspond to the privileged group (e.g. whites in the United States in criminal justice application) and d = 0 to unprivileged group (e.g. blacks). The value y = 1 is set to correspond to a favorable outcome. Based on the context, we may also deal with probabilistic binary classifiers with continuous output scores yˆS ∈ [0, 1] that are thresholded to {0, 1}. One definition of individual bias is as follows. Sample i has individual bias y (xi , d = 0) = yˆ(xi , d = 1)], where if yˆ(xi , d = 0) = yˆ(xi , d = 1). Let bi = I[ˆ I[·] is an indicator function. The individual bias score, bS,i = yˆS (x + i, d = 1) − yˆS (xi , d = 0), is a soft version of bi . To compute an individual bias summary statistic, we take the average of bi across test samples. One notion of group fairness known as disparate impact is defined as follows [15]. There is disparate impact if the division of the expected values E[ˆ y (X, D)|D = 0] E[ˆ y (X, D)|D = 1]

(1)

is less than 1 −  or greater than (1 − )−1 , where a common value of  is 0.2. The test procedure usually divided into two steps: 1) determine whether there are any trials of individual bias, 2) discover the found bias against all samples. Mitigation can be performed by changing the label outputs of the classifier yˆi to other labels y˘ ∈ Y. Zhang et al. use federated learning (FL) in privacy-aware distributed machine learning application [29]. Experiments show it can reduce the discrimination index of all demographic groups by 13.2% to 69.4% with the COMPAS dataset.

3

AI Bias and ML-Based Threat Analytics

Artificial intelligence uses data to train models and uses an inference engine to draw a conclusion or predict the outcome. The overall architecture can be shown as Fig. 5. Su et al. utilize open source testing and analysis tools Hping3, Scapy to simulate DDoS flood and import the data collected from those simulated attacks into Splunk to identify possible attacks [22]. The Splunk machine learning toolkit extends the capabilities of Splunk. Ngoc et al. proposed an early warning approach to counter APT. It analyzes the APT target using log analysis techniques [19]. Counterfit is an AI security risk assessment package developed at Microsoft. The open-source tool helps companies conduct AI risk assessment to ensure the AI/ML algorithms are non-bias, reliable, and trustworthy [16].

4

Programming and Experiments

Detecting web attacks using machine learning is an area that has drawn attention and requires continuous research and development. This project analyzes 822,226

500

S. P. Wang and P. A. Mullin

Fig. 5. Machine learning architecture.

log records from a healthcare IT company’s web login page in a 5 h time span. After cleaning and pre-processing the data, the algorithm detected records that could potentially be attacks. It then calculated the likelihood (of attacks) based on the abnormal behaviors. The main strategy is to use unsupervised learning for better understanding the distribution of the input data. Supervised learning is then applied for further classification and generating predictions. As a result, the model learns how to predict/classify on output from new inputs. Reinforcement learning learns from experiences over time. The algorithm can be improved with more data feed into the system. The application first loads the input data into a Pandas dataframe, then removes features that are not of interests in detecting attacks. Next, data are “compressed” from 800,000+ to around 40,000 by combining the records that have the same source and destination ip addresses in the same unit time period. The higher the compression rate, the more the duplications in the dataset. This improved the efficiency of machine learning process. Unsupervised machine learning is applied to the dataset using K-means clustering. The output three clusters are labeled as not-suspicious, suspicious, and transitional area, as shown in Fig. 6. The pre-processed data are then splitted into 0.66/0.33 for training/testing and further analyzing the likelihood of each response’s abnormal behaviors. Using results (three clusters) from the unsupervised learning as a supervisor, the algorithm continues apply supervised machine learning to discover the threats. In addition to areas that are considered “confident” or “no confident”, the transition (in yellow) area is further analyzed using k-mean clustering to separate

Trustworthy Artificial Intelligence for Cyber Threat Analysis

501

Fig. 6. Classification using unsupervised learning

into 2 clusters, labeled as “more suspicious” and “less suspicious”, as shown in Fig. 7. The “more suspicious” tags are then added into the suspicious activity dataset. By doing so it ensures the machine does not miss any responses that get filtered out during analyzing process but is still suspected having abnormal behaviors. The likelihood of the suspicion is calculated based on the percentage of “attacks” over the maximum responses per second. An attack for a general log-in page is defined as considerable number of visits, responses, callbacks in a short period of time. Thus, pre-processing the data by combining each duplicating responses per second helps determine the number of responses or visits that stands out. The application can be improved by feeding into more and rich data so risks associated with human behaviors can be identified. The 3–2 two-tier classification technique helps narrowing down the suspicious activities. If the k-means clustering is applied only once with 2 clusters, the uncertain groups of dataset would possibly be wrong. Therefore, creating a transition area in the middle of two certainties helps detecting the potential attacks that could be missed. The result is saved into result.csv and all detected attacks are saved in the suspicious activity.csv. A list of references for this Python application can be found in [1,6,11,12,23,24]. The team is conducting quantum-inspired machine learning research to identify cyber threats in data and in AI/ML tools. It is expected that the new findings will further improve the speed and accuracy of identifying cyber threats.

502

S. P. Wang and P. A. Mullin

Fig. 7. Classification using supervised learning

References 1. Abdulraheem, M., Ibraheem, N.: A detailed analysis of new intrusion detection dataset. In: Semantic Scolar (2019) 2. Ablayev, F., Ablayev, M., Huang, J.Z., Khadiev, K., Salikhova, N., Dingming, W.: On quantum methods for machine learning problems part i: Quantum tools. Big Data Mining and Analytics 3(1), 41–55 (2020) 3. Amazon. Aws, 3 (2022) 4. Mohana, K., Venugopal, V.K., Sathwik, H.N.: Data security using genetic algorithm and artificial neural network. Int. J. Sci. Eng. Res. 5, 543–548 (2014) 5. Bernagozzi, M., Srivastava, B., Rossi, F., Usmani, S.: Gender bias in online language translators: visualization, human perception, and bias/accuracy trade-offs. IEEE Internet Comput. 25, 53–63 (2021) 6. Betarte, G., Pardo, A., Mart´ınez, R.: Web application attacks detection using machine learning techniques. In: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 1065–1072 (2018) 7. Sagar, B.S., Niranjan, S., Kashyap, N., Sachin, D.N.: Providing cyber security using artificial intelligence - a survey. In: 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC), pp. 717–720 (2019) 8. Dilmaghani, S., Brust, M.R., Danoy, G., Cassagnes, N., Pecero, J., Bouvry, P.: Privacy and security of big data in AI systems: a research and standards perspective. In: 2019 IEEE International Conference on Big Data (Big Data), pp. 5737–5743 (2019) 9. Chen Ding, Tian-Yi Bao, and He-Liang Huang. Quantum-inspired support vector machine. IEEE Trans. Neural Networks Learn. Syst. 1–13 (2021) 10. Edwards, D., Rawat, D.B.: Quantum adversarial machine learning: status, challenges and perspectives. In: 2020 Second IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA), pp. 128– 133 (2020) 11. Geron, A.: Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, 2nd edn. O’Reilly (2019)

Trustworthy Artificial Intelligence for Cyber Threat Analysis

503

12. Gong, X., et al.: Estimating web attack detection via model uncertainty from inaccurate annotation. In: 2019 6th IEEE International Conference on Cyber Security and Cloud Computing (CSCloud)/2019 5th IEEE International Conference on Edge Computing and Scalable Cloud (EdgeCom), pp. 53–58 (2019) 13. Gupta, S., Mohanta, S., Chakraborty, M., Ghosh, S.: Quantum machine learningusing quantum computation in artificial intelligence and deep neural networks: quantum computation and machine learning in artificial intelligence. In: 2017 8th Annual Industrial Automation and Electromechanical Engineering Conference (IEMECON), pp. 268–274 (2017) 14. Swapnil Ramesh Kumbar: Int. J. Innov. Res. Comput. Commun. Eng. An overview on use of artificial intelligence techniques in effective security management 2, 5893– 5898 (2014) 15. Lohia, P.K., Natesan Ramamurthy, K., Bhide, M., Saha, D., Varshney, K.R., Puri, R.: Bias mitigation post-processing for individual and group fairness. In: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2847–2851 (2019) 16. Microsoft. Azure counterfeit, 3 (2022) 17. Mittu, R., Lawless, W.F.: Human factors in cybersecurity and the role for AI. In: 2015 AAAI Spring Symposium, pp. 39–43 (2015) 18. Mohanty, J.P., Swain, A., Mahapatra, K.: Headway in quantum domain for machine learning towards improved artificial intelligence. In: 2019 IEEE International Symposium on Smart Electronic Systems (iSES) (Formerly iNiS), pp. 145–149 (2019) 19. Ngoc, H.L., Cong Hung, T., Huy, N.D., Thanh Hang, N.T.: Early phase warning solution about system security based on log analysis. In: 2019 6th NAFOSTED Conference on Information and Computer Science (NICS), pp. 398–403 (2019) 20. Obaidat, M., Singh, N., Vergara, G.: Artificial intelligence bias minimization via random sampling technique of adversary data. In: 2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC), pp. 1226–1230 (2021) 21. Rajan, H.M.: Artificial intelligence in cyber security-an investigation. Int. Res. J. Comput. Sci. 09, 28–30 (2017) 22. Su, T.-J., Wang, S.-M., Chen, Y.-F., Liu, C.-L.: Attack detection of distributed denial of service based on splunk. In: 2016 International Conference on Advanced Materials for Science and Engineering (ICAMSE), pp. 397–400 (2016) 23. Tian, Z., Luo, C., Qiu, J., Xiaojiang, D., Guizani, M.: A distributed deep learning system for web attack detection on edge devices. IEEE Trans. Industr. Inf. 16(3), 1963–1971 (2020) 24. Torrano-Gimenez, C., Nguyen, H.T., Alvarez, G., Franke, K.: Combining expert knowledge with automatic feature extraction for reliable web attack detection. Secur. Commun. Networks 8(16), 2750–2767 (2015) 25. Wang, P., Kelly, W.: A novel threat analysis and risk mitigation approach to prevent cyber intrusions. Colloquium Inf. Syst. Secu. Educ. (CISSE) 3, 157–174 (2015) 26. Wang, S., Kelly, W.: inVideo - a novel big data analytics tool for video data analytics. In: IEEE/NIST IT Pro Conference, pp. 1–19 (2014) 27. Wang, S.P.: Computer Architecture and Organization - Fundamentals and Architecture Security, 1st edn. Springer, Singapor (2021)

504

S. P. Wang and P. A. Mullin

28. Wang, S.P., Sakk, E.: Quantum algorithms: overviews, foundations, and speedups. In: 2021 IEEE 5th International Conference on Cryptography, Security and Privacy (CSP), pp. 17–21 (2021) 29. Zhang, D.Y., Kou, Z., Wang, D.: Fairfl: a fair federated learning approach to reducing demographic bias in privacy-sensitive classification models. In: 2020 IEEE International Conference on Big Data (Big Data), pp. 1051–1060 (2020)

Entropy of Shannon from Geometrical Modeling of Covid-19 Infections Data: The Cases of USA and India Huber Nieto-Chaupis(B) Universidad Aut´ onoma del Per´ u, Panamericana Sur Km 16.3 Villa el Salvador Lima, Lima, Peru [email protected]

Abstract. Between the end of second semester of 2020 and along the first semester of 2021, Covid-19 has had a strong impact on United States and India as seen at the official statistics exhibiting a big number of new infections as well as fatalities, particularly India that have had sharp peaks at March 2021. The present paper addresses the question if there is a entropic nature in these cases from an intuitive model based at simple geometries that adjust well the histograms of new infections versus time. Although the geometry-based models might not be satisfactory in all, it provides a view that would lead to answer intrinsic questions related to the highest peaks of pandemic if these have a nature cause or are strongly related to disorder as dictated by Shannon’s entropy for instance. Keywords: Covid-19

1

· Shanon’s entropy · Geometry modeling

Introduction

The unexpected arrival of Corona Virus Disease (Covid-19 in shorthand) [1–3] in various countries has overwhelmed the services of public health in few weeks as seen at the first cases of Spain and Italy [4]. Because of this most countries have had to execute fine decisions in order to minimize the peaks of infections [5]. Thus human-human interaction was avoided so that the infection chain would have to be broken and one would expect to minimize the rate of infection drastically as seen in some countries were curfew and quarantine were imposed [6,7]. This is because it was confirmed that human-human contact without mask is the first reason to produce the infection due to the exchange of aerosols [8]. Despite these gained experience from the first wave at 2020, most countries have exhibited high peaks of new infections due to the mutation of Corona virus into subsequent types that have generated the arrival of various waves of pandemic at the globe. In fact, the so-called Delta variant rumbled deeply the health services at the beginning of 2021 as seen at USA and India. The official statistics have displayed well defined peaked distributions that to some extent it can be seen advantageous because the exceptional morphologies of the curves of new infections can be c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023  K. Arai (Ed.): IntelliSys 2022, LNNS 542, pp. 505–513, 2023. https://doi.org/10.1007/978-3-031-16072-1_37

506

H. Nieto-Chaupis

understood as exact geometry to carry out an analysis as to the find predictive elements as well as comparisons among the experienced waves. As just done at [9], trapezoid and rectangle shapes have been identified at the first wave of pandemic. These mathematical methodologies have been improved to go through the data of USA and India, countries that have shown the sharpest peaks of pandemic as well as an unprecedented velocity of infections. In second section of paper, the trapezoid model is described whereas in third section a comparison between the proposed theory and data from USA and India, is presented. In fourth section, discussion as well as the conclusion of paper, is presented.

Fig. 1. Motivation of paper: It is possible to associate a trapezoid shape to data of USA and India from the union of the peaks of their more representative waves. This creates a well-defined trapezoid form whose geometry is exploited at this paper in conjunction to entropy of shannon.

2

Motivation of Geometry-Based Models

In Fig. 1, the official data belonging to USA (Up) and India (Down) are displayed. The reader can see that there is an exceptional coincidence with a trapezoid shape once the tops of peaks along two successive waves are joined. Of course it is done from a simple inspection on data. Of course any other geometry can be associated to these data. From this one can formulate the following questions: – It is possible to anticipate waves in a global pandemic through a geometric model?

Entropy of Shannon from Geometrical Modeling

507

– There is a concrete geometry that characterizes a virus? – What are the predictive capabilities of a geometry-based model to estimate future waves? Because of Fig. 1, the construction of a trapezoid is based from the tops of waves then one can relate them to a general time-dependent Gaussian function defined by G(t − tPq ), with this one can define below that the number of new infections N for Q waves and it can be written as: N(t) =

Q 

Nq G(t − tPq ),

(1)

q

with tPq the time where the peak was registered and Nq the number of infections corresponding to the q. It is based at experience such as the ongoing pandemics that clearly is exhibiting peaks [10]. However, these peaks or maximum values of new infections might not be easily observed but also are containing some noise data. Then the net number of new infections n(u) can be written as:  n(u) = r(t, u)N(t)dt (2) with r(t, u) the noise attained to signal. This integration can be seen as a product of the noise and the signal. In a most general view the noise can be also perceived as the factor that would decrease or increase the signal. Thus in its more simple meaning it is a product given by: n = ηr × N,

(3)

with η a free parameter that in praxis is attained to the properties of convex geometries. In this manner Eq. 3 can be composed by the relevant characteristics of pandemic in the form of being embedded into a concrete geometry. From Eq. 1 one gets below that: N(t) = Nq G(t − tPQ ) = ηr × N.

(4)

For example to give an idea to reader, the derivative with respect to η would suggest to be front the trivial area of rectangle: dN(t) = r × N. dη

(5)

Thus one can establish that the general area A follows the structure of Eq. 5: A = r × N.

(6)

508

H. Nieto-Chaupis

Fig. 2. The trapezoid model of pandemic with the tops of peaks of two successive waves H and x, respectively. The time between these two waves is given by u. The variable h denotes the error at the measurement of peaks at that wave.

3

Trapezoid Model of Pandemic

From Fig. 1, one can associate in a straightforward manner a trapezoid-like geometry as seen at Fig. 2. In fact one can estimate the area through the joint of the top at the number of infections of wave (A) namely H+h and the one belonging to peak of wave (B) given by x. The area is trivial and it reads: A=

1 u(H + h + x). 2

Factorizing the quantity “x” the one gets:   h+H 1 A = ux 1 + . 2 x

(7)

(8)

Commonly one might to expect that the peak of wave (B) would be greater than wave (A) as seen at the Covid-19 pandemics where second and third wave have exhibited a huge quantity of new infections than the previous ones. It is rather different to others statistics as seen at Europe [11,12] In this manner can assume that:  k  k K K   1 h+H 1 h+H , where all data points and the clusters are contained in V as a vertex. The bipartite graph has n+ c vertices, where n is the number of vertices and c is the number of clusters. Equations 2 and 3 show the mathematical representation of weights assigned to edges in the graph. If i, j are both clusters or data points, then the weight assigned is 0 as shown in the Eq. 2 and in case, i is the data point and j is the cluster, then the weight assigned is S where S is the probability of i belonging to the cluster j as shown in Eq. 3 [36]. w(a, b) = 0

(2)

w(a, b) = Si,j

(3)

The obtained partitions are then used for further cluster analysis to determine the vehicle usage style in each Season and to determine the relationship between vehicle usage style and the vehicle performance from the observed patterns in the clusters. Figure 3 shows the steps followed using ensemble clustering to obtain the different vehicle usage styles. 5.4

Pattern Extraction

For each fragment of data, three separate clusters are generated by using ensemble clustering technique. Generally, we have applied the clustering approaches on all collected data, and four important variables with a strong correlation were considered to design our model. According to the literature, a vehicle’s fuel consumption and failure play a crucial role in defining its usage style [14]. Additionally, speed and acceleration play a role in influencing a vehicle’s performance as well [39]. Because speed is a major determinant of a vehicle’s engine’s performance. While observing how efficiently the engine is being used, higher and lower speed values are considered. This research can be used to show how speed, in general, can be used to predict a vehicle’s failure [21]. Acceleration, on the other

Vehicle Usage Extraction Using Unsupervised Ensemble Approach

595

Fig. 3. Ensemble clustering

Fig. 4. Pictorial representation of utility function.

hand, is an essential factor in this work because speed is influenced by acceleration values. Furthermore, increasing acceleration in the transportation sector is also a major cause of increases in emissions since the increased emissions define the abrupt use of fuel [22]. Hence, speed and acceleration have a positive correla-

596

R. Khoshkangini et al.

tion with the fuel consumption and failures [23]. After the Clustering step, these four performance parameters are selected to evaluate and analyze the clusters. Utility Function. The utility function is designed and employed to categorize clusters of data into specific use styles over different seasons (the conceptual view of the utility function is shown in Fig. 4). The utility function can aid in explaining this dynamic behavior of vehicle usage with its confidence value because the usage style is dynamic and can vary over time. As different features may have a different impact on characterizing a vehicle’s usage style, the utility function can determine this impact and, based on that, can classify the usage style of a vehicle into different categories [24]. Equation 4 shows the proposed utility function, where s represents the extracted usage, and c is the numbers of instances in that cluster. wfk points the assigned parametric weight for different features. More specifically, wfk represents the weight assigned to the performance parameters namely fuel consumption, failure, speed and acceleration. In order to get the final style/label S, we followed five steps; 1) assignment of parametric weights, 2) allotment of threshold ranges for every parameter, 3) getting instances count from each range, 4) obtaining s1, s2 and s3, and 5) using the Eq. 4 and label them as Good (s1), Moderate (s2) and Bad (s3) styles. si =

n  m 

cj × wfk

(4)

j=1 k=1

⎧ ⎪ ifs1 > s2, s3 ⎨‘Good’, Sc = ‘Moderate’, ifs2 > s1, s3 ⎪ ⎩ ‘Bad’, ifs3 > s1, s2

(5)

Finally the style of the cluster Sc will be assigned by Eq. 5 considering the s(.) which has the highest value.

6

Results

This section demonstrates the evaluations and results of the proposed ensemble approach designed to address the usage modeling extraction. In this practice, we concentrate on the issue of finding a similarity pattern within vehicle usage over time by exploiting ensemble clustering and utility function. 6.1

Ensemble Clustering Evaluation

To evaluate the output of the implemented clustering algorithms on each segment, Davis-Bouldin Index (DBI) [25] metric is used. The DBI values shown in Table 1 for both algorithms are low suggesting that the clusters are relatively well partitioned (the lower DBI value ensures better clustering results [29]). Thus, these partitions are taken as input for the consensus function to get the best numbers of clusters.

Vehicle Usage Extraction Using Unsupervised Ensemble Approach

597

Table 1. The obtained DBI values from K-Means and agglomerative clustering algorithm. Time period DBI value K-Means Agglomerative

6.2

Season 1

0.503129 0.492822

Season 2

0.503618 0.492450

Season 3

0.503778 0.513051

Season 4

0.505192 0.49590

Cluster Analysis

Once the clusters are obtained (note: we use all the parameters–sensors data– to extract the patterns, but in our analysis we focus only on four parameters which were our concerns to characterize the behavior of vehicles over time). A non-parametric test Wilcoxon Signed Rank test [30] was conducted to check how the distribution of the population in each cluster is and how significantly different they are from other clusters before performing the cluster analysis. To do so, we have selected our concerned parameters such as Fuel Consumption (FC) while moving, Speed (S), Acceleration (AC), Failure (F), Fuel over mileage (FO), and Emission (E). Table 2, Table 3, Table 4, and Table 5 indicate how the vehicle usage in different clusters are differ in each season. In most cases the test can reject the null-hypothesis by p-value less than the critical value alpha=0.05 and state that there is a significant difference between the usages over time. Table 2. The p-Value obtained for different clusters in season 1. Here we took the test between the same features (only two each time) in two clusters. Clusters FC

S

AC

F

FO

E

0&1