Innovations in Smart Cities Applications Volume 4: The Proceedings of the 5th International Conference on Smart City Applications 9783030668402, 3030668401


694 55 216MB

English Pages [1529] Year 2021

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface
Organization
Committee
Conference Chair
Conference General Chairs
Conference Technical Program Committee Chair
Workshops Chair
Publications Chair
Tutorials Chair
Publicity and Social Media Chair
Local Organizing Committee
Technical Program Committee
Keynote Speakers
Smart Cities and Geo-Spatial Technologies
Smart Cities and Energy
Deep Learning Applications for Shoreline Extraction from Landsat and Sentinel Satellite Imagery
Review Use of Modern Technologies in creating Smart Cities
Contents
Smart Citizenship and Sentiment Analysis
Temporal Sentiment Analysis of Socially Important Locations of Social Media Users
1 Introduction
2 Literature Review
3 Method
3.1 Discovery of Socially Important Locations
3.2 Polarity Detection and Training Dataset Generation
3.3 Temporal Sentiment Analysis Method
4 Experimental Evaluation
4.1 Dataset
4.2 Data Pre-processing
4.3 Results of SS-ILM Algorithm
4.4 Performance of Machine Learning Algorithms on Training Dataset
4.5 Temporal Sentiment Analysis of Socially Important Locations
5 Conclusion
References
A New Sentiment Analysis System of Climate Change for Smart City Governance Based on Deep Learning
1 Introduction
2 Related Work
3 Data and Methods
4 Result and Discussion
5 Conclusion
References
A Novel Approach of Community Detection Using Association Rules Learning: Application to User's Friendships of Online Social Networks
1 Introduction
2 Related Work
3 Preliminary Definitions
3.1 Online Social Networks
3.2 Association Rules
3.3 Evaluation Metrics
4 Methodology
4.1 General Scheme
4.2 Comparison with Synthetic Real-World Datasets
5 Discussion and Conclusion
References
Arabic Sentiment Analysis Based on 1-D Convolutional Neural Network
1 Introduction
2 The Proposed Model
2.1 Word Embedding
2.2 Convolutional Neural Networks
2.3 The Output Layer
3 Experimental Setup and Results
3.1 Arabic Preprocessing
3.2 Building the Word Embedding Models
3.3 Arabic Word Embeddings Evaluation
3.4 Hyperparameters
3.5 Arabic SA Evaluation
4 Conclusion
References
CMA-EV: A Context Management Architecture Extended by Event and Variability Management
1 Introduction
2 State of the Art
3 CMA-EV Architecture Overview
3.1 Context Metamodel
3.2 Event Metamodel
3.3 Feature Modeling
3.4 Scenario-Based Analysis
4 Validation: Smart Tourism Recommender System (STRS)
5 Conclusion
References
Electronic Public Services in the AI Era
1 Introduction
2 Theoretical Background
2.1 E-Government
2.2 Electronic Government Development Index
2.3 Types of E-Government
3 Related Work
4 International E-Government Initiatives
4.1 E-Government Initiatives Across Singapore
4.2 E-Government Initiatives Across EU-15, Norway and Iceland
5 Digital Revolution
5.1 The Evolution of the Web
5.2 The Evolution of the Industry
6 Benchmarking of the Countries Using AI on Their Electronic Public Services
7 New Model of Maturity of Moroccan Electronic Public Service in Artificial Intelligence Era
7.1 Moroccan Electronic Services
7.2 New Model of Maturity of Moroccan Electronic Public Service
8 Conclusion and Future Research
References
Planning and Designing Smart Sustainable Cities: The Centrality of Citizens Engagement
1 Introduction
2 Characteristics of a Smart Sustainable City
2.1 Relationship Between a Smart Sustainable City Dimensions, Initiatives and Projects
2.2 Pathways of Influence for a Smart Sustainable City
3 Stakeholders Engagement in Smart Sustainable Cities: The Focus on Citizens
3.1 Types of Stakeholders of a Smart Sustainable City
3.2 Citizen Engagement in a Smart Sustainable City: Literature Review
4 Frameworks for Citizen Engagement
4.1 The Spectrum of Public Participation (The Spectrum)
4.2 Information and Communication Technologies and Citizen Engagement
5 Conclusion
References
Review of Learning-Based Techniques of Sentiment Analysis for Security Purposes
1 Introduction
2 Background
2.1 Security Intelligence
2.2 Machine Learning
2.3 Deep Learning
2.4 Natural Language Processing (NLP)
2.5 Sentiment Analysis (SA)
2.6 Multimodal Sentiment Analysis
2.7 Rule Based
3 Research Method
3.1 Search and Selection
3.2 Synthesizing the Literature
4 Findings
5 Conclusion
References
Sentiment Analysis. A Comparative of Machine Learning and Fuzzy Logic in the Study Case of Brexit Sentiment on Social Media
1 Introduction
2 Related Work
3 Main Concepts
3.1 Sentiment Analysis
3.2 Ontology
3.3 Fuzzy Logic
3.4 Support Vector Machine
3.5 Naïve Bayes
3.6 Decision Tree
3.7 Long Short-Term Memory Network
4 Our Proposed Work
4.1 Ontology
4.2 Text Preprocessing
4.3 Machine Learning Approaches Comparison
5 Conclusion
References
Sentiment Analysis and Opinion Mining Using Deep Learning for the Reviews on Google Play
1 Introduction
2 Related Works and Background
2.1 Related Works
2.2 Feature Engineering for NLP
2.3 Machine Learning Classifiers
2.4 Sequential Deep Learning Model
3 Corpus Collection
4 Corpus Analysis
5 Model Building and Results
5.1 Results for Machine Learning Classifiers
5.2 Results for Deep Learning Model
6 Conclusions
References
Smart Guest Virtual Assistant with Automatic Guest Registration
1 Introduction
2 Virtual Assistants and Deep Learning
2.1 IBM Watson Assistant
3 Smart Virtual Guest Assistant
3.1 Entities and Keywords Extraction
3.2 Automatic Guest Registration and Face Recognition
3.3 Guest Recognition and Assistance
3.4 Use Case: Invited Speakers
4 Related Work
5 Conclusion
References
Topic Modeling and Sentiment Analysis with LDA and NMF on Moroccan Tweets
1 Introduction
2 Literature Survey
2.1 Sentiment Analysis
2.2 Topic Modeling
3 Tools and Methods
3.1 Proposed Architecture
3.2 Sentiment Analysis and Statistics of Twitter Data
3.3 Topic Modeling
4 Experiments and Empirical Results
4.1 Data Collection and Storage
4.2 Preprocessing Data
4.3 Experimental Results
5 Conclusion and Future Work
References
Smart Education and Intelligent Learning Systems
A Deep Learning Model for an Intelligent Chat Bot System: An Application to E-Learning Domain
1 Introduction
2 Related Works
3 AI Algorithms Background
3.1 Convolutional Neural Network (CNN)
3.2 Long Short Term Memory (LSTM)
3.3 Transformers
4 System and Proposed Architecture
4.1 System Architecture
4.2 Proposed Architecture
5 Implementation
5.1 Dataset
5.2 Hardware Component
5.3 Software Component
5.4 Pre-processing
5.5 Evaluation Metrics
6 Results and Comparative Study
7 Conclusion
References
E-learning at the Service of Professionalization in Higher Education in Morocco: The Case of MOOCs of the Maroc Université Numérique Platform
1 Introduction
2 Theoretical Anchoring
2.1 The Moroccan University and Digital Technology
2.2 E-learning
2.3 MOOCS
3 The Methodological Approach of Our Research
4 The Results of Our Survey
4.1 The Testimonies of Some Teacher Designers of MOOCs
4.2 Analysis of the MOOC: “Methodological Support for University Work” (Fig. 3) Designed by Doctor Khalid JAAFAR
5 Conclusion
References
Methodology to Develop Serious Games for Primary Schools
1 Introduction
2 Serious Game Development Processes
3 GLUPS
3.1 GLUPS Engineering Disciplines
3.2 GLUPS Artifacts
3.3 GLUPS Templates
4 Serious Games Built
4.1 Examples with GLUPS Templates
4.2 Game Screenshots
5 Results and Lessons Learned
5.1 Competencies and Learning Tasks
5.2 Storyline and Levels
6 Conclusion and Future Work
References
Methods and Software Tools for Automated Synthesis of Adaptive Learning Trajectory in Intelligent Online Learning Management Systems
1 Introduction
2 Background Analysis
3 Method of ALT Synthesis
4 Results
5 Discussion
6 Future Research
7 Conclusion
References
National University of Uzbekistan on the Way to the Smart University
1 Introduction
2 Review of ICT Infrastructure Creation at NUUz
3 Results and Outcomes: E-Learning Environment
4 Why LMS MOODLE as VLE
5 Conclusion
References
Smart Pedagogical Knowledge Management Model for Higher Education
1 Introduction
2 Smart Education and Knowledge Management Basics
2.1 Smart Learning Systems
2.2 Pedagogical Knowledge Management
3 How to Manage Learning Knowledge?
4 Proposed Solution
5 Conclusion
References
The Recommendation of a Practical Guide for Doctoral Students Using Recommendation System Algorithms in the Education Field
1 Introduction
2 State of the Art
2.1 Recommender System Category
2.2 Machine Learning Category
2.3 Our Approach Goal
3 Methodology and Overall Approach
3.1 Proposed Work
3.2 Methodology
4 Experimental and Results
4.1 Experimental Results
4.2 Tools Used
5 Problem and Challenges
6 Conclusion and Future Work
References
Virtual Reality–Enhanced Soft and Hard Skills Development Environment for Higher Education
1 Introduction
2 Extended Reality Technology
3 Industrial Training
3.1 General Industrial Training Objectives
3.2 Learning Outcomes
3.3 General Roles and Responsibilities
3.4 Insights Regarding Industrial Training and Short Industrial Placements
4 Skills Categories
4.1 Human (Soft Skills)
4.2 Technical (Hard Skills)
4.3 Conceptual Skills
5 Skills Development Environment
6 Conclusion
References
A Comparative Study Between K-Nearest Neighbors and K-Means Clustering Techniques of Collaborative Filtering in e-Learning Environment
1 Introduction
2 Literature Survey
2.1 Recommendation System
2.2 Data Mining
2.3 Classification Algorithm
2.4 Clustering Algorithm
3 Proposed Approach
4 Compare K-NN and K-means Algorithms
5 Conclusion
References
Competence and Lean Management, a Literature Review
1 Introduction
2 Lean Management
2.1 Definition
2.2 Lean Organizational Structure
2.3 Lean Organisational Culture
3 Competence Management
3.1 Defining Competence
3.2 Competence Category
3.3 Competence Levels
3.4 General Process of Competence Management
3.5 Competence Management Related to Knowledge
4 Synthesis
References
Smart Mobility and Intelligent Infrastructures
A New Distributed Strategy to Find a Bi-objective Optimal Path Toward a Parking in the City
1 Introduction
2 Previous Works
3 Overview of the System Architecture
3.1 Parking Infrastructure Description
3.2 Overview of the Parking Routing System
4 Ant Colony Optimisation Algorithm
5 Bi-objective Parking Problem Using ACO Algorithm
6 Description of Architecture Agents
7 Description of Implementation
8 Conclusion
References
A Novel Mobile CrowdSensing Architecture for Road Safety
1 Introduction
2 Background and Related Work
3 Design Goals
3.1 Participant Incentivization
3.2 Data Quality Optimization
3.3 Privacy and Security Protection
4 Our SI-CAR's Proposed Architecture
5 Conclusion
References
An Agent-Based Architecture for Multi-modal Transportation Using Prometheus Methodology Design
1 Introduction
2 Background
3 Multi-agent Information System: Architecture Detailed Design
3.1 Organization of the Multi-agent Information System
3.2 Architecture Detailed Design
4 Semantic Layer: Methodology for Ontology Development
4.1 Approach
4.2 The Ontology Design Methodology
5 Design Based Prometheus Methodology
5.1 Prometheus Methodology and Tools
5.2 System Design and Development
6 Conclusion
References
An Overview of Real-Time Traffic Sign Detection and Classification
1 Introduction
2 Traffic Sign Databases
2.1 The German Traffic Signs Benchmark
2.2 The Belgium Traffic Sign
2.3 The Swedish Traffic Signs Dataset (STSD)
3 Traffic Sign Detection
3.1 Color-Based Methods
3.2 Shape-Based Methods
3.3 Learning-Based Methods
3.4 Hybrid Methods
4 Traffic Sign Classification
4.1 Methods Based on Hand-Crafted Features
4.2 Artificial Neural Networks Based Methods
4.3 Deep Learning Based Methods
5 Traffic Sign Recognition Based on Deep Neural Networks
5.1 Faster Region-Based Convolutional Neural Networks (Faster R-CNN)
5.2 Region-Based Fully Convolutional Networks (R-FCN)
5.3 Shot MultiBox Detector (SSD)
5.4 You Only Look Once (YOLO)
6 Conclusion
References
Classification of the Driver’s Emotions Using a Convolutional Neural Network
1 Introduction
2 Facial Expression Recognition
3 Related Work
4 CNN Background
5 CNN Based System for Emotion Detection
5.1 Dataset
5.2 CNN Hyperparameters
6 Results and Discussion
6.1 Performance Metrics
6.2 Discussion
7 Real Time Detection
8 Conclusion
References
Deep Learning Based Driver’s Fatigue Detection Framework
1 Introduction
2 Related Works
3 Overview of the Proposed Fatigue Detection Approach
3.1 Framework Architecture and Fatigue Detection Process
3.2 Input Image
3.3 Face Detection and Landmark Localization
3.4 Eye Region Extraction
3.5 Eye Status Recognition
3.6 Fatigue Judgment
4 Experimentation and Results
5 Discussion
6 Conclusion
References
DSRC vs LTE V2X for Autonomous Vehicle Connectivity
1 Introduction
2 Related Work
3 Comparison between LTE V2X and DSRC4
3.1 DSRC (Dedicated short-range communications)
3.2 LTE V2X (Long Term Evolution Vehicle to Everything)
4 DSRC vs LTE V2X Comparative Study
5 Simulation and Performance Evaluation of DSRC and LTE V2X Protocol
6 Discussion
7 Conclusion and Future Works
References
Dynamic on Demand Responsive Transport with Time-Dependent Customer Load
1 Introduction
2 Related Work
3 The Problem Description and Formulation
3.1 Problem Description
3.2 Problem Formulation
4 The Dynamic Tabu Search Method
4.1 The Insertion Heuristic for the Dynamic Requests
4.2 The Neighbourhood Strategy
5 The Experimental Study
6 Discussion and Conclusion
References
Encryption Issues in Traffic Control Systems in Smart Cities and Traffic Signal Control Optimization
1 Introduction
2 Background and Related Work
3 Comparison Between DES, AES and RSA
4 Smart Cities
4.1 Smart Traffic Control Systems
5 Encryption Issues in Traffic Control Systems
5.1 Traffic Signal Control Optimization
6 Conclusion
References
Evolutionary Heuristic for Avoiding Traffic Jams in Road Network Using A* Search Algorithm
1 Introduction
2 Background
2.1 A* Search Algorithm
2.2 Parallel Programming
3 Related Work
4 Parallel Implementation of A* Algorithm
5 Results
6 Conclusion
References
Geometric Feature Extraction of Road from UAV Based Point Cloud Data
1 Introduction
2 Related Works
3 Material and Method
3.1 Data Acquisition and Processing
3.2 Ground and Non-ground Classification
3.3 Road Surface Extraction
3.4 Road Centerline Extraction
3.5 Profiling from Point Clouds
3.6 Cross-Sectioning from Point Clouds
4 Results and Discussion
5 Conclusion
References
Parking Availability Prediction in Smart City
1 Introduction
1.1 Main Contributions
2 Smart Cities and Parking Prediction Challenges
2.1 Intelligent Transport System (ITS)
2.2 Internet of Things (IoT)
3 System Model of Smart Parking
4 Methodology
4.1 About Ensemble-Based Models
4.2 Ensemble-Based Models for Regression
4.3 Random Forest Regressor
5 Experimentation Results and Discussion
5.1 Dataset
5.2 Performance Measures
5.3 Resulsts Analysis and Discution
6 Conclusion
References
Smart Infrastructure and Integrated Platform of Smart and Sustainable Cities
1 Introduction
2 Literature Review
3 Proposed Smart Infrastructure and Integrated Platform
3.1 Developing of the Smart Infrastructure: Proposed Solution
3.2 Developing of the Integrated Platform: Proposed Solution
4 Recommendations
5 Conclusion
References
Study to Reduce the Costs of International Trade Operations Through Container Traffic in a Smart Port
1 Introduction
2 Container Port Performance
2.1 Seaside Operations
2.2 Terminal Operations
2.3 Landside Operations
3 Research Methodology
3.1 Sampling Plan
3.2 Data Analysis Procedure
4 Data Analysis and Interpretations-Primary Data
4.1 Normality Test
4.2 Reliability
References
The Global Performance of a Service Supply Chain: A Simulation-Optimization Under Arena
1 Introduction
2 Conceptual Framework
2.1 From Supply Chain to Supply Chain Management
2.2 From Supply Chain Management to Service Supply Chain Management
3 SSCM Cost Estimation Model
3.1 General Principles of Supply Chain Management Service Modelling
3.2 Proposal for Modelling the Cost of Supply Chain Management Services
4 Service Supply Chain Modeling and Optimization with Arena
4.1 Simulation and Optimization of the Drug Supply Chain with Arena
4.2 Results and Discussions
5 Conclusion
References
Traffic Signs Detection and Recognition System in Snowy Environment Using Deep Learning
1 Introduction
2 Related Works
3 Machine Learning-Based Methods
3.1 R-CNN Algorithm
3.2 Fast R-CNN
4 Sampling and Image Preprocessing
4.1 Experimental Environment
4.2 Data Collection
4.3 Sampling
4.4 Labeling
4.5 Generate Training Data
4.6 Create Label Map and Configure Training
4.7 Run and Time the Data Training
5 Conclusion
References
Smart Healthcare
A Machine Learning Approach for Initial Screening of Polycystic Ovarian Syndrome (PCOS)
1 Introduction
1.1 Background
1.2 Machine Learning (ML) Approach
2 Methodology
2.1 Data Collection and Preparation
2.2 Feature Selection
2.3 Training and Testing
2.4 Oversampling
2.5 Performance Evaluation
3 Results and Discussion
3.1 Cross-Validation Results
3.2 Model Performance
3.3 Effect of Data Imbalance
3.4 Comparison to Other Studies
4 Conclusion
References
Patient Transport and Mobile Health Workforce: Framework and Research Perspectives
1 Introduction
2 Hospital Logistics
2.1 Definition
2.2 Problem Description
3 Relevant Literature on Operational Research on Patient Transport and Mobile Healthcare Workforce
3.1 Extra-Hospital Services
3.2 Intra-Hospital Service
3.3 Relevant Models on DARPs for Patient Flows Modeling
4 Conclusion
References
Semantic Web and Healthcare System in IoT Enabled Smart Cities
1 Introduction
2 Semantic Web and Healthcare
2.1 Ontology in Healthcare
2.2 Semantically Enhanced Patient and Clinical Information
2.3 Using Semantic Web in Healthcare Systems
2.4 Interoperability in Healthcare
3 Intelligence and IoT in SW and Healthcare
4 Security in Semantically Enhanced Healthcare Systems
4.1 A Taxonomy of Privacy
4.2 Availability, Authentication, Authorization, and Trustworthiness
5 Conclusions and Future Works
References
Skin Cancer Prediction and Diagnosis Using Convolutional Neural Network (CNN) Deep Learning Algorithm
1 Introduction
2 Related Work
3 Experiment
3.1 Dataset
3.2 Proposed Method. Data Preparation
3.3 Experiment Results
4 Model Deployment
4.1 Mobile Application
4.2 Web Application
5 Future Work
6 Conclusion
References
Smart Earth Environment and Agriculture
Climate-Smart Landscapes for Sustainable Cities
1 Climate-Smart Landscapes and Climate-Smart Landscape Approach
2 Climate-Smart Landscapes with Regard to Sustainable and Smart Cities
3 Climate-Smart Landscape Design Approaches for Sustainable Cities
3.1 Coastal Resilience
3.2 Rain Garden
3.3 Biophilic Design
3.4 Xeriscape
3.5 Climate-Smart Agriculture
4 Discussion
References
Diversity and Seasonal Occurrence of Sand Flies and the Impact of Climatic Change in Aichoune Locality, Central Morocco
1 Introduction
2 Materials and Methods
2.1 The Study Zone
2.2 Period of Entomological Surveys
2.3 Criterion for Choosing the Biotope
2.4 Sandfly Capture Techniques
2.5 Identifications of Sand Flies
2.6 Entomological Data Analyses
2.7 Study of Meteorological Parameters
2.8 Statistical Analysis
3 Results and Discussion
3.1 Characteristics of the Study Population
3.2 Characteristics of Climatic Factors
3.3 Statistical Analysis of the Impact of Abiotic Factors
4 Conclusion
References
Environmental Challenges of Solid Waste Management in Moroccan Cities
1 Introduction
2 Study Area
2.1 Rainfall
2.2 Temperature
2.3 Wind
2.4 Landfill of Oum Azza
3 Material and Method
4 Results and Discussion
5 Conclusion
References
Evaluation of the Purification Performance of the WWTP by Aered Lagunage of the City of Oujda (Morocco)
1 Introduction
2 Materials and Methods
2.1 General on the Study Site
2.2 Description of Oujda WWTP
2.3 Sampling Method
2.4 Parameters Analysed
3 Results and Discussion
3.1 Operating Parameters
3.2 Purification Performance Evaluation Parameters
4 Discussion and Interpretation
4.1 Comparison of Results to Design Goals
4.2 Comparison of Results to Domestic Discharge Standards
4.3 The COD/BOD5 Report
4.4 Raw and Purified Water Loads and Purification Yields
5 Conclusion
References
IoTree: A Way Towards Smart Plantation
1 Introduction
2 Literature Survey
2.1 Climate Change via Plantation
2.2 Varied Nature of Plants
2.3 Prediction/Suggestion Systems
2.4 Monitoring Systems
3 IoTree–The Proposed System
4 Implementation
4.1 Use Case-IoTree for Indoor Plantation
4.2 Use Case - IoTree for Outdoor Plantation
5 Benefits of the System
5.1 Elimination of Risks Associated with Plant Choice
5.2 Ceasing the Way of Consequences of Human Error in Maintenance
5.3 Green Computing
5.4 Air Quality Treatment and Betterment
5.5 Filling up the Communication Gap Between User and Gardener
5.6 Plant Donation Campaigns
6 Conclusion
Appendix
References
Physico-Chemical and Mineralogical Characterization of Urban Sludge from the Tamuda Bay Tetouan Treatment Plant
1 Introduction
2 Presentation of the Study Area
3 Materials and Methods
3.1 Sludge Sampling
3.2 Analysis Protocols
3.3 Determination of Organic Matter (OM)
3.4 Determination of the C/N Ratio
4 Results and Discussion
4.1 Humidity
4.2 Residual Humidity
4.3 Determination of Dry Matter
4.4 Determination of pH
4.5 Electrical Conductivity
4.6 Organic Matter
4.7 Total Nitrogen
4.8 Determination of C/N Ratio
4.9 Evaluation of Heavy Metals Using the ICP Technique
4.10 Characterization by X-Ray Diffraction
4.11 X-ray Fluorescence
5 Conclusion
References
Study of Climate Change During the Period (2009–2018) in the Region of Sidi Slimane, - Morocco
1 Introduction
2 Materials and Methods
2.1 Methodology
2.2 Materials
3 Rainfall Evolution
3.1 Annual Rainfall
3.2 Monthly Rainfall
4 Temperature Evolution
5 Annual Temperature
5.1 Monthly Temperatures
6 Gaussen’s Ombrothermal Diagram
7 Conclusion
References
Typological Study of the Water of the Boufekrane River (Meknes, Morocco): Principal Component Analysis and Discriminant Analysis
1 Introduction
2 Study Area
3 Materials and Methods
4 Results and Discussion
5 Conclusion
References
Smart Economy and Smart Factory
A Framework of Integrating Condition Based Maintenance Programs and Wireless Sensor Network in The Context of Industry 4.0
1 Introduction
2 Condition Based Maintenance within An Intelligent Context
2.1 Condition Based Maintenance and its Applications
2.2 Condition Maintenance Management: Key References
2.3 Platforms Support Condition based Maintenance and Remote Monitoring Systems
3 Conceptual Phases of Implementation of CBM with WSN and Design of a Remote Monitoring Platform: Organizational and Technical Steps
3.1 Condition Maintenance Management: Key References
3.2 Structured Technical Steps for Collecting and Analyzing Data Using WSN for Condition based Maintenance Application
4 Design of a Remote Monitoring Platform for CBM using WSN in the Context of Maintenance 4.0
4.1 Graphic Models of Operation of the Integration Platform
4.2 The functions of the remote monitoring platform
5 Conclusion
References
A New Artificial Intelligence-Based Strategy for Digital Marketing Reinforcement
1 Introduction
2 Related Work
3 Artificial Intelligence Overview
3.1 History of AI
3.2 Application of AI
4 Artificial Intelligence in Digital Marketing
4.1 Digital Marketing
4.2 Artificial Intelligence in Digital Marketing
5 Proposed Work
6 Open Challenges
7 Conclusion
References
An Efficient Collaborative Filtering and Graph Approach for Business-Matching Systems
1 Introduction
2 Related Work and Background
3 Business Matching
4 Collaborative Filtering Based
5 Graph Embedding Based Approach
6 Conclusion
References
Assessment of Blockchain Technology Adoption Factors and Scenarios Within the Economy of Latvia
1 Introduction
2 Methodology for Blockchain Technology Adoption Assessment
3 Discussion of Research Results
4 Conclusions
References
Data Mining and Machine Learning Techniques Applied to Digital Marketing Domain Needs
1 Introduction
2 Digital Marketing
2.1 Traditional Marketing
2.2 Digital Marketing
2.3 Research Framework in Digital Marketing
2.4 Online Marketing Trends/Domains
2.5 Consumer Decision Journey
3 Data Mining and Machine Learning Techniques
3.1 Artificial Intelligence and Machine Learning
3.2 Data Mining
3.3 Most Used Techniques
4 Discussion
5 Conclusion
References
Leveraging Dynamicity and Process Mining in Ad-Hoc Business Process
1 Introduction
2 Related Works
3 Towards Ad-Hoc Business Process Dynamicity
3.1 Off-Line
3.2 On-Line
3.3 Synthesis
4 Case Study: Business Process of an Insurance Claims’ Handler System
4.1 Off-Line
4.2 On-Line
4.3 Synthesis
5 Conclusion
References
Modeling the Use of RFID Technology in Smart Processes
1 Introduction
2 Related Work
3 Process “as-is” and “to-be” Phases
3.1 ‘As-is’ Phase
3.2 ‘To be’ Phase
4 Modelling RFID Technology in Smart Processes
4.1 uBPMN
4.2 Business Process Model – Phase “as is”
4.3 Business Process Model – Phase “to be”
5 Conclusion
References
Real Time Release Approach: At-Line Prediction of Ascorbic Acid Concentration in Nutraceutical Syrup via Artificial Neural Network
1 Introduction
1.1 Background
1.2 Ascorbic Acid (AA)
1.3 Power of Hydrogen (pH)
1.4 Specific Gravity (SG)
1.5 Viscosity
2 Artificial Neural Network (ANN)
2.1 Multi-layer Perceptron
3 Materials and Methods
3.1 Data Acquisition and Preprocessing
3.2 Neural Network Model
4 Results and Discussion
5 Conclusion
References
Smart Recognition Systems and Multimedia Processing
Convolutional Neural Network for Identifying Human Emotions with Different Head Poses
1 Introduction
2 Related Work
3 Experiment
3.1 Data Preprocessing
3.2 The Proposed CNN Architecture
4 Results and Discussions
4.1 Emotion Recognition with Frontal Pose
4.2 Emotion Classification with Three Head Poses
5 Conclusion and Future Works
References
Deep Learning-Based 3D Face Recognition Using Derived Features from Point Cloud
1 Introduction
2 Related Works
3 Material and Method
3.1 Dataset
3.2 ResNet
4 Experiment
5 Results and Discussion
6 Conclusion
References
Face Sketch Recognition: Gender Classification Using Eyebrow Features and Bayes Classifier
1 Introduction
2 Related Work
2.1 Contribution of This Work
3 Backround Information
3.1 Naive Bayesian Classification Method
3.2 Eyebrow Golden Ratio and Two Other Features
4 Approach
5 Experiment and Results
6 Conclusion
References
Fall Detection for Pedestrians in Video-Surveillance
1 Introduction
2 Background
2.1 Principle of the Active Contour Technique
2.2 Object Tracking
2.3 Related Works for Fall Detection Methods
3 Detection Of Pedestrian Fall
4 Experimental Results
4.1 Comparison with Previous Work
4.2 Detecting True Fall for Pedestrian by the Proposed Approach
5 Conclusion
References
Hand Pose Estimation Based on Deep Learning
1 Introduction
2 Related Work
3 Formulation of the Proposed Approach
3.1 Dataset
3.2 Experimentation
4 Results Analysis and Comparison
5 Conclusion
References
Static and Dynamic Hand Gesture Recognition System Using Contourlet Transform
1 Introduction
2 Features Extraction
2.1 DWT Transform
2.2 Contourlet Transform
3 Simulation Results
3.1 Dataset Description
3.2 Experimental Design
4 Conclusion
References
Video Activity Recognition Based on Objects Detection Using Recurrent Neural Networks
1 Introduction
2 Related Work
3 Details of the Approach
4 Experimental Evaluation
4.1 The Dataset
4.2 Experimental Method
4.3 Results
5 Conclusion and Future Work
References
A Comparative Study Between the Most Usable Object Detection Methods Based on Deep Convolutional Neural Networks
1 Introduction
2 Object Detection: Algorithms and Datasets
2.1 Neural Networks Algorithms
2.2 Datasets
3 Experiences and Results
4 Conclusion
References
Performance Analyses of AES and 3DES Algorithms for Encryption of Satellite Images
1 Introduction
2 Related Work
3 AES (Advanced Encryption Standard) Algorithm
4 3DES (Triple Data Encryption Standard) Algorithm
5 Performance Measurement Metrics
5.1 Histogram Analysis
5.2 Correlation Analysis
5.3 NPCR (Number of Pixel Change Rate) and UACI (Unified Average Change Intensity)
5.4 PSNR Analysis
5.5 Computational Time Analysis
6 Experimental Results
6.1 Histogram Analysis
6.2 Correlation Analysis
6.3 NPCR and UACI Analysis
6.4 PSNR Analysis
6.5 Computational Time Analysis
7 Discussion
8 Conclusion
References
A Survey on Deep Learning-Based Approaches to Estimation of 3D Human Pose and Shape from Images for the Smart Environments
1 Introduction
2 Optimization-Based Approaches
3 Deep Learning-Based Approaches
3.1 Parametric Approaches
3.2 Non-parametric Approach
4 Discussion and Conclusion
References
Smart Devices and Softwares
A 3-DOF Cable-Driven Robotic Ankle Rehabilitation Device
1 Introduction
1.1 Ankle Joint Pathophysiology and Rehabilitation
1.2 Robotic Ankle Rehabilitation
2 Methodology
2.1 Prototype Development
2.2 Testing and Experimentation
3 Results and Conclusions
3.1 Data Analysis
3.2 Conclusions and Recommendations
References
A Cognitive Radio Spectrum Sensing Implementation Based on Deep Learning and Real Signals
1 Introduction
2 System Model
2.1 Database Generation
2.2 Database Acquisition
3 Implementation and Results
3.1 Description of the Used Data
3.2 Performance Evaluation
3.3 Results
3.4 Discussion
4 Conclusions
References
Appliance-Level Monitoring with Micro-Moment Smart Plugs
1 Introduction
2 Related Work
3 The (EM)3 Energy Efficiency Framework
4 Proposed System Design
4.1 Power Consumption Unit
4.2 Environmental Monitoring Unit
5 Consumer Data Visualization
6 Results
6.1 Power Consumption Unit
6.2 Environmental Monitoring Unit
7 Conclusions
References
Backhaul Networks and TV White Spaces (TVWS) with Implementation Challenges in 5G: A Review
1 Introduction
2 5G Backhaul Requirements and Challenges
3 Mobile Backhaul Types and Key Challenges
3.1 Wired Backhaul Solution
3.2 Wireless Backhaul Solutions
4 Conclusion
References
Comparative Study via Three MPPT Techniques Methods for PV Systems
1 Introduction
2 Modeling of the Photovoltaic System
2.1 Modeling of the PV Module
2.2 DC-DC Modified CUK Converter
3 Maximum Power Point Tracking (MPPT)
3.1 Perturb and Observe
3.2 Incremental Conductance (IC)
4 Fuzzy Logic Mppt Controller
5 Simulation Results
6 Conclusion
References
Design and Realization of an IoT Prototype for Location Remote Monitoring via a Web Application
1 Introduction
2 IoT System Description
2.1 Communicating Nodes
3 Web Application
3.1 Authentication Interface
3.2 Control Interface
4 Conclusion
References
Design of a New CP Microstrip Patch Antennas for WPT to a UAV at 5.8 GHz
1 Introduction
2 Related Work
2.1 Subsystem Transmitter
2.2 Subsystem Receiver
2.3 Synthesis
3 Design, Simulation and Results
3.1 Presented Antenna
4 Conclusion
References
Design of Folded Dipole with Double U Shaped Slot UHF RFID Tag Using Genetic Algorithm Optimization for Healthcare Sensing Applications
1 Introduction
2 Energetic Constraint for UHF RFID System
3 RFID Antenna Configuration and Application of GA Technique
3.1 Antenna Design
3.2 Genetic Algorithm
3.3 Simulated Results of Optimized Antenna
4 Conclusion
References
Flexible Query Systems for Relational Databases
1 Introduction
2 A Brief Introduction to the Relational Data Model and Fuzzy Sets Theory
2.1 Relational Data Model
2.2 Fuzzy Sets Theory
3 Flexible Query Models for Crisp Relational Databases
4 Flexible Query Models for Fuzzy Relational Databases
5 Conclusion
References
Introduction to Integrate the Cellular Automata Concept Within the Internet of Things: The Use of the Dynamic Management of Bridge Approach
1 Introduction
2 Energy Management
3 Communication Protocols
3.1 Constrained Application Protocol CoAP
3.2 Message Queue Telemetry Transport MQTT
3.3 Advanced Message Queuing Protocol AMQP
4 Cellular Automata Concept
5 Contribution
6 Conclusion
References
Investigation of Ultrasonic Opacity Based on Quarter-Wave Mode Resonance Using a Two-Dimensional Silicon Phononic Crystal
1 Introduction
2 Setup and Results
3 Conclusion
References
Numerical Simulation of HDPE Behavior Under V-Notch
1 Introduction
2 The Studied Model
2.1 Characterization Specimen
2.2 V-Notch at an Opening Length of 5 mm
2.3 V-Notch Opening 15 mm
2.4 V-Notch with 20 mm Opening Length
2.5 Comparison
3 Conclusion
References
Operating Models of Network Protocols IoT: Long-Range Protocols
1 Introduction
2 Network Protocols
2.1 LoRa
2.2 Cellular
2.3 Sigfox
2.4 Narrowband-IoT (NB-IoT)
3 Comparative Study
4 Operating Models
4.1 LoRa
4.2 Cellular
4.3 Sigfox
5 Conclusion and Discussion
References
Recent Trends in Green Computing
1 Introduction
2 Need for Green Computing
2.1 Increasing Electronic Waste
2.2 Increasing Energy Consumption
2.3 Lowering Infrastructure Cost
2.4 Increasing Carbon Footprints
2.5 Strengthening Organization Sustainability Image
3 Policies Devised and Implemented by Developed Countries
4 Impact of Implementing Green Computing Trends
5 Challenges
5.1 Privacy Maintenance
5.2 Decrement in Efficiency
5.3 Green Design
5.4 Green Practices and Green Management
5.5 Reduction in Carbon Footprints
5.6 Focused Green Computing
5.7 Cost
6 Initiatives for Green Computing by Tech Giants
6.1 Energy Conservation Program
6.2 Facebook’s Initiative
6.3 Google’s Initiative
6.4 Amazon’s Initiative
6.5 Microsoft’s Initiative
6.6 Apple’s Initiative
6.7 Samsung’s Initiative
7 Conclusion
References
Smart Security
Design Challenges and Assessment of Modern Web Applications Intrusion Detection and Prevention Systems (IDPS)
1 Introduction
2 An Overview of Intrusion Detection and Prevention
2.1 Intrusion Detection and Prevention
2.2 Detection Methods in IDPS
2.3 Basic Architecture of an IDPS
3 Web IDPS Design Challenges
3.1 Web-Related Security Issues
3.2 Placement of the IDPS
3.3 Communication Protocol (HTTP/HTTPS)
3.4 Users and Session Management
3.5 Continuous Changes and Performance
3.6 Bots Requests
4 Assessment of Open-Source Web IDPS
5 Conclusion
References
Hardware Trojan Detection in Heterogeneous Systems on Chip
1 Introduction
2 Backgrounds
2.1 The Threat Levels of the Hardware Trojan
2.2 Some Critical Fields Affected by Trojans
2.3 Hardware Trojan Taxonomy
3 Related Works
4 Methodology
4.1 Main Circuit
4.2 Trojan Implementation
4.3 Detection
5 Results
6 Conclusion and Future Work
References
How Much Your Cloud Management Platform Is Secure? OpenStack Use Case
1 Introduction
2 OpenStack Architecture
2.1 Horizon (Dashboard)
2.2 Nova (Compute)
2.3 Neutron (Networking)
2.4 Keystone (Identity)
2.5 Glance (Image Service)
2.6 Cinder (Block Storage)
2.7 Swift (Object Storage)
3 Related Work
4 Openstack Security: Analysis of Vulnerabilities
5 Conclusion
References
SVM: An Approach to Detect Illicit Transaction in the Bitcoin Network
1 Introduction
2 Methodology
2.1 Dataset Elliptic Overview
2.2 Support Vector Machine Algorithm (SVM)
2.3 Logistic Regression
2.4 Random Forest
2.5 Confusion Matrix
3 Results
3.1 Confusion Matrix of Each Algorithm
3.2 Machine Learning Metrics
4 Conclusion
References
The Security of MQTT Against the Applications Protocols for IoT
1 Introduction
2 An Overview
2.1 Definition of IoT
2.2 IoT Model Communication
3 Applications Layer Protocols
4 Comparative Study of MQTT vs the Other Protocols
5 Discussion of the Security Status of the Protocol MQTT and the Proposition of a New Approach
5.1 The Weakness of MQTT Related to Security
5.2 The New Approach to Secure the MQTT
6 Conclusion
References
A Review of Anomalies Detection Based on Association Rules Techniques
1 Introduction
2 Background
3 Review by Context
3.1 Credit Card Fraud
3.2 Network Intrusion
3.3 Other Context
4 Review by Technique
4.1 Apriori Algorithm
4.2 Fuzzy Association Rules
4.3 Pruning Technique
4.4 Other Techniques
5 Review Summary
6 Conclusion and Future Work
References
International Security Standards for Critical Oil, Gas, and Electricity Infrastructures in Smart Cities: A Survey Study
1 Introduction
2 Smart Grids
3 Industrial Control Systems
4 Critical Infrastructures
5 Investigation of International Security Standards
6 Conclusions
References
COVID-19 Pandemic Researches
A Framework for Concurrent Contact-Tracing and Digital Evidence Analysis in Heterogeneous Environments
1 Introduction
2 Background and Related Literature
2.1 Digital Forensics
2.2 Influence of Mobile Devices
2.3 Integrity of Digital ``Evidence'' Data
2.4 Related Work
3 Concurrent Contact-Tracing Framework for Heterogeneous Environments
3.1 CCT Framework Overview
3.2 CCT Framework Steps
3.3 Comparing the CCT Framework with Other Existing Frameworks
4 Discussions
5 Conclusion and Future Work
References
A Smart Surveillance Prototype Ensures the Respect of Social Distance During COVID19
1 Introduction
2 Related Works
3 Methodology
3.1 Raspberry Pi
3.2 Raspberry Pi Camera
3.3 Object Detection Pretrained Models
3.4 Social Distance
4 Proposed Prototype
5 Results and Discussion
6 Conclusion and Perspectives
References
COVID-19 Patient Classification Strategy Using a Hybrid BWM-SVM Model
1 Introduction
2 Proposed Work
2.1 System Model
2.2 The First Stage: BWM
2.3 The Second Stage: Patient Classification Using SVM
3 Conclusion
References
Development of a Simulator to Model the Spread of Coronavirus Infection in a Closed Space
1 Introduction
2 Related Works
3 Methodology
3.1 Angular Framework
3.2 Perlin Noise Algorithm
3.3 Quad-Tree Algorithm
3.4 Agent-Based Model
4 Proposed Solution
4.1 Simulator Parameters
4.2 Simulator Process
5 Experimentations and Results
5.1 Preliminaries
5.2 Sample
5.3 Results and Analyzes: The Impact of Contamination Ratio
6 Conclusion
References
Patient Classification Using the Hybrid AHP-CNN Approach
1 Introduction
2 Related Works
3 Proposed Work
3.1 Deep Learning
3.2 Artificial Neural Network
3.3 The Multi-criteria AHP Method
4 Our Illustrative Example and Discussion
5 Conclusion
References
Towards Automatic Diagnosis of the COVID-19 Based on Machine Learning
1 Introduction
2 Methodology
2.1 Random Forest Classifier
2.2 ExtraTrees Classifier
2.3 AdaBoost Classifier
2.4 XGBoost Classifier
3 Experiments and Results
3.1 Data Set
3.2 Data Processing
3.3 Experimental Protocol
3.4 Results Analysis
4 Conclusion
References
Internet of Things for Smart Healthcare: A Review on a Potential IOT Based System and Technologies to Control COVID-19 Pandemic
1 Introduction
2 The Three-Layer Architecture of IoT
3 Internet of Things for Digital Healthcare
4 Covid-19 Pandemic
5 Discussion
5.1 System Design:
5.2 Recommended Component for Implementation:
5.3 Methodology
5.4 Discussion and Results:
6 Conclusion and Perspectives
References
Covid -19: Performance of e-commerce in Morocco
1 Introduction
2 Impact of Social Anxiety on Consumer Attitude
3 Effect of Economic Change on Consumer Behavior
4 Materials
5 Results
5.1 Impact of Anxiety and the Economic Change on Moroccan Consumer Behavior
5.2 Model Specification
5.3 Statistical Analysis
6 Discussion
References
Survey of Global Efforts to Fight Covid-19: Standardization, Territorial Intelligence, AI and Countries’ Experiences
1 Introduction
2 Standards, Policies and Referential
2.1 Preparedness and Quick Response
2.2 Quality Assurance and Infrastructure
2.3 Transparent Communication and Global Consensus
2.4 Experience Feedback and Knowledge Management
2.5 Continuity and Resilience
2.6 3R Resistance-Relaunch-Recovery Management
3 Territorial Intelligence
3.1 Pillars and Concepts
3.2 Opportunities for Growth
4 Applications that Integrate AI to Fight Covid-19
5 Coronavirus Fighting Using AI – Architecture
6 Countries’ Experience with Covid-19
6.1 New Zealand
6.2 South Korea
6.3 Morocco
6.4 United States
6.5 Discussion
7 Conclusion and Future Works
References
3D City Modelling and Augmented Reality
3D City Modelling Toward Conservation and Management. The Digital Documentation of Museu do Ipiranga – USP, San Paulo, Brazil
1 Introduction
1.1 Museu do Ipiranga - USP
2 Related Works
3 Methodology
3.1 Digital Documentation
3.2 Data Processing and Analysis
3.3 Medium and Long-Term Fallout
4 Conclusion
5 Credits
References
3D Documentation of Göreme Saklı Church
1 Cappadocia
1.1 Monasteries in Cappadocia
2 Göreme (Lower) Valley Monasteries
2.1 Location and of History Saklı Church
2.2 Plan Features of Saklı Church
3 Conclusions
References
Appropriateness of Using CityGML Standard Version 2.0 for Developing 3D City Model in Oman
1 Introduction
2 Related Works
3 The Need for 3D City Model in Oman
4 CityGML Standard
5 Methodology
6 Discussion and Results
7 Conclusion
References
Investigating the Effects of Population Growth and Urban Fabric on the Simulation of a 3D City Model
1 Introduction
1.1 Urbanization and Urban Growth Modelling
1.2 3D City Modelling
2 Study Area
3 Model Procedure and Methodology
3.1 SLEUTH Simulation Urban Growth
3.2 3D Representation of Prospective Urban Growth Simulation
4 3D Urban Growth Simulation
4.1 Attraction-Based Environmental Protection Scenario
4.2 Classify the Type of Building
4.3 Estimate the Population Growth
4.4 Urban Fabric Scenario
4.5 Modifying the SLEUTH Output to Create the Building Footprints
4.6 Positioning and Division of the Building Footprints
4.7 Configuration of the Building Footprints
4.8 Assembles the Polygonal Components
4.9 Building Footprints Generation and Positioning the Building Representations
4.10 The 3D Visualization of the City
5 Conclusion
References
Segmentation-Based 3D Point Cloud Classification on a Large-Scale and Indoor Semantic Segmentation Dataset
1 Introduction
2 Methodology
2.1 Segmentation of Raw Data
2.2 Feature Extraction from Segments
2.3 Classification
3 Experimental Results
3.1 Dataset
3.2 Parameter Settings
3.3 Testing the Performances of the Classifiers and the Features on the Semantic Ground-Truth Segments
3.4 Segmentation-Based Classification Results of the Raw Test Data
4 Conclusion
References
Big Data and Parallel Computing
Feature Learning of Patent Networks Using Tensor Decomposition
1 Introduction
2 Related Works
3 Methodology
3.1 Search Patent and Find Related Terms in Similar Patent
3.2 Plot Patent Network or Graphs Information
3.3 Clustering Graphs Data
3.4 Visualization and Evaluation
4 Conclusion
References
Lambda-IVR: An Indexing Framework for Video Big Data Retrieval Using Distributed In-memory Computation in the Cloud
1 Introduction
2 Related Work
3 Background
4 System Architecture
5 Example Scenario
6 Evaluation and Discussion
6.1 Experimental Setup
6.2 Evaluation
7 Conclusion
References
Parallel Computing for Multi-core Systems: Current Issues, Challenges and Perspectives
1 Introduction
2 Related Works
2.1 Graph Partitioning Models
2.2 Graph Partitioning Based Algorithms for Parallel Computing
3 Main Issues and Challenges of Parallelism
3.1 Data-Task Decomposition
3.2 Communication Cost
3.3 Load Balancing
3.4 Resource-Aware Load Balancing
4 Perspectives and Future Works
4.1 Computation-Data Decomposition
4.2 Communication Cost
4.3 Resource Allocation and Load Balancing
5 Conclusion
References
Semantic Web and Business Intelligence in Big-Data and Cloud Computing Era
1 Introduction
1.1 Related Works
2 Semantic Web and Business Intelligence
3 SW and Online Analytical Processing (OLAP)
3.1 Multidimensional Model Oriented
3.2 Analysis Oriented Approach of OLAP
4 Contextualizing Business Intelligence Analysis
5 Big Data
5.1 Big Data Integration in SW
6 Incorporating the Cloud with SW and BI
6.1 Benefits of Cloud Incorporation
7 Conclusions and Future Works
References
Video Big Data Analytics in the Cloud: Research Issues and Challenges
1 Introduction
2 Proposed L-CVAS Architecture
2.1 Video Big Data Curation Layer
2.2 Video Big Data Processing Layer
2.3 Video Big Data Mining Layer
2.4 Knowledge Curation Layer
3 Research Issues, Opportunities, and Future Directions
4 Conclusion
References
Smart Modeling Systems and Natural Language Processing
A Multimodal Memes Classification: A Survey and Open Research Issues
1 Introduction
2 Memes Classification: A Generic Architecture
2.1 Linguistic Processing Flow
2.2 Visual Processing Flow
2.3 Fusion and Pre-training: Towards Multimodality
3 State-of-the-Art on Memes Classification
3.1 Hateful Speech Classification
3.2 Multimodal Visual-Linguistic Classification
4 Research Issues, Opportunities, and Future Directions
5 Conclusion
References
Fireworks Algorithm for Solving the Fixed-Spectrum Frequency Assignment
1 Introduction
2 Frequency Assignment Problem
3 Fireworks Algorithm
3.1 Design of Fireworks Explosion
3.2 Selection of Locations
4 Solution Representation
4.1 Neighborhood Operators
5 Simulation Results and Discussion
5.1 Simulation Results
6 Conclusion
References
Intersection Modeling Using Generalized Fuzzy Graph Coloring
1 Introduction
2 Related Work
3 Preliminaries
4 The Proposed Modeling
5 Conclusion
References
Recognition of Arabic Handwritten Text by Integrating N-gram Model
1 Introduction
2 Related Work
3 Architecture of the Proposed Method
4 Words Extraction
4.1 Preprocessing
4.2 Segmentation
5 Recognition of Arabic Handwritten Words
5.1 Preprocessing
5.2 Feature Extraction
5.3 Classification of Words
6 N-gram Model
7 Experiments and Results
8 Conclusion
References
Transformation of Smart Text Processing: Emoji Classification for Arabic and Turkish Languages
1 Introduction
2 Related Works
3 Methodology
3.1 Corpus
3.2 Classification
3.3 Classification Procedure
4 Evaluation of Classification
5 Results
6 Discussion and Conclusion
References
Author Index
Recommend Papers

Innovations in Smart Cities Applications Volume 4: The Proceedings of the 5th International Conference on Smart City Applications
 9783030668402, 3030668401

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Lecture Notes in Networks and Systems 183

Mohamed Ben Ahmed · İsmail Rakıp Karaș · Domingos Santos · Olga Sergeyeva · Anouar Abdelhakim Boudhir Editors

Innovations in Smart Cities Applications Volume 4 The Proceedings of the 5th International Conference on Smart City Applications

Lecture Notes in Networks and Systems Volume 183

Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas— UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Turkey Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA; Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada; Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong

The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science.

More information about this series at http://www.springer.com/series/15179

Mohamed Ben Ahmed İsmail Rakıp Karaș Domingos Santos Olga Sergeyeva Anouar Abdelhakim Boudhir •







Editors

Innovations in Smart Cities Applications Volume 4 The Proceedings of the 5th International Conference on Smart City Applications

123

Editors Mohamed Ben Ahmed Computer Engineering Department Faculty of Sciences and Techniques Tangier, Morocco Domingos Santos CICS.NOVA - Interdisciplinary Centre of Social Sciences Polytechnic Institute of Castelo Branco Castelo Branco, Portugal

İsmail Rakıp Karaș Computer Engineering Department, Faculty of Engineering Karabük University Karabük, Turkey Olga Sergeyeva Department of Sociology of Culture and Communication Saint Petersburg State University St. Petersburg, Russia

Anouar Abdelhakim Boudhir Faculty of Sciences and Techniques Association Méditerranéenne des Sciences et Technologies Tangier, Morocco

ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-3-030-66839-6 ISBN 978-3-030-66840-2 (eBook) https://doi.org/10.1007/978-3-030-66840-2 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

Nowadays, there is a tremendous trend toward the design and implementation of the smart cities in the world. This trend opens an important and significant research activity by academics and their partners (industrial, governments, civil society,…) in order to establish essential and intelligent foundations for developing the active fields in smart city like economy, industry, healthcare, government, society, water and energy and more. This edited book edition titled “Innovations in Smart Cities and Applications,” aims to present the scientific researches and engineering applications for building the future smart cities and their different innovative applications and services. The book has also the objective of providing an integrated vision of problems to researcher, engineers, practitioners and outlining the contour of new subjects in smart city applications. This edition is the fruit of the accepted and presented works in the fourth International virtual Conference on Smart City Applications (SCA2020) held on October 7–9, 2020, in Safranbolu Turkey. It regroups original research, achieved works and proposed architectures on the main topics of the conference. The scope of SCA2020 covers a variety of topics with an intersection with smart cities, including geo-smart information systems, education, healthcare, economy and digital business, building and home automation, environment and agriculture, and information technologies and computer science. The past, present and certainly future editions of this book edition has the same goals of constructing and building the basics and essentials researches, innovations and applications that can help on the growth of the future next generation of cities and the human well-being. Thanks to participants and researchers who trusted to those series of conferences, that this research area will grow and exist due to their high quality of presented papers. Our thanks are addressed to all authors for choosing SCA2020 and submitting their manuscripts. We cannot forget, in this occasion, all keynotes speakers for their appreciated and rich scientific talk at the conference session.

v

vi

Preface

We are deeply grateful to all organizing committee members, to all program committee members and reviewers, to all chairs of sessions for their efforts and the time spent in order to evaluate the contributions and to success this event. We also would like to acknowledge and thank the Springer Nature Switzerland AG staff for their support, guidance and for the edition of this book. Finally, we wish to express our sincere thanks to Dr. Thomas Ditzinger, Pr. Janusz kacprzyk and Ms. Varsha Prabakaran for their kind support and help to promote and develop research. Mohamed Ben Ahmed İsmail Rakıp Karaș Domingos Santos Anouar Abdelhakim Boudhir Olga Sergeyeva

Organization

Committee Conference Chair İsmail Rakıp Karaș

Karabuk University, Turkey

Conference General Chairs Mohamed Ben Ahmed Anouar Boudhir Abdelhakim Bernadetta Kwintiana Ane

FST, Tangier UAE University, Morocco FST, Tangier UAE University, Morocco University of Stuttgart, Germany

Conference Technical Program Committee Chair Wassila Mtalaa

Luxembourg Institute of Science and Technology, Luxembourg

Workshops Chair Domingos Santos

Polytechnic Institute Castelo Branco, Portugal

Publications Chair Olga Sergeyeva

Saint-Petersburg University, Russia

Tutorials Chair Senthil Kumar

Hindustan College of Arts and Science, India

vii

viii

Organization

Publicity and Social Media Chair Abdellaoui Alaoui El Arbi

EIGSI, Casablanca, Morocco

Local Organizing Committee Idris Kahraman Emrullah Demiral Mustafa Aksin Kadriye Oz Hacer Kübra Köse Berna Gunes Umit Atila Kasim Ozacar Yasin Ortakci Muhammed Kamil Turan Sohaib Abujayyab Emre Yücer

Karabuk Karabuk Karabuk Karabuk Karabuk Karabuk Karabuk Karabuk Karabuk Karabuk Karabuk Karabuk

University, University, University, University, University, University, University, University, University, University, University, University,

Turkey Turkey Turkey Turkey Turkey Turkey Turkey Turkey Turkey Turkey Turkey Turkey

Technical Program Committee İsmail Rakıp Karaș Abdel-Badeeh M. Salem Abderrahim Ghadi Abdullah Elen Abdullah Emin Akay Abdurrahman Geymen Accorsi, Riccardo Adib Habbal Adnan Alajeeli Aftab Ahmed Khan Ahmad S. Almogren Ahmed Kadhim Hussein Alabdulkarim Lamya Alghamdi Jarallah Ali Jamali Alias Abdul Rahman Anabtawi Mahasen Anton Yudhana Arif Çağdaş Aydinoglu Arioua Mounir Assaghir Zainab Astitou Abdelali Aydın Üstün Aziz Mahboub

Karabuk University, Türkiye Ain Shams University, Egypt FSTT UAE, Morocco Bandirma Onyedi Eylül University, Türkiye Bursa Technical University, Türkiye Erciyes University, Türkiye Bologna University, Italy Karabuk University, Türkiye Karabuk University, Türkiye Karakoram International University, Pakistan King Saud University, Saudi Arabia Babylon University, Iraq King Saud University, Saudi Arabia Prince Sultan University, Saudi Arabia Universiti Teknologi Malaysia Universiti Teknologi Malaysia Al-Quds University, Palestine Universitas Ahmad Dahlan, Indonesia Gebze Technical University, Türkiye UAE, Morocco Lebanese University, Lebanon UAE, Morocco Kocaeli University, Türkiye FSTT UAE, Morocco

Organization

Bahadır Ergun Barış Kazar Bataev Vladimir Behnam Alizadehashrafi Behnam Atazadeh Ben Yahya Sadok Bessai-Mechmach Fatma Zohra Biswajeet Pradhan Berk Anbaroğlu Bolulmalf Mohammed Boutejdar Ahmed Burhan Selcuk Bulent Bayram Caner Ozcan Caner Güney Chadli Lala Saadia Cumhur Şahin Damir Žarko Dominique Groux Dousset Bernard Edward Duncan Eehab Hamzi Hijazi El Kafhali Said Eftal Şehirli El Malahi Mostafa El Mhouti Abderrahim El Haddadi Anass El Hebeary Mohamed Rashad El Ouarghi Hossain Elif Sertel Emre Yücer Emrullah Sonuç En-Naimi El Mokhtar Enrique Arias Castilla-La Tolga Ensari Ferhat Atasoy Filip Biljecki Francesc Anton Castro Ghulam Ali Mallah Habibullah Abbasi Haddadi Kamel Iemn Hakan Kutucu Hanane Reddad Hazim Tawfik

ix

Gebze Technical University, Türkiye Oracle, USA Zaz Ventures, Switzerland Tabriz Islamic Art University, Iran University of Melbourne, Australia Faculty of Sciences of Tunis, Tunisia CERIST, Algeria University of Technology Sydney, Australia Hacettepe University, Türkiye UIR, Morocco German Research Foundation, Bonn, Germany Karabuk University, Türkiye Yildiz Technical University, Türkiye Karabuk University, Türkiye Istanbul Technical University, Türkiye University Sultan Moulay Slimane, Morocco Gebze Technical University, Türkiye Zagreb University, Croatia UPJV, France UPS Toulouse, France The University of Mines and Technology, Ghana An-Najah University, Palestine Hassan 1st University, Settat, Morocco Karabuk University, Türkiye USMBA University, Fez, Morocco FST, Al-Hoceima, Morocco UAE University, Morocco Cairo University, Egypt ENSAH UAE University, Morocco Istanbul Technical University, Türkiye Karabuk University, Türkiye Karabuk University, Türkiye UAE, Morocco Mancha University, Spain Istanbul University, Türkiye Karabuk University, Türkiye National University of Singapore Technical University of Denmark Shah Abdullatif University, Pakistan University of Sindh, Pakistan Lille University, France Karabuk University, Türkiye USMS University, Morroco Cairo University, Egypt

x

Huseyin Bayraktar Hüseyin Pehlivan Huseyin Topan Huseyin Zahit Selvi İlhami Muharrem Orak Ilker Türker Iman Elawady Indubhushan Patnaikuni Ismail Büyüksalih Ivin Amri Musliman J. Amudhavel Jaime Lioret Mauri Jus Kocijan Kadir Ulutaş Kasım Ozacar Khoudeir Majdi Labib Arafeh Laila Moussaid Lalam Mustapha Loncaric Sven Lotfi Elaachak Mademlis Christos Mehmet Akbaba Mete Celik Miranda Serge Mohamed El Ghami Mohammad Sharifikia Mousannif Hajar Mufit Cetin Muhamad Uznir Ujang Muhammad Imzan Hassan Muhammed Kamil Turan Murat Yakar Murat Lüy Mustafa Akgul My Lahcen Hasnaoui Mykola Kozlenko Nafil Khalid Nesrin Aydin Atasoy Nusret Demir Oğuz Fındık Oğuzhan Menemencioğlu

Organization

General Directorate of GIS, Türkiye Gebze Technical University, Türkiye Bulent Ecevit University, Türkiye Konya Necmettin Erbakan University Karabuk University, Türkiye Karabuk University, Türkiye Ecole Nationale Polytechnique d’Oran, Algeria RMIT—Royal Melbourne Institute of Technology, Australia Bimtaş A.Ş., Türkiye Universiti Teknologi Malaysia VIT Bhopal University, Madhya Pradesh, India Polytechnic University of Valencia, Spain Nova Gorica University, Slovenia Karabuk University, Türkiye Karabuk University, Türkiye IUT Poitiers University, France Al-Quds University, Palestine ENSEM, Casablanca, Morocco Mouloud Mammeri University of Tizi Ouzou, Algeria Zagreb University, Croatia FSTT, UAE, Morocco Aristotle University of Thessaloniki, Greece Karabuk University, Türkiye Erciyes University, Türkiye Nice University, France University of Bergen, Norway Tarbiat Modares University, Iran Cadi Ayyad University, Morocco Yalova University, Türkiye Universiti Teknologi Malaysia Universiti Teknologi Malaysia Karabuk University, Türkiye Mersin University, Türkiye Kırıkkale University, Türkiye Istanbul University, Türkiye Moulay Ismail University, Morocco Vasyl Stefanyk Precarpathian National University, Ukraine UM5, Morocco Karabuk University, Türkiye Akdeniz University, Türkiye Karabuk University, Türkiye Karabuk University, Türkiye

Organization

Omar Dakkak Omer Muhammet Soysal Ouederni Meriem R. S. Ajin Rani El Meouche Raif Bayır Rafet Durgut Saffet Erdogan Sagahyroon Assim Saied Pirasteh Savas Durduran Sedat Bakici Sibel Senan Senthil Kumar Serdar Bayburt Seyit Ali Kayış Siddique Ullah Baig Slimani Yahya Sohaib Abujayyab Sonja Grgić Sri Winiarti Suhaibah Azri Sunardi Sule Erten Ela Tebibel Bouabana Thouraya Umit Atila Umit Isikdag Umran Koylu Xiaoguang Yue Yasin Ortakcı Yasyn Elyusufi Yüksel Çelik Youness Dehbi Yusuf Arayıcı Yusuf Yargı Baydilli Zafer Albayrak Zennure Uçar Zigh Ehlem Slimane Zouhri Amal

xi

Karabuk University, Türkiye Southeastern Louisiana University, USA INP - ENSEEIHT Toulouse, France DEOC DDMA, Kerala, India Ecole Spéciale des Travaux Publics, France Karabuk University, Türkiye Karabuk University, Türkiye Harran University, Türkiye American University of Sharjah, United Arab Emirates University of Waterloo, Canada Konya Necmettin Erbakan University, Türkiye Turkish Cadastre Office, Türkiye Istanbul University, Türkiye Hindustan College of Arts and Science, India Bimtaş A.Ş., Türkiye Karabuk University, Türkiye COMSATS Institute of Information Technology, Pakistan Manouba University, Tunisia Karabuk University, Türkiye Zagreb University, Croatia Universitas Ahmad Dahlan, Indonesia Universiti Teknologi Malaysia Universitas Ahmad Dahlan, Indonesia Ege University, Türkiye ESI, Alger, Algeria Karabuk University, Türkiye Mimar Sinan Fine Arts University, Türkiye Erciyes University, Türkiye International Engineering and Technology Institute, Hong Kong Karabuk University, Türkiye FSTT, UAE, Morocco Karabuk University, Türkiye University of Bonn, Germany Northumbria University, UK Karabuk University, Türkiye Karabuk University, Türkiye Düzce University, Türkiye INTTIC, Oran, Algeria USMBA University, Fez, Morocco

Keynote Speakers

Smart Cities and Geo-Spatial Technologies

Professor Alias Abdul-Rahman is an Associate Professor at the Department of Geoinformatics, Faculty of Geoinformation Science and Engineering, Universiti Teknologi Malaysia, Skudai, Johor, Malaysia. He holds a PhD from the University of Glasgow, UK; he has headed up the 3D GIS Research Laboratory since 2000. His research interests include 3D GIS, 3D city modeling and 3D spatial databases.

xv

Smart Cities and Energy

Professor Şule Erten Ela has been the Head of Energy Department since 2016. Since 2019, she has been working as the Solar Energy Director at Ege University. She is currently working as a Professor at Ege University, Institute of Solar Energy, and Department of Energy. She received Assistant Professor Position in 2005 and Associate Professor Position in 2009 and Full Professor Position in 2015 in Ege University-Solar Energy Institute. She received Turkish Academy of Sciences-Outstanding Young Scientist Award (TUBA-GEBIP) in 2013. She received UNESCO-LOREAL Young Woman Scientist Award in 2014. In 2015, she received Young Scientist Award from The Science Academy (BAGEP). She has many principles in science such as academic integrity, academic honesty, academic freedom and academic merit.

xvii

Deep Learning Applications for Shoreline Extraction from Landsat and Sentinel Satellite Imagery

Professor Bulent Bayram is a Professor at Yildiz Technical University, Department of Geomatics Enginering since 2014. He is Head of Photogrammetry Subdivision His focus interest is on image processing, medical image processing, deep learning, UAV, LIDAR. And he is expert at: Spatial analysis, remote sensing, deep learning, photogrammetry, programming languages; 3D, images processing, natural disasters, laser scanning, mammography, cardiac CT.

xix

Review Use of Modern Technologies in creating Smart Cities

Prof. Attaullah Shah has 31 year senior/executive level experience of academic leadership, strategic planning, resource mobilization, teaching & research in university and public sector organizations. He has authored five books in the fields of civil engineering, construction project management, engineering economics, disaster management and high strength concrete. He also authored more than 100 research papers in the renowned peer-reviewed international journals and proceedings of international conferences across the world.

xxi

Contents

Smart Citizenship and Sentiment Analysis Temporal Sentiment Analysis of Socially Important Locations of Social Media Users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alper Ecemiş, Ahmet Şakir Dokuz, and Mete Celik

3

A New Sentiment Analysis System of Climate Change for Smart City Governance Based on Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . Mustapha Lydiri, Yousef El Mourabit, and Youssef El Habouz

17

A Novel Approach of Community Detection Using Association Rules Learning: Application to User’s Friendships of Online Social Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mohamed El-Moussaoui, Mohamed Hanine, Ali Kartit, and Tarik Agouti Arabic Sentiment Analysis Based on 1-D Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bensalah Nouhaila, Ayad Habib, Adib Abdellah, and Ibn El Farouk Abdelhamid CMA-EV: A Context Management Architecture Extended by Event and Variability Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zineb Aarab, Asmae El ghazi, Rajaa Saidi, and Moulay Driss Rahmani Electronic Public Services in the AI Era . . . . . . . . . . . . . . . . . . . . . . . . Hajar Hadi, Ibtissam Elhassani, and Souhayl Sekkat

29

44

56 70

Planning and Designing Smart Sustainable Cities: The Centrality of Citizens Engagement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sukaina Al-Nasrawi

83

Review of Learning-Based Techniques of Sentiment Analysis for Security Purposes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mohammed Boukabous and Mostafa Azizi

96

xxiii

xxiv

Contents

Sentiment Analysis. A Comparative of Machine Learning and Fuzzy Logic in the Study Case of Brexit Sentiment on Social Media . . . . . . . . 110 Ihab Moudhich, Soumaya Loukili, and Abdelhadi Fennan Sentiment Analysis and Opinion Mining Using Deep Learning for the Reviews on Google Play . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 Sercan Sari and Murat Kalender Smart Guest Virtual Assistant with Automatic Guest Registration . . . . 138 Mohammed Hussain, Abdullah Hussein, and Mohamed Basel AlMourad Topic Modeling and Sentiment Analysis with LDA and NMF on Moroccan Tweets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Nassera Habbat, Houda Anoun, and Larbi Hassouni Smart Education and Intelligent Learning Systems A Deep Learning Model for an Intelligent Chat Bot System: An Application to E-Learning Domain . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Ben Ahmed Mohamed, Boudhir Anouar Abdelhakim, and Saadna Youness E-learning at the Service of Professionalization in Higher Education in Morocco: The Case of MOOCs of the Maroc Université Numérique Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 Nadia Elouesdadi and Sara Rochdi Methodology to Develop Serious Games for Primary Schools . . . . . . . . 195 Younes Alaoui, Lotfi El Achaak, and Mohammed Bouhorma Methods and Software Tools for Automated Synthesis of Adaptive Learning Trajectory in Intelligent Online Learning Management Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 Mariia Dutchak, Mykola Kozlenko, Ihor Lazarovych, Nadiia Lazarovych, Mykola Pikuliak, and Ivan Savka National University of Uzbekistan on the Way to the Smart University . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 A. Karimkhodjaev and M. Nishonov Smart Pedagogical Knowledge Management Model for Higher Education . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 Meriyem Chergui, Aziza Chakir, Hajar Mansouri, and Adil Sayouti The Recommendation of a Practical Guide for Doctoral Students Using Recommendation System Algorithms in the Education Field . . . . 240 Oumaima Stitini, Soulaimane Kaloun, and Omar Bencharef Virtual Reality–Enhanced Soft and Hard Skills Development Environment for Higher Education . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 Abid Abdelouahab

Contents

xxv

A Comparative Study Between K-Nearest Neighbors and K-Means Clustering Techniques of Collaborative Filtering in e-Learning Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 Rajae Zriaa and Said Amali Competence and Lean Management, a Literature Review . . . . . . . . . . . 283 Wafae Qjane and Abderazzak Boumane Smart Mobility and Intelligent Infrastructures A New Distributed Strategy to Find a Bi-objective Optimal Path Toward a Parking in the City . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301 Khaoula Hassoune and Mehdi Hassoune A Novel Mobile CrowdSensing Architecture for Road Safety . . . . . . . . 311 Wahiba Abou-zbiba, Hajar El Gadi, Hanan El Bakkali, Houda Benbrahim, and Driss Benhaddou An Agent-Based Architecture for Multi-modal Transportation Using Prometheus Methodology Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 Jihane Larioui and Abdeltif El Byed An Overview of Real-Time Traffic Sign Detection and Classification . . . 344 Youssef Taki and Elmoukhtar Zemmouri Classification of the Driver’s Emotions Using a Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 Abdelfettah Soultana, Faouzia Benabbou, and Nawal Sael Deep Learning Based Driver’s Fatigue Detection Framework . . . . . . . . 370 Zakaria Boucetta, Abdelaziz El Fazziki, and Mohamed El Adnani DSRC vs LTE V2X for Autonomous Vehicle Connectivity . . . . . . . . . . 381 Kawtar Jellid and Tomader Mazri Dynamic on Demand Responsive Transport with Time-Dependent Customer Load . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395 Sonia Nasri, Hend Bouziri, and Wassila Aggoune-Mtalaa Encryption Issues in Traffic Control Systems in Smart Cities and Traffic Signal Control Optimization . . . . . . . . . . . . . . . . . . . . . . . . 410 Diedon Bujari and Erke Aribas Evolutionary Heuristic for Avoiding Traffic Jams in Road Network Using A* Search Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423 Safa Belhaous, Soumia Chokri, Sohaib Baroud, Khalid Bentaleb, and Mohammed Mestari Geometric Feature Extraction of Road from UAV Based Point Cloud Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435 Mustafa Zeybek and Serkan Biçici

xxvi

Contents

Parking Availability Prediction in Smart City . . . . . . . . . . . . . . . . . . . . 450 El Arbi Abdellaoui Alaoui and Stephane Cedric Koumetio Tekouabou Smart Infrastructure and Integrated Platform of Smart and Sustainable Cities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463 Maysoun Ibrahim Study to Reduce the Costs of International Trade Operations Through Container Traffic in a Smart Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477 Ouail El Imrani The Global Performance of a Service Supply Chain: A Simulation-Optimization Under Arena . . . . . . . . . . . . . . . . . . . . . . . . 489 Badr Bentalha, Aziz Hmioui, and Lhoussaine Alla Traffic Signs Detection and Recognition System in Snowy Environment Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503 Hamou Chehri, Abdellah Chehri, and Rachid Saadane Smart Healthcare A Machine Learning Approach for Initial Screening of Polycystic Ovarian Syndrome (PCOS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517 Joshua Rei Jaralba, Renann Baldovino, and Homer Co Patient Transport and Mobile Health Workforce: Framework and Research Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . 530 Yosra Lahmer, Hend Bouziri, and Wassila Aggoune-Mtalaa Semantic Web and Healthcare System in IoT Enabled Smart Cities . . . 546 Barakat A. Dawood and Melike Sah Skin Cancer Prediction and Diagnosis Using Convolutional Neural Network (CNN) Deep Learning Algorithm . . . . . . . . . . . . . . . . . . . . . . . 558 Hajar Mousannif, Hiba Asri, Mohamed Mansoura, Anas Mourahhib, Yassine Isaouy, and Mouad Marmouchi Smart Earth Environment and Agriculture Climate-Smart Landscapes for Sustainable Cities . . . . . . . . . . . . . . . . . 571 Canan Cengiz, Bülent Cengiz, and Aybüke Özge Boz Diversity and Seasonal Occurrence of Sand Flies and the Impact of Climatic Change in Aichoune Locality, Central Morocco . . . . . . . . . 583 Fatima Zahra Talbi, Mohamed Najy, Mouhcine Fadil, Nordine Nouayti, and Abdelhakim El Ouali Lalami

Contents

xxvii

Environmental Challenges of Solid Waste Management in Moroccan Cities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594 A. El Atmani, H. Chiguer, I. Belhaili, S. Aitsi, D. Elkhachine, K. Elkharrim, El Borjy Aziz, and Belghyti Driss Evaluation of the Purification Performance of the WWTP by Aered Lagunage of the City of Oujda (Morocco) . . . . . . . . . . . . . . . . . . . . . . . 607 Belhaili Isslam, Alemad Ali, Aissati Touria, Elatmani Ayoub, Elkharrim Khadija, and Belghyti Driss IoTree: A Way Towards Smart Plantation . . . . . . . . . . . . . . . . . . . . . . 620 Surayya Obaid, Hiba Binte Tariq, Tehreem Qamar, Aimun Tahir, and Namrah Komal Physico-Chemical and Mineralogical Characterization of Urban Sludge from the Tamuda Bay Tetouan Treatment Plant . . . . . . . . . . . . 632 Douae El Khachine, Belhaili Isslam, Ait-Si Salah, El Atmani Ayoub, El Kharrim Khadija, and Belghyti Driss Study of Climate Change During the Period (2009–2018) in the Region of Sidi Slimane, - Morocco . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646 Salah Aitsi, Donia Bassir, Ayoube Elatmani, Ahmed Chabli, and Driss Belghyti Typological Study of the Water of the Boufekrane River (Meknes, Morocco): Principal Component Analysis and Discriminant Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 659 Imane Taha, Abdelkader Chahlaoui, Mustapha Samih, Driss Bengoumi, Aziz Taouraout, Khadija Ouarrak, and Rachid Sammoudi Smart Economy and Smart Factory A Framework of Integrating Condition Based Maintenance Programs and Wireless Sensor Network in The Context of Industry 4.0 . . . . . . . . 675 Sadiki Soukaina, Driss Amegouz, and Said Boutahari A New Artificial Intelligence-Based Strategy for Digital Marketing Reinforcement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 689 Mouna Boujrad and Yasser el Mazoui Nadori lamlili An Efficient Collaborative Filtering and Graph Approach for Business-Matching Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 700 Anas Sabbani, Anass El Haddadi, and Hayat Routaib Assessment of Blockchain Technology Adoption Factors and Scenarios Within the Economy of Latvia . . . . . . . . . . . . . . . . . . . . 714 Natalija Kostrikova

xxviii

Contents

Data Mining and Machine Learning Techniques Applied to Digital Marketing Domain Needs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 730 Sara Ahsain and M’hamed Ait Kbir Leveraging Dynamicity and Process Mining in Ad-Hoc Business Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 741 Zineb Lamghari, Rajaa Saidi, Maryam Radgui, and Moulay Driss Rahmani Modeling the Use of RFID Technology in Smart Processes . . . . . . . . . . 758 Ihsane Abouzid and Rajaa Saidi Real Time Release Approach: At-Line Prediction of Ascorbic Acid Concentration in Nutraceutical Syrup via Artificial Neural Network . . . 770 Mikhael Anthony Felipe and Renann Baldovino Smart Recognition Systems and Multimedia Processing Convolutional Neural Network for Identifying Human Emotions with Different Head Poses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785 Wafa Mellouk and Wahida Handouzi Deep Learning-Based 3D Face Recognition Using Derived Features from Point Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 797 Muhammed Enes Atik and Zaide Duran Face Sketch Recognition: Gender Classification Using Eyebrow Features and Bayes Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 809 Khalid Ounachad, Mohamed Oualla, and Abdelalim Sadiq Fall Detection for Pedestrians in Video-Surveillance . . . . . . . . . . . . . . . 820 Wassima Aitfares Hand Pose Estimation Based on Deep Learning . . . . . . . . . . . . . . . . . . 835 Marwane Bellahcen, El Arbi Abdellaoui Alaoui, and Stéphane Cédric Koumétio Tékouabou Static and Dynamic Hand Gesture Recognition System Using Contourlet Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 844 Roumiassa Ferhat, Fatma Zohra Chelali, and Salah eddine Agab Video Activity Recognition Based on Objects Detection Using Recurrent Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 855 Mounir Boudmagh, Mohammed Redjimi, and Adlen Kerboua A Comparative Study Between the Most Usable Object Detection Methods Based on Deep Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 867 Ayyoub Fakhari, Mohamed Lazaar, and Hicham Omara

Contents

xxix

Performance Analyses of AES and 3DES Algorithms for Encryption of Satellite Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 877 Yasin Ortakci and Mohammed Yaseen Abdullah A Survey on Deep Learning-Based Approaches to Estimation of 3D Human Pose and Shape from Images for the Smart Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 891 Sh. Maleki Arasi and E. Seyedkazemi Ardebili Smart Devices and Softwares A 3-DOF Cable-Driven Robotic Ankle Rehabilitation Device . . . . . . . . . 919 Romel S. Saysay, Nicanor R. Roxas Jr., Nilo T. Bugtai, Homer S. Co, and Renann G. Baldovino A Cognitive Radio Spectrum Sensing Implementation Based on Deep Learning and Real Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 930 Mohamed Saber, Abdellah Chehri, Abdessamad El Rharras, Rachid Saadane, and Mohammed Wahbi Appliance-Level Monitoring with Micro-Moment Smart Plugs . . . . . . . 942 Abdullah Alsalemi, Yassine Himeur, Faycal Bensaali, and Abbes Amira Backhaul Networks and TV White Spaces (TVWS) with Implementation Challenges in 5G: A Review . . . . . . . . . . . . . . . . . . . . . 954 Teena Sharma, Abdellah Chehri, Paul Fortier, and Rachid Saadane Comparative Study via Three MPPT Techniques Methods for PV Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 966 Mohamed Chouiekh, Amine Lilane, Karim Benkirane, Mohamed Abid, and Dennoun Saifaoui Design and Realization of an IoT Prototype for Location Remote Monitoring via a Web Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 977 S. M. H. Irid, M. Hadjila, H. E. Adardour, and I. Y. Nouali Design of a New CP Microstrip Patch Antennas for WPT to a UAV at 5.8 GHz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 992 Salah Ihlou, Hafid Tizyi, Abdelmajid Bakkali, Ahmed El Abbassi, and Jaouad Foshi Design of Folded Dipole with Double U Shaped Slot UHF RFID Tag Using Genetic Algorithm Optimization for Healthcare Sensing Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1003 Ibtissame Bouhassoune, Hasna Chaibi, Abdellah Chehri, Rachid Saadane, and Khalid Minaoui Flexible Query Systems for Relational Databases . . . . . . . . . . . . . . . . . . 1015 Rachid Mama, Mustapha Machkour, Mourad Ennaji, and Karam Ahkouk

xxx

Contents

Introduction to Integrate the Cellular Automata Concept Within the Internet of Things: The Use of the Dynamic Management of Bridge Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1030 Fatima Zahra Chafi and Youssef Fakhri Investigation of Ultrasonic Opacity Based on Quarter-Wave Mode Resonance Using a Two-Dimensional Silicon Phononic Crystal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1044 A. Elmadani, S. Bensallam, M. Idrissi, M. Addouche, A. Elayouch, A. Khelif, A. Bouaaddi, Y. Achaoui, and H. Jakjoud Numerical Simulation of HDPE Behavior Under V-Notch . . . . . . . . . . . 1051 Rabiaa Elkori, Abdelilah Hachim, Khalid Elhad, and Amal Laamarti Operating Models of Network Protocols IoT: Long-Range Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1059 Sakina Elhadi, Abdelaziz Marzak, and Nawal Sael Recent Trends in Green Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1071 Surayya Obaid, Narmeen Bawany, Hiba binte Tariq, Aimun Tahir, and Namrah Komal Smart Security Design Challenges and Assessment of Modern Web Applications Intrusion Detection and Prevention Systems (IDPS) . . . . . . . . . . . . . . . . 1087 Yassine Sadqi and Manal Mekkaoui Hardware Trojan Detection in Heterogeneous Systems on Chip . . . . . . 1105 Billel Guechi and Mohammed Redjimi How Much Your Cloud Management Platform Is Secure? OpenStack Use Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1117 Najat Tissir, Said ElKafhali, and Noureddine Aboutabit SVM: An Approach to Detect Illicit Transaction in the Bitcoin Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1130 Abdelaziz Elbaghdadi, Soufiane Mezroui, and Ahmed El Oualkadi The Security of MQTT Against the Applications Protocols for IoT . . . . 1142 Imane Sahmi, Tomader Mazri, and Nabil Hmina A Review of Anomalies Detection Based on Association Rules Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1155 Imane Sadgali, Nawal Sael, and Faouzia Benabbou International Security Standards for Critical Oil, Gas, and Electricity Infrastructures in Smart Cities: A Survey Study . . . . . . . . . . . . . . . . . . 1167 Cevat Özarpa, Muhammed Ali Aydin, and Isa Avci

Contents

xxxi

COVID-19 Pandemic Researches A Framework for Concurrent Contact-Tracing and Digital Evidence Analysis in Heterogeneous Environments . . . . . . . . . . . . . . . . . . . . . . . . 1183 Stacey O. Baror, H. S. Venter, and Victor R. Kebande A Smart Surveillance Prototype Ensures the Respect of Social Distance During COVID19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1197 Ikram Ben abdel ouahab, Lotfi Elaachak, Fatiha Elouaai, and Mohammed Bouhorma COVID-19 Patient Classification Strategy Using a Hybrid BWM-SVM Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1210 Samira Achki and Layla Aziz Development of a Simulator to Model the Spread of Coronavirus Infection in a Closed Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1220 Mohamed Almechkor, Lotfi El Aachak, Fatiha Elouaai, and Mohammed Bouhorma Patient Classification Using the Hybrid AHP-CNN Approach . . . . . . . . 1231 Layla Aziz and Samira Achki Towards Automatic Diagnosis of the COVID-19 Based on Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1244 El Arbi Abdellaoui Alaoui, Stephane Cedric Koumetio Tekouabou, Ismail Ougamane, and Imane Chabbar Internet of Things for Smart Healthcare: A Review on a Potential IOT Based System and Technologies to Control COVID-19 Pandemic . . . . . 1256 M. Ennafiri and T. Mazri Covid -19: Performance of e-commerce in Morocco . . . . . . . . . . . . . . . . 1270 Asmaa Abyre, Zineb Jibraili, and Hajar Anouar Survey of Global Efforts to Fight Covid-19: Standardization, Territorial Intelligence, AI and Countries’ Experiences . . . . . . . . . . . . . 1282 Boudanga Zineb, Mezzour Ghita, and Benhadou Siham 3D City Modelling and Augmented Reality 3D City Modelling Toward Conservation and Management. The Digital Documentation of Museu do Ipiranga – USP, San Paulo, Brazil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1299 M. Balzani, L. Rossato, F. Raco, and B. Mugayar Kühl 3D Documentation of Göreme Saklı Church . . . . . . . . . . . . . . . . . . . . . 1317 Sümeyye Ertürk and Leyla Kaderli

xxxii

Contents

Appropriateness of Using CityGML Standard Version 2.0 for Developing 3D City Model in Oman . . . . . . . . . . . . . . . . . . . . . . . . . 1332 Khalid Al Kalbani and Alias Bin Abdul Rahman Investigating the Effects of Population Growth and Urban Fabric on the Simulation of a 3D City Model . . . . . . . . . . . . . . . . . . . . . . . . . . 1344 Rani El Meouche, Mojtaba Eslahi, and Anne Ruas Segmentation-Based 3D Point Cloud Classification on a Large-Scale and Indoor Semantic Segmentation Dataset . . . . . . . . . . . . . . . . . . . . . . 1359 Ali Saglam and Nurdan Akhan Baykan Big Data and Parallel Computing Feature Learning of Patent Networks Using Tensor Decomposition . . . . 1375 Mohamed Maskittou, Anass El Haddadi, and Hayat Routaib Lambda-IVR: An Indexing Framework for Video Big Data Retrieval Using Distributed In-memory Computation in the Cloud . . . . . . . . . . . . 1391 Muhammad Numan Khan, Aftab Alam, Tariq Habib Afridi, Shah Khalid, and Young-Koo Lee Parallel Computing for Multi-core Systems: Current Issues, Challenges and Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1405 Soumia Chokri, Sohaib Baroud, Safa Belhaous, and Mohammed Mestari Semantic Web and Business Intelligence in Big-Data and Cloud Computing Era . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1418 Adedoyin A. Hussain, Fadi Al-Turjman, and Melike Sah Video Big Data Analytics in the Cloud: Research Issues and Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1433 Aftab Alam, Shah Khalid, Muhammad Numan Khan, Tariq Habib Afridi, Irfan Ullah, and Young-Koo Lee Smart Modeling Systems and Natural Language Processing A Multimodal Memes Classification: A Survey and Open Research Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1451 Tariq Habib Afridi, Aftab Alam, Muhammad Numan Khan, Jawad Khan, and Young-Koo Lee Fireworks Algorithm for Solving the Fixed-Spectrum Frequency Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1467 Mohamed El Bouti, Raouan El Ghazi, Lamia Benameur, and Alami Chentoufi Jihane Intersection Modeling Using Generalized Fuzzy Graph Coloring . . . . . . 1479 Sidina Boudaakat, Mohamed Amine Basmassi, Ahmed Rebbani, Jihane Alami Chentoufi, Lamia Benameur, and Omar Bouattane

Contents

xxxiii

Recognition of Arabic Handwritten Text by Integrating N-gram Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1490 Asmae Lamsaf, Mounir Aitkerroum, Siham Boulaknadel, and Youssef Fakhri Transformation of Smart Text Processing: Emoji Classification for Arabic and Turkish Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1503 Ismail Burak Parlak, Séverine Dubuisson, Çağatay Ünal Yurtöz, Soukaina El Majdoubi, Chaymae Harchli, and Maha Lazrak Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1519

Smart Citizenship and Sentiment Analysis

Temporal Sentiment Analysis of Socially Important Locations of Social Media Users Alper Ecemi¸s1(B)

, Ahmet Sakir ¸ Dokuz1

, and Mete Celik2

1 Department of Computer Engineering, Nigde Omer Halisdemir University,

51240 Ni˘gde, Turkey {ecemisalper,adokuz}@ohu.edu.tr 2 Department of Computer Engineering, Erciyes University, 38039 Kayseri, Turkey [email protected]

Abstract. Socially important locations are the places which are frequently visited by social media users. Temporal sentiment analysis of socially important locations is the process of interpretation and classification of emotions within their sharings in their socially important locations over time. Observing the temporal sentiment changes in these locations helps both to examine the emotion change in the locations and to understand the thoughts of the social media users in these locations. In this paper, Twitter is selected as social media data source and temporal sentiment analysis of socially important locations of social media users are analyzed in different time frames. For the analysis, a method, called Temporal Sentiment Analysis of Socially Important Locations (TS-SIL), is proposed in this study. In this method, first of all, socially important locations are discovered from the collected Twitter dataset. Then, sentiment analysis is performed using a dictionary based approach and several machine learning algorithms. Finally, the sharings in the locations are listed and the sentiments at these locations are analyzed by daily, weekly, and monthly basis. As a result, socially important locations of the city of Istanbul are discovered and temporal sentiment analysis of these locations are performed. Results shows that all of the socially important locations of ˙Istanbul, except Be¸sikta¸s Fish Market, showed emotional fluctuations over the time. Keywords: Temporal sentiment analysis · Sentiment at socially important locations · Social media mining · Twitter

1 Introduction Socially important locations are the locations where social media users frequently visit during their social media usage [1, 2]. Discovering socially important locations helps us getting valuable information about spatial preferences of social media users and important places of a social media user group [3]. Discovering socially important locations could be beneficial for several application areas, such as city management, user community discovery, and social media user groups’ popular places discovery. However, spatial aspect may not provide enough information about the sentiment of a social media user group at socially important locations over time. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 3–16, 2021. https://doi.org/10.1007/978-3-030-66840-2_1

4

A. Ecemi¸s et al.

The purpose of sentiment analysis is to reveal the attitude of a writer or a speaker on a subject in the text [4]. Discovery of sentiment in socially important locations will provide information about opinions or thoughts of social media users at these locations. This information can be used in many different areas, such as spatial recommender systems, user ideas about a product in e-commerce, target audience thoughts, and marketing in elections. Temporal sentiment analysis refers to the analysis of the change of sentiments over time periods [5]. Analyzing temporal sentiment of socially important locations would provide insights about emotion change of social media users in socially important locations. However, there are several challenges posed by the discovery of temporal sentiments in socially important locations. First, social media text messages are short and it is difficult to extract sentiment from such messages. Second, social media datasets are unstructured and do not have a specific format, therefore requires preprocessing. Third, the number of messages of users changes as their location changes over time. Fourth, it is difficult to develop algorithms to analyze temporal sentiments of socially important locations since social media datasets are unstructured and their size grows drastically over time. In the literature, most of the studies on semantic text mining and temporal analysis focus on sentiment analysis and polarity detection of texts. However, there are few studies that deal with social media text messages which are unstructured and short texts. The studies on spatio-temporal and temporal sentiment analysis deal with target audience emotion change and determination, analysis of temporal change of the locations with the highest polarity in a particular region, and temporal analysis of the semantic change of brands. This study focuses on temporal sentiment analysis of socially important locations of social media users. For this purpose, Twitter dataset is used as social media dataset and tweets shared from the Istanbul city are collected. The method, called Temporal Sentiment Analysis of Socially Important Locations (TS-SIL), is proposed for temporal sentiment analysis. In this method, first, SocioSpatially Important Locations Mining (SS-ILM) algorithm [1] is used to discover socially important locations of users. Then, sentiment analysis is performed on socially important locations using machine learning algorithms and a dictionary-based approach. Finally, the messages in the locations are sorted temporally and analyzed according to the pre-defined time frames. Results shows that all of the socially important locations of ˙Istanbul, except Be¸sikta¸s Fish Market, showed emotional fluctuations over the time. The rest of this paper is organized as follows. Section 2 presents literature review. Section 3 presents the proposed method of Temporal Sentiment Analysis of Socially Important Locations. Section 4 presents experimental evaluation and Sect. 5 presents conclusion.

2 Literature Review In the literature, several studies were performed for temporal sentiment analysis of social media datasets. The main applications of temporal sentiment analysis are analysis of elections, brand analysis, and product feedback analysis.

Temporal Sentiment Analysis of Socially Important Locations

5

In the study of Medagoda and Shanmuganathan [6], several events, such as the election between 19 November and 20 December 2014 were analyzed by sentiment classification and keyword clustering methods. Also, the temporal change of sentiment was examined and interesting information about the change of the opinions of people in the selection process was revealed. In the study of Das et al. [7], it was thought that the sub-relationship between events within the scope of TempEval 2007 challenge could be related to the emotions in two consecutive sentences. In the study, conditional random field (CRF) machine learning method was used and the results were visualized with emotion tracking. In the study of Xia and Song [8], temporal sentiment analysis of locations at Curtin University Bentley campus was carried out using twitter data. As a result, it was stated that the highest positive polarity in the region belongs to the faculty of social sciences, while the highest negative polarity was stated in the tweets sent from the engineering faculty and dormitory regions. In the study of Rill et al. [9], a system called PoliTwi was proposed, which performed early detection of political issues arising on Twitter. In the study, 4 million tweets were collected before the German parliamentary elections between April and September 2013 and analyzed with PoliTwi. As a result, PoliTwi was compared with Google search trends and it was shown that PoliTwi reveals the topics earlier. In the study of Cho et al. [10], tweets of Korean region were examined, tweet contents were investigated by sentiment analysis, and links between brands and words were discovered. A dictionary-based structure was created for sentiment analysis and classification was performed using support vector machine (SVM) and naive Bayes algorithms. The temporal change of sentiment in the tweets about brands was examined and the change of brand awareness over time was tracked [10]. In the study of Park et al. [11], textual content on dark web forums was discovered to show radical trends using natural language processing and emotion analysis. It was also stated that using spatio-temporal analysis and collective temporal change of ideas in forms would be an indicator of terrorism. In the study of Fukuhara et al. [5], the graphical representation of the temporal sentiment analysis was mentioned and the sample results obtained by applying the graphics to news articles were explained. In the study of Paul et al. [12], the temporal change of emotions was emerged through the twitter in 2016 US elections. It was also stated that the method used had the potential to provide solutions to many social problems, such as neighbor happiness and health indicator. The study of Ecemis et al. [13] conducted sentiment analysis of Twitter data, however, this study does not take into account temporal aspect of the dataset. In this study, temporal sentiment analysis of socially important locations of Istanbul city is performed and the results are discussed. For this purpose, TS-SIL algorithm is proposed. In the proposed algorithm, first, socially important locations of social media users in Istanbul are discovered and then temporal sentiment analysis of these locations are performed using machine learning algorithms.

3 Method In this study, first, social media users of Istanbul city residents were selected as social media user group and social media data were collected from Twitter for the selected users. Then, the collected dataset was preprocessed and the SS-ILM algorithm [1] was

6

A. Ecemi¸s et al.

used to discover socially important locations of Istanbul social media users. For the discovery of emotions in these locations, an equal amount of positive, negative, and neutral labeled tweets were used and a training dataset consisting of a total of 15000 instances was selected. Afterwards, the emotions of the tweets, shared from socially important locations, were extracted using machine learning algorithms. For temporal sentiment analysis of these locations, tweets shared at these locations are sorted in temporal order. Finally, emotional changes are analyzed at these locations by daily, weekly, and monthly basis. In this section, discovery of socially important locations, polarity detection and training dataset generation, and the proposed temporal sentiment analysis method were discussed. 3.1 Discovery of Socially Important Locations In the study, SS-ILM algorithm [1] is used for socially important locations discovery. In the discovery process, the user-based and group-based operations are carried out after the preprocessing phase. The basic structure of socially important locations discovery process is presented in Fig. 1 [14].

Fig. 1. Steps of SS-ILM algorithm [14]

First, the dataset is preprocessed and the locations of the users are extracted (Fig. 1). Then, user-level socially important locations discovery is carried out and, finally, grouplevel socially important locations are discovered using user-level socially important locations. SS-ILM algorithm has user-given parameters of min_density, min_visit, and min_UP [14, 15] and, in this study, their values were set at 0.001, 0.005, and 0.001, respectively. 3.2 Polarity Detection and Training Dataset Generation In this study, polarity detection [16] is applied to determine the semantic orientation of the tweet contents for the discovery of emotion in socially important locations. While

Temporal Sentiment Analysis of Socially Important Locations

7

performing polarity detection, a dictionary-based approach is preferred and tweet contents are labeled as positive, negative, or neutral. Polarity determination is realized in the range of [−1, +1]. The polarity value of shared tweets in the specified time frame means 1 if all tweets are positive, −1 if all tweets are negative, and 0 if all tweets are neutral. In the study, a dictionary-based approach is used for generating the training dataset. The training dataset is generated by selecting a total of 15000 tweets, including equal positive, negative, and neutral tweets, from the labeled tweets. The generated training dataset is tested with machine learning algorithms of SVM, k nearest neighbor (kNN), naive Bayes, and random forest. SVM is a classification algorithm based on structural risk minimization proposed by Booser et al. [17, 18]. kNN is a neighborhood-based classification algorithm, the basis of which was introduced by Cover and Hart [19]. Naive Bayes is a probability based classification algorithm which is based on Bayes’ theorem [20]. Random forest is an ensemble-based classification algorithm proposed by L. Breiman [21]. 3.3 Temporal Sentiment Analysis Method In this study, while performing the temporal sentiment analysis of socially important locations, first, socially important locations are discovered. Then, the sharings in these locations are sorted based on time and analyzed by daily, weekly, and monthly basis. The flowchart of the temporal sentiment analysis method used in the study is presented in Fig. 2. -min_density -min_visit -min_up

Dataset

Preprocess

SS-ILM

-SVM -kNN -Naive Bayes -Random Forest

Machine Learning Algorithms

Time Based Sorting

Analyze and Result

Fig. 2. The flowchart of temporal sentiment analysis of socially important locations method

The pseudocode of the proposed Temporal Sentiment Analysis Of Socially Important Locations (TS-SIL) algorithm is presented in Algorithm 1. In Algorithm 1, the discovery of social important locations is performed using SSILM algorithm [1] (Step 1). Then, the tweet contents are purified by passing them through the preprocessing stage (Step 2). Afterwards, the TF-IDF [22] values of the dataset are calculated (Step 3) and a training dataset consisting positive, negative, and neutral-labeled tweets is created using a dictionary containing strong emotion words (Step 4). The created training dataset is tested with machine learning algorithms and the best model is selected (Step 5) and is used for the discovery of sentiments in socially important locations (Step 6). Then, tweets of these socially important locations are sorted based on their time information (Step 7). Finally, temporal sequential tweets are analyzed by daily, weekly, and monthly basis (Step 8).

8

A. Ecemi¸s et al.

Algorithm 1. The pseudocode of TS-SIL algorithm Inputs: D: Social media users' tweet dataset SentiWords: Strong emotion words dataset StopWords : Conjunction, preposition, punctuation, and emotion-free words TimeFrame: Time interval to perform sentiment analysis Output : Temporal polarity at SILs 1. 2. 3. 4. 5. 6. 7. 8. 9.

SILs = discover_SILs(D) cleanedTweets = clean_tweets(D, StopWords) tfidfTerms = TFxIDF(cleanedTweets) trainingData = generate_training_data(tfidfTerms, SentiWords) slModel = evaluateTrainingDataset(MachineLearningAlgorithms) locationPolarity = analyze_sentiment(tfidfTerms, trainingData, SILs, slModel) sortedLocationTweets = sort_polarity(locationPolarity) time-basedLocationPolarity = analyze_time_based_polarity (TimeFrame) return time-basedlocationPolarity

4 Experimental Evaluation In this section, first, the dataset is presented, then data preprocessing steps are explained, and, finally, experimental results are presented. In the experiments, top 10 socially important locations are presented and the temporal changes of emotions of these locations are examined to answer following questions: • Which locations are the socially important locations in the city of Istanbul? • What is the temporal change of sentiments at these locations? In this study, a computer with Intel i7 3.4 GHz CPU and 8 GB RAM is used. 4.1 Dataset In the study, tweets of social media users of Istanbul city residents in Turkey are collected. Istanbul is selected because it has a dense population and has high number of socially important places. The number of social media users in the dataset for Istanbul is 5,583 and the total number of tweets of these users is 14,782,064. The reason why the number of users gathers around 5,000 is to prevent different and contradictory user preferences. To collect the tweets, first, a search process is performed and then the tweets are collected with Twitter REST API. 4.2 Data Pre-processing The detection and cleaning of fake account in the collected data is of great importance [23, 24]. In order to extract fake users, two filters are defined as the ratio of the users’

Temporal Sentiment Analysis of Socially Important Locations

9

followers to those who follow the user is more than 0.1 and the number of user tweets is over 50. Fake users have been eliminated using these filters. After the elimination process, 1782 users are selected as real Istanbul users. Then, retweets, mentions, prepositions, conjunctions, and punctuation marks are removed. Another preprocessing stage is performed for labeling the locations. Twitter data has sensitive coordinate data. In order to determine the social important locations of individuals, a certain approximate method should be applied. Thus, it is ensured that the tweets with different coordinates in the same location are labeled with the same location tag. The details of location labeling can be found in [1] and [25]. 4.3 Results of SS-ILM Algorithm The top 10 social important locations discovered by the SS-ILM algorithm are given in Table 1 and Fig. 3. Table 1. Top 10 socially important locations of ˙Istanbul Order Locations A

Ayasofya Mosque

B

Eyüp Jandarma Memorial Forest

C

Eminönü

D

Be¸sikta¸s Fish Market

E

Zorlu Shopping Center

F

Be¸sikta¸s Cultural Center

G

Cevahir Shopping Center

H

Sabiha Gökçen Airport

I

Sükrü ¸ Saraço˘glu Stadium ˙Istiklal Street

J

In Fig. 3, it can be seen that social important locations are spread to different parts of Istanbul. When the locations listed in Table 1 are examined, it can be seen that the locations of C, F, G, and J coincide with the locations presented in [1] and [25]. When the other locations are examined, the Ayasofya Mosque is a tourist attraction place which is visited quite frequently by both tourists and people from outside the region. Eyüp Jandarma Memorial Forest is a forested land used by people to socialize and have a picnic. Be¸sikta¸s Fish Market is one of the popular and frequently used places in Istanbul with its restaurants where people can buy or eat seafood. In addition, Sabiha Gökçen Airport is an international airport and Sükrü ¸ Saraço˘glu Stadium is a football statium. All these inferences confirm the results obtained.

10

A. Ecemi¸s et al.

Fig. 3. Top 10 socially important locations of ˙Istanbul

4.4 Performance of Machine Learning Algorithms on Training Dataset The results obtained by analyzing the dataset using machine learning algorithms are given in Table 2. In the study, accuracy and processing time are considered as performance metrics. Modeling performances of algorithms are taken into account for the processing time. Table 2. Training dataset performance results of machine learning algorithms Algorithms

Accuracy % Build time (sec)

SVM

99.46

66.66

kNN

93.72

0.01

Naive Bayes

87.06

3.93

Random Forest 98.87

358.47

When the results in Table 2 are examined, it can be seen that the most successful classification algorithm is SVM for sentiment analysis of training dataset. Although modeling time is longer compared to kNN and naive Bayes, SVM is used in this study since it has the highest accuracy. 4.5 Temporal Sentiment Analysis of Socially Important Locations In this study, temporal polarity changes of socially important locations are handled on a daily, weekly and monthly basis and the temporal emotion changes are analyzed. The temporal sentiment analysis of first five socially important locations given in Table 1 are presented in Fig. 4, 5, 6, 7 and 8. In the figures, x axis shows the date and y axis shows the polarity value between −1 and +1. For example, if all tweets at a location in time frame t are positive, the result is 1, or contrarily, −1 if all tweets are negative. When the figures are evaluated, it can be seen that there are emotional changes at each location over time. In addition, it can be said that daily and weekly changes contain more intense fluctuations than weekly and monthly changes, respectively.

Temporal Sentiment Analysis of Socially Important Locations

11

Fig. 4. Temporal sentiment analysis results of Ayasofya Mosque

Fig. 5. Temporal sentiment analysis results of Eyüp Jandarma Memorial Forest

When daily emotional analysis results of the Ayasofya Mosque are examined, the intensity of the number of tweets shared between 2017 and 2019 is remarkable. Although negative polarity occur at certain time intervals, when the results of the month-based analysis are examined, it can be said that positive polarity is the most prevalent polarity. When the temporal sentiment analysis results of the Eyüp Jandarma Memorial Forest are examined, in the day-based analysis of shared tweets, it is seen that negative thoughts

12

A. Ecemi¸s et al.

Fig. 6. Temporal sentiment analysis results of Eminönü

Fig. 7. Temporal sentiment analysis results of Be¸sikta¸s Fish Market

are high on 05.02.2018, 06.05.2019, and 09.06.2019. However, this situation seems to decrease in weekly and monthly analysis. When the temporal sentiment analysis of the tweets shared in Eminönü is analyzed, it is observed that the most intense positive polarity occurred between 2016 and 2017. However, it is apparent at times in situations containing negative polarity.

Temporal Sentiment Analysis of Socially Important Locations

13

Fig. 8. Temporal sentiment analysis results of Zorlu Shopping Center

When the temporal sentiment analysis results of Be¸sikta¸s Fish Market are analyzed, it can be said that people have a positive opinion about this location due to the continuous polarity and the polarity is never negative at this location. When temporal sentiment analysis results of Zorlu Shopping Center are analyzed, it is seen that polarity is generally positive. However, on 15.06.2018, negative tweets were shared and tweet density was sparse on a weekly and monthly basis, causing a negative reflection of the emotion in the location daily and weekly and monthly analysis results.

5 Conclusion The discovery of social important locations aims to identify the locations that social media users frequently visit during their social media life. Emotion analysis reveals whether a product, thought, or any textual sharing is positive or negative. The temporal sentiment analysis reveals the emotional changes of social media users in these locations over time. In this study, TS-SIL algorithm is proposed for the temporal analysis of sentiment of socially important locations in the city of Istanbul. For this purpose, first of all, user data is collected in Istanbul city using the Twitter dataset and passed through the preprocess phase. Then, SS-ILM algorithm is applied and socially important locations in Istanbul city are discovered. Afterwards, a dictionary-based training dataset is generated and tested with machine learning algorithms. By using SVM algorithm, which gives the best accuracy performance, the sharings in SILs are classified as positive, negative, and neutral. Then, after the classification process, the sharings in the locations are sorted based on their time information. Finally, the sorted sharing in these locations are analyzed by daily, weekla, and monthly basis. The results show that daily analysis varies more

14

A. Ecemi¸s et al.

frequently than weekly analysis and weekly changes have more fluctuations compared to monthly analysis. In this study, the discovery of social important locations in Istanbul city is performed using Twitter dataset and the temporal changes of emotions in these locations are analyzed. For many locations, it is revealed that the emotions of people have a variable attitude over time and they vary based on the analyzed time frame. One of the research limitations of this article is only 5 of the socially important locations were included in temporal sentiment analysis. In addition, a dictionary-based approach was used for analyzing sentiments of locations and the change of emotion in locations was discussed. However, the change in users’ emotions is another research topic. The minimum user density, number of visits and location density values determined by the parameters of the SS-ILM algorithm are another research limit. In the future, spatial-temporal analysis can be performed by considering spatial features [26, 27], by using temporal analysis of time-frame, anomaly detection [28] can be performed and the causes of instant emotional changes in locations can be investigated, and algorithms based on intelligent optimization techniques [29–31] can be developed. Another study may be to evaluate the impact of specific events for identified socially important locations. For example, using time-frame whether the corona pandemic process has effects on the people in the locations can be examined. Acknowledgements. This research has been supported by the Scientific Research Projects Coordination Unit of Nigde Ömer Halisdemir University, Project Number: MMT 2019/02-BAGEP, 2019.

References 1. Dokuz, A.S., Celik, M.: Discovering socially important locations of social media users. Expert Syst. Appl. 86, 113–124 (2017). https://doi.org/10.1016/j.eswa.2017.05.068 2. Dokuz, A.S., ¸ Celik, M.: Cloud computing-based socially ımportant locations discovery on social media big datasets. Int. J. Inf. Technol. Decis. Mak. 469–497 (2020). https://doi.org/ 10.1142/S0219622020500091 3. Celik, M., Dokuz, A.S.: Daily and hourly mood pattern discovery of Turkish twitter users. Glob. J. Comput. Sci. Theory Res. 5, 90–98 (2015). https://doi.org/10.18844/gjcs.v5i2.183 4. Li, N., Dash Wu, D.: Using text mining and sentiment analysis for online forums hotspot detection and forecast. Decis. Support Syst. 48, 354–368 (2010). https://doi.org/10.1016/j. dss.2009.09.003 5. Fukuhara, T., Hiroshi, N., Toyoaki, N.: Understanding sentiment of people from news articles: temporal sentiment analysis of social events. In: ICWSM (2007) 6. Medagoda, N., Shanmuganathan, S.: Keywords based temporal sentiment analysis. In: 2015 12th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2015, pp. 1418–1425. Institute of Electrical and Electronics Engineers Inc. (2016). https://doi.org/ 10.1109/FSKD.2015.7382152 7. Das, D., Kolya, A.K., Ekbal, A., Bandyopadhyay, S.: Temporal analysis of sentiment events a visual realization and tracking. In: Lecture Notes in Computer Science (˙Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 417–428 (2011). https://doi.org/10.1007/978-3-642-19400-9_33

Temporal Sentiment Analysis of Socially Important Locations

15

8. Jianhong, X., Song, Z.: Spatial and Temporal Sentiment Analysis of Twitter data. https://esp ace.curtin.edu.au/handle/20.500.11937/36187. Accessed 20 June 2020 9. Rill, S., Reinel, D., Scheidt, J., Zicari, R.V.: PoliTwi: early detection of emerging political topics on twitter and the impact on concept-level sentiment analysis. Knowl.-Based Syst. 69, 24–33 (2014). https://doi.org/10.1016/j.knosys.2014.05.008 10. Cho, S.W., Cha, M.S., Kim, S.Y., Song, J.C., Sohn, K.A.: Investigating temporal and spatial trends of brand images using twitter opinion mining. In: 2014 5th International Conference on Information Science and Applications, ICISA 2014. IEEE Computer Society (2014). https:// doi.org/10.1109/ICISA.2014.6847417 11. Park, A.J., Beck, B., Fletche, D., Lam, P., Tsang, H.H.: Temporal analysis of radical dark web forum users. In: Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016, pp. 880–883. Institute of Electrical and Electronics Engineers Inc. (2016). https://doi.org/10.1109/ASONAM.2016.7752341 12. Paul, D., Li, F., Teja, K., Yu, X., Frost, R.: Compass: Spatio Temporal Sentiment Analysis of US Election What Twiier Says! (2017). https://doi.org/10.1145/3097983.3098053 13. Ecemis, A., Dokuz, A.S., Celik, M.: Sentiment analysis of posts of social media users in their socially ımportant locations. In: 2018 International Conference on Artificial Intelligence and Data Processing, IDAP 2018 (2019). https://doi.org/10.1109/IDAP.2018.8620832 14. Dokuz, A.S., Celik, M.: FAST SS-ILM: a computatıonally efficient algorithm to discover socially ımportant locations. In: ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, pp. 197–202. Copernicus GmbH (2017). https://doi.org/10. 5194/isprs-annals-iv-4-w4-197-2017 15. Celik, M., Dokuz, A.S.: Discovering socially similar users in social media datasets based on their socially important locations. Inf. Process. Manag. 54, 1154–1168 (2018). https://doi. org/10.1016/j.ipm.2018.08.004 16. Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity: an exploration of features for phrase-level sentiment analysis. Comput. Linguist. 35, 399–433 (2009). https://doi. org/10.1162/coli.08-012-R1-06-90 17. Boser, E., Vapnik, N., Guyon, I.M., Laboratories, T.B.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152 (1992) 18. Hua, Z., Wang, Y., Xu, X., Zhang, B., Liang, L.: Predicting corporate financial distress based on integration of support vector machine and logistic regression. Expert Syst. Appl. 33, 434–440 (2007). https://doi.org/10.1016/j.eswa.2006.05.006 19. Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13, 21–27 (1967) 20. Islam, M.J., Wu, Q.J., Ahmadi, M., Sid-Ahmed, M.A.: Investigating the Performance of Naive-Bayes Classifiers and K-Nearest Neighbor Classifiers (2007) 21. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001) 22. Salton, G., Fox, E.A., Wu, H.: Extended boolean ınformation retrieval. Commun. ACM 26, 1022–1036 (1983). https://doi.org/10.1145/182.358466 23. Stringhini, G., Kruegel, C., Vigna, G.: Detecting spammers on social networks. In: Proceedings of the 26th Annual Computer Security Applications Conference, pp. 1–9 (2010) 24. Benevenuto, F., Magno, G., Rodrigues, T., Almeida, V.: Detecting spammers on twitter. In: Collaboration, Electronic Messaging, Anti-abuse and Spam Conference, vol. 6, p. 12 (2010) 25. Celik, M., Sakir Dokuz, A.: Discovering socio-spatio-temporal important locations of social media users. J. Comput. Sci. 22, 85–98 (2017). https://doi.org/10.1016/j.jocs.2017.09.005 26. Shekhar, S., Ranga, R.V., Celik, M.: Spatial and spatiotemporal data mining: recent advances, as a chapter of next generation of data mining. In: Kargupta, H., Han, J., Yu, P.S., Motwani, R., Kumar, V. (eds.) As a Chapter of Next Generation of Data Mining (2009)

16

A. Ecemi¸s et al.

27. Joshi, D., Samal, A., Soh, L.K.: Spatio-temporal polygonal clustering with space and time as first-class citizens. Geoinformatica 17, 387–412 (2013). https://doi.org/10.1007/s10707-0120157-8 28. Çelik, M., Dada¸ser-Çelik, F., Dokuz, A.S.: ¸ Anomaly detection in temperature data using DBSCAN algorithm. In: 2011 International Symposium on Innovations in Intelligent Systems and Applications, INISTA 2011, pp. 91–95 (2011). https://doi.org/10.1109/INISTA.2011.594 6052 29. Celik, M., Koylu, F., Karaboga, D.: CoABCMiner: an algorithm for cooperative rule classification system based on artificial bee colony. Int. J. Artif. Intell. Tools. 25 (2016). https://doi. org/10.1142/S0218213015500281 30. Koylu, F., Celik, M., Karaboga, D.: Performance analysis of ABCMiner algorithm with different objective functions. In: 21st Signal Processing and Communications Applications Conference (SIU), pp. 1–5. IEEE (2013) 31. Ozcan, ˙I., Celik, M.: Developing recommendation system using genetic algorithm based alternative least squares. In: 2018 International Conference on Artificial Intelligence and Data Processing (IDAP), pp. 1–5. IEEE (2018)

A New Sentiment Analysis System of Climate Change for Smart City Governance Based on Deep Learning Mustapha Lydiri1(B) , Yousef El Mourabit1 , and Youssef El Habouz2 1

Faculty of Science and Technology, Sultan Moulay Slimane University, Beni Mellal, Morocco [email protected], [email protected] 2 IGDR, UMR 6290 - CNRS - Rennes 1 University, Rennes, France [email protected]

Abstract. With the world’s massive population growth, governments over the world are trying to move closer to their citizens, in order to improve the management and governance of their cities, and made it smarter than ever before. Smart city technologies are used to analyze and evaluate huge volumes of data to monitor city dwellers, for better governance. Moreover, social media can be a useful vehicle for governments to better understand their citizens. Twitter sentiment analysis is a great approach to provide deep insight into how citizens behave towards phenomena and thus has definite use for smart city governance and monitoring. Climate change is a critical phenomenon. In recent years, the existence of climate change or global warming became an increasingly public debate. In this paper we developed a deep learning model. Based on convolutional Neural Network (CNN) in order to identify believers and deniers of the climate change phenomenon. And we examined the temporal patterns of climate change discussions on Twitter and its driving factors. Results demonstrate that the developed CNN model successfully identified citizens behavior towards climate change with an overall accuracy of 97% denier and 91% believer. Our model provides improved understanding of factors affecting citizen attitudes on climate change, as an efficient tool for smart city monitoring and governance.

Keywords: Smart city

1

· Deep leaning · Climate change · Twitter

Introduction

The number of urban inhabitants is developing by around 60 million each year, and more than 54% of the population is living in cities and it is expected to rise by 72% by 2050 in the world [1]. With this rapid population growth, governments are facing many challenges such as human resources management, economic growth, environmental pollution and many others, so governments are forced to better control and manage all the parts of their communities and react to c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021  M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 17–28, 2021. https://doi.org/10.1007/978-3-030-66840-2_2

18

M. Lydiri et al.

its citizens concerns. To achieve that, they must make their traditional city’s infrastructure more smarter than before in order to integrate smart solutions and enforce the systems intelligence of any field for example transportation, healthcare and environment. Integrating and developing new communications technologies, such as using advantage of big data analyses to build smart decision systems for analyzing the huge amount of real-time data available online, could improve management of cities and rise the quality of services delivered to citizens, also respond to citizen’s individual needs and concerns. Social media is a great source of data, majority of people around the world uses social media platforms to express an opinion or make a complain toward a subject. Twitter is one of the social media that gathering more than 330 million monthly active users and 145 million daily active users, and 500 million tweets are posted every day, that’s make Twitter a valuable data source of studying any subject. To do so we used sentiment analysis approach, which is a field of natural language processing, with this tool we could examine people’s opinions, attitudes, emotions and personal tendencies towards any subject or topic, and classifies them in term of polarity as positive or negative sentiment. With the Rapid global urbanization, climate change has become a disaster that citizens are getting worried about. Due to gases emission and urban congestion and other causes, these phenomena have appeared and it’s becoming a serious problem that need to be studied. In this paper we aim to create an efficient model to detecting believers and deniers of climate change topic on twitter, for that we collected related tweets and labeled them to believers who believe that climate change is human cause problem, and deniers who thinks that climate change is not happening and it doesn’t exist. Our deep learning model of classification is based on Convolutional neural networks (CNN). We compared our model to three other model based respectively on Support Vector Machine (SVM), Na¨ıve Bays (NB) and Logistic Regression. The results show that our model is more performant and clearly efficient based on most relevant metrics (reached 95% of accuracy). The rest of this paper is organized as follow: Sect. 2 present the related work. In Sect. 3 we explain in details our model of classification based on Convolutional Neural Networks. Results are presented and discussed in Sect. 4. Conclusion and future work are given in Sect. 5.

2

Related Work

A lot of recent work has been done in the sentiment analysis and smart cities fields. [2] tried to evaluate how smart city concepts and technologies are perceived and utilized in cities, they employed systematic geo-Twitter analysis to examine discourse and policy in Australia. The main objective of this research was focused on how smart city concepts and technologies are perceived and employed in Australian cities, unfortunately the study findings faced many limitations such as the data used was just about tweets shared in Australia, also

A New Sentiment Analysis System of Climate Change for Smart City

19

the study does not involve a time-series analysis. [3] also applied sentiment analysis on Twitter users data by using a large dataset of geotagged tweets related to climate change topic, this research combined both topic modeling and sentiment analysis techniques, the first one is used to infer the different topics of discussion, the second is for to detecting the overall feelings and attitudes found in the dataset, this word also has many limitation, especially the dataset used for the study contains many indecipherable tweets making both topic modeling ineffective, also topic modeling could not be applied to other languages rather English. [4] present a new approach to sentiment analysis by identifying the contextual polarity for a subset of sentiment expressions drawn from transcribed interviews conducted with decision-makers based in 11 communities in British Columbia, as a result of this work several drivers and barriers to change were identified at the local government level. All those studies focused on English language, while [5] tried to explore Arabic tweets related to smart city systems such as healthcare and transport by reviewing the relevant tools and techniques of each approach considering their accuracy of the obtained results, in order to create a useful guide for researchers who are interested in this field. In [6] authors showed how social media streaming data analytics could help to detect events taking place in a smart city and identify all concerns of its citizens. They studied a case scenario by analyzing traffic data in three largest cities in the UAE. The work [7] Used Multinomial Na¨ıve Bayes to build a sentiment classifier model, they collected, pre-processed, analyzed sentiment of Twitter data, for the purpose of helping the governments to monitor their citizens, the main weakness in this work is the lack of a big dataset for having a better training of the classifier, also the study was focused on NY tweets, while it would be better to analyze the entire continent data. [8] Presented a framework for real-time analysis of humangenerated textual publications, through this framework they gave a combination of techniques for semantic representation, content classification and sentiment analysis, also they exploited it to monitor the recovering state of the social capital of L’Aquila’s city. Other works evaluated how smart city concepts and technologies are exploited and applied in cities using social media analysis, such as [9] who provided an approach based systematic geo-Twitter analysis, with a case study of Australian country, results of this work provided good information about community perceptions on smart city concepts and technologies. In other hand [10] presented a method for analyzing social media and digital governance in smart cities, by investigating issues related to people about daily life in cities using Twitter as data source and using a statistical process to identify tweets sentiment score range used by the Afinn lexicon as dictionary with sentiment score. There are also studies that used for sentiment prediction of Twitter photos by exploring deep features using convolutional neural network [11], but the majority of social media users uses words to express they feelings or opinions, thus we focused on analyzing citizens publications on Twitter to reveal they behavior on climate change topic.

20

3

M. Lydiri et al.

Data and Methods

The overall architecture of our proposed system is shown in Fig. 1, each part is explained in details.

Fig. 1. The general architecture CNN model for sentiment analysis.

The first phase for applying sentiment analysis is capturing data, we gathered messages from Twitter using Twitter API offered by Twitter company. It gives access to all tweets posted by users, more than 5000 tweets about climate change topics was collected by using related keywords to this topic. Each tweet contains information such as tweet content, time when the tweet was created, name of the user profile, hashtags used in the text, retweet status, and follower/following statuses, then all this record are stored in a csv file. The second step is the labeling, for this we used advantage of the hashtag [12] used in those tweets witch specify the subject target of the tweet, for example if the hashtag contain #climatechangehoax”, #climatedeniers, #climatechangeisfalse, #globalwarminghoax or #climatechangenotreal then the tweet is classified as denier one, and if the hashtag contain #climatechangeisreal, #globalwarmingisreal or #actonclimate then the tweet is classified as believer one. The purpose of this step is to prepare the training and testing datasets, which were used to train and validate the classifier. Table 1 shows samples tweets of climate change deniers and believers. After building the classification model, we compared the deep learning model with other Supervised Machine Learning classifiers used frequently with natural language processing tasks, to reveal and compare performances, for that we have chosen algorithms as follow:

A New Sentiment Analysis System of Climate Change for Smart City

21

Table 1. Sample tweets of climate change deniers and believers. Id Tweet content

Label

1

“Climate change is a myth”

Denier

2

“What a joke does borisjohnson believe in man made climate change another lie and fraud by johnson i think”

Denier

3

“The climate change con”

Denier

4

“Do you want the next generation to inherit a world better than the Believer one we live in today make the choice for childfirst actonclimate”

5

“We can only save the planet ourselves globalwarmingisreal”

Believer

Na¨ıve Bays [13] The Naive Bayes classifier is a probabilistic model uses Bayes Theorem by combining conditional probability and prior probability, the general formula could be shown as: P (label/f eatures) =

P (f eatures/label)P (label) P (f eatures)

(1)

Support Vector Machine [14] The SVM is a non-probabilistic binary linear classifier, generally this algorithm work by making an hyperplane between the classes, and maximizing the margin which separates the classes while minimizing the classification errors, the hyperplane that separates the data input is calculated by Eq. (2): y = f (x) = W T x + b =

N 

Wi xi + b

(2)

i=1

Where N is the number of the samples, W is a N-dimensional vector and b is a scalar. Logistic Regression [15] Logistic regression is a binary classifiers uses the logistic function which produces an output value between 0 and 1 representing true or false, the logistic function is also known as the sigmoid function [16] defined as: σ(x) =

1 1 + e−x

(3)

Where, σ is the output between 0 and 1 and e is the Euler’s number. The proposed sentiment analysis model based on Convolutional neural networks (CNN) which are a specialized kind of deep neural networks for processing data that has widely used in image classification [16], its name indicates that it uses a mathematical operation called convolution, also it employs Kernel to extract important regional features of images. In recent years CNN proved its efficiency also in the field of natural language processing and text classification

22

M. Lydiri et al.

tasks, and it gives good results. Same as working on images, CNN extracts features of sentences, by using kernels or windows applied on embedding matrix reveals meaningful terms that expresses the sentiment, several embedding methods are used for generating a matrix factorization of word representations such as Word2vec [17] witch transforms sentences in a continuous vector space and gives a distribution relationship of a word with the rest in a dictionary that means that each unique word will be assigned to a corresponding vector space, and if two words have similar meaning then there corresponding vectors will close. Word2vec utilizes two techniques for producing vectors representations: Continuous Bag-of-Words (CBOW) [18] or skip-gram model (SG) [19]. In this article we used Glove embedding [20] or Global Vector proposed by the NLP team at Stanford University, which is an unsupervised learning algorithm for generating words representations that considers all the information carried by the corpus and not only the information of a specific word, this algorithm is capable of calculating the semantic similarity between tweets by using co-occurrence probabilities of two words to detect if both are sharing the same meaning or not, basically this method uses a fixed-length window of lexical elements around the word and aims to represent each word i and each word j matched in the same context by vectors vi and vj respectively of dimension d such that: vi .vj + bi + bj = log(Xij )

(4)

Where Xij represents the number of times the word j occurs in the context of the word i and bi and bj are scalar biases associated with the i and j words respectively. After constructing the embedding matrix, each word was represented as a word-embedding vector of k-dimensions, we then feed it to our model of sentiment classification based on Convolutional neural networks. The first layer is called the convolution layer which employs functions as follow: Let xi ∈ Rk be the k-dimensional word vector corresponding to the i-th word in the sentence, n represent the number of words in a sentence that will be represented as: X1:n = x1 + x2 + ... + xn

(5)

And + represent the concatenation operator, then the convolution operator is applied with the filter w ∈ Rh∗k to a window of h words generates the feature cj: cj = f (w.xi:i+h−1 + b)

(6)

b is a bias and f is a non-linear function. After iterating on all the all words in the sentence we obtain the feature map c: (7) c = [c1 ; c2 ; c3 ; ..., cn−h+1 ] The next layer of the model is called the max-pooling layer, this step identifies the feature corresponding to the filter window by choosing the maximum value

A New Sentiment Analysis System of Climate Change for Smart City

23

cˆ = max{c} of the feature map. The role of this layer is to capture just the important feature with the max value for each feature map. Last part of the model is the fully connected layer this layer is a simple neuronal network which represent the classifier. The output of the pooling layer is flattened and passed to this layer, each neuron with an activation function, in our model we have chosen the sigmoid function (3), each neuron of this layer applies the function as follow: f = (wi .x + bi )

(8)

f is the sigmoid function, x is the word vector, wi and bi are the weight and a bias for each neuron i, and learned through continuous training. The last layer uses the Softmax function instead of the previous one to predict a probability distribution of sentiment class for the input tweet, formally the Softmax function is given by: e(xi ) , i = 1, 2, ..., K Pi (x) = K (xj ) j=1 e

(9)

K is the number of classes, Pi (x) is the probability that the tweet x contains the sentiment class i. For optimization the loss between predicted and the real probabilities, the cross-entropy error was used as loss function. The details of the CNN layers are given in Table 2, for the conv layer the output shape means the number of convolution filters, the number of parameters means the weights of matrices or neurons that are learnt during back-propagation process. The activation function used in Conv and Pooling layer is the RELU (Rectification Linear Unit [16]), while dense1 layer uses Sigmoid function, and the last layer applies the Softmax function. Table 2. CNN architecture.

4

Layer

Output shape Number of parameters

Conv

(96, 128)

64128

Pooling 128

0

Dense1 128

16512

Dense2 2

258

Result and Discussion

The tweets used for this study contains 2838 denied that climate change is happening, and 2456 are believers of climate change. To create the model for predicting sentiments of tweets, we divide the data of tweets into training and validation datasets with a 80% for the training set and 20% for the testing set. Results

24

M. Lydiri et al.

illustrated in Table 2 are presented in terms of precision, recall and F1-measure. Those metrics are calculated as following: P recision =

TP TP + FP

(10)

TP (11) TP + FN P recision ∗ Recall F1 = 2 ∗ (12) P recision + Recall For comparing the results, we used the Accuracy which is one of the most common performance metrics, it is the ratio of correct predictions to the total number of predictions made, this metric is calculated as: Recall =

Accuracy =

TP + TN TP + TN + FP + FN

(13)

T P is the number of true positives (tweets contains positive sentiment correctly classified as positive). T N is the number of true negatives (tweets contains negative sentiment correctly classified as negative). F P is the number of false positives (tweets contains negative sentiment incorrectly classified as positive). F N is the number of false positives (tweets contains positive sentiment incorrectly classified as negative).

Table 3. Results of CNN model in terms of Precision, Recall and F1-measure. Precision Recall F1-measure Believer 0,91 Denier

0,96

0,94

0,97

0,92

0,94

Average 0,94

0,94

0,94

We applied the trained CNN model to climate change tweets collected and analyzed the public discussion on climate change. The results illustrated in Table 3 is showing that the trained CNN model correctly classifies 91% of Believer tweets and 97% of denier ones, thus the model achieved the highest accuracy of 94% after 100 epochs of training. As the confusion matrix shows (Fig. 2), there are: 465 true positive, 19 false positive, 540 true negative and 44 false negative, therefore the trained model misclassifies 19 of believer tweets and 44 denier tweets, most likely this is due to spelling and sarcasm used by Twitter most users witch is very hard to detect by the classification algorithms.

A New Sentiment Analysis System of Climate Change for Smart City

25

Fig. 2. Confusion matrix.

To prove the efficiency of the CNN algorithm, we did compare it with other classification algorithms, and we obtained the results illustrated in Fig. 3 and 4, results in term of the accuracy are respectively: NB (66%), Logistic Regression (68%), SVM (80%) and CNN (95%), and the results in term of the F-Score are: NB (63%), Logistic Regression (64%), SVM (73%) and CNN (94%), as we can deduce from these that the CNN model reaches the best performances in terms of accuracy and F-score values, thus it is the perfect one to build a sentiment classifiers.

Fig. 3. Comparison with supervised Machine Learning algorithms in terms of Accuracy.

26

M. Lydiri et al.

Fig. 4. Comparison with supervised Machine Learning algorithms in terms of F-Score.

The proposed model for sentiment analysis based on convolutional neural networks outperform the classical machine learning algorithms SVM, Na¨ıve Bayes and Logistic Regression witch are also used for natural language processing tasks especially sentiment analysis domain [21–23]. We have chosen the convolutional neural networks classifier which is proven to perform well on text classification tasks, and by the result obtained in this paper we affirms that sentiment analysis is a key factor in the development of smart city domains. Also, for improving the development of smart city technologies more academic efforts is required, governments should support and highlights this research field for better leading urban centers and cities to be smart in the future also individual citizens should work together to create develop and create information systems that could help the development of theirs cities. We believe that this work could be a powerful citizen sensing tool which can be used by governments to better control their cities, and more understanding of their citizens concerns and complains, thus we believe that the proposed architecture for sentiment analysis as citizen sensing tool make an important contribution to devolving smart cities.

5

Conclusion

To create a new efficient model for detecting believers and deniers of climate change topic in a smart city, we collected a dataset of tweets related to this topic. We developed a new CNN model. The efficiency and the performance of our model are confirmed by the presented results. This approach on binary classification of sentiment analysis (Positive and Negative sentiments) attained in this work, can be extended to perform multi

A New Sentiment Analysis System of Climate Change for Smart City

27

class classification and injecting of other sentiment classes. Furthermore, identifying sarcasm is extremely challenging for this algorithm to detect so inventing models that could detect it, would increase the effectiveness and importance of this work. In summary the study presented in this paper is about mining social media that offers a large amount of precious data for researchers, it allows them to make analysis of different subjects in several domains, beside Twitter more sites should be explored such as facebook. In the future, we are planning to collect and analyze more labeled data to build a more accurate model, and explore more classification algorithms such as unsupervised or semi-supervised learning.

References 1. Zhang, X.Q.: The trends, promises and challenges of urbanisation in the world. Habitat Int. 54, 241–252 (2016) 2. Yigitcanlar, T., Kankanamge, N., Vella, K.: How are smart city concepts and technologies perceived and utilized? A systematic geo-twitter analysis of smart cities in Australia. J. Urban Technol. 1–20 (2020) 3. Dahal, B., Kumar, S.A.P., Li, Z.: Topic modeling and sentiment analysis of global climate change tweets. Soc. Netw. Anal. Min. 9(1), 24 (2019) 4. Jost, F., Dale, A., Schwebel, S.: How positive is “change” in climate change? A sentiment analysis. Environ. Sci. Policy 96, 27–36 (2019) 5. Alotaibi, S., Mehmood, R., Katib, I.: Sentiment analysis of Arabic tweets in smart cities: a review of Saudi Dialect. In: 2019 Fourth International Conference on Fog and Mobile Edge Computing (FMEC), pp. 330–335. IEEE (2019) 6. Al Nuaimi, A., Al Shamsi, A., Al Shamsi, A., et al.: Social Media Analytics for Sentiment Analysis and Event Detection in Smart Cities (2018) 7. Li, M., Ch’ng, E., Chong, A., et al.: The new eye of smart city: novel citizen sentiment analysis in twitter. In: 2016 International Conference on Audio, Language and Image Processing (ICALIP), pp. 557–562. IEEE (2016) 8. Musto, C., Semeraro, G., De Gemmis, M., et al.: Developing smart cities services through semantic analysis of social streams. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1401–1406 (2015) 9. Bons´ on, E., Perea, D., Bedn´ arov´ a, M.: Twitter as a tool for citizen engagement: an empirical study of the andalusian municipalities. Gov. Inf. Q. 36(3), 480–489 (2019) 10. Est´evez-Ortiz, F.-J., Garc´ıa-Jim´enez, A., Gl¨ osek¨ otter, P.: An application of people’s sentiment from social media to smart cities. El profesional de la informaci´ on, 25(6), 851–858 (2016) 11. Ahmed, K.B., Bouhorma, M., Ahmed, M.B.: Visual sentiment prediction with transfer learning and big data analytics for smart cities. In: 4th IEEE International Colloquium on Information Science and Technology (CiSt), pp. 800–805. IEEE (2016) 12. Kywe, S.M., Hoang, T.-A., Lim, E.-P., et al.: On recommending hashtags in twitter networks. In: International Conference on Social Informatics, pp. 337–350. Springer, Heidelberg (2012)

28

M. Lydiri et al.

13. Singh, R., Goel, V.: Various machine learning algorithms for twitter sentiment analysis. In: Information and Communication Technology for Competitive Strategies, pp. 763–772. Springer, Singapore (2019) 14. Saad, S.E., Yang, J.: Twitter sentiment analysis based on ordinal regression. IEEE Access 7, 163677–163685 (2019) 15. Shah, K., Patel, H., Sanghvi, D., et al.: A comparative analysis of logistic regression, random Forest and KNN models for the text classification. Augmented Hum. Res. 5(1), 1–16 (2020) 16. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016) 17. Goldberg, Y., Levy, O.: word2vec explained: deriving Mikolov et al.’s negativesampling word-embedding method. arXiv preprint arXiv:1402.3722 (2014) 18. Zhao, R., Mao, K.: Fuzzy bag-of-words model for document representation. IEEE Trans. Fuzzy Syst. 26(2), 794–804 (2017) 19. Al-Saqqa, S., Awajan, A.: The use of word2vec model in sentiment analysis: a survey. In : Proceedings of the 2019 International Conference on Artificial Intelligence, Robotics and Control, pp. 39–43 (2019) 20. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014) 21. An, X., Ganguly, A.R., Fang, Y., et al.: Tracking climate change opinions from twitter data. In: Workshop on Data Science for Social Good, pp. 1–6 (2014) 22. Kulcu, S., Dogdu, E.: A scalable approach for sentiment analysis of Turkish tweets and linking tweets to news. In: 2016 IEEE Tenth International Conference on Semantic Computing (ICSC), pp. 471–476. IEEE (2016) 23. Ramadhan, W.P., Novianty, S.A., Setianingsih, S.C.: Sentiment analysis using multinomial logistic regression. In: 2017 International Conference on Control, Electronics, Renewable Energy and Communications (ICCREC), pp. 46–49. IEEE (2017)

A Novel Approach of Community Detection Using Association Rules Learning: Application to User’s Friendships of Online Social Networks Mohamed El-Moussaoui1(B) , Mohamed Hanine1 , Ali Kartit1 , and Tarik Agouti2 1

LTI Laboratory, Chouaib Doukkali University of El Jadida, El Jadida, Morocco [email protected] 2 ISI Laboratory, Cadi Ayyad University of Marrakesh, Marrakesh, Morocco

Abstract. Both Social Network Analysis (SNA) and Association Rules Learning (ARL) enriched our daily-lives through various applications, by improving axial roles in several domains. In particular, the community detection in online social networks (OSN) has interested researchers, for its valuable contribution in understanding systems complexity, as either for academic, commercial or further purposes. The aim of this paper is the identification of communities in OSN using knowledge extraction based on association rules methods. Furthermore, we propose a new approach, namely ARL Clustering, using association rules learning for SNA. Particularly, we base our detection on user’s friendships of OSN by processing a four level technique to extract meaningful rules, converted later to communities. The conducted experimentation was applied on two synthetic real-world networks, and improved important results in identifying potential communities in comparison with existing approaches. Keywords: Social network analysis Clustering · Community detection

1

· Association rules learning ·

Introduction

Discovering communities in online social networks (OSN) is the concept of clustering networks into subgroups, based on a number of shared similarities (e.g. Topological similarities: centrality measurements, modularity, ..., etc. Topical similarities: users background, political convergence, emotional reactions, ..., etc.) [1,2]. Thus, it is important as scientific communities in social network analysis (SNA) to highlight characteristics of the network structures to display novel structures of complex systems [3]. In fact, community detection constitutes an important topic of SNA, taking advantages of mathematical and social science to understand the dynamic, the structures and the information flows of complex networks, such as OSN [4–6]. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021  M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 29–43, 2021. https://doi.org/10.1007/978-3-030-66840-2_3

30

M. El-Moussaoui et al.

Although, various approaches were proposed for discovering communities as surveyed in [1]. Particularly, since the data can be modeled as graphs, OSN platforms participate mainly in uniforming human behaviors. Interestingly, Girvan et Newman [7] introduced their graph clustering with iterations of a computation of the betweenness centrality scores, highlighting edges with similar scores, by removing edges with low scores. The process of clustering is repeated through several iterations until an isolation of all clusters. Where Zhu et al. [26] have proposed a new algorithm LPA Label Propagation Algorithm, a semi-supervised learning based approach for community detection. In addition, Blondel et al. [8] have proposed the Louvain algorithm, based on the modularity optimization methods. Where Pons et al. [9] proposed the walktrap algorithm for identifying community structures in graphs, based on hierarchical clustering method. Rosvall et al. [10] have proposed the famous infomap algorithm based on the random walk method. Various comparison studies were proposed for better understanding of differences, performance and specificity of each proposal [2,11,12]. Although, few of these approaches focus on topical measures to deal with the community detection problem. Indeed, this paper focuses on community detection problems based on user’s interactions using association rules techniques. We are interested in identifying potential communities in online social networks, based on user’s interactions. We argue the proposed approach based on association rules Learning where the considered transactions are represented by the relationships in a given dataset of an online social network. We assume that interactions and relationships between users of online social networks, dissimulates common interests between those users. Indeed, those users could belong to one or more communities. The proposed demonstration is based on the extracted rules, obtained by application of apriori algorithm on the preprocessed datasets, in order to demonstrate the potential of forming communities through user’s interactions. We selected two real-world social networks, widely used for the purpose of community detection Zachary Karate Club [13] and Krackhardt Kite social network [14]. The outline of this paper is organized as follows: in Sect. 2, related work is given for related literatures; in Sect. 3, we provide preliminary definitions for understanding of different terms related to the proposed approach; in Sect. 4, we describe the used methodology and the detailed experimentation; And finally, in the last section, we discuss and conclude on results between the proposed approach and different clustering algorithms.

2

Related Work

A considerable number of approaches and contributions has been proposed during the last few years for discovering communities in online social networks [1,3,6,7,9,10,26]. The observed diversity of approaches builds opportunities of application, and enriches existing methods and algorithms for practitioners and researchers from various fields. Community detection approaches (known as Clustering) were behind the exploration of social networks properties and networks structures. Therefore, understanding individual behaviors is enhanced due

A Novel Approach of CD Using ARL

31

to network structures by discovering common properties shared between users of social networks. Initially, the widely used approach in community detection has been proposed by Girvan and Newman [25], where they introduced the famous divisive method Modularity. Additionally, the agglomerative methods as mentioned in Newman et al. 2006 [15] gained attention by introducing a new approach of partitioning, starting from such a number of nodes as initial communities to end by merging similar nodes into communities with common properties. The random walk principle has been demonstrated by George et al. [16], using the notation that shorts walks. The Greedy algorithm, initially introduced in Clauset et al. [17], has implemented the optimization modularity concept, which merges between communities when modularity is increased. Blondel et al. 2008 [8] proposed an agglomerative approach named Louvain method, where the process iterations decide on each node of the graph, to be part of a community once no change is observed on the modularity computation. A new graph is then generated after various iterations. Although, new approaches gained place into the race to novel community detection approaches as parallel and distributed methodologies. Those approaches are dealing especially with large scale networks [18,19]. Other approaches [20,21], mainly accommodated for complex networks, made their focus on identifying overlapping and non-overlapping communities. The application of association rules in the ONS topic is no longer popular than other techniques. In fact, Nancy et al. [22] have used ARL to study the influence of the Facebook user’s gender in studying a course in a university. Where Schmitz et al. [23] described an experimentation of knowledge extraction from association rules, and analyzed the folksonomy structure, through applying association rules for ontology learning purposes. Agrawal et al. [24], introduced the association rules to study regularities of relational datasets, it was performed on points of sales (POS) of the supermarkets’ transactional datasets, for rearranging and promotional products purposes. An interesting work has been performed by Fan et al. [27], where the principle of association rules is implemented with graph patterns named (GPARs) which aims to identify granularities between networks’ nodes. In fact, association rules can be a valuable technique for community detection in OSN, with focus on user’s interactions. We focus on this paper on user’s friendships as a transactional datasets, in order to discover granularities between users based on user’s interactions, more specifically the friendship interaction.

3

Preliminary Definitions

In this section, we provide for better understanding all involved definitions and preliminaries, necessary for the demonstration of the proposed approach. We define in this section three important vocabulary: online social networks, association rules and evaluation metrics.

32

3.1

M. El-Moussaoui et al.

Online Social Networks

Let’s consider an online social network represented by G = (V, E), where V is the set of nodes and E the set of edges. The nodes of the graph G represent users, and the edges represent the set of user’s relationships and interactions. The mathematical representation of the graph G is as follows: G = (V, E), with V = {u1 , u2 , ..., un } where ui represents the user i, and n is the total number of users in G. E = {e1 , e2 , ..., em } where ei is the interaction or relationship i between two users of the online social network, and m is the total number of interactions or relationships in the G. In this paper, we assumed that an edge ei represents a relationship or interaction identified as i and happened between two users. Hence, users of the graph G could belong to different communities or subgroups. A community ci is a set of users sharing common interests between them. Let’s consider the set of communities C = {c1 , c2 , ..., cp } the set of communities in the graph G, where ci is a subset of users, and p the total number of communities in G. 3.2

Association Rules

The association rule [24] represents a machine learning method, which helps to identify how an item could affect other items in a voluminous dataset, through analysis of items appearance frequency in relationships between items. Four evaluation metrics are used to discover the relationship between items of a large dataset, namely, SU P P ORT , CON F IDEN CE and LIF T . The application of apriori algorithm, provides efficient analysis performances in extracting association rules from transactional datasets. The frequency of an Item in the dataset is measured via the Support metric, while the Conf idence metric illustrates how the occurrence of the same Item is correct in the whole dataset, in other words “the strength of implication in the rule”. The third metric (Lif t) represents the interestingness measure, which clearly illustrates the expected substitution effect between two Items. Let consider I as a set of Items I = {i1 , i2 , ..., in }, and T as a set of transactions T = {t1 , t2 , ..., tm }, where tm is a subset of items in I. 3.3

Evaluation Metrics

Let’s consider a transaction {A → B}, mining the implication of item B when item A is appearing. The support formula [24], can be expressed as follow in Eq. (1): {A} (1) SU P P ORT ({A}) = |T | The confidence formula [24], can is expressed as in Eq. (2): CON F IDEN CE({A → B}) =

SU P P ORT ({A → B}) SU P P ORT ({A})

(2)

A Novel Approach of CD Using ARL

33

The lift measure is the metric that describes the importance of a rule. If less than a value of 1, the lift formula [24], is then expressed as in Eq. (3): LIF T ({A → B}) =

SU P P ORT ({A → B}) SU P P ORT ({A}) × SU P P ORT ({B})

(3)

Additionally to the above measures, there is a fourth formula which describes the conviction. The conviction formula [24], can be expressed as in Eq. (4): CON V ICT ION ({A → B}) =

4

1 − SU P P ORT ({A → B}) 1 − CON F IDEN CE({A → B})

(4)

Methodology

In this paper, we address the identification of communities in online social networks (OSN) by focusing on Social Network Analysis and Association Rules Mining contributions, in exploiting user’s friendships rather than topological measures. 4.1

General Scheme

In this section, we describe a new approach, which aims to detect communities in an experimental online social network, taking into consideration the role of association rules in discovering knowledge. Furthermore, we address the problem of identifying communities by considering the user’s interactions. Both the friendship and user’s actions can be seen as links between involved users, where users are represented as nodes. As described in the following figure Fig. 1, four steps are processed to achieve this goal:

Fig. 1. General scheme of the proposed approach.

Graph Pre-processing. This step is a global preparation step, where we first analyze the aim of the social network (e.g., community blogs network, images or video sharing social network, professional social network, ..., etc.), to understand the type of the social network and the possible interactions between actors.

34

M. El-Moussaoui et al.

Then, we evaluate the social network to understand the valuable information hidden in both nodes and edges (e.g., characteristics and properties). Furthermore, we focus on the dataset as a flat file (e.g., csv, xls, json, xml, ..., etc.). At this stage, The need is to understand the meaning of the dataset representation, by identifying attributes of the dataset objects, discovering their limits, categories and types. Finally, we run the cleaning process which aims to maintain cohesive values by eliminating nulls and NA values in the dataset. As an example, the chosen datasets for the proposed approach are described as social networks, where nodes represent users, interconnected via relationships between them. The below figure Fig. 2 visualizes (a) Zachary Karate Club social network and (b) Krackhardt Kite social network, where we notice the undirected and unweighted aspects of the graphs.

(a) Zachary karate Club social network visualization: A Social network of friendships between 34 members of a karate club at a US university in the 1970s.

(b) Krackhardt kite social network visualization: A social network with 10 vertices and 18 edges, introduced by David Krackhardt in 1990 to distinguish different concepts of centrality.

Fig. 2. Synthetic social networks (a) and (b), used for the experimentation purposes.

Dataset Transformation. The aim of this step is to provide a transactional dataset where association rules techniques can be applied. In this step, we start with a cleaned dataset for which a syntactical level of preparation was already processed on the dataset. The dataset transformation is insured through two steps: First, a semantic treatment is processed on the cleaned dataset in order to convert numeric values to categorical values, (e.g., Column age of nodes can be converted to categorical values to simplify perception of the age as Teenager for values from 13 to 17 years old, Adult for values from 18 to 40). Categorizing values of a dataset gives quick understanding of data, in particular when we manipulate numerous data as the

A Novel Approach of CD Using ARL

35

case of social networks. Second, a transformation of the network interactions into a transactional dataset, with a transposition of all interactions between nodes as transactions of different users. The following tables Table 1 and Table 2, describe the generated transactional datasets respectively related to Krackhardt Kite social network and Zachary Karate Club social network, where each transaction is represented with all interactions between a given user with others. In this case, interactions represent the friendship between users. Table 1. Transactional representation (Trx) of the interactions in Krackhardt Kite social network. Transactions (Trx) Items (Users) U1 U2 U3 U4 U5 ... U10

T1 T2 T3 T4 T5 ... T10

U2 U1 U1 U2 U2

U3 U4 U4 U3 U4

U4 U5 U7 U6 U5 U6 U7 U7

U9

Table 2. Transactional representation (Trx) of interactions in Zachary Karate Club social network. Trx

Items (Users)

T1

U1

U2

U3

U4

U5

U6

T2

U2

U3

U4

U8

U14

U18 U20

U22 U31

T3

U3

U4

U8

U28 U29

U33 U10

U9

T4

U4

U8

U13

U14

T5

U5

U7

U11

...

...

T25

U32

U33 U34

T26

U33

U34

U7

U8

U9

U11 U12

U13 U14

U18 U20

U22 U32

U14

Association Rules Mining. The aim of this step is to extract association rules (ARL) from the transactional dataset, generated in the previous step. In this section, we first specify the thresholds calculations of the Minimum SUPPORT and the Minimum CONFIDENCE by applying respectively the Eqs. (5) and (6). The goal is to ensure the preliminary setup for apriori algorithm. Otherwise, the rules resulted through apriori will not represent a potential of interesting rules. SU P P ORT ({A → B}) ≥ σ

(5)

36

M. El-Moussaoui et al.

CON F IDEN CE({A → B}) ≥ δ

(6)

The generation of association rules is made based on the following values for minimum support of 0.1% and minimum confidence of 50%. Additionally, we add the minimum and a maximum lengths to extract respectively rules with at least 2 items and at most 5 items. The above assumption aims to minimize the number of rules that will be taken into consideration. Therefore, the extracted rules will be in the following form: {A → B}, where {A} and {B} are sets of the items (groups or communities) meaning if a node in {A} happens (interacts with others or for a topic), then a node in {B} will have higher probability to happen (interacts with others or for a topic). In a second phase and once the rules are identified, we remove redundancy and filter the resulting rules. Then, we generate the dataset of rules to be processed as input of the next step. The statistic summary of generated rules is provided below in Table 3, for both social networks: Table 3. Transactional representation (Trx) of interactions in Zachary Karate Club social network. Datasets

Rules Min.

1st Qu. Median Mean 3rd Qu. Max.

Supp. Conf.

Zachary Karate Club 878

2.000 3.000

4.000

3.891 5.000

5.000 0.05

0.5

518

2.000 3.000

4.000

3.869 5.000

5.000 0.01

0.5

Krackhardt Kite

Therefore, the extraction of the initial generated rules, related to the above configuration, are subjected to the following filtration to extract the interesting rules: – Elimination of redundancy to remove repeatable rules. – Selection of significant rules by removing non significant rules. – Filter by Lift, which describes the importance of the rule. For example: if the lift value is more than 1, the rule is more likely to occur, and if the value is less than 1, the rule is less likely to occur. The obtained result of the above treatment is summarized in the next Table 4, where the number of rules is reduced for Zachary Karate Club rules from 878 to 107, and for Krackhardt Kite rules from 518 to 54. The statistics summary for both datasets is as follow: Table 4. Statistics summary of the final rules datasets. Datasets

Rules Min.

1st Qu. Median Mean 3rd Qu. Max.

Supp. Conf.

Zachary Karate Club 107

2.000 2.000

2.000

2.224 2.000

2.000 0.05

0.5

54

2.000 2.000

3.000

2.519 3.000

3.000 0.01

0.5

Krackhardt Kite

A Novel Approach of CD Using ARL

37

Community Detection. The aim of this step is to extract possible items based on the generated rules. Those items represent nodes or users of the social network, sharing friendships or interactions between them. In fact, those users share a maximum of similarities represented as a friendship relationship. The proposed ARL Clustering approach aims to detect the possible communities, through user’s interactions. The main steps of approach is described as follow: ARL Clustering method’s Algorithm Initialization: SETS R, R’ # Sets of rules C, C’ # Set of communities, where p is the size of DISTINCT values of R’[RHS] List c # List of items Functions : SORT (R) # Sort the set of rules R by RHS. UNIQUE( LIST() ) # provides the input parameter with distinct items. Main steps : # sort by rhs in list (LHS, RHS) R’ = SORT (R) # Forming clusters FOR i IN DISTINCT(R’[RHS]) DO ci = i ci = append (ci , R’(i)[LHS]) UNIQUE (ci ) SORT ITEMS (ci ) C = append (C, ci ) END FOR

# Store the first item # Append the list with the item R’(i)[LHS] # Eliminate duplicate items inside ci # Sort items of ci

C’ = SORT (C) # Clean duplicated clusters UNIQUE (C’)

The obtained results of the novel approach application (ARL Clustering) are presented in the following in Fig. 3: 4.2

Comparison with Synthetic Real-World Datasets

In this section, we describe in detail the chosen datasets, in comparison with the novel approach for community detection, based on association rules contribution. We used two synthetic social networks, the Zachary Karate Club social network and the Krackhardt kite social network. Both of them are available in the R library (igraph).

38

M. El-Moussaoui et al.

(a) Visualization of the 16 communities detected with ARL Clustering Method.

(b) Visualization of the 7 communities detected with ARL Clustering Method.

Fig. 3. Visualization of the detected communities in both synthetic real-world social networks, for the computed results with ARL Clustering method.

Computed Results for Existing Algorithms. In this section, we compute seven various algorithms on two synthetic social networks. For a comparison purpose, we selected various algorithms, seven of the top famous clustering methods in community detection, e.g. Fast Greedy (a); Walktrap (b); SpinGlass (c); Leideing Eigen (d); Label Propagation (e); Infomap (f) and Optimal (g), respectively applied to two synthetic social networks, Zachary karate club [13] and Krackhardt Kite social network [14]. We focus only on two metrics which are respectively the number of identified communities in the network, and the modularity Q [25]. The quality of communities is not taken into consideration in this paper, it will be the subject of future work. The above visualizations represent the summarized results of the computation of all selected clustering algorithms for both synthetic social networks. The visualizations have been conducted on R Studio editor (version 3.3.2 - 2016). All selected clustering algorithms are available publicly in R packages as a standard library called igraph, the selected algorithms have been computed with respect to available functions in the igraph library. Comparisons of the Proposed Approach. In this section, we compare the obtained results for the ARL clustering method with other clustering methods. As described in the above figures Fig. 4 and Fig. 5, the number of identified communities differs for each algorithm. The comparison summary of the obtained results in terms of modularity and the number of detected communities is as below in Table 5: We are interested in this comparison to the number of detected communities for each algorithm. For instance, in Zachary Karate Club, four communities were detected respectively by application of algorithms (c), (d) and (g). Where algorithms (a), (e) and (f) identify three communities, finally walktrap algorithm

A Novel Approach of CD Using ARL

39

Fig. 4. Visualization of detected communities in Zachary Karate Club social network, for the computed results with seven selected algorithms (a), (b), (c), (d), (e), (f) and (g).

(b) identifies five communities. Indeed, ARL Clustering method detects sixteen possible communities according to user’s friendships of the same Zachary Karate Club social network. Meanwhile, in Krackhardt Kite social network, three communities is the number of detected communities for the algorithms (a), (b), (c), (d) and (g).

40

M. El-Moussaoui et al.

Fig. 5. Visualization of detected communities in Krackhardt Kite social network, for the computed results with seven selected algorithms (a), (b), (c), (d), (e), (f) and (g).

A Novel Approach of CD Using ARL

41

Table 5. Comparison of the applied community detection algorithms. CD Algorithms

Zachary Karate Club Krackhardt Kite Modularity(Q) CN Modularity(Q) CN

Fast Greedy (a)

0.38

3

0.22

3

Walktrap (b)

0.35

5

0.12

3

Spinglass (c)

0.42

4

0.22

3

Leiding Eigen (d)

0.40

4

0.16

3

Label Propagation (e) 0.37

3

0.10

2

Infomap (f)

3

0.10

2

0.40

Optimal (g)

0.42

4

0.22

3

ARL Clustering

NA

16

NA

7

Where two algorithms (e) and (f) detect only two communities. As in the first synthetic social network, ARL clustering method detects more than other algorithms, where it identifies seven possible communities according to user’s friendships of the Krackhardt Kite social network. In fact, the proposed approach allows the identification of a maximum number of communities through user’s friendships as mentioned in the above Table 5, and visualized in figure Fig. 3.

5

Discussion and Conclusion

In this section, we discuss and conclude on the obtained results, taking into account the comparison with various clustering algorithms, performed on synthetic real-world social networks. Analyzing a huge amount of data of real-world systems, in particular ONS, is in fact a doable task based on advances in data mining and SNA. We based our approach principle on performances from methods and techniques, in particular apriori algorithm, initially made for big data processing. Indeed, the choice of apriori has been argued with the gain of analysis performances. Therefore, as part of the four level technique of the proposed approach, the extracted data from user’s interactions is then processed with apriori algorithm to produce the data source of the ARL Clustering approach. We used R language for the needed development tasks related to the association rules extraction, graphs implementation and the visualization purposes. The proposed four steps technique begins with a graph pre-processing step, where needed initialization is performed to create a cleaned and filtered transactional datasets; Then, a transformation process is made to prepare the next step, where the application of apriori algorithm is performed in a third step. A filtration items, and lists of items in the transactional datasets. The final step is the community mining, where we extract all possible communities with all possible combinations.

42

M. El-Moussaoui et al.

The experimentation has been performed for the ARL clustering approach in parallel with the selected clustering algorithms on the same synthetic realworld social networks, in order to compare the obtained results. In fact, ARL Clustering method displays more communities than other techniques, because of the obtained result is oriented user interaction, Therefore, users in social interactions constitute a part of detected communities, where inactive users are not. In comparison with other algorithms, the proposed ARL Clustering method discovers all possible communities in a given OSN, where other methods identify effective communities subjected to topological networks measurements. The computation of the proposed approach takes advantages of apriori performances. For instance, both online social networks used for the demonstration in this work represent synthetic networks with few number of nodes and edges, which is not impacting the global performance of the computation time. Meanwhile, in case of dense networks with billions of nodes and edges, the time computation increases significantly, depending on the number of friendship. As part of the perspectives of this work, an implementation process of the proposed approach will be applied for bigger online social networks, taking into consideration a high level of friendship activity to further improve this proposed method. Furthermore, an extend of the proposed method will be considered to solve other problems related to community detection in complex networks, in order to take into account the overlapping and non-overlapping characteristic and the dynamic aspect of complex networks.

References 1. Fortunato, S.: Community detection in graphs. Phys. Rep. 486, 75–174 (2010) 2. Ding, Y.: Community detection: topological vs. topical. J. Informetr. 5(4), 498–514 (2011) 3. Newman, M.E.J., Girvan, M.: Mixing patterns and community structure in networks. Lecture Notes in Physics, pp. 66–87 (2003) 4. Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Phys. Rev. E 78(4) (2008) 5. Xiang, J.: Comparing local modularity optimization for detecting communities in networks. Int. J. Mod. Phys. C 28(06) (2017) 6. Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural: attribute similarities. PVLDB 2(1), 718–729 (2009) 7. Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. 99(12), 7821–7826 (2002). http://www.pnas.org/cgi/ doi/10.1073/pnas/122653799 8. Blondel, V.D.: Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 10 (2008) 9. Pons, P., Latapy, M.: Computing communities in large networks using random walks. Lecture Notes in Computer Science, pp. 284–293 (2005) 10. Rosvall, M., Bergstrom, C.T.: Maps of random walks on complex networks reveal community structure. Proc. Natl. Acad. Scie. 105(4), 1118–1123 (2008). http:// www.pnas.org/cgi/content/full/0706851105/DC1

A Novel Approach of CD Using ARL

43

11. Lancichinetti, A., Fortunato, S.: Community detection algorithms: a comparative analysis. Phys. Rev. 80 (2009) 12. El-Moussaoui, M., Agouti, T.: A comprehensive literature review on community detection: approaches and applications. Procedia Comput. Sci. 151, 295–302 (2019) 13. Zachary, W.W.: An information flow model for conflict and fission in small groups. J. Anthropol. Res. 33(4), 452–73 (1977) 14. Krackhardt, D.: Assessing the political landscape: structure, cognition, and power in organizations. Adm. Sci. Q. 35, 342–369 (1990) 15. Newman, M.E.J.: Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E 74(3) (2006) 16. George, R., Shujaee, K., Kerwat, M., Felfli, Z., Gelenbe, D., Ukuwu, K.: A comparative evaluation of community detection algorithms in social networks. Procedia Comput. Sci. 171, 1157–1165 (2020) 17. Clauset, A., Newman, M.E.J., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70(6) (2004) 18. Khomami, M.M.D.: Distributed learning automata-based algorithm for community detection in complex networks. Int. J. Mod. Phys. B 30(08) (2016) 19. Staudt, C.L., Meyerhenke, H.: Engineering parallel algorithms for community detection in massive networks. IEEE Trans. Parallel Distrib. Syst. 27(1), 171–184 (2016) 20. Chakraborty, T., Kumar, S., Ganguly, N., Mukherjee, A., Bhowmick, S.: GenPerm: a unified method for detecting non-overlapping and overlapping communities. IEEE Trans. Knowl. Data Eng. 28(8), 2101–2114 (2016) 21. Hajiabadi, M., Zare, H., Bobarshad, H.: IEDC: an integrated approach for overlapping and non-overlapping community detection. Knowl.-Based Syst. 123, 188–199 (2017) 22. Nancy, P., Geetha Ramani, R., Jacob, S.G.: Mining of association patterns in social network data (Face Book 100 Universities) through data mining techniques and methods. In: Advances in Intelligent Systems and Computing, pp. 107–117 (2013) 23. Schmitz, C., Hotho, A., J¨ aschke, R., Stumme, G.: Mining association rules in folksonomies. In: Data Science and Classification, pp. 261–270. Springer, Heidelberg (2006) 24. Agrawal, R., Imielienski, T.: Mining association rules between sets of items in large databases. In: Proceedings of the Conference on Management of Data, Washington (1993) 25. Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2) (2004) 26. Zhu, X., Ghahramani, Z.: Learning from labeled and unlabeled data with label propagation. School of Computer Science, CMU-CALD-02, Pittsburgh (2002) 27. Fan, W., Wang, X., Wu, Y., Xu, J.: Association rules with graph patterns. VLDB 1502–1513 (2015)

Arabic Sentiment Analysis Based on 1-D Convolutional Neural Network Bensalah Nouhaila1(B) , Ayad Habib1 , Adib Abdellah1 , and Ibn El Farouk Abdelhamid2 1

Team Networks, Telecoms and Multimedia, University of Hassan II Casablanca, 20000 Casablanca, Morocco [email protected], [email protected], [email protected] 2 Teaching, Languages and Cultures Laboratory Mohammedia, Mohammedia, Morocco [email protected]

Abstract. In recent years, Deep Learning (DL) has garnered tremendous success in a variety of application domains such as sentiment analysis. In this paper, we study the impact of different preprocessing techniques on Sentiment Analysis (SA) in Arabic. Moreover, we describe the details of selecting good Arabic word embeddings using Word2Vec and FastText models. Furthermore, a DL architecture based on Convolutional Neural Networks (CNNs) is proposed. Experiments on a total of 63K book reviews show the suitability of our proposed system on Arabic SA. Keywords: Sentiment analysis · Arabic deep learning LSTM · Word2Vec · FastText · Penn Arabic Treebank

1

· CNN · · Farasa

Introduction

SA is a well-known task in the area of Natural Language Processing (NLP). It aims at analyzing people’s sentiments from user generated text in product reviews, social networks or blogs. The sentiments can consist of different classes: very negative (−−), somewhat negative (−), neutral (o), somewhat positive (+), or very positive (++). In this study, we consider two classes: 1) positive (+) or negative (−). Due to several challenges including diverse vocabulary and morphological structure of the Arabic language, many techniques and approaches were proposed to improve NLP in general and SA in particular such as Part of Speech (POS) taggers [19], stemming and lemmatizing the text [2]. However, there is still a need to tackle the complexity of NLP tasks in Arabic. Over the past decade, numerous Arabic SA techniques have been reported in the literature [4,14]. They are mainly categorized into Machine Learning (ML) and DL approaches. When the ML is considered, various approaches have c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021  M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 44–55, 2021. https://doi.org/10.1007/978-3-030-66840-2_4

SA of a Large Arabic Multi-domain Resources Using DL

45

been proposed including K-Nearest Neighbors (KNN), Support Vector Machines (SVM), and Naive Bayes [14]. However, a pre-processing step called features extraction is required in the majority of these works. In order to deal with the drawbacks of traditional machine learning algorithms, Many DL architectures have been introduced and are based on different models such as Recurrent Neural Networks (RNN) [13], CNN [21], or Recurrent Convolution Neural Network [16]. DL has been of a great benefit to an number of tasks such as Machine Translation (MT) [8,10] and Biomedical Engineering [12,23]. Hence in this work, we propose a novel accurate Arabic SA system based on the well-known CNN. Moreover, we train and generate Arabic word embeddings using Word2vec [22] including (Continuous Bag Of Words (CBOW) and Skip Gram (SG)) and FastText [11] models. Furthermore, we study the impact of different preprocessing techniques on Arabic SA, which includes the cleaning, the normalization and the tokenization of the Arabic sentences using the Penn Arabic Treebank (ATB). These latter have shown to have a significant impact on Statistical Machine Translation in Arabic [17]. The rest of the paper will be organized as follows: Sect. 2 describes the proposed approach for implementing a SA system in Arabic. Section 3 details the experimental setup and results. Finally, we conclude our paper and propose a scope for future work in Sect. 4.

2

The Proposed Model

Our model proceeds as follows. The input sentence is decomposed into words; each one is represented as a fixed-dimension vector. Thereby, we have built three embedding models; one based on Word2Vec models (including CBOW and SG models) and another one according to the FastText model. The embedded sentence is fed to the CNN model. The obtained vectors are used as inputs to a flatten layer that concatenates them into a long single features vector. Finally, we stack a dense layer with Sigmoid activation function indicating whether the input sentence (the review) is positive or negative; see Fig. 1. Hereafter, we will detail the different layers through which the whole process passes.

Convolution

Dropout

Convolution

Max Pooling

Dropout

Convolution

Max Pooling

Dense(64)

Dense(128)

Flatten

Dropout

Max Pooling

Convolution

Dropout

Dense (32)

Positive or negative

Ouput layer

Word embeddings n-columns=200

Fig. 1. The chart of the proposed method

46

2.1

B. Nouhaila et al.

Word Embedding

Words embedding, or mapping words that have the same meaning, has been of a great benefit to a number of NLP tasks such as Question Answering [9] and Text Classification [3]. It is the process of representing a word with a fixed-length vector of real values in such a way that it can be distinguishable from all the other words. In this paper, we aim to train and generate Arabic word embeddings using three different architectures. They consist of the most commonly used for word semantic distribution which are CBOW, SG and FastText. As shown in Fig. 2, the Word2Vec model has two architectures: The CBOW model which is used to predict a center word from its context words within the window length. The SG one which is the opposite of CBOW and it is used to predict the context words from a center word. Recently, an improved architecture for word embeddings called FastText has been proposed [11]. Its improvement lies in two aspects; one is the ability to generate vectors of new words. The other is the use of the internal structure of words, which enables the model to take into consideration the morphology and the lexical similarity of them. A comparison of these various word embedding models will be conducted in this study for Arabic SA.

Fig. 2. Word2Vec model, CBOW and SG models

2.2

Convolutional Neural Networks

In summary, each source sentence is presented as a matrix by concatenating the embedding vector sequence as columns. Then, four convolutional layers are applied on the resulting matrix. The size of the windows for the four convolutional layers are 4, 5, 6 and 7, respectively. Max pooling is added after each

SA of a Large Arabic Multi-domain Resources Using DL

47

convolution with a pooling length of 3. A dropout of 30% is added at the end of each of the convolutional layers. It is used to randomly drop units from the Neural Network during training and hence avoid overfitting. Finally, a nonlinear activation function is applied to introduce non-linearity into the output of the Convolutional and the Max pooling layers. In this work, we have adopted the Rectified Linear Unit (ReLU) as an activation function. 2.3

The Output Layer

The last block is composed of a classification layer using Sigmoid activation function. It is often used at the neural network’s final layer for binary classification to generate the probability distribution over sentiment labels and hence indicating whether the review is positive or negative.

3

Experimental Setup and Results

3.1

Arabic Preprocessing

Dataset Cleaning: In this paper, we aimed to investigate the performance of the proposed Arabic SA system with a Large Scale Arabic Book Reviews (LABR) dataset constructed by Aly and Atiya [7]. It is the largest sentiment corpus for Arabic text; it consists of over 63.000 books reviews. In order to keep only the relevant words, a very crucial step known as data cleaning needs to be used. It includes many steps such as: 1. Removing duplicated reviews and the tashkeel symbols. For example: becomes . 2. 3. 4. 5.

Removing elongation from a word such as Removing any special characters such as (#$%). Manually correcting words that have missing letters Removing links, hashtags and emojis.

becomes

.

Normalization and Tokenization: We normalize some letters such as and which are replaced with and , respectively. Finally, Farasa [1] is utilized for morphology-aware tokenization (ATB) which is used to split all clitics. Table 1 shows an example across cleaning, normalization and ATB scheme. 3.2

Building the Word Embedding Models

As introduced earlier, one of the main purposes of this work is to generate three different word embedding models, the first one is built using CBOW technique, the second one using the SG technique and the third using FastText one. Thereby, Gensim tool1 and Gensim’s native implementation of FastText2 were 1 2

https://radimrehurek.com/gensim/about.html. https://radimrehurek.com/gensim/models/fasttext.html.

48

B. Nouhaila et al.

Table 1. Cleaning, normalization and tokenization scheme applied to an example.

used to build Word2Vec and FastText models, respectively. Specifically, preprocessing is applied first to the input sequence (the review), then a list of the generated sentences, where each sentence is decomposed into a list of words, is used as input to the model (Word2Vec or FastText). Finally, a sliding window is applied on the text to generate the vector (representation) of each word in the corpus. In order to build the embedding representations, a series of experiments to select the best training hyperparameters are conducted in this study. The selected hyperparameters are illustrated in Table 2. Table 2. Word vector representations training parameters Model

−2

sg Negative Iterations

CBOW/SG 200

5

1 × 10

0

100

20

SG

8

1 × 10−2 1

500

20

500

20

FastText

3.3

Dimensionality Window size Sample 200 200

8

−2

4 × 10

1

Arabic Word Embeddings Evaluation

As Word2Vec and FastText models were built to be evaluated as a part of the Arabic Sentiment Analysis, the first way to test the models is by checking the and bad , since these words are commonly most similar words to nice used to express positive and negative sentiments. Tables 3 and 4 show eight

SA of a Large Arabic Multi-domain Resources Using DL

49

different results using these models with and without normalization + ATB technique. Notice that the similarity score between two words is based on the cosine of the angles between the two words’s vectors. Examining the results illustrated in Table 3, it can be seen that the three models can precisely capture the meaning of the word nice. However, the similar words to bad are the word itself with different Arabic spellings. The results in Table 4 are indeed interesting. The three models are suitable with the two words and the similar words to bad are various compared to those reported in Table 3 which confirms that the meaning of the query words is precisely captured. This is due to the normalization and the morphology-based tokenization (ATB) applied on the Arabic reviews. It also gives the models the ability to generate the similar words to bad not only in Modern Standard Arabic but also in Egypt’s Spoken Arabic Dialects; . for example:

Table 3. The results of the most related words to and FastText without normalization and ATB

and

using CBOW, SG

Table 4. The results of the most related words to and FastText after applying normalization and ATB

and

using CBOW, SG

50

B. Nouhaila et al.

3.4

Hyperparameters

The hyperparameters are the cornerstone of the training process. In this study, the proposed model parameters were optimized by Adamax [20]. The learning rate was initialized with a value of 0.000002 and with the begining of overfitting we start to multiply this value by 0.2 after each 4 epochs until it reaches 10−8 . In order to avoid overtraining, an early stopping technique based on the validation error was used. The main role of this technique is to stop the training process when the error starts to increase [15]. The details of the proposed model parameters are shown in Table 5. Table 5. The details of the proposed model parameters Hyperparameter

Values

Batch size

64 (This value seems to be more efficient based on a manual tuning)

Loss function

Binary crossentropy (In the case of binary classification, this loss function is the appropriate)

Epoch

250 (to converge faster to a better local optimum)

Dropout

0.3 (In order to avoid overfitting and speed the training process at the same time)

Validation size/Test size 0.2/0.3

3.5

Arabic SA Evaluation

Here, we present a series of experiments for Arabic SA analysis in order to evaluate the performance of the proposed approach. First of all, we compare the performance of the Arabic SA over the three used word representation techniques: CBOW, SG and FastText. Then, we analyze the impact of ATB technique on the Arabic SA quality. The performance was evaluated in terms of accuracy, which is defined as: Accuracy(%) =

Number of reviews that are correctly classified Total number of reviews

(1)

The results are displayed using the confusion matrix which gives us the predicted accuracy of each class (positive or negative). Figure 3, 4, 5 and 6 report the obtained results. As shown, in Fig. 3a, the reviews were relatively easily classified, reaching an average accuracy of 74% for negative and 89% for positive. The performance gap between these two classes can be explained by the fact that the total number of positive reviews is much larger than that of negative reviews in this dataset. Improvement was first made by using word representation techniques (after cleaning the data); see Fig. 4a, 5a and 6a. The proposed set gives an average

SA of a Large Arabic Multi-domain Resources Using DL

(a) Without normalization and ATB

51

(b) With normalization and ATB

Fig. 3. Confusion matrix of the proposed model

(a) Without normalization and ATB

(b) With normalization and ATB

Fig. 4. Confusion matrix of the proposed model using CBOW

accuracy of 77% for negative and 87% for positive using CBOW, 79% for negative and 86% for positive using SG and 79% for negative and 87% for positive using FastText. Examining the results, it can be seen that FastText gives the best results and this likely due to the fact that FastText takes into consideration the internal structure of words, which makes it suitable for morphologically rich languages such as Arabic one. In order to further improve the performance of the proposed approach, normalization + ATB technique were used on the cleaned reviews. The reported results in Fig. 3b, 4b, 5b and 6b showed that these settings helped the model to achieve better results and when it used before applying FastText; the accuracies become much higher (81% for negative and 89% for positive). As a result of this analysis, the best appropriate combination to be used in this study is FastText with 200 dimensions with Arabic preprocessing.

52

B. Nouhaila et al.

(a) Without normalization and ATB

(b) With normalization and ATB

Fig. 5. Confusion matrix of the proposed model using SG

(a) Without normalization and ATB

(b) With normalization and ATB

Fig. 6. Confusion matrix of the proposed model using FastText

To compare the performance of the proposed Deep CNN Arabic SA with the existing ML and DL methods, we have selected only the state-of-the-art works using LABR dataset. ElSahar et al. [14] developed an ML model for Arabic SA based on Delta-TF-IDF, TF-IDF and count for features selection and Linear SVM for classification. In [6], Altowayan et al. have presented an effective classification tool combining POS tags and word stemming features with Logistic Regression for Arabic SA. Altowayan et al. [5] used FastText with Support Vector Classification (SVC) and Logistic Regression classifiers on LABR dataset. Recently, Abubakr et al. [24] proposed a DL model based mainly on the 1D-CNN model and the Long Short Term Memory (LSTM) architecture. The feature maps extracted by both CNN and LSTM are used as input to SVM classifier to generate the final classification. The obtained results are shown in Table 6.

SA of a Large Arabic Multi-domain Resources Using DL

53

Table 6. Performance comparison with the existing methods using LABR dataset Classifier

Accuracy (%)

ElSahar et al. [14]

80.20

Altowayan et al. [6]

78.60

Altowayan et al. [5]

84.97

Abubakr et al. [24]

90.20

Our deep CNN model + FastText + Arabic preprocessing (cleaning + normalization + ATB)

87.73

In comparing the obtained accuracy for Arabic SA on LABR database with that of literature works, our model is determined to have an advantage over the majority of works and improves the accuracy by 7.53%, 9.20% and 2.76% compared to the accuracies obtained by ElSahar et al. [14], Altowayan et al. [6] and Altowayan et al. [5]; respectively. This is largely due to three main factors. First, the use of CNN allows the model to capture the most influential information about the reviews. Second, the use of the word embeddings helps the model to converge faster. Third, the use of the morphology-based tokenization scheme improves the performance of the Arabic SA as they deal with data sparsity [17,18,25].

4

Conclusion

In this work, we analyzed the advantage of combining normalization and ATB after cleaning the Arabic reviews. Furthermore, we compared three different Arabic word embedding models; namely, CBOW, SG and FastText. Moreover, a DL architecture based on the well-known 1-D CNN model is proposed. Experimental results showed that proposed set improves the performance of the Arabic SA system. As a part of future work, we aim to investigate more robust features in order to capture more morphological richness of the Arabic words.

References 1. Abdelali, A., Darwish, K., Durrani, N., Mubarak, H.: Farasa: a fast and furious segmenter for Arabic. In: Proceedings of the Demonstrations Session, NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 11–16 (2016) 2. Al Sallab, A., Hajj, H., Badaro, G., Baly, R., El Hajj, W., Bashir Shaban, K.: Deep learning models for sentiment analysis in Arabic. In: Proceedings of the Second Workshop on Arabic Natural Language Processing, pp. 9–17 (2015) 3. Alghamdi, N., Assiri, F.: A comparison of fasttext implementations using Arabic text classification. In: Proceedings of the 2019 Intelligent Systems Conference on Intelligent Systems and Applications, pp. 306–311 (2019)

54

B. Nouhaila et al.

4. Alomari, K.M., Elsherif, H.M., Shaalan, K.: Arabic tweets sentimental analysis using machine learning. In: Advances in Artificial Intelligence: From Theory to Practice - 30th International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems, vol. 10350, pp. 602–610 (2017) 5. Altowayan, A.A., Elnagar, A.: Improving Arabic sentiment analysis with sentiment-specific embeddings. In: 2017 IEEE International Conference on Big Data, BigData, pp. 4314–4320 (2017) 6. Altowayan, A.A., Tao, L.: Word embeddings for Arabic sentiment analysis. In: 2016 IEEE International Conference on Big Data, BigData, pp. 3820–3825 (2016) 7. Aly, M., Atiya, A.: LABR: a large scale Arabic book reviews dataset. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 494–498 (2013) 8. Bensalah, N., Ayad, H., Adib, A., Farouk, A.I.E.: LSTM or GRU for Arabic machine translation? Why not both! In: International Conference on Innovation and New Trends in Information Technology, INTIS 2019, Tangier, Morocco, 20–21 December (2019) 9. Bensalah, N., Ayad, H., Adib, A., Farouk, A.I.E.: Combining word and character embeddings for Arabic chatbots. In: Advanced Intelligent Systems for Sustainable Development, AI2SD 2020, Tangier, Morocco (2020) 10. Bensalah, N., Ayad, H., Adib, A., Farouk, A.I.E.: CRAN: an hybrid CNN-RNN attention-based model for Arabic machine translation. In: International Conference on Cloud Computing and Artificial Intelligence: Technologies and Applications, CloudTech 20, Marrakesh, Morocco (2020) 11. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017) 12. Bouny, L.E., Khalil, M., Adib, A.: ECG heartbeat classification based on multiscale wavelet convolutional neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 Mai (2020) 13. Elman, J.L.: Finding structure in time. Cogn. Sci. 14, 179–211 (1990) 14. ElSahar, H., El-Beltagy, S.R.: Building large Arabic multi-domain resources for sentiment analysis. In: Gelbukh, A.F. (ed.) 16th International Conference on Computational Linguistics and Intelligent Text Processing, pp. 23–34 (2015) 15. Feurer, M., Hutter, F.: Hyperparameter optimization. In: Automated Machine Learning, pp. 3–33. Springer (2019) 16. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580–587 (2014) 17. Habash, N., Sadat, F.: Arabic preprocessing schemes for statistical machine translation. In: Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (2006) 18. Kholy, A.E., Habash, N.: Orthographic and morphological processing for EnglishArabic statistical machine translation. Mach. Transl. 26, 25–45 (2012) 19. Khong, W., Soon, L., Goh, H., Haw, S.: Leveraging part-of-speech tagging for sentiment analysis in short texts and regular texts. In: 8th Joint International Conference on Semantic Technology, vol. 11341, pp. 182–197 (2018) 20. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: 3rd International Conference on Learning Representations, ICLR (2015) 21. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. In: Proceedings of the IEEE, pp. 2278–2324 (1998)

SA of a Large Arabic Multi-domain Resources Using DL

55

22. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: 1st International Conference on Learning Representations, ICLR (2013) 23. Mousavi, S., Afghah, F.: Inter- and intra-patient ECG heartbeat classification for arrhythmia detection: a sequence to sequence deep learning approach. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 1308–1312 (2019) 24. Ombabi, A.H., Ouarda, W., Alimi, A.M.: Deep learning CNN-LSTM framework for Arabic sentiment analysis using textual information shared in social networks. Social Netw. Analys. Min. 10(1), 53 (2020) 25. Zalmout, N., Habash, N.: Optimizing tokenization choice for machine translation across multiple target languages. Prague Bull. Math. Linguist. 108, 257–270 (2017)

CMA-EV: A Context Management Architecture Extended by Event and Variability Management Zineb Aarab1(B) , Asmae El ghazi1,3 , Rajaa Saidi1,2 , and Moulay Driss Rahmani1 1 LRIT Associated Unit to CNRST (URAC 29, Faculty of Sciences), Rabat IT Center,

Mohammed V University, BP 1014, Rabat, Morocco [email protected], [email protected], [email protected] 2 SI2M Laboratory, INSEA, BP 6217, Rabat, Morocco [email protected] 3 GENIUS Laboratory - SUPMTI of Rabat, Rabat, Morocco

Abstract. Nowadays software systems must adapt themselves to suit the specific context in which they operate. Context-awareness concepts are suitable for systems that must be optimally configured during execution. Because they allow systems to interact with real world by deciding the degree of adaptation the environment requires. In this sense, this paper presents our approach that consists of a CMA-EV architecture for the development of context-aware event-based systems, from the modeling phase through a context management architecture extended by event processing and variability management. Also, a discussion about the importance and challenges in designing context-aware and event-based applications is presented. An essential aspect that is often neglected in context modeling and also triggers adaptation is the context variability. This paper puts the light on this notion and proposes context feature model that would help the designer in programming adaptation for context-aware systems. All crucial these notions (context, event, and variability) are combined in our proposed CMA-EV architecture for the development of context aware and event-based system extended by variability management. Moreover, a Smart Tourism Recommender System (STRS), that respect personal preferences and capture usage, personal, social and environmental contextual parameters, is implemented to validate our approach. Keywords: Context-aware system · Event · Variability · Adaptive systems · Recommender system · Smart tourism

1 Introduction The new trend of computing rises the demand of the ability of self-adapting dynamically, autonomously, and high degree of flexibility, to meet with new stakeholder requirements, context changes and complexity. Context-aware systems introduce a set of design challenges that must be considered and handled as a first concern. Another challenge would be the notion of event and its management to be capable of operating highly dynamic environments and placing minimal demands on user attention. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 56–69, 2021. https://doi.org/10.1007/978-3-030-66840-2_5

CMA-EV: A Context Management Architecture

57

Likewise, an essential concept that’s usually ignored in context modeling which triggers adaptation too is context variability. Variability modeling is usually used to analyse and represent the commonalities and the changing part in software product lines, which is a benefit for context-aware systems concerning context variability and adaptation. Variant configuration differs from a context to another depending on varied data values. Those data are linked to context features and cross-tree constraints in feature model. Context features need to be clearly identified, modeled, and managed for contextaware systems to be able to reason about potential contextual changes and perform according to the defined strategy or reconfiguration plans. The diversity of context features is needed to satisfy multiple scenarios adaptation which can be addressed by introducing variability into context properties. The raison why we have decided to work on those three notions which are the Context, Event and Variability is that all of them triggers system adaptation. Which is a main concern for Context-aware systems. Recommender Systems (RSs) are information filtering systems, aiming to recommend optimal items to individual users. And try to better much with user preferences. In tourism, existing RSs acquire user wants, either explicitly or implicitly, and suggest services to the tourist (destinations to visit, events, hotels…). However, they generally do not consider contextual information liked to external events that not affect user preferences directly. The missing context, while providing recommendations, may have crucial consequences on user’s choices. Therefore, a personalized smart recommendation has become mandatory for tourism RSs. Our objectives in this work are fore-fold: 1) To present our CMA-EV architecture for the development of context-aware and event based systems, 2) to propose our context metamodel based on relational concept analysis an event metamodel to instantiate productive and comprehensible models, 3) to model context variability at design time which makes possible modeling several product lines supporting several dimensions of the context, and 4) to validate our approach through a smart tourism recommender system. The remainder of this paper is organized as follows. Section 2 presents a state of the art of three main concepts, which their discussion, which are: context awareness, event based systems, and context variability in context-aware systems. Section 3 introduces an overview of the proposed CMA-EV architecture for the development of context-aware and event based systems extended by context variability management. The validation of our approach is presented in Sect. 4. While Sect. 5 concludes this paper.

2 State of the Art The main motivation in the new computing paradigms (such as Ambient Intelligence, Pervasive Computing, Ubiquitous Computing, CASs, and so on) is decoupling users from computing devices. In this regard, several context definitions were proposed in the literature. A discussion of the notion of context is presented in our previous work [1].

58

Z. Aarab et al.

A widely and uniquely accepted definition of this concept is not available in the literature. For example, [2] sees the context as any information that can be used to characterize the situation of an entity. An entity could be a person, place, or object that is considered relevant to the interaction between a user and an application, including location, time, activities, and the preferences of each entity. In the last decade an important evolution of this notion comes out, the context was no longer considered as a state, but a part of a process in which users are involved. Thus, defining the word ‘context’ still a big challenge and many researchers tried to find their own definition for what context actually includes. As proposed by [3] we can say that this term is usually adopted to indicate a set of attributes that characterize the capabilities of the access mechanism, users’ preferences, information and services are delivered, these could include the access device (alike in the presence of strong heterogeneity of the devices) [3]. Additionally, the crucial standing of this notion has been the source of inspiration of several papers which gave multiple ramifications of its dimensions: spatial, spatial mobility, spatiotemporal, environment and personal dimension. This dimensions are explained in detail in our previous work [4]. As we can see, the notion of context is not an easy concept to understand neither to surrounds. Thus, context modeling is needed to understand and interpret dynamic context representations at a high-level abstraction in an unobtrusive way. The most important context modeling techniques are compared in [5], and listed as followings: key-value, markup schemes, graphical, object based, logic based, and ontology based modeling. Context metamodeling is also a major challenge. A metamodel defines the language and semantic to specify the particular model domains or applications. Metamodeling context is a major challenge for the identification of concepts involved in manipulating the context, their relationships, their formalization and presentation of their semantics [6]. The most popular context metamodeling approaches are: metamodel centered ubiquitous web applications, UML Profile metamodels, metamodels supporting ontological models, high level abstraction and framework metamodel. Those approaches were discussed in details and compared in our previous work [5]. Event Oriented Context Management Methods The authors of [7] propose a methodology for event-driven business process integration for ubiquitous enterprise environments. They provide an e-Services and uWork; e-Services are considered automated activities which can be provided by welldefined application services, while u-Works are defined as manual activities which can be supported by ubiquitous computing devices such as RFID sensors. The activities are coordinated in a decentralized collaboration network based on event-driven rule processing. [8] have contributed with a general reference architecture for Event Processing to make clear the fundamental organization and common operations of an EP-based system embodied in its components and features, to run concrete EP-based systems. This general reference architecture [8] models the abstract architectural elements of an EP-based system independent of the technologies, protocols, and products that might be used to implement it.

CMA-EV: A Context Management Architecture

59

[9] have introduced an EDA-based reference architecture for DSS (Decision Support Systems) for traffic management. And they have [9] redesigned an ITMS (Intelligent Transportation Management System) prototype for a real-world problem within the framework of that architecture, where event processing agents connect streams of increasingly abstract types of events, making use of a rule-based representation of local traffic expertise. In the E-CARe (Engineering Context-Aware and Reactive systems) method [10], an event driven approach was proposed to gather context data and trigger system reactions. This approach allows the definition of system architectures based on event flow exchanges and decoupling the system from static databases. [11] proposes an event metamodel standing as a pivot between the beforehand Natural Language Processing, and temporal information visualization frameworks on the end user side. Based on The MDE (Model-Driven Engineering) [11] exploited the event metamodel both for data production and graphical interfaces generation for visualization of events stored in the database. Based on the MDE (Model Driven Engineering) [12] has proposed an approach for the development of a journalistic production environment. [12] gives an overview of their approach and present the metamodel used, including event model serving as a pivot between the beforehand Natural Language Processing, and temporal information visualization frameworks on the end user side. The growing evolution of notion of ubiquity in mobile devices addresses new digging track like the need of new interaction and application models to facilitate new forms of communication. In this sense [13] discussed the challenges in designing eventdriven mobile services that will detect conditions of interest to users and notify them accordingly. Context Variability in Context-aware systems Software systems are becoming more context sensitive and increasingly exploit contextual information to handle the diversity of changes in and conditions of their environment. In specific application domains such as automotive systems, marine and aviation systems, windmill farms, and airport management systems, the timely use of and adaptation to contextual information is critical for the system’s normal operation. Consequently, a major concern is acquiring, analyzing, modeling, and managing contextual information for the plethora of systems that need to react and adapt to new contexts. These activities require appropriate software modeling and development techniques. In this section we will discuss context variability then we present the important concepts of variability. Context Variability: We call context aware a software that automatically adapt its behavior to its current context. For example, the adjustment of the speed of a vehicle in function of the distance of the vehicle ahead. Since the notion of context is a pillar in Context-Aware System (CAS), context modeling has taken a big importance in the research field. Yet the concept of context variability is often missed. Since context variability is the variability of the environment in which a product resides [14], this makes it steadily related to dynamic adaptation of a system. The notion of context variability was introduced in [14, 15] as a relevant technique for modeling context awareness of systems which model their context properties using a Software Product Line (SPL) approach [16]. Thus for [14] a context variability model “consists of general classifiers of the

60

Z. Aarab et al.

context in which the product is used. A general classifier stands for a set of requirements or constraints for that context. The Context Variability model captures the commonality and variability of the context”. We adopt the definition of [17] for context variability: “Context variability is a range or a set of context values of an environment along which that specific environment changes or is influenced by various context entities”. Synthesis In this section we present a summary of the presented notions above through comparisons: In Table 1 we have classified different context definitions scanned from the literature into six categories which are: Context Definition, Context Element, Context Levels, Context Type, Context Nature, Context Exploitation. To sum up, as it can be observed a complete and global context definition is required, covering all the needed context-aware concepts (i.e. user profile, preferences, user activity, history, interaction, location, environment, devices, services, activities, etc.). That is why the next section is dedicated to introduce context aware architecture for the development of context-aware event-based systems.

3 CMA-EV Architecture Overview The recent perspective in the research area is merging toward a smart word (e.g. smart devices, smart environments, smart applications, smart homes, smart manufacturing, and smart cities). Therefore, many works are trying to integrate context management in the development of applications that serve this purpose. In our research, we deal with the development of context-aware applications in intelligent environments. Thus, our approach aims to provide a solution for the development of context-aware applications involving the combination of context, distributed events and context variability modeling. We demonstrate its applicability and usability via a smart tourism application. Figure 1 presents the fundamental building blocks of our CMA-EV architecture for enabling self-managing context-aware systems. This includes context modeling, context variability modeling, automatic context data acquisition, and service provisioner component. The context variability selects the applicable alternatives considering data from context and physical environment changes. During the execution phase, the context service provisioner searches for available services that can accomplish the demanded request to make an automated decision on fragment using context variability. In our approach, we use for capturing adaptation rules a formalism based on EventCondition- Action (ECA) rules. This adaptation model relies on the variability model of the system and especially context variability model. Finally, the context service provisioner choose the best solution among the acceptable adaptation configurations for a particular context. The context model is an instance of our proposed context metamodel that is presented in the next section.

Context Exploitation

Context Nature

Context Type

Context Levels

Context Element

Context Definition

Approaches

Service

Rules

Dynamic/ Sensed

Static

Secondary Context

Primary Context

Profiled

Derived

High-level / Logical

Low-level /physical

Historical interaction

Environment (Physical)

System (Computing)

Conditions

Task

Objective (Why)

Activity (What)

Time (When)

User (Who)

Location (Where)

Domain

Social

Synonyms

Categories

Example

















[21] √





[20] √



[19]



[18]







[18] √











[2]











[22]













[28]











[29]





[30]

[31]











[27]



[26]



















[25]



[24]



[23]













[32] √

Table 1. Context categorization schemes in context classification works











[6]













[10]







































The proposed :CMA-EV





[34]













[33]

CMA-EV: A Context Management Architecture 61

62

Z. Aarab et al.

3.1 Context Metamodel A context metamodel must address special requirements of pervasive computing environments. Besides context-aware applications require high-level context information that is derived from low-level context values. In our vision, the concept of context interacts three different elements which are: • The User with its properties. • The System with its properties. • And the external/physical Environment. Therefore, we designed the conceptual basics of our CMA-EV architecture first, starting with context metamodeling, as our paramount concern, that translate our vision.

Fig. 1. Overview of the proposed CMA-EV architecture

As you can see in Fig. 2, our context depends on three major elements which are the system, user and the environment. We define the context as an aggregation of seven different main groups of properties (see Fig. 2): The Channel, Location and time, User profile, System Activity, History, Rules, and External Environment. • A Channel defines the medium by which information of the context can be transmitted; it identifies the physical device and the connection used to • access the application. The Channel is composed of: The Device used for the communication, we adopt the Device model proposed by [35]; The Network used to transfer information; The Application protocols used by services. • A Location and Time description that identify the position of the user; where the user is located while interacting with the application and the Time of the interaction. It is composed of two main parts: Political Location describes the location in terms of Country, City and street. Physical Location describes the location in terms of Geography location using GPS.

CMA-EV: A Context Management Architecture

63

Fig. 2. Our proposed Context metamodel

• The User profile is, none other than, the information collected, structured, organized, and maintained about user-related data. We classify the data concerning a user, as mentioned in the context metamodel in Fig. 2, into two different groups [29]: Static data: such as First name and Last name, age (Date of birth), gender, and personal information that the user could express explicitly like his address. Dynamic data: are the information that could be changed manually by the user or automatically by the application such as: user’s Expertise and knowledge, the Role played by the user in social context and Medical state of the user that is helpful (e.g. in a health application). The Dynamic part of the user is composed of User activity which describes if he/she is Mobile or Fixe. • System Activity that describes relevant information about the ongoing activities in which the user and the system are involved, which could be known from the state of the system (Stand-by, Acquisition, Offering or Searching mode). • A History of context action interactions that the user has performed with the application implemented. • Preferences formulated implicitly by means of association Rules extracted from the above-mentioned history, or were set by a domain expert. An example of a rule in a tourism application would be: IF a user is a student, requesting from Venice, asking for a service belonging to Service Hotel Reservation THEN the user prefers 3stars category, room with TV. • External Environment information which describes Temperature and UV Index, Probability of precipitation, time of sun set and sun rise. As you have seen in Table 1 we have compared our context metamodel and from what we have presented, our proposed metamodel fulfill most of the important concepts for context aware systems. 3.2 Event Metamodel We argue that events processing serves as a key in building a flexible and responsive business. Consequently, event-driven approaches were introduced to service-oriented

64

Z. Aarab et al.

computing in ubiquitous enterprise environments, to allow service applications to be loosely coupled through event notification. “An event is an occurrence within a particular system or domain; it is something that has happened, or is contemplated as having happened in that domain. The word event is also used to mean a programming entity that represents such an occurrence in a computing system”. [13] An event-based system executes rules (e.g. computational process) that generate appropriate responses whenever the rule is verified. Event-based systems have to collect, analyze then react to occurring events. These steps are the most challenging part in the design of event-based systems. In our vision [36] events are not isolated in fact an event-oriented system can be viewed as a set of interconnected constructing units of information systems, such as the context, event operators, rules and event sensor, having different functionalities. In our previous work [36] we have proposed an event metamodel, which separate between two parts : the structural and the behavioral part of an event. Where the Event structural metamodel is composed of primitive and complex events. Primitive events are predefined in the system and they are specific to a domain. Primitive event could be business, user or external events. A complex event is a composition of, other primitive or complex, events. Complex events are combined by using logical operators or other non-logical type of events. The detailed description of the Event metamodel could be found in [36]. 3.3 Feature Modeling In this section, we present the feature model of the context, we have splitted the context feature model (Fig. 3) into several sub-feature models according to fore main groups of properties (see the description of the context metamodel). Different context variability models represent different instance of context that could be considered in the system. So that the expert could have a large vision of different adaptation action that should be computed at a specific context instance C1. After that context variability models are merged by means of dependency relations. Context-aware process variability: Analytical processes can have common parts and details that may vary for each use case, influenced variously by special type of context,

Fig. 3. Context feature model

CMA-EV: A Context Management Architecture

65

e.g. location and preferences… Therefore, designing ad-hoc processes for each use case can consumes more time, resource and cost, as well as an error prone task. On the other hand, considering the variability of the context data should be able to configure and execute analytical processes at runtime through automated decision making on suitable fragments for current context data. This would reduce changing time from one analytical process variant to another for each specific context, also enable managing large sets of process variants and allowing a DSPL to support runtime variability of process models. 3.4 Scenario-Based Analysis This section provides a scenario based analysis and evaluation of our context-aware event based architecture by going through different scenarios and showing how different parts of the system work together. As a sample use case, we consider mobile devices for the reason that context in mobile devices is very dynamic since those devices are continually with the user. Applications in mobile devices could be linked to several components in our context-aware event-based system. Where these functions vary from a component to another. E.g. the Phone component is in charge of making and receiving calls, the Directory component is responsible of the user’s contact information, and the GPS component contains the location of the device (user). We shall not forget that mobile devices are very limited in terms of memory, power sources and processing. Context awareness could be accomplished by providing an effective and clear manner to share information between different components while they are trying to afford a single functionality. Let’s consider a case when Lina’s phone is in “not receiving calls” mode, but one of the people who are very close to her call (her best friend Rim). Lina would like to receive this call but she forgot to program her phone for this exception. In This case the context service provisioner (see Fig. 1), lunch the context-aware process, which analyses the call history and text messages and calculate the communication frequency between Lina and Rim. When we have found that this frequency is high the context-aware component adds an exception without asking permission to Lina. The event is the “call of rim” (which has affected the current context). The context-awareness part is the analysis of the current context (call history) and decision making (the adaptation of the service).

4 Validation: Smart Tourism Recommender System (STRS) In this section, we present our Smart Tourism RS experiment results starting from the premise. The tourist just arrived to Rabat city and its current location is the city center, Fig. 4 show the nearest hotels in the area. Tourist 1: The first scenario is about an ordinary tourist aged between 40 and 65 years old where its profile does not show special requirements but we have chosen to give him/her the optimal hotel for two days (see Fig. 4) according to the following criteria: The price, Air conditioner, TV, Swimming pool, Number of stars, Sports hall, The reviews, Other.

66

Z. Aarab et al.

Fig. 4. The hotels surrounding the city center of Rabat using the TripAdvisor application

Case1: Figure 5 shows the results of 12 hotels with our approach. The hotel H9 which is Belere Hotel is the best one for the Tourist 1. And Fig. 6 shows the resulting hotel for the Tourist 1 using TripAdvisor websites. In this case we have analyzed the user profile (see Fig. 2 context metamodel) and managed the collected context information. And proposed to the tourist an optimal hotel based on its preferences. Case 2: when unpredictable events occurs (like weather conditions, traffic congestion…), our RSs adapt its self automatically to suggest new solution (hotels in this case) to the user in real time. We have experimented for different period the proposition of the STRS for the tourist 1, in the case of traffic problem to attend to the H3 hotel at different period of time see (a), (b) and (c). The variation of time and the traffic condition affects the results of the

Fig. 5. The optimal choice among twelve hotels for the Tourist 1 using STRS

Fig. 6. The recommended hotel for the Tourist 1 using our STRS

CMA-EV: A Context Management Architecture

67

proposed service (the proposition of Hotel). We can see in Fig. 7 that the proposition has changed and the optimal hotel became H11.

Fig. 7. The optimal hotel for the Tourist 1 at runtime within changing conditions at different period of time (a), (b) and (c) using STRS

5 Conclusion In recent years, context-aware technologies have been incessantly used to enhance user interaction in areas such as searching and information recovering. Even there have been several significant advances in context-aware systems. There is still a lack of approaches that develop proactive applications that combine context awareness and distributed events. Moreover, the contextual information exposes multiple variability factors which induce many possible configurations of the software system. In this paper we expose these issues by proposing a development architecture (CMA-EV) for context-aware and event-based systems extended by context variability management. In our approach, we propose our vision of the notion of context by defining a context metamodel. Finally, we have experimented our proposed approach through a smart tourism recommender system.

References 1. Aarab, Z., Saidi, R., Rahmani, M.D.: «Towards a framework for context-aware mobile information systems». In: 2014 Tenth International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), pp. 694–701, Marrakech, November 2014. http://ieeexplore. ieee.org/abstract/document/7081618/. Accessed 28 June 2017 2. Abowd, G.D., Dey, A.K., Brown, P.J., Davies, N., Smith, M., Steggles, P.: «Towards a better understanding of context and context-awareness». In: Handheld and Ubiquitous Computing, pp. 304–307 (1999). http://link.springer.com/chapter/10.1007/3-540-48157-5_29. Accessed 26 Feb 2014 3. Pernici, B., Krogstie, J.: Mobile Information Systems. Springer (2006) 4. Aarab, Z., El ghazi, A., Saidi, R., Rahmani, M.D.: «Toward a Smart Tourism Recommender System: Applied to Tangier City» (2017)

68

Z. Aarab et al.

5. Aarab, Z., Saidi, R., Rahmani, M.D.: Context modeling and metamodeling: a state of the art. In: El Oualkadi, A., Choubani, F., El Moussati, A. (eds.) Proceedings of the Mediterranean Conference on Information & Communication Technologies 2015. LNEE, vol. 381, pp. 287– 295. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-30298-0_30 6. Vieira, V., Tedesco, P., Salgado, A.C.: «Designing context-sensitive systems: An integrated approach» Exp. Syst. Appl. 38(2) 1119–1138 (2011) 7. Kong, J., Jung, J.-Y., Park, J.: «Event-driven service coordination for business process integration in ubiquitous enterprises». Comput. Ind. Eng. 57(1), 14–26 (2009) 8. Paschke, A., Vincent, P.: «A reference architecture for event processing». In: Proceedings of the Third ACM International Conference on Distributed Event-Based Systems, p. 25 (2009). http://dl.acm.org/citation.cfm?id=1619291. Accessed 28 June 2017 9. Dunkel, J., Fernández, A., Ortiz, R., Ossowski, S.: «Event-driven architecture for decision support in traffic management systems». Expert Syst. Appl. 38(6) 6530–6539 (2011) 10. Ben Cheikh, A., Front, A., Giraudin, J.-P., Coulondre, S.: «An engineering method for contextaware and reactive systems». In: Sixth International Conference on Research Challenges in Information Science (RCIS), 2012, pp. 1–12 (2012). http://ieeexplore.ieee.org/xpls/abs_all. jsp?arnumber=6240433 11. Mallouli, S.D., Assar, S., Souveyet, C.: «Proposition d’une démarche de type IDM pour la construction d’outils d’exécution de processus». In: INFormatique des ORganisations et Systèmes d’Information et de Décision, p. 163 (2014). https://hal.archives-ouvertes.fr/hal00998524/. Accessed 28 June 2017 12. Faucher, C., Bertrand, F., Lafaye, J.-Y., Teyssou, D., Bully, M.: «Une approche fondée sur l’IDM pour le développement d’un environnement de production journalistique». TSI-Tech. Sci. Inform. 31(7) 917 (2012) 13. Boukerche, A., Loureiro, A.A., Nakamura, E.F., Oliveira, H.A., Ramos, H.S., Villas, L.A.: «Cloud-assisted computing for event-driven mobile services». Mob. Netw. Appl. 19(2), 161– 170 (2014) 14. Hartmann, H., Trew, T.: «Using feature diagrams with context variability to model multiple product lines for software supply chains». In: 12th International on Software Product Line Conference, 2008. SPLC 2008, pp. 12–21 (2008). http://ieeexplore.ieee.org/abstract/ document/4626836/. Accessed 14 Sept 2017 15. Ali, R., Yu, Y., Chitchyan, R., Nhlabatsi, A., Giorgini, P.: «Towards a unified framework for contextual variability in requirements». In: Third International Workshop on Software Product Management (IWSPM), 2009, pp. 31–34 (2009). http://ieeexplore.ieee.org/abstract/ document/5457323/. Accessed 14 Sept 2017 16. Murguzur, A., Capilla, R., Trujillo, S., Ortiz, O., Lopez-Herrejon, R.E.: «Context variability modeling for runtime configuration of service-based dynamic software product lines». In: Proceedings of the 18th International Software Product Line Conference: Companion Volume for Workshops, Demonstrations and Tools, vol. 2, pp. 2–9, (2014). http://dl.acm.org/citation. cfm?id=2655957. Accessed 14 Sept 2017 17. Amja, A.M. Obaid, A., Mili, H.: «Combining variability, RCA and feature model for contextawareness». In: 2016 Sixth International Conference on Innovative Computing Technology (INTECH), pp. 15–23 (2016). http://ieeexplore.ieee.org/abstract/document/7845015/. Accessed 14 Sept 2017 18. Schilit, B., Adams, N., Want, R.: «Context-aware computing applications». In: First Workshop on Mobile Computing Systems and Applications, 1994. WMCSA 1994, pp. 85–90 (1994). http://ieeexplore.ieee.org/abstract/document/4624429/. Accessed 23 June 2017 19. Brown, P.J.: «The stick-e document: a framework for creating context-aware applications». Electron. Publ.-Chichester 8, 259–272 (1995) 20. Brown, P.J., Bovey, J.D., Chen, X.: «Context-aware applications: from the laboratory to the marketplace». IEEE Pers. Commun. 4(5), 58–64 (1997)

CMA-EV: A Context Management Architecture

69

21. Ryan, N., Pascoe, J., Morse, D.: «Enhanced reality fieldwork: the context aware archaeological assistant». Bar Int. Ser. 750, 269–274 (1999) 22. Göker, A., Myrhaug, H.I.: «User context and personalisation» (2020) http://goker.us/papers/ personalization_workshop.pdf#page=4. Accessed 23 June 2017 23. Henricksen, K.: A framework for context-aware pervasive computing applications. University of Queensland, Queensland (2003) 24. Guan, D., Yuan, W., Lee, S., Lee, Y.-K.: «Context selection and reasoning in ubiquitous computing» In: The 2007 International Conference on Intelligent Pervasive Computing, IPC. 2007, pp. 184–187 (2007). http://ieeexplore.ieee.org/abstract/document/4438421/. Accessed 23 June 2017 25. Schmidt, A., Beigl, M., Gellersen, H.-W.: «There is more to context than location». Comput. Graph 23(6), 893–901 (1999) 26. Miao, Z., Yuan, B.: «Spontaneous sensor networks for context-aware computing». In: 2006 IET International Conference on Wireless, Mobile and Multimedia Networks, pp. 1–4 (2006), http://ieeexplore.ieee.org/abstract/document/5195749/. Accessed 23 June 2017 27. Rodden, T., Cheverst, K., Davies, K., Dix, A.: «Exploiting context in HCI design for mobile systems». In: Workshop on Human Computer Interaction with Mobile Devices, pp. 21–22 (1998). http://alandix.com/academic/papers/exploting-context-1998/. Accessed 23 June 2017 28. Jun-Zhong, G.: Context aware computing. J. East China Norm. Univ. Nat. Sci. 5, 1–20 (2009) 29. Zimmermann, A., Lorenz, A., Oppermann, R.: «An operational definition of context». In: International and Interdisciplinary Conference on Modeling and Using Context, pp. 558–571 (2007). http://link.springer.com/chapter/10.1007/978-3-540-74255-5_42. Accessed 23 June 2017 30. Grassi, V., Sindico, A.: «Towards model driven design of service-based context-aware applications». In: International Workshop on Engineering of Software Services for Pervasive Environments: In Conjunction with the 6th ESEC/FSE Joint Meeting, pp. 69–74 (2007). http://dl. acm.org/citation.cfm?id=1294915. Accessed 23 June 2017 31. Rizou, S., Häussermann, K., Dürr, F., Cipriani, N., Rothermel, K.: «A system for distributed context reasoning». In: 2010 Sixth International Conference on Autonomic and Autonomous Systems (ICAS), pp. 84–89 (2010). http://ieeexplore.ieee.org/abstract/document/5442615/. Accessed 23 June 2017 32. Garzotto, F., Paolini, P., Speroni, M., Proll, B., Retschitzegger, W., Schwinger, W.: «Ubiquitous access to cultural tourism portals». In: 2004 Proceedings of the 15th International Workshop on Database and Expert Systems Applications, pp. 67–72 (2004). http://ieeexp lore.ieee.org/abstract/document/1333451/. Accessed 23 June 2017 33. Emmanouilidis, C., Koutsiamanis, R.-A., Tasidou, A.: «Mobile guides: Taxonomy of architectures, context awareness, technologies and applications». J. Netw. Comput. Appl. 36(1), 103–125 (2013) 34. Perera, C., Zaslavsky, A., Christen, P., Georgakopoulos, D.: «Context aware computing for the Internet of Things: a survey». IEEE Commun. Surv. Tutor. 16(1), 414–454 (2014) 35. Adorni, M., et al.: «Reference architecture and framework». In: Mobile Information Systems: Infrastructure and Design for Adaptivity and Flexibility, Springer, Heidelberg (2006) 36. Aarab, Z., Saidi, R., Rahmani, M.D.: «Event-Driven Modeling for Context-Aware Information Systems», December 2016

Electronic Public Services in the AI Era Hajar Hadi(B) , Ibtissam Elhassani, and Souhayl Sekkat Artificial Intelligence for the Sciences of the Engineer, National School of Arts and Crafts, Moulay Ismail University, 50500, 15290 Meknes, Morocco [email protected], {i.elhassani,s.sekkat}@ensam.umi.ac.ma

Abstract. Effective e-services can provide a wide variety of benefits, including more efficiency and savings for governments and businesses, increased transparency, and greater participation of citizens in political life. Indeed, the digital world has made important strides and we talk today about Artificial Intelligence as the fourth Industrial Revolution. In this context, this paper aims to demonstrate the importance of application Artificial intelligence in public services, by a benchmark of successful international experiences applying AI in public services and to propose a new model to classify maturity of Moroccan electronic public services. Keywords: E-government · Public service · Benchmark · Artificial intelligence · Industry 4.0

1 Introduction Information and Communication Technologies (ICT) now play a very important role in society and are used in a multitude of activities [1]. E-government has become a priority for many countries around the world. This paradigm combines the use of ICT with forms of management and planning [2], in order to create a more efficient and effective public sector to meet the needs of citizens, through the development of new electronic services [3]. The development of industry and the web has accelerated exponentially, as we talk about industry 4.0 or web 3.0. In this context, the majority of countries, including Morocco, have begun to digitize their services and plan their e-government strategically between the year 2008 and 2013. In order to coordinate these developments and the public sector, this paper aims to demonstrate the importance of AI in public administration. Therefore, we presents a theoretical background on e-government and related works to the subject then focuses on successful international initiatives of Egovernment and digital revolution, After that we report a benchmark of international public electronic services applying AI and finally proposes a new model to classify maturity of electronic public services through artificial intelligence in Morocco.

2 Theoretical Background Nowadays, E-government is an important topical phenomenon. In this paragraph, we present definitions related to electronic services. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 70–82, 2021. https://doi.org/10.1007/978-3-030-66840-2_6

Electronic Public Services in the AI Era

71

2.1 E-Government Electronic government (or e-government) is a current issue. The first official government sites appeared in the middle of the 1990 s, delivering information and services [4]. Egovernment can be defined as the use of Information and Communication Technologies (ICTs) combined with organizational change to improve the structures and operations of government [5] or as the process by which the government can deliver services and information to its citizens via the internet to improve the availability of information and services offered to citizens [6]. 2.2 Electronic Government Development Index In this paragraph, we define an important indicator, which enables the measurement of the maturity of electronic services. Indeed, the United Nations (UN) developed a tool to evaluate E-government maturity, called E-government development index (EGDI), and composed of three sub-indices: Online services (called web measure index (WMI) in UN 2008), telecommunications infrastructure index (TCII) and human capital index (HCI) 2.3 Types of E-Government E-government activities are usually categorized into three forms of interactions: government to citizen, government to business and government-to-government. G2B and G2C include transactions between government and users whereas G2G performs internal transactions in the back office [7]. According to Brown and Brudney [8], one may include two additional categories in this list: Government-to-Civil Societal Organizations (G2CS) and Citizen-to-Citizen (C2C).

3 Related Work Various models of e-government implementation have been advanced in the literature. In 2009, Papadomichelaki and Mentzas promoted [9] E-GovQual as a model developed to measure the quality of E-government services. In 2010, Alanezi, Kamil and Basri [10] developed a proposal to measure the quality of e-government services. This proposal is based on seven dimensions, based on the SERVQUAL methodology of Parasuraman. In 2012, Zaidi and Qteishat [11] developed the e-GSQA framework based on the diagram of Fig. 1. Then, Hien [12] considers that the quality of electronic services is assumed from two points of view: the quality of the service and the quality of the information.

72

H. Hadi et al.

Fig. 1. E-GSQA framework [11]

4 International E-Government Initiatives According to UN E-government survey 2018, Singapore and some European countries are among the top ten countries regarding development of e-government. Therefore, we present the strategies adopted to attain this level followed by the Moroccan experience in this context. 4.1 E-Government Initiatives Across Singapore Based on the work of Watson and Mundy [13], we have broken down the various stages of Singapore’s e-government evolution that are initiation, dissemination and customization, as well as the main actions taken at each stage. First, in the stage of initiation, the government began developing official websites at the end of the IT2000 project, aiming to make Singapore a smart island. Then, in the stage of infusion, the focus shifted to: • Development of an action plan to define the main lines of the deployment of information and communication technologies (ICT) (2000–2002); • Commitment of $932 million over the period 2000–2003 to implement the action plan; • Centralization of financing and common infrastructure; • Bridging the digital backlog.

Electronic Public Services in the AI Era

73

Finally, in the customization step, the goal was to maximize the value of egovernment for citizens by enabling them to obtain an electronic personal profile of their interactions with the government. To achieve this goal, the government has addressed the challenges of integrating its portal with the information systems of various agencies, reengineering the public service delivery process and implementing management techniques in customer relationship. The government of Singapore made a master plan [14] from 1980 until 2015, as shown in the following Fig. 2:

• Automation of public service • Basic IT infrastructure and data hubs CSCP (1980-1999)

e-Government Action Plan I and II (2000-2005) • Online service delivery • Integrated services

eGov 2015 (2011-2015) • Integration of data, process and systems for Government agencies • 300 mobile government services deployed

• Focus on collaboration within and outside Government

iGov2010 (2006-2010)

Fig. 2. Masterplans of Singapore government

4.2 E-Government Initiatives Across EU-15, Norway and Iceland According to the first surveys at the European level [15], there are eight major requirements to apply successfully ICT to government as shown in Table 1.

74

H. Hadi et al. Table 1. Characteristics of EU-15, Norway and Iceland’s E-government strategy.

Strategy

Characteristics

Strategy 1: Digitisation of Largely Unchanged Back-Offices

The history of back office integration and cooperation is an important consideration when introducing online services and seeking to reap rationalisation and quality benefits

Strategy 2: Deep Reorganisation of Back-Offices

Requires a complete re-think from scratch of the whole system and philosophy of service design, production and delivery

Strategy 3: Centralisation of Back-Office and De-centralisation of Front-Office Functions

Can provide strong rationalisation benefits, focusing expertise, reducing errors and time delays, and exploiting economies of scale, which a large number of decentralised units undertaking largely similar functions cannot hope to emulate

Strategy 4: Back-Office Clearing House

Allows data exchange compatibility where this does not exist, and thus may be a cheaper solution than the wholesale centralisation of data sources where data standards, languages, semantics and syntax are incompatible and need to be able to communicate

Strategy 5: Generic Types of Interaction Between User and Agency

Provide many qualitative and efficiency benefits by standardising some common features to achieve savings, simplicity, ease of maintenance and upgrade, ease of use (whether by end users or staff) and re-use of successful features

Strategy 6: Portals

Services in the portal are normally related to each other in some way, so that typically the user will need, or will wish, to use two or more of them to fulfil a particular requirement. An advantage of portals is often that what previously were separate services now appear as one service concept consisting of a number of steps or options

Strategy 7: Pro-active Services

It is a service for which the relevant agency takes full responsibility to initiate, deliver and fulfill. This minimizes input and responsibility required from the user and may even disappear altogether

Strategy 8: User Self-service

It is also possible to offer the user, not less but, greater responsibility and control over a given service, maximising transparency for users

5 Digital Revolution 5.1 The Evolution of the Web We distinguish between three generations of the web, driven by technological developments [16, 17]. In Web 1.0, applications only allow reading as it is generally intended to publish content to be “consumed” by users, thus limiting interaction. Web 2.0 is a term that describes a new generation of the web, allowing more interaction with the displayed content. The content is generated by the user, which makes the user more active as both producer and consumer, and further develops content dissemination networks. The most recent generation, Web 3.0, is a semantic data network that has provided enhanced search and data linkage capabilities, allowing this data network to better interface with and provide data to other Web applications used by individuals on the Internet. Web 3.0 technologies are the answer to the ever-increasing amounts of data generated by users and organizations that need to be sought and exploited more efficiently. Since it is difficult for a single platform to handle huge amounts of data, the services concerned are moving towards deconcentration, which results in the emergence of technologies such as distributed computing or the blockchain.

Electronic Public Services in the AI Era

75

5.2 The Evolution of the Industry We can distinguish some distinct generations of industry as well, driven by technological evolutions [18, 19]. Industry 1.0 entailed mechanical mass production using water - and steam-powered machines, while Industry 2.0 focused on the power of electricity, and developed new methods of manufacture, through the improved allocation of various manufacturing resources. Industry 3.0 was based on the development of electronic hardware and software to improve planning of industrial operations, as well as extending automation of previously manual production tasks - these also enabled new services and capabilities based on optimizing warehouse management, which are completely beyond the realm of inventory control and shipping logistics [20]. Recently, Industry 4.0 has emerged with the advanced digitization and use of the ‘Internet of Things,’ big data and analytics technologies within factories, in order to generate new production-related information, which can be used to further increase production efficiency (production process innovations), and also for the development of novel products and services (product and process innovations). Among the core characteristics of Industry 4.0 are quicker decision-making, decentralization, and products/services customization and personalization, with the use of big data as an important factor driving industry 4.0.

6 Benchmarking of the Countries Using AI on Their Electronic Public Services From the above, it is clear that Great advances have been made in terms of industrial development and technologies, especially application of AI, in this context many countries have benefited from it to improve the quality of their public services. The following table present an analysis of the successful international experiences related to applying AI in public administration (Table 2).

76

H. Hadi et al. Table 2. Analysis of public electronic services applying AI

Countries

Public service

Administration

Objective

Starting data

added value of AI semantic search in the computerized patient folder to find the right information at the right time

Patient Record Analysis[21]

CHU BORDEAUX

facilitate access to patient information

storing digital patient records

Monitoring the legality of acts [21]

DIRECTORATEGENERAL FOR LOCAL AUTHORITIES

develop artificial intelligence in the control of dematerialised legality

Application acts

automatically sort communicable and noncommunicable acts and detect the information to be controlled in priority: the AI will facilitate the work of agents in prefecture

Support and improvement of the online precomplaint system: propose an approach based on the referral of the complainant in natural language[21]

DIRECTORATEGENERAL FOR THE NATIONAL POLICE

improve the online precomplaint system

data from online precomplaints

Identification of false customs declarations[21]

DIRECTORATEGENERAL FOR CUSTOMS AND INDIRECT DUTIES

-

single administrative documents

direct users of public services[22]

Services relating to user guidance

facilitate processing of user complaints

online complaints

Automatically detect infringements from online precomplaints and identify additional questions to ask the user. AI will transform online precomplaints into qualified complaints detect imported products that are incorrectly declared in a taxadvantageous product nomenclature Use of algorithms to classify citizens' claims and route them to the relevant services.

gain Facilitate the work of health care workers and reduce the time lost in retrieving patient information simplify the work of agents in the prefecture

saving time for complainants and officers

-

minimize the time to process claims

(continued)

Electronic Public Services in the AI Era

77

Table 2. (continued) provide information and information[22]

Issues in the application of AI

• • • • •



services relating to the provision of information

Release of the Aid Centre operator line

The requests of users.

adoption of chatbots auditory or textual computerized conversational systems

Nearly 90% of calls are handled by automated operators, which allows operators to respond to the most complicated and urgent requests from users

Improvement of case processing processes, Mass data analysis to assist in decision making, Improvement of the user relationship, Improvement of strategic intelligence, Optimisation of control targeting, Automatic analysis of the rule of justice.

7 New Model of Maturity of Moroccan Electronic Public Service in Artificial Intelligence Era E-governance has evolved in 3 waves [23], beginning with e-government 1.0, which refers to the use of ICTs and web-based technologies for improving or enhancing the efficiency and effectiveness of public service production and delivery to citizens and firms. E-government 2.0 emerged alongside web 2.0 and the opening of public information, enabling more open, accountable and responsive government, promoting government transparency, and enabling citizens’ participation and collaboration. E-government 3.0 is based on new, disruptive ICTs (such as big data, IOT, analytics, machine learning, AI), in combination with established ICTs (such as distributed technologies for data storage and service delivery), and taking advantage of the wisdom of crowd (crowd/citizensourcing and value co-creation), to support data-driven and evidence-based decision and policymaking. Certainly, this wave will have a positive impact in terms of revenue, cost reduction and flexibility of process, but it will require change and transformation of business. Only actors who can keep up with this transformation and adopt innovative business models can take advantage of this revolution. In this context, public services must keep up with this revolution and consider it in evaluating models. Indeed, the development of this technology is intensifying in the private sector, while there are some initiatives to apply it in public sector. For example, in Quebec, the federal government uses AI to sort many visa applications received from India and China [24]. 7.1 Moroccan Electronic Services The Moroccan Digital Plan 2013, launched in October 2009, included bringing administration closer to users among its four priorities. This priority has been implemented through a vast E-government program. It entailed a portfolio of 89 projects and services, under the responsibility of different organizations and administrations, for a global

78

H. Hadi et al.

investment of more than 2 billion dirhams [25]. In a recently completed study for the ministry of economy, finance and administration reform [26], there are 453 electronic services in 87 public administrations. 86% of them are technical electronic services while 14% of them are support electronic services. 45% of electronic services serving citizens, while 40% are targeted to business and 15% concern them both (Figs. 3 and 4).

citizen SUPPORT 14%

professional 40%

citizen/professional

technical 86%

Other 15% citizen 45%

Fig. 3. Distribution of electronic services in relation to their strategic importance

professional

citizen/professional /administrations professional /administration administration

Fig. 4. Distribution of “technical” electronic services in relation to the target population

Electronic services can be classified according to their maturity level. According to policies for transition to smart government for the government of Dubai, the Moroccan government defined an administrative services repository [27], where it presented a scale composed from four levels to evaluate maturity of electronic services. These levels are determined as follows: • Level one. Information only: This electronic service provides users with the option to view the detailed description of the service on an electronic channel only. • Level two. Interaction only: This electronic service offers the user the possibility to send information unilaterally (from the user to the administration), through an electronic channel. Ex: Sending forms, observations, or suggestions. • Level three. Partial dematerialization: This electronic service offers the user the possibility to perform at least one-step in a dematerialized way (a step is dematerialized when the interaction between user and administration is two-way.) • Level four. Complete dematerialization: This electronic service offers the user the possibility to complete all the steps in a dematerialized way (Fig. 5). 7.2 New Model of Maturity of Moroccan Electronic Public Service In this context, the previous model related to maturity evaluation of electronic public services is based on four levels, which are: (i) information only, (ii) Interaction only, (iii) partial dematerialized and (iv) complete dematerialized. We propose to adapt it to the current situation on artificial intelligence and industry 4.0; consequently, we add the 5th level: intelligent services and the model it becomes as follow (Fig. 6):

Electronic Public Services in the AI Era

79

Level 4: Complete dematerialization Level 3: Partial dematerialization Level 2: Interaction only Level 1: Information only Fig. 5. Maturity of electronic services

Level 5: intelligent services

Level 4: Complete dematerializaƟon Level 3: ParƟal dematerializaƟon Level 2: InteracƟon only Level 1: InformaƟon only

Fig. 6. Maturity of electronic services in AI era.

8 Conclusion and Future Research To conclude, many governments benefit from the development of technology by digitizing their public services and their governance; consequently, we find many successful examples of e-government, such as those presented in fourth paragraph (Singapore and European countries). In this context, the Moroccan government has also begun to implement e-government through the Moroccan digital plan 2013. Indeed, many services are digitized, such as biometric passports, and e-consulate. In the same framework, the Moroccan government has defined a scale from 1-4 to evaluate electronic services which are Level 1. Information only, Level 2. Interaction only, Level 3. Partial dematerialization and Level 4. Complete dematerialization. On the other hand, web and industry have developed exponentially and we are beginning now to incorporate artificial intelligence. Many countries had reorganized

80

H. Hadi et al.

their structures and tools to develop their e-governance for example, the algorithmic processing of Visa applications from India and China by the Canadian government. Consequently, we proposed a new model of evaluating the majority of public services, which includes a new component: intelligent services. Therefore, In the future, these characteristics should enable the development of a successful framework to implement an e-government incorporating revolutionary infrastructural, informational and human resources technologies. However, it is important also to mention that the application of AI in public service must take in consideration many component as shown in the following diagram (Fig. 7):

Strength:

Weakness:

Refocusing public officials on their core business

Cost

SWOT Analysis Opportunity:

Satisfying users of public services

Threat:

Being overwhelmed by technology therefore lose confidence in the user.

Fig. 7. SWOT Analysis of AI application in public service.

References 1. Concha, G., Naser, A.: CEPAL – Colección Documentos de proyectos, “El desafío hacia el gobierno abierto en la hora de la igualdad”, Santiago de Chile, Impreso en Naciones Unidas, pp. 11–14 (2012) 2. Rocha, A., Sá, F.: Planning the Information Architecture. A local public administration organization. Inf. Dev. 30(3), 223–234 (2014) 3. Justice, J., Melitski, J., Smith, D.: E-government as an instrument of fiscal accountability and responsiveness: do the best practitioners employ the best practices? Am. Rev. Pub. Adm. 36(3), 301–322 (2006) 4. Ostašius, E.: Assessing maturity for E-government services. In: 13th Working Conference on Virtual Enterpries (PROVE), pp. 301–309, Bournemouth, October 2012 5. Field, T., Muller, E., Lau, E., Gadriot-Renard, H., Vergez, C.: The case for E-government: excerpts from the OECD report “The E-government Imperative”. OECD J. Budgeting 3(1) 61–96 (2003)

Electronic Public Services in the AI Era

81

6. Carta Iberoamericana de Gobierno Electrónico, Aprobada por la IX Conferencia Iberoamericana de Ministros de Administración Pública y Reforma del Estado Pucón, Chile, 31 de mayo y 1° de junio de 2007, Adoptada por la XVII Cumbre Iberoamericana de Jefes de Estado y de Gobierno, Santiago de Chile, 10 de noviembre de 2007 (2007 7. Namkoong, K., Cho, K., Kim, S.: Public Administration and Policy in Korea: Its Evolution and Challenges. Taylor and Francis (2017) 8. Brown, M.M., Brudney, J.L.: Achieving advanced electronic government services: an examination of obstacles and implications from an international perspective. Paper presented at the National Public Management Research Conference, Bloomington, October 2001 9. Papadomichelaki, X., Mentzas, G.: A multiple-item scale for assessing E-government service quality. In: Wimmer, M.A. (ed.), EGOV 2009. vol. 5693, pp. 163–175. Springer, Heidelberg (2009) 10. Alanezi, M.A., Kamil, A., Basri, S.: A proposed instrument dimensions for measuring Egovernment service quality. Int. J. u-and e-Serv. Sci. Technol. 3(4), 1–18 (2010) 11. Zaidi, S.F.H., Qteishat, M.K.: Assessing E-government service delivery (government to citizen). Int. J. eBussiness eGovernment Stud. 4(1), 45–54 (2012) 12. Hien, N.M.: A study on evaluation of E-government service quality. Int. J. Soc. Manag. Econ. Bus. Eng. 8(1) 16–19 (2014) 13. Watson, R.T., Mundy, B.: A strategic perspective of electronic democracy. Commun. ACM 44(1), 27–30 (2001) 14. eGov Masterplans, January 2016. https://www.tech.gov.sg/media/corporate-publications/ egov-masterplans 15. Millard, J., Iversen, J.S., Kubicek, H., Westholm, H., Cimander, R.: Reorganisation of government back-offices for better electronic public services. In: European Good Practices (Back-Office Reorganisation). EC eGovernment Unit, Brussels (2004) 16. O’Reilly, T.: What is web 2.0: design patterns and business models for the next generation of software. Commun. Strat. 65, 17–37 (2007) 17. Sharma, A.: Introducing The Concept of Web 3.0. http://www.tweakandtrick.com/2012/05/ web-30.html. Accessed 08 Dec 2017 18. Lasi, H., Fettke, P., Kemper, H.G., Feld, T., Hoffmann, M.: Industry 4.0. Bus. Inf. Syst. Eng. 6, 239–242 (2014) 19. Lu, Y.: Industry 4.0: a survey on technologies, applications and open research issues. J. Ind. Inf. Integr. 6, 1–10 (2017) 20. Charalabidis, Y., Loukis, E., Alexopoulos, C., Lachana, Z.: «The Three Generations of Electronic Government: From Service Provision to Open Data and to Policy Analytics» EGOV2019, vol. 11. Springer (2019) 21. Christian, D.: leSoleil. 19 09 (2019). https://www.lesoleil.com/opinions/point-de-vue/lintel ligence-artificielle-et-la-fonction-publique-5a79f9f208a5b8ac303ca. Accessed 01 Jan 2020 22. M. o. modernisation, 1 November 2019. https://www.modernisation.gouv.fr/mots-cle/intell igence-artificielle 23. Mehr, H.: «Artificial Intelligence for Citizen Services and Government» Harvard Ash, August 2017 24. Charalabidis, Y., Loukis, E., Alexopoulos, C., Lachana, Z.: «The Three Generations of Electronic Government: From Service Provision to Open Data and to Policy Analytics» In: EGOV 2019, vol. 11685 (2019) 25. ministère de l’industrie, du commerce et des nouvelles technologies (2011). http://egov.ma/en. Accessed 10 Jan 2019

82

H. Hadi et al.

26. ministry of economy, finance and administration reform . 2019. service public. http://ereadi ness.service-public.ma/niveaux-maturite-electronique. Accessed Jan 2020 27. Ministry of economy, finance and administration reform: Administrative services repository: definitions, classification and criteria for evaluating the level of electronic maturity. Moroccan government, Rabat (2019)

Planning and Designing Smart Sustainable Cities: The Centrality of Citizens Engagement Sukaina Al-Nasrawi(B) Beirut, Lebanon

Abstract. Smart Sustainable Cities is a concept regarded differently by various relevant stakeholders. This also applies to the concept of engagement throughout the design and planning of these cities to achieve smartness. The perspectives on the centrality of engagement vary between academicians, professionals, private sector, governments, and others. This research focuses on the significance of stakeholders’ engagement in designing and planning SSCs, mainly citizens engagement. It explores the characteristics of a SSC and the relationship between its dimensions, initiatives and projects which require interaction that can only be granted through proper engagement of citizens. It explores pathways of influence for SSCs and the engagement process through the Spectrum of Citizen Engagement. This paper argues that no smartness of cities is achieved without citizens empowerment. It concludes that smartness of cities is more than digital, technical, or technological; Smartness of cities is about People and SSCs are about providing the ability and opportunity for everyone to be an active citizen. Keywords: Smart sustainable city · Citizens engagement · Smartness

1 Introduction Cities are the hubs for innovation and change and the centre for economic growth. They host the institutions and mechanisms to promote the changes needed to accelerate sustainable development. However, the progressive potential of urbanization can equally be lost in the absence of socially inclusive urban plans and policy decisions that foster wellbeing and leave no one behind. Countries face the challenges of a siloed approach to urban development, data voids, and a lack of policy coherence in order to shift to the new paradigm of the city and right to the city, as a macro-level public good, where the economic, social, cultural and environmental rights and quality of life for all inhabitants, of present and future generations are guaranteed without discrimination of any kind (NUA 2016). Vulnerable urban groups, specifically youth, poor, persons with disabilities, homeless, migrants, minorities and women in these categories, are disproportionately affected by gaps in insufficient city services, infrastructure and social development opportunities as well as the intensification of climate change related disasters. These inequalities will persist and even increase if not properly addressed when building Smart Sustainable Cities (SSC) or transforming existing cities into smart and sustainable ones and achieving smartness. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 83–95, 2021. https://doi.org/10.1007/978-3-030-66840-2_7

84

S. Al-Nasrawi

Researchers and practitioners refer to smartness differently. In this paper, it is considered as a concept that has conceptual and operational aspects (Al-Nasrawi et al. 2017a) and is defined as a “dynamic process through which Information and Communication Technologies (ICTs) and other means are used to advance innovative multidimensional urban efficiency in line with the principles of sustainable development” (Al-Nasrawi et al. 2016). A core means to achieve smartness is through the engagement of relevant stakeholders, citizens in particular.

2 Characteristics of a Smart Sustainable City SSC is a cumulative concept where each of its constituent notions namely smart, sustainable and cities is important. Researchers take different stands when identifying the components of a SSC. (Dirks and Keeling 2009) stresses the importance of the organic integration of a city’s various systems in creating a smart city. (Kanter and Litow 2009) reiterate on this view and affirm that introducing intelligence in each subsystem of a city on one-to-one basis is insufficient to create a smart city, as cities should be dealt with holistically. However, many other researchers, with the intent of clarifying the concept have separated it into many features and dimensions. (Komninos 2002) in his attempt to describe the features of an intelligent city (referring to a smart city), indicated that this has four possible dimensions. The first dimension concerns the application of a wide range of electronic and digital technologies to create a cyber, digital, wired, informational or knowledge-based city; the second is the use of information technology to transform life and work; the third is to embed ICTs in the city infrastructure; the fourth is to bring ICTs and people together to enhance innovation, learning, and knowledge. (Giffinger et al. 2007) identified four components of a smart city namely: industry, education, participation, and technical infrastructure. This list of four components has since been expanded by the Centre of Regional Science at the Vienna University of Technology which has identified six main components (Giffinger and Gudrun 2010). These components are smart economy, smart environment, smart governance, smart living, smart mobility, and smart people. The main addition to the previous list that should be highlighted is the inclusion of the “quality of life.” This component emphasizes the definition of a SSC as a city that increases the quality of life of its citizens (Giffinger et al. 2007) (ITU-T 2014). However, many researchers argue that a separate dimension should not be attributed to the quality of life since all the actions taken in the remaining areas identified should aim at enhancing the quality of life of citizens noting that it could be considered as a basic component (Shapiro 2006). The ITU-T/FG-SSC, through its standardization efforts for the definition of SSC, defined six primary indicators of SSC. These are comparable to the main six dimensions developed by the Centre of Regional Science at the Vienna University of Technology except for one dimension that is smart environment which ITU refers to as Smart environment and sustainability. The six identified dimensions are the ones that are mostly referred to in the literature. These are smart economy, smart environment, smart governance, smart living, smart mobility, and smart people (Al-Nasrawi et al. 2017b).

Planning and Designing Smart Sustainable Cities

85

2.1 Relationship Between a Smart Sustainable City Dimensions, Initiatives and Projects There exists a distinction between the SSC, its initiatives, projects and applications. In fact, a SSC exhibits many initiatives. Within the SSC initiatives, different actors work together, and innovative ways of collaborating are created with the aim to develop the SSC. Each initiative has its own aims to develop one or some characteristics of the SSC. To achieve these aims, SSC initiatives develop specific projects. It is these concrete projects in which concrete SSC applications are developed. Although many SSC projects are developed within an initiative, this is not compulsory. Individual projects can be developed independent of an initiative. Within SSC projects, many different actors and technologies are involved. The relation between a SSC and its distinct aspects, smart city initiatives and projects is represented in Fig. 1.

Fig. 1. Relationship between smart sustainable city dimensions, initiatives and projects

It goes without saying that interactions take place between the SSC, its various aspects, initiatives, and projects as well within these levels. This interaction contributes to making a city smart and sustainable. A critical issue to mention when speaking about interaction is that of citizens as no city exists without its citizens. They are core to

86

S. Al-Nasrawi

urban development. This human perspective has been highlighted by selected researchers (Chourabi et al. 2012), (Hollands 2008) and (Nam and Pardo 2011) but no indication on the exact group of people to include nor on their specific role in the development lifecycle of a SSC. This paper addresses the central role of citizens in SSC. 2.2 Pathways of Influence for a Smart Sustainable City The adoption of SSC solutions faces a set of challenges that vary from one region to another and between countries within the same region. These challenges range from social and technological to economic and regulatory. The Arab region suffers in varied degrees from this set of challenges (IEC 2014), (European Commission 2014) and (Ebrahim and Irani 2005). These challenges include lack of SSC expertise and challenges at the economic, social and governance levels (AlNasrawi et al. 2015). SSCs require horizontal integration and creation of a sustainable system of systems capable of generating opportunities for the city and its citizens. This integration increases the complexity of how to operate, regulate, finance, and plan the SSC. It also includes issues related to integration and convergence; administration; standardization and interoperability; management of open data; data privacy and security; integrity of data and others. At the economic level, the big challenge of any SSC project is the need for a sustainable financial investment to create and/or renovate the technological and physical infrastructure as well as to invest in digital solutions. The latter is highly dependent on the economic status of the country, thereby the city. Therefore, the SSC project’s plan should identify how the SSC services will be delivered and how these services will be funded. The challenges at the social level, refer to the lack of skilled people in the field, lack of collaboration between research and development, the need for greater citizens’ engagement, misunderstanding of the impact of smart technologies on the city’s daily administrative level, insufficient attention to citizens and others. Moreover, in terms of governance, the inherent nature of a SSC as a complex system of systems increases the need for long-term and holistic policies to enable institutional and governance mechanisms for SSCs initiatives. The latter requires coordination and integration between public, private and civil bodies in addition to the collaboration with different stakeholders to make the city function efficiently and effectively as one organism. Moreover, to succeed in the implementation of smarter and more sustainable cities, expert professionals in the field are needed. This refers to urban planners, technology experts, economists, environmentalist, socialists among other professionals, who must be prepared to deal with the challenges of the new urban landscape. In addition to being experts in different areas, they need to be aware of all the other aspects that define and shape cities. For example, an urban planner or an environmental expert would also have general knowledge about the capabilities and functioning of the Internet of Things (IoT) applied to cities, and the ICT infrastructure that is needed for that; he/she could thus have a holistic vision of the SSC. Two main points found to be of high influence on the path of a SSC, thus of impact on its smartness are also important to mention. These are: the low levels of citizen engagement and participation and the growing inequalities. In fact, citizens’ engagement in the advancement and development of municipal projects and initiatives is critical for

Planning and Designing Smart Sustainable Cities

87

urban development. As the ultimate users of the provided city services, it is important that city planning strategies include the vision and expectations of the citizenry and include this aspect in the assessment of the performance of SSC. Modern technologies like mobile applications or social media tools may assist in enabling citizen engagement and participation (ITU-T FG-SSC). As for the growing inequalities, including gender inequality, the widening income disparities was ranked as the second most significant global trend by the World Economic Forum in 2014 (WEF 2014). Also, on the same note, Oxfam suggests that “seven out of ten people live in countries where economic inequality has tremendously increased in the last three decades and almost half of the world’s wealth is now owned by just one per cent of the population.” (Oxfam 2014). Since SSC strive for social sustainability is therefore important that the projects developed include all the levels of society. Moreover, the role of ICTs in SSCs should not be neglected. In fact, ITU-T FGSS highlights their crucial existence due to their ability to act as a digital platform from which an information and knowledge network can be created. (Nam and Pardo 2011) refers to technology as a crucial dimension of a smart city. This viewpoint is supported by many authors and practitioners in the field. However, another group of researchers including the ITU_T FGSSC attribute a high importance to the role of ICTs in the establishment of SSC but do not consider technology as a dimension of the city; it rather looked at as the component that lies at the core of SSC and acts as the central nerve orchestrating all the interactions between the different pillars and the infrastructure. It is an indispensable ingredient of the SSC that acts as a glue connecting different daily living services to public infrastructures. It is the orchestrator of the various elements of the SSC which should coexist (IEEE SC 2015).

3 Stakeholders Engagement in Smart Sustainable Cities: The Focus on Citizens The nature of the challenges facing SSCs justifies the need to better understand the role of stakeholders in the planning and design processes. This can help overcome obstacles and take advantage of opportunities towards the realization of the goals of the city thereby affecting its smartness. Stakeholder engagement could be viewed as a mean for enhancing the relevance, responsiveness, accountability, transparency, inclusiveness, legitimacy, effectiveness, efficiency, and equitability of the decision-making process. Therefore, the engagement of stakeholders in the implementation of SSCs can have a significant contribution to the governance of the city and can affect its performance noting that citizen engagement ensures citizen satisfaction, which in turn ensures maximum efficiency of urban solutions and good governance (Shankar 2016). 3.1 Types of Stakeholders of a Smart Sustainable City A stakeholder is any entity with a declared interest at stake in a policy concern (World Bank 2016). Different definitions exist in the literature and they all coincide to the same concept. In this paper, a stakeholder is regarded as an entity that could be an institution or an individual and that has interest in SSCs. This entity may affect or be affected by

88

S. Al-Nasrawi

the deployment of SSCs. Different stakeholders have varying roles in a SSC. For example, organizations may promote SSC solutions through the provision of needed funding expertise. ICT industries may promote the development of SSC solutions. Academia raises awareness and educates professionals and studies trends and new initiatives. Citizens and visitors pay for the smart solutions. Consulting firms may benchmark progress towards smartness. Standardization institutes may develop a common language to use by all stakeholders to minimize the fuzziness of the concepts and enable harmonized advancement. The ITU-T focus group on SSC compiled a list of stakeholders of SSC and validated it based on the general classification of stakeholders. It classified them into twelve categories which are not exhaustive. These include Academia, Research Organizations and Specialized bodies, Citizens and Citizen organizations, City Services Companies, ICT Companies (Telecom Operators, Start-ups, Software Companies), Industry Associations, International, Regional and Multilateral Organizations, International, Regional and Multilateral Organizations, Municipalities, City Council and City administration, National and regional governments, Non-Governmental Organizations, Standardization Bodies, Urban Planners and Utility providers. These stakeholders have varying roles and responsibilities when implementing projects relating to the SSC. According to the Logical Framework Approach, a methodology mainly used for designing, international development projects, the SSC stakeholders can be seen as drivers or enablers for SSC solutions and can be looked at also as active, beneficiaries and affected stakeholders. Observing this classification shows that citizens fall under all listed categories which highlights their vital role in the performance of the city and connotes their impact on smartness of SSCs. Also, when trying to explore the interaction amongst the different stakeholders of SSC, it becomes apparent that citizens and citizens’ organization constitute the basic part of the city that interacts directly or indirectly with all other stakeholders (ITU-T FG-SSC 2015). 3.2 Citizen Engagement in a Smart Sustainable City: Literature Review A thorough review of the literature indicates that the definitions and interpretations of the concepts of citizen engagement vary between academicians and practitioners. There exists a thin line between citizen involvement, participation and engagement noting that the terms “citizen”, “public”, “involvement” and “participation” are often used interchangeably (Mize 1972). Citizen engagement is defined as acts of sharing of information, power, and mutual respect between governments and their citizens with defined characteristics (Sheedy 2008). Moreover, citizen engagement denotes the commitment from government to work with its citizens in a reliable and continuous manner (Pham 2016). It also requires a commitment from the government to continuously work with its citizens since through the real citizen engagement, governments can better understand the issues of people and communities, be exposed to viable solutions and give opportunities for citizens to use their knowledge and skills to help shape policies and plans that affect them (Lukensmeyer and Hasselblad 2006). Therefore, building social capital is essential to ensure that citizens acquire the capabilities and skills needed to meet the challenges of the future and can properly engage.

Planning and Designing Smart Sustainable Cities

89

The engagement of citizens is a fundamental cornerstone of SSC’s governance. Numerous are the reason that governments may look at to engage citizens. These include reducing the “democratic deficit” and contributing to good governance (Axelsson et al. 2010). The latter is measured by the extent to which it involves its citizens in the overall decision-making process (Shankar 2016). Moreover, engagement helps in creating groups of “expert citizens” who are working within community organizations; thus, they can operate well inside the system of governance. The expert citizens are believed to be a “resource or political capital for democracy” because they experience dealing with problems of exclusion based on ethnicity, gender, class on daily basis (Bang 2009). Addressing the topic of people and communities as part of SSC, although ignored in most of the cases, is identified as critical (Chourabi et al. 2012). The social infrastructure, including intellectual and social capital, is essential to SSC since it contributes to creating an environment that is adequate for building a creative class considered as a fundamental asset for SSCs (Albino et al. 2015). Creativity and social innovation are regarded as key drivers to SSCs which means people, education, learning, and knowledge are central (Nam and Pardo 2011) and (TEPSIE 2015). Given the above, it is logical to say that making people smarter could be considered as one of the objectives of SSC initiatives at the same time smart people represents a fundamental asset for SSCs as one of the most relevant resource SSCs can rely on to make cities smarter. In addition to impacting the city competitiveness, the engine for economic growth, smart, educated, informed and involved people can become active users and engage with the smart city initiatives to the extent that they can influence the effort to be a success or a failure, both by adopting and using the services made available to them and by participating in the governance and the management of the city (Chourabi et al. 2012). Also, by carefully observing existing examples of SSCs, a variance in the way citizens are engaged is noted. Selected cities, as is the case in India, resort to the concept of “Kumbathons” to engage citizens whereas in Amsterdam, an initiative called “Smart Citizen” is implemented in an aim to connect data, people, and knowledge to create more engaged communities. Therefore, citizen engagement in terms of assessing the performance of a SSC is a major factor to consider. Also, there are two types engagement, namely, citizens to city and city to citizens as shown in Fig. 2. The “City to citizens” captures the level of satisfaction of citizens of the services offered by the city (top-down). This level refers to the perception of citizens about the services offered. The “Citizens to city” type of engagement captures the engagement of citizens in providing the solutions themselves (bottom up approach) or highlighting the needs and priorities to other city stakeholders. Institutional and practical challenges exist in the citizen engagement practices and processes. The challenges include the financial resources, the structures of decision making processes, the methods/frameworks for citizen engagements, the full inclusion of citizen, the institutional justification for taking on the inputs from the citizens, timing, the sharing of power with representatives, and many more (Boyd and Lukensmeyer 2004) and (Axelsson et al. 2010). One notable practical challenge is the assessment of level of engagement and results of the citizen engagement programmes, since their effects are not apparent in the short term (Voorberg et al. 2014).

90

S. Al-Nasrawi

Citizen Engagement in SSC

City to Citizens

Citizens to City

Fig. 2. Engagement of citizens in smart sustainable city planning and design

The centrality of citizen engagement was validated though capturing the insights of thirteen experts from different countries around the World, namely, Canada, Colombia, Estonia, Germany, India, Italy, Japan, Norway, Russia, United Arab Emirates (UAE), United Kingdom (UK), the United States (USA), and South Korea. The selected experts lead smart city projects in their respective countries. The validation questionnaire consisted of a set of questions out of which one of them focused on the importance of the role of citizens in attaining and achieving smartness. All experts confirmed the centrality of citizen engagement in planning smart cities and in achieving and sustaining smartness of a SSC. In fact, 54% stated that the role of citizens is very important; 38% stated that the role of citizen is important; and 8% stated that the role of citizens is moderately important (Al-Nasrawi 2019).

4 Frameworks for Citizen Engagement There are different spectrums of approaches to citizen engagement in the literature introduced via several frameworks. These are important as proper citizen engagement connotes sustained and active involvement on the long run through processes that foster shared decision-making and continuous collaboration and learning. In the context of SSCs, a thorough research on search engines using words like “citizens’ engagement”, “involvement”, “participation”, “framework” and “spectrum”, it was clear that the framework developed by the International Association of Public Participation is widely used even amongst highly renowned entities in the area such as the smart cities council and the European Manifesto on Smart Cities and others. Therefore, given its adoption by the SSC communities, we will refer to this framework to analyze the citizens engagement process. 4.1 The Spectrum of Public Participation (The Spectrum) The Spectrum was designed in 1997 by the International Association for Public Participation (IAP2) to assist with the selection of the level of participation that defines the role of the public in any participatory process. The Spectrum shows that varying levels of participation are valid as all depends on the goals, resources, time frames and levels of concern in the decision to be taken. It is used on international level and in

Planning and Designing Smart Sustainable Cities

91

different contexts. The Spectrum specifies five levels of government/participant engagement and expected outcomes with increasing level of impact as we go from left to right. These levels are “Inform”, “Consult”, “Involve”, “Collaborate”, and “Empower”. Selected scholars argued that the Spectrum has a bias toward direct democracy (Nabatchi 2012). To overcome this challenge and ensure its neutrality, the Spectrum was extended to include supplementary factors capturing the level of communication between government stakeholders and the public noting that moving throughout the framework from the left to the right side presents increased authority in decision-making and transitions in the ways of communication from one way to two-way communication to deliberative communication as shown in Fig. 3.

Fig. 3. Levels of public engagement impact

The first phase “Inform” is a one-way communication that is satisfied using fact sheets, websites, open houses, and others. The second phase “Consult” is attained through the establishment of focus groups, conducting surveys, and holding public meetings. The “Involve” phase is where two-way communication takes place and impact of citizens’ engagement increases. This is attained through various means, including, just to name few, workshops, and deliberative polling. The “Collaborate” phase is the stage where citizens’ advices are sought through citizens’ advisory committees’ consensus building and participatory decision-making. Lastly yet importantly, the “Empower” phase is when the final decision making is placed in the hands of the public. This could be realized through delegated decision making and deliberative democracy (IAP2 2017). 4.2 Information and Communication Technologies and Citizen Engagement The growth of ICTs, since late 1990s, has enabled the local governments in fostering citizen engagement (Ferro et al. 2013). Numerous researchers highlighted the positive impact of modern technologies and ICT applications in helping local and central governments to cultivate a culture for active citizen engagement. The role of ICTs includes the provision of timely information (Fuentes-Bautista 2014), creating effective platforms for citizens engage in public life (Linders 2011) and facilitating the formation of social networks (Bonsón et al. 2012) added to contributing to deliberative democracy (Astrom et al. 2012) and (Hong and Nadler 2012). It is worth mentioning that the first role mentioned above related to the provision of timely and reliable information for citizens to

92

S. Al-Nasrawi

increase their awareness and enable them to make informed decisions has been progressing since the very first Web 1.0. Scholars discovered that cities are using their websites as a mean to provide information about themselves and at the same time to help citizens to efficiently use their time and resources as paying fees and saving the cost of transport. Moreover, researchers and practitioners found that the ICT applications also helped governments in developing their public services to make them inclusive for the needs of citizens (Hong and Nadler 2012). Speaking about one way and two ways communication issue, the adoption of ICT tools enables the two-way public affairs’ discussions on the social media. In fact, the information can be broadcasted at lower costs compared to other traditional forms of print newspaper and advertisements added to the fact that the use of social media assists the governments in sensing citizens’ expectation towards innovative ideas, thus helping the governments to meet their needs (Ellison and Hardey 2014). The social media improves the two-way dialogues between governments and citizens and vice versa (Panagiotopoulos et al. 2013) but it might not always be an inclusive approach. Factors such as digital skill, access, and generation gaps can prohibit a group of citizens from making use of it which highlights a very important aspect that is the real opportunity and capability of citizens to engage. Indeed, ICTs offer governments at all levels a series of tools and platforms to adopt novel approaches in establishing greater transparency, fighting against corruption, calling for stronger accountability, improving public service efficiency and enhancing good governance. These technologies enable the engagement and participation of citizens directly into all processes. The latter is conditioned by the true will of the governments to be open and have a sincere desire to work with citizens for the common goods at all levels (Axelsson et al. 2010).

5 Conclusion This paper explores a concept that is of high importance to researchers and practitioners in the field. It captures the relationship between SSC Dimensions, Initiatives and Projects and it explores the pathways of influence for SSCs noting the central role of stakeholders’ engagement with particular focus on citizens. The uniqueness of this research lies in highlighting the human aspect when capturing smartness of a SSC, an aspect that is seldom neglected throughout the SSC development lifecycle despite its vital role. It emerged into the process of citizens engagement through referencing the Spectrum of Public Participation which shows that engaging citizens starts with them being informed to empowered with decision making. The paper showed that smartness of cities is more than digital, technical, or technological; Smartness of cities is about People. Smartness is above all inclusive, providing the ability and opportunity for everyone to be an active citizen.

References Albino, V., Berardi, U., Dangelico, R.: Smart cities: definitions, dimensions, performance, and initiatives. J. Urban Technol. 22, 3–21 (2015)

Planning and Designing Smart Sustainable Cities

93

Al-Nasrawi, S.: A validated model for citizen engagement and smartness of cities. In: 2019 International Conference on Smart Applications, Communications and Networking (SmartNets), Sharm El Sheik, Egypt, 2019, pp. 1–6 (2019). https://doi.org/10.1109/smartnets48225.2019. 9069794 Al-Nasrawi, S., Adams, C., El-Zaart, A.: Assessing smartness of smart sustainable cities: a comparative analysis. In: Proceedings of the International Conference on Sensors, Networks, Smart and Emerging Technologies (SENSET 2017) (2017a) Al-Nasrawi, S., Adams, C., El-Zaart, A.: The anatomy of smartness of smart sustainable cities: an inclusive approach. In: Proceedings of the International Conference on Computer and Applications (ICCA) (2017b) Al-Nasrawi, S., Adams, C., El-Zaart, A.: Smartness of smart sustainable cities: a multidimensional dynamic process fostering sustainable development. In: Proceedings of the 5th International Conference on Smart Cities, Systems, Devices and Technologies (SMART 2016) (2016) Al-Nasrawi, S., Ibrahim, M., El-Zaart, A., Adams, C.: Challenges facing E-government and smart sustainable cities: an Arab region perspective. In: Proceedings of the 15th European Conference on eGovernment (ECEG 2015), pp. 396–402 (2015) Astrom, J., Grönlund, Å.: Online consultations in local government: what works, when, and why. Connecting Democracy: Online Consultation and the Flow of Political Communication, pp. 75–96 (2012) Axelsson, K., Melin, U., Lindgren, I.: Exploring the importance of citizen participation and involvement in e-government projects: practice, incentives, and organization. Transform. Gov.: People Process Policy 4, 299–321 (2010) Bonsón, E., Torres, L., Royo, S., Flores, F.: Local e-government 2.0: social media and corporate transparency in municipalities. Gov. Inf. Q. 29, 123–132 (2012) Bang, H.P.: Yes, we can: identity politics and project politics for a late-modern world. Urban Res. Pract. 2, 1–21 (2009) Boyd, A., Lukensmeyer, C.: Putting the “public” back in management: seven principles for planning meaningful citizen engagement. Public Manag. 86(7), 10–15 (2004) Chourabi, H., Nam, T., Walker, S., Gil-Garcia, R.J., Mellouli, S., Nahon, K., Pardo, T., Scholl, H.: Understanding smart cities: an integrative framework. In: Proceedings of the 45th Hawaii International Conference on System Sciences, pp. 2289–2297. IEEE (2012). https://doi.org/10. 1109/hicss.2012.615 Dirks, S., Keeling, M.: A Vision of Smarter Cities: How Cities Can Lead the Way into a Prosperous and Sustainable Future. IBM Institute for Business Value, Cambridge (2009) Ebrahim, Z., Irani, Z.: E-government adoption: architecture and barriers. Bus. Process Manag. J. 11(5), 589–611 (2005) Ellison, N., Hardey, M.: Social media and local government: citizenship, consumption and democracy. Local Gov. Stud. 40, 21–40 (2014) European Commission: Smart cities: smart cities and sustainability (2014). http://ec.europa.eu/ dgs/connect/en/content/smart-cities-0 Ferro, E., Loukis, E.N., Charalabidis, Y., Osella, M.: Policymaking 2.0: from theory to practice. Gov. Inf. Q. 30, 359–368 (2013) Fuentes-Bautista, M.: Rethinking localism in the broadband era: a participatory community development approach. Gov. Inf. Q. 31, 65–77 (2014) Giffinger, R., Fertner, C., Kramar, H., Kalasek, R., Pichler-Milanovis, N., Meijers, E.: Smart cities: ranking of European medium-sized cities. Centre of Regional Science (SRF), Vienna University of Technology (2007). http://www.smart-cities.eu/download/smart_cities_final_report.pdf Giffinger, R., Gudrun, H.: Smart cities ranking: an effective instrument for the positioning of cities. Archit. City Environ. 4(12), 7–25 (2010). http://upcommons.upc.edu/revistes/bitstream/2099/ 8550/7/ACE_12_SA_10.pdf

94

S. Al-Nasrawi

Hollands, R.: Will the real smart city please stand up? City: Anal. Urban Trends Cult. Theory Policy 12, 303–320 (2008) Hong, S., Nadler, D.: Which candidates do the public discuss online in an election campaign? The use of social media by 2012 presidential candidates and its impact on candidate salience. Gov. Inf. Q. 29, 455–461 (2012) IEEE Smart Cities (IEEE SC): IEEE smart cities (2015). https://smartcities.ieee.org/ International Association for Public Participation (IAP2): IAP2’s public participation spectrum (2017). http://c.ymcdn.com/sites/www.iap2.org/resource/resmgr/foundations_cou rse/IAP2_P2_Spectrum_FINAL.pdf International Electrotechnical Commission (IEC): Orchestrating infrastructure for sustainable smart cities (2014). ISBN 978-2-8322-1833-4 ITU-T FG-SSC: Technical report on smart sustainable cities: an analysis of definitions. United Nations, International Telecommunication Union - Telecommunication Standardization Sector Focus Group on Smart Sustainable Cities (ITU-T FG-SSC) (2014) ITU-T FG-SSC: Setting the stage for stakeholders’ engagement in smart sustainable cities. United Nations, International Telecommunication Union - Telecommunication Standardization Sector Focus Group on Smart Sustainable Cities (ITU-T FG-SSC) (2015) Kanter, R.M., Litow, S.S.: Informed and Interconnected: A Manifesto for Smarter Cities, pp. 9–14. Harvard Business School General Management Unit, Boston (2009) Komninos, N.: Intelligent Cities: Innovation, Knowledge Systems and Digital Spaces. Spon Press, London (2002) Linders, D.: We-government: an anatomy of citizen coproduction in the information age. In: Proceedings of the 2011 Digital Government Society Conference (2011) Lukensmeyer, C.J., Hasselblad, T.L.: Public Deliberation: A Manager’s Guide to Citizen Engagement. IBM Center for the Business of Government, Washington, D.C. (2006) Mize, C.E.: Citizen Participation in Public Decision-Making: A Study of the Willamette National Forest. University of Oregon, Oregon (1972) Nabatchi, Y.: Putting the “public” back in the public values research: designing participation to identify and respond to values. Public Adm. Rev. 72(5), 699–708 (2012) Nam, T., Pardo, T.: Conceptualizing smart city with dimensions of technology, people, and institutions. In: Proceedings of the 12th Annual International Digital Government Research Conference, pp. 282–291 (2011) NUA: The new urban agenda (2016). http://habitat3.org/wp-content/uploads/NUA-English.pdf Oxfam: Working for the few (2014). http://www.oxfam.org/sites/www.oxfam.org/files/bp-wor king-for-few-political-capture-economic-inequality-200114-en.pdf Panagiotopoulos, P., Barnett, J., Brooks, L.: Social media and government responsiveness: the case of the UK food standards agency. In: eGov2013, Lecture Notes in Computer Science, vol. 8074, pp. 310–321 (2013) Pham, L., Mai, T.T., Messy, B.: Key factors for effective citizens engagement in smart city: the case of Cork City. In: IoT and Smart City Challenges and Applications (2016) Shankar, R.: Why smart cities need smart citizens (2016). www.thehindu.com/features/homesand-gardens/why-smart-cities-need-smart-citizens/article8625075.ece Shapiro, J.M.: Smart cities: quality of life, productivity, and the growth effects of human capital. Rev. Econ. Stat. 88(2), 324–335 (2006) Sheedy, A.: Handbook on Citizen Engagement: Beyond Consultation. Canadian Policy Research Networks Inc., Ottawa (2008) The World Economic Forum (WEF): Outlook on the global agenda (2014). http://www.weforum. org/reports/outlook-global-agenda-2014 TEPSIE: Growing the field of social innovation in Europe. Deliverable of the project: “the theoretical, empirical and policy foundations for building social innovation in Europe” (TEPSIE). European Commission, DG Research, Brussels (2015)

Planning and Designing Smart Sustainable Cities

95

Voorberg, W., Bekkers, V.J.J.M., Tummers, L.G.: Co-Creation in Social Innovation: A Comparative Case-Study on the Influential Factors and Outcomes of Co-Creation. IRSPM, Ottowa (2014) World Bank: Stakeholder analysis (2016). http://www1.worldbank.org/publicsector/anticorrupt/ PoliticalEconomy/stakeholderanalysis.htm

Review of Learning-Based Techniques of Sentiment Analysis for Security Purposes Mohammed Boukabous(B)

and Mostafa Azizi

MATSI Lab, ESTO, Mohammed First University, Oujda, Morocco {m.boukabous,azizi.mos}@ump.ac.ma

Abstract. Big data refers not only to datasets that are big but also to high velocity and variety which make traditional techniques and tools both insufficient for processing it and unable to propose real solutions for handling it. Gradually as the amount of data is getting bigger and more voluminous, specific solutions are coming to manage and extract knowledge and significant value from it. Among them, we are interested in sentiment or opinion analysis from social media messages that provides the most recent and comprehensive information and trends, due to the widespread of social media and their simplicity and easiness of use. This study conducted in this paper provides an overview of the existing literature of learning-based methods regarding the context of sentiment analysis and security intelligence. To this end, we have systematically reviewed most recent papers published over the last five years in the area of security threats in exchanged messages based on sentiment analysis techniques. This review and its findings can serve as a potential basis for our future research directions. Keywords: Security intelligence · Big data · Deep learning · Machine learning · Natural language processing · Sentiment analysis

1 Introduction Due to the affordability, and accessibility of exchanged messages on the Internet, criminals, terrorist, and individuals have constantly used these messages to reach their goals, recruit members, and disseminate their messages, with the intention to commit a criminal act or intent to cause harm or loss to other people. Various governments and organizations have attempted to thwart the use of the World Wide Web by malicious people and terrorist organizations [1]. Being in the age of technology, the data volume and the multitude of their sources have grown exponentially, paving the way for new technical and application challenges; by 2025, data will have grown to around 163 zettabytes according to IDC predicts [2]. These data come from everywhere: online social networks that have become a key communication platform for millions of internet users with more than 3.8 billion people, representing 49% of the world’s population [3]; digital pictures and videos (YouTube users watch more than a billion hours of video per day [4]); forums; blogs; magazines; news; comments, etc. The classical algorithms, frameworks, methods, and tools for © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 96–109, 2021. https://doi.org/10.1007/978-3-030-66840-2_8

Review of Learning-Based Techniques of Sentiment Analysis

97

data processing have become both insufficient or limited for processing these volumes of data and unable to be true solutions for handling it [5]. The problem of extracting and managing valuable knowledge from these big data sources is now among the most popular subjects in computing research. Big data is not just about storing or accessing data, but it also a solution that aims to analyze data in order to understand it and exploit these values [6, 7]. On the one hand, the data continues to get bigger and more voluminous, and, on the other hand, artificial intelligence (AI) is coming to play a key role in providing predictive analytic solutions for big data [8]. AI targets to study challenging problems and to develop software and machines that can reproduce human-like intelligence to capture high-level abstractions in big data, providing important improvement for numerous tasks, in addition to processing and finding patterns in massive quantities of data, particularly its application in sentiments analysis [9].

2 Background 2.1 Security Intelligence Security Intelligence is an orientation of security techniques based on AI. These techniques aim at collecting and organizing all information related to threats from cyberspace, in order to detect possible security threats in real time and draw a portrait of the attackers, or to highlight trends (sectors of activity affected, method used, etc.). This profiling makes it possible to better defend ourselves and to anticipate the various incidents by allowing detection at the beginning of a major attack [10]. Advanced security solutions are now of a tremendous importance, as security intelligence leads the way. The intelligence cycle has traditionally five major phases: identifying needs, gathering information, processing raw information, analysis, and dissemination/action. Likewise, simplicity of navigation, anonymity (liberty to upload any content without revealing identities), and the weakness of the publication system (users only need to have valid website account). Popular social media websites, forums, blogs are frequently being badly used by many hate groups to promote their ideologies (cybercrime [11], cyber-extremism, and cyberhate propaganda) [10]. In fact, social media intelligence (SOCMINT or SMI) refers to the solutions and tools that allow organizations and governments to monitor different platforms and conversations on social media, to react to different signals received on these social media, and to synthesize these individual reactions to deduce trends and user needs. Social media intelligence uses intrusive or non-intrusive means to collect information from social media sites, both open and closed access [12, 13]. Research shows that extremists put out hateful speeches, violent and offensive messages, and comments focusing on their missions. Many hate promotion groups use different popular social media websites to promote their ideologies by broadcasting racist contents to their viewers. Researchers from various disciplines like social sciences, psychology, and computer science have been continuously developing tools and suggesting techniques to combat and counter these problems of online radicalization and make it possible to generate an early warning of civil unrest-related events [14, 15], filter and report these contents to the right authorities.

98

M. Boukabous and M. Azizi

2.2 Machine Learning ML is a subset of artificial intelligence, which is based on mathematical and statistical approaches to give programs the capability to learn from data, identify trends, and make decisions with almost no human intervention [16]. More broadly, it concerns the design, analysis, optimization, development, and implementation of such methods. Machine learning could be globally divided in supervised, unsupervised and semisupervised learning. Supervised learning algorithms are trained on labeled examples, i.e., examples with inputs and outputs already known. Unsupervised learning applied on unlabeled data in this case, the system does not know the correct output for each input. It is up to the algorithm to decide. The goal is to explore the data and discover its structure [17]. Semi-supervised learning which takes certain annotated data as input, and others not. This is a very interesting method that takes advantage of both worlds (supervised and unsupervised), but, of course, also bring their two difficulties, each of these kinds of learning could be improved by involving reinforcement learning which is based on an experience/reward cycle and improves performance with each iteration [18]. Some of best-known machine learning algorithms are Linear Regression, Logistic Regression, Decision Tree (DT), Random Forest, Naive Bayes, k-Nearest Neighbors (kNN), Support Vector Machines (SVM), Artificial Neural Networks (ANN), Restricted Boltzmann machine (RBM). 2.3 Deep Learning DL is a set of machine learning methods attempting to model with a high level of data abstraction using architectures articulated with different non-linear transformations. It is based on networks of artificial neurons inspired by the human brain. These networks are made up of tens or even hundreds of layers of neurons, each layer receives and interprets information from its previous one [26, 27]. Discoveries in this field have proved significant, rapid, and effective progress in many areas, including facial and speech recognition, computer vision, natural language processing, and security [21]. The Deep Learning models learn from big datasets using different algorithms, among them: Convolutional Neural Network (CNN). Is a feed-forward artificial neural network, in which the connection pattern between neurons is inspired by the visual cortex of animals [22], it is widely used in computer vision and image processing applications. CNN is generally composed of three layers (see Fig. 1). The Convolution Layer: It is the key to implement feature extraction. Each convolution kernel convolutes with the upper feature map and can extract one feature, and each of this convolution kernel can be linked with several other feature maps of the upper layer [23]. The formula of the convolution layer is:   l−1 l l l (1) xi ∗ kij + bj xj = f i∈Mj

where Mj represents a selection of input maps xi in the layer l − 1 to form the output map xj in layer l. kijl refers to the kernel convolving input map xil−1 to form output map

Review of Learning-Based Techniques of Sentiment Analysis

99

xjl , for each input map will be convolved with distinct kernels. Each output map is given an additive bias blj . f is the activation function. Most common activation functions are Relu, Sigmoid and Tanh as shown in Fig. 2. The Pooling Layer: It helps to greatly reduce the feature dimension while retaining the original feature information, and has the characteristic of translation invariance [24]. The formula of the pooling layer is:     (2) xjl = f βjl p xjl−1 + blj Where p is a pooling function. Different feature maps have different multiplier deviations β and additional deviations b. The Fully Connected Layer: It operates on flattened inputs where each input is connected to all neurons, and each neuron is connected to all activations in the previous layer.

Fig. 1. Typical CNN architecture [22].

1

1

1

0.5

-1

ReLU

Sigmoid

Tanh

Fig. 2. Activation functions.

Recurrent Neural Network (RNN). Is frequently used with texts and continuous data that contains feedback connections, loops (bidirectional data flow), and memories to remember previous computations [20] (see Fig. 3). The most known RNN is Long Short-Term Memory (LSTM) [25].

100

M. Boukabous and M. Azizi

x ○

x

+



+ x

Fig. 3. RNN illustration [26].

For each time step t, the activation a and the output y are expressed as follows:   (3) a = g1 Waa a + Wax x + ba and

  y = g2 Wya a + by

(4)

where Wax , Waa , Wya , ba , by are coefficients that are shared temporally and g1 , g2 activation functions. 2.4 Natural Language Processing (NLP) NLP is a multidisciplinary field involving linguistics, information engineering, computer science, and artificial intelligence, which aims at creating NLP tools for various applications. It is the ability of a program to understand a human language [27]. NLP facilitates the development of virtual assistants by making the dialogue more intuitive, and it can be used in many ways when it comes to text classification like document labeling, spam recognition, and sentiment analysis. 2.5 Sentiment Analysis (SA) SA also known as emotion AI or opinion mining is the analysis of feelings from dematerialized textual sources on large amounts of data (big data). It is the use of NLP, computational linguistics, biometrics, and textual analysis in order to identify, extract, quantify, and study emotional states and personal information [9]. There are three usual granularity levels for opinion mining [28]: the document, the sentence and the aspect levels [29]. These sentiments can express the author’s opinion, his emotional state (when writing his text), or a deliberate sense of connection (that the author expects to make with readers). Sentiment analysis is widely used in marketing, customer service, healthcare materials, social media, and other areas.

Review of Learning-Based Techniques of Sentiment Analysis

101

2.6 Multimodal Sentiment Analysis This is a new dimension of the classical text-based sentiment analysis, which goes beyond the analysis of texts, and includes other modes such as audio and visual data [30]. With the excessive amount of data exchanged online in different forms (such as text, images, and videos), the conventional text-based sentiment analysis has evolved into a more complex model, multimodal sentiment analysis [31], which can be applied in several areas such as development of virtual assistants, analysis of YouTube movie reviews [32], analysis of news videos [33], and emotion recognition. 2.7 Rule Based This technique uses a sentiment lexicon to describe the polarity (negative, positive, and neutral) of a textual content. This approach is more understandable and easily implemented unlike ML or DL based algorithms. However, the main drawback is that it requires human involvement in the text analysis process. The larger the volume of information, the more remarkable the test will be for sifting through noise, identifying sentiment, and distinguishing useful data from various content sources. The lexicon-based approach can be divided into two categories: (1) Corpus based approach (using corpus data, either a statistical or semantic approach), and (2) Dictionary based approach (using dictionary words). Generally, we go through the following steps: extract the data, tokenize text (splitting the text into individual words), stop words removal, punctuation removal, and finally running the lexicon on the preprocessed data.

3 Research Method We have conducted an in-depth systematic literature review focused on identifying active debates. The review includes a search, a selection, an analysis, and a synthesis process. Our objective was to provide a deep analysis of the field rather than provide a descriptive overview [34]. 3.1 Search and Selection After several searches, we have found that the number of articles devoted to big data and sentiment analysis has increased in the recent years. We performed our searches over the following databases: Scopus, Web of Science, ScienceDirect, Google Scholar, and Semantic Scholar. The search terms “Sentiment Analysis” and “Big data” were used on all databases and all results were included (see Table 1). According to Scopus, almost 1010 papers of this topic have been published since 2012. It is clear that this topic is getting progressive attention among the community of researchers (see Fig. 4). We used these search terms in each database and with different groups of fields like title, abstract, keywords, and full text, the main selection process involved two phases. In the first phase, we primarily judged papers based on the title, abstract, and keywords. We have included articles that cover one or more of the search keywords in a security intelligence context. In the second phase, because of the limited focus of our search and

102

M. Boukabous and M. Azizi

Library

Total number of results

Scopus

1010

Web of Science ScienceDirect Google Scholar Semantic Scholar

505

% interest

Table 1. Results from searched databases.

878378 100 67 6766726559 60 52 4855 80 5058 60 33 40 20 4455565 20 1 2 2 2 3 3 3 2 3 2 0

1106 27500 Sentiment Analysis

7240

Big data

Fig. 4. Google trends, search strings “Sentiment Analysis” and “big data”.

selection, we inspected the full texts of papers to check whether the terms “sentiment analysis” and “big data” have been mentioned in the body of the text or the papers are dealing with these terms. We have excluded many articles because of our selection process. Out of 522 potential candidates, we selected merely 132 papers that focus on sentiment analysis and big data analytics in a security intelligence context. From this sample, only 71 papers are qualified for inclusion (see Fig. 5).

Search

Library and search: • Scopus: on Title, Abstract, Keywords • Web of Science: on Topic • ScienceDirect: on Keywords • Google Scholar: on Full Text • Semantic Scholar: on Full Text Exclude: abstract-only, Blogs, workshop proposals, news items, panel/setup and formats, overviews, and demos

522 potential articles Primarily judge papers based on abstract, title, and keywords: Include: Sentiment Analysis, Big data, Security, Social media, Natural Language Processing, or Deep Learning.

Selection 132 papers on sentiment analysis or Big data in a security intelligence context Full text search for "Sentiment Analysis" Exclude : Papers not mentioning "Sentiment Analysis" in the abstract, body, or keywords 71 papers

Fig. 5. Search and selection processes.

Review of Learning-Based Techniques of Sentiment Analysis

103

3.2 Synthesizing the Literature We have summarized the newest articles that suit the purpose of this literature review. We collected the data from these articles, and the main results of the study are illustrated on Table 2.

4 Findings In this section, we analyze and compare the results found in the study carried out in Table 2. We start by illustrating the most important comparison criteria defined: • Method: With this criterion, we are seeking for the approaches used for sentiment analysis. We can group them into three main categories: knowledge-based techniques (lexicon-based), statistical methods (artificial intelligence), and hybrid approaches (that combine both). • Context or dataset: it is the collection of data used to train and test the artificial intelligence model or/and the lexicon-based dictionary. • Result or feature: this criterion gives one or more results or features of the approach used. For example, the “accuracy” metric is used to determine the ratio of accurate predictions to the whole input samples. “Precision and Recall”: where precision is the portion of relevant cases among the retrieved instances, and Recall is the portion of the total number of relevant cases that were actually retrieved. “F1 scores” are a measure of a test’s accuracy that considers both the precision and the recall of the test to calculate the score. Area Under the Curve (AUC) which is equal to the possibility that a classifier is ranking a random positive occurrence higher than a random negative one. In addition to other metrics that are related to the subject studied in each article.

As shown in the previous section, researchers express great interest for security intelligence based on sentiment analysis. In Table 2, we mainly focus on the used method, the context or the dataset used and the best performance results. Data play a vital role in all these papers as shown by the qualitative data analysis tool NVivo. By applying NVivo on these papers to determine the most used words frequency like “DATA”, “Sentiment”, “Crime” and “Analysis” that suit the purpose of this research (see Fig. 6). Fig. 6. Words frequency

Area

Crime

Extremism

Hate Speech

Cyber Crime

Disaster

Aggressive behavior, Cyberbullying

Crime

Social Threats

Crime

[42]

[40]

[43]

[55]

[44]

[37]

[45]

[46]

Crime, education, and business

[36]

Paper [35]

Lexicon-based

Lexicon-based

Machine Learning SVM, DT, CNB, and KNN

Lexicon-based

Lexicon-based and Machine Learning

Lexicon-based

Deep Learning LTSM+MLP

Machine Learning Logistic Regression, SVM Lexicon-based

Method Machine Learning SVM, Naïve Bayes

Proposed an algorithm to find a person who is deviating from their normal behavior Predict crime rate directions in a prospective time frame

Detect and classify crimes from Arabic Twitter posts using text mining techniques

Detect aggressive, inappropriate, or antisocial behavior, under the prism of the discussion context.

Develop a crime investigation tool which provides contextual information about crime incidents Framework for opinion mining and extremist content detection in online social media data using Big Data application HaterNet, an intelligent system for the detection and analysis of hate speech in Twitter Analyze the extent of cyberattacks in various countries across the globe Sentiment towards the needs of affected people during disaster

Application Analyze national educational, business and crime rates occurred in Malaysia, Singapore, Vietnam and, Myanmar

Twitter data and crime rates

Twitter, Facebook, Blogger, Instagram

Twitter

Bayzick et all. dataset, MySpace forums

Gathering streaming social media data from Facebook public pages Public dataset on hate speech in Spanish with 6000 tweets Tweets collected relevant cybersecurityrelated hashtags Twitter

Police Department Incidents Dataset of San Francisco

Context/Dataset Real-time social media data about education, business, and crime from Twitter.

Table 2. Summary of reviewed literature Result/Feature

SVM accu.: 0.9155 accu.: 0.8817 - DT accu.: 0.8246 - KNN accu.: 0.7806 Tweets, status, blogs, frequency, duration… F1 scores: 0.55 - CNB

-

- Unig, Big, PP, BBig, BTrig - Accuracy: 0.9543

BOW, POS and lexicon. F1 scores: 0.97

F1 scores: 0.96

Area Under Curve: 0.828

Lexical, word classes, and syntactic features

- SVM Accuracy: 0.9516 - NB Accuracy: 0.9133 F1 scores: 0.80

(continued)

104 M. Boukabous and M. Azizi

Extremism

Extreme events Terrorism

Extremism

Crime

Extremism

Security Attack

Security Breaches Crime

Crime

Extremism

[48]

[49]

[50]

[51]

[38]

[56]

[39]

[52]

[41]

[54]

[53]

Area CyberEvents

Paper [47]

Lexicon-based

Deep learning

Lexicon-based

Lexicon-based

Lexicon-based and Machine Learning Machine Learning Linear regression

Machine Learning MNB, SVM, RF

Lexicon-based

Lexicon-based

Lexicon-based

Lexicon-based

Method Lexicon-based

Detecting security breaches can be in the earlier stages and by that prevent further destruction Predict future crime on each area using twitter sentiment and weather Predict crime by focusing patterns and trends from various contributing factors Build a self-guiding web crawler to collect data specifically from extremist websites

Webpages

Twitter

Twitter

Twitter

Websites using TENE web crawler Twitter

Classify data collected by the terrorism and extremism network to detect terrorist webpages, and gage the intensity of their content Predict future attacks on the web based on daily collection of tweets

Aspect identification task involving implicit aspect implied by adjectives and verbs for crime tweets

Magazines, videos, and Twitter Twitter crime datasets

Twitter

Online discussion forums Twitter

Context/Dataset Hacking forums

The ways extremists use language in their media, and how using it differs across platforms

Understand the processes of users and patterns of intersubjective sense-making during extreme events Difference between people from Western and Eastern countries on how they view Terrorism

Identifies the most radical users within online forums

Application Predict malicious cyber-events by exploiting malicious actor’s behavior via posts on forums

Table 2. (continued) Result/Feature

92% success rate classifying extremist pages

Frequency of crime: 0.026

Area Under Curve: 0.67

- ISIS R2: 0.4434 - OpIsrael R2: 0.992 Accuracy: 93%

- MNB: 0.83 - SVM: 0.89 - RF: 0.87 Parts-of-Speech, SentiStrength, WEKA

Positive Words: 33% Negative Words: 71% SentiStrength

SentiStrength

- F1 Endpoint Malware: 0,78 - F1 Malicious Destination: 0,75 - F1 scores Malicious Email: 0,71 Parts-of-Speech

Review of Learning-Based Techniques of Sentiment Analysis 105

106

M. Boukabous and M. Azizi

We see that these papers described in Table 2 use different methods. For example, papers [35–39] use machine learning approaches, [40, 41] use deep learning approaches, [42–54] use lexicon-based approaches, and [55, 56] use hybrid approaches (machine learning with lexicon-based methods). They had different results according to the approach used. By analyzing their results, we found that the best methods are the hybrid approaches that combine machine-learning models with rule-based approaches in order to improve the performance of the model as well as sentiment scoring. This provides us a machine-learning model trained with a labeled corpus. We also notice the rarity of works that deal with hybrid approaches involving deep learning with lexicon-based methods on security intelligence field. We expect that using deep learning could give even better results than machine learning to process natural data in their raw form. This is because of the requirement of expertise with ML to design a feature extractor that transforms the raw data into a suitable representation from which the classifier can detect or classify patterns in the input [19]. Since each of these researchers built its own dataset or obtained it from an organization with a limited access, it will also be worth generating own datasets using the existing API and crawlers, so that the data can be relevant to the subject studied. In addition, we found that most of these papers use explicit approaches. In fact, as more and more users are sharing their thoughts, concerns, and feelings on social media, their user-generated content includes valuable signals as socio-behavioral factors that convey significant information and can be useful for detecting implicit behavior. Within an implicit speech, the user can use metaphors that may be one of the most difficult types of sentiments to detect as they contain a lot of semantic information. Indirect speech acts are not a condition to transmit sentiment [57]. It is possible to transmit implicitly sentiments through expressions in which the speaker alludes to an act or notion without explicitly stating it. The literature on linguistics intends that speech acts that are directive (suggesting that a third party is taking action), commissive (engaging in future action), or assertive (transmitting the state of the situation) can also transmit sentiment [58]. We may also use the user’s message history to detect any implicit behavior as follows: • Establishing a certified user database on defining a certified user, as he should assume the role of posting various pieces of information necessary for information sharing and delivery. • Definition of interest categories. • Establishing a database using interest categories of each user. • Follow the suggestions of users as the interest category of a user is classified. • Follow the suggestions of users as the interest category of a user is classified through deep distance metric learning. Indeed, we can measure the distance between what the user enters and his interest. Learning a similarity measure or a distance metric is indispensable for many tasks such as content-based retrieval. We measure so the similarity between the certified user interest and what he publishes. It also helps to reduce the representation redundancy and to alleviate the problem of lacking adequate and labeled training data.

Review of Learning-Based Techniques of Sentiment Analysis

107

5 Conclusion The purpose of this review was mainly oriented to survey the trends in learning-based security researches within the past five years and to seek the possibility to combine it with the use of sentiment analysis. According to our analysis, most of the studied papers in security use machine learning-based methods, and few of them deal with deep learning. Whereas there is no paper that really combines learning-based security with sentiment analysis. Regarding sentiment analysis, all these papers have adopted explicit approaches for interpreting the text meaning, which may not apply to all cases. Moreover, the suggestions provided were specific to explicit methods, which is not systematically truthful for implicit approaches. Sentiment analysis covers both explicit and implicit approaches as well as methods to detect user’s behavior. In this regard, there is a need that further researches should focus on generating new diverse corpus destined entirely for security intelligence purposes with sentiment analysis, and combine it with deep learning with rule-based approaches to improve the performance of the model as well as sentiment scoring. This is the direction of our future works.

References 1. Kirillova, E.A., Kurbanov, R.A., Svechnikova, N.V., Zul’fugarzade, T.E.D., Zenin, S.S.: Problems of fighting crimes on the internet. J. Adv. Res. Law Econ. 8(3), 849–856 (2017) 2. Reinsel, D., Gantz, J., Rydning, J.: Data age 2025: the digitization of the world from edge to core. International Data Corporation, no. November, p. 28 (2018) 3. Digital around the world in April 2020 - We Are Social. https://wearesocial.com/blog/2020/ 04/digital-around-the-world-in-april-2020. Accessed 21 May 2020 4. YouTube, Presse - YouTube (2019). https://www.youtube.com/about/press/. Accessed 05 Sept 2020 5. Sagiroglu, S., Sinanc, D.: Big data: a review. In: Proceedings of the 2013 International Conference on Collaboration Technologies and Systems, CTS 2013, pp. 42–47 (2013) 6. Wu, X., Zhu, X., Wu, G.Q., Ding, W.: Data mining with big data. IEEE Trans. Knowl. Data Eng. 26(1), 97–107 (2014) 7. Cuzzocrea, A., Song, I.Y., Davis, K.C.: Analytics over large-scale multidimensional data: the big data revolution!. In: International Conference on Information and Knowledge Management, Proceedings, pp. 101–103 (2011) 8. O’Leary, D.E.: Artificial intelligence and big data. IEEE Intell. Syst. 28(2), 96–99 (2013) 9. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2(1–2), 1–135 (2008) 10. Correa, D., Sureka, A.: Solutions to Detect and Analyze Online Radicalization: A Survey, January 2013 11. Kasmi, M.A., Mostafa, A., Lanet, J.L.: Methodology to reverse engineer a scrambled Java card virtual machine using electromagnetic analysis. In: International Conference on Next Generation Networks and Services, NGNS, pp. 278–281 (2014) 12. Norton-Taylor, R.: Former spy chief calls for laws on online snooping. Guard (2013) 13. Vinter, P.: Why we must be allowed to spy on Facebook and Twitter, by former Whitehall intelligence chief. Dly. Mail (2013) 14. McNamee, L.G., Peterson, B.L., Peña, J.: A call to educate, participate, invoke and indict: understanding the communication of online hate groups. Commun. Monogr. 77(2), 257–280 (2010)

108

M. Boukabous and M. Azizi

15. Agarwal, S., Sureka, A.: A focused crawler for mining hate and extremism promoting videos on YouTube. In: HT 2014 - Proceedings of the 25th ACM Conference on Hypertext and Social Media, pp. 294–296 (2014) 16. Bishop, C.M.: Pattern Recognition and Machine Learning, vol. 4, no. 4. Springer (2006) 17. Hinton, G.E., Sejnowski, T.J.: Unsupervised Learning: Foundations of Neural Computation, vol. 38, no. 5–6. MIT Press, Cambridge (1999) 18. Van Otterlo, M., Wiering, M.: Reinforcement learning and Markov decision processes. In: Adaptation, Learning, and Optimization, vol. 12, pp. 3–42. Springer Verlag (2012) 19. Lecun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015) 20. Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015) 21. Idrissi, I., Boukabous, M., Azizi, M., Moussaoui, O., El Fadili, H.: Toward a deep learningbased intrusion detection system for IoT against Botnet attacks. IAES Int. J. Artif. Intell. 9(4) (2020) 22. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998) 23. Uchida, K., Tanaka, M., Okutomi, M.: Coupled convolution layer for convolutional neural network. Neural Netw. 105, 197–205 (2018) 24. Scherer, D., Müller, A., Behnke, S.: Evaluation of pooling operations in convolutional architectures for object recognition. In: Lecture Notes in Computer Science (Including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2010) 25. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997) 26. Amidi, A., Amidi, S.: CS 230 - Recurrent Neural Networks Cheatsheet. Stanford. https:// stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks. Accessed 06 Sept 2020 27. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011) 28. Liu, B.: Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang. Technol. 5, 1–167 (2012) 29. Vargas, F.A., Pardo, T.A.S.: Aspect clustering for sentiment analysis. In: Horizons in Computer Science Research: Volume 18, Nova Science, pp. 213–224 (2020) 30. Soleymani, M., Garcia, D., Jou, B., Schuller, B., Chang, S.F., Pantic, M.: A survey of multimodal sentiment analysis. Image Vis. Comput. 65, 3–14 (2017) 31. Poria, S., Cambria, E., Bajpai, R., Hussain, A.: A review of affective computing: from unimodal analysis to multimodal fusion. Inf. Fusion 37, 98–125 (2017) 32. Wollmer, M., et al.: You tube movie reviews: sentiment analysis in an audio-visual context. IEEE Intell. Syst. 28(3), 46–53 (2013) 33. Pereira, M.H.R., Pádua, F.L.C., Pereira, A.C.M., Benevenuto, F., Dalip, D.H.: Fusing Audio, Textual and Visual Features for Sentiment Analysis of News Videos, April 2016 34. Jones, O., Gatrell, C.: Editorial: the future of writing and reviewing for IJMR. Int. J. Manag. Rev. 16, 249–264 (2014) 35. Naing, H.W., Thwe, P., Mon, A.C., Naw, N.: Analyzing sentiment level of social media data based on SVM and Naïve Bayes algorithms. In: Advances in Intelligent Systems and Computing (2019) 36. Siriaraya, P., et al.: Witnessing crime through tweets: a crime investigation tool based on social media. In: GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems (2019) 37. AL-Saif, H., Al-Dossari, H.: Detecting and classifying crimes from Arabic Twitter posts using text mining techniques. Int. J. Adv. Comput. Sci. Appl. 9(10), 377–387 (2018)

Review of Learning-Based Techniques of Sentiment Analysis

109

38. El Hannach, H., Benkhalifa, M.: WordNet based implicit aspect sentiment analysis for crime identification from Twitter. Int. J. Adv. Comput. Sci. Appl. 9, 150–159 (2018) 39. Hernandez, A., et al.: Security attack prediction based on user sentiment analysis of Twitter data. In: Proceedings of the IEEE International Conference on Industrial Technology (2016) 40. Pereira-Kohatsu, J.C., Quijano-Sánchez, L., Liberatore, F., Camacho-Collados, M.: Detecting and monitoring hate speech in Twitter. Sensors (Switzerland) 19, 4654 (2019) 41. Azeez, J., Aravindhar, D.J.: Hybrid approach to crime prediction using deep learning. In: 2015 International Conference on Advances in Computing, Communications and Informatics, ICACCI 2015 (2015) 42. Mouhssine, E., Khalid, C.: Social big data mining framework for extremist content detection in social networks. In: International Symposium on Advanced Electrical and Communication Technologies, ISAECT 2018 - Proceedings (2019) 43. Sharma, K., Bhasin, S., Bharadwaj Nalini, P.: A worldwide analysis of cyber security and cyber crime using Twitter. Int. J. Eng. Adv. Technol. 8, 1–6 (2019) 44. Ventirozos, F.K., Varlamis, I., Tsatsaronis, G.: Detecting aggressive behavior in discussion threads using text mining. In: Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 45. Jindal, S., Sharma, K.: Intend to analyze Social Media feeds to detect behavioral trends of individuals to proactively act against Social Threats. Proc. Comput. Sci. 132, 218–225 (2018) 46. Aghababaei, S., Makrehchi, M.: Mining Twitter data for crime trend prediction. Intell. Data Anal. 22, 117–141 (2018) 47. Deb, A., Lerman, K., Ferrara, E.: Predicting cyber-events by leveraging hacker sentiment. Information 9, 280 (2018) 48. Scrivens, R., Davies, G., Frank, R.: Searching for signs of extremism on the web: an introduction to Sentiment-based Identification of Radical Authors. Behav. Sci. Terror. Polit. Aggress. 10, 39–59 (2018) 49. Stieglitz, S., Bunker, D., Mirbabaie, M., Ehnis, C.: Sense-making in social media during extreme events. J. Contingencies Cris. Manag. 26, 4–15 (2018) 50. Mansour, S.: Social media analysis of user’s responses to terrorism using sentiment analysis and text mining. Proc. Comput. Sci. 140, 95–103 (2018) 51. Macnair, L., Frank, R.: The mediums and the messages: exploring the language of Islamic State media through sentiment analysis. Crit. Stud. Terror. 11, 438–457 (2018) 52. Hao, J., Dai, H.: Social media content and sentiment analysis on consumer security breaches. J. Financ. Crime 23, 855–869 (2016) 53. Chen, X., Cho, Y., Jang, S.Y.: Crime prediction using Twitter sentiment and weather. In: 2015 Systems and Information Engineering Design Symposium, SIEDS 2015 (2015) 54. Mei, J., Frank, R.: Sentiment crawling: extremist content collection through a sentiment analysis guided web-crawler. In: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2015 (2015) 55. Ragini, J.R., Anand, P.M.R., Bhaskar, V.: Big data analytics for disaster response and recovery through sentiment analysis. Int. J. Inf. Manag. 42, 13–24 (2018) 56. Weir, G.R.S., Dos Santos, E., Cartwright, B., Frank, R.: Positing the problem: enhancing classification of extremist web content through textual analysis. In: 2016 IEEE International Conference on Cybercrime and Computer Forensic, ICCCF 2016 (2016) 57. Pinker, S., Nowak, M.A., Lee, J.J.: The logic of indirect speech. Proc. Natl. Acad. Sci. USA 105, 833–838 (2008) 58. Searle, J.R.: Indirect speech acts. In: Speech Acts, pp. 59–82. Brill (1975)

Sentiment Analysis. A Comparative of Machine Learning and Fuzzy Logic in the Study Case of Brexit Sentiment on Social Media Ihab Moudhich(B) , Soumaya Loukili, and Abdelhadi Fennan LIST Department of Computer Science, Faculty of Sciences and Techniques, UAE, Tangier, Morocco [email protected], [email protected], [email protected]

Abstract. Sentiment analysis is a case of natural language processing that extracts the sentiment of people about any specific service, product, or the topic. Users spend a considerable amount of time scrolling through social media feeds to find something relatable, either emotionally or as a problem in their daily life that they want to solve. By analyzing the content produced by social media users, we can know if his opinion is relatively optimistic, neutral, or pessimistic about the subject – that varies from a service, news, a movie, etc. There are multiple approaches for sentiment analysis; first, there’s the Machine learning with its numerous methods that can be either supervised or unsupervised. Second, the Lexicon-based approach. Third, the combination of the two approaches – which is called a hybrid. Also, there are other approaches that we can add to get a more optimized result, such as working on improving text preprocessing techniques or by adding Fuzzy Logic approach and subjectivity detection. In this paper, we continued the same path as in our previous work [1]. We used the same ontologies, the same data collected about Brexit and also the same text preprocessing method. We applied a subjectivity detection using Long ShortTerm Memory so that our model can predict subjectivity in a text. We did that by applying a fuzzy logic method based on the Lexicon based polarity and subjectivity classifier to classify the text into one of 5 categories. Finally, we come up with a comparison that summarizes every approach with their result. Keywords: Sentiment analysis · Machine learning · Fuzzy Logic · Classification · Support Vector Machine · Naïve Bayes · Decision Tree

1 Introduction Every day social media platforms grow more and more as users publish content every second, reacting to or expressing their emotions towards a specific subject. This data creates new opportunities for companies to understand users’ behaviors, and thus propose services to solve their daily problems. This massive amount of data is known as big data. However, to analyze it, firstly, it needs to be structured and comprehended. Sentiment analysis or opinion mining is considered one of the most challenging domains, firstly due to the massive volume of data generated from social media, and secondly due to the complexity of every language. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 110–125, 2021. https://doi.org/10.1007/978-3-030-66840-2_9

Sentiment Analysis. A Comparative of Machine Learning and Fuzzy Logic

111

The fundamental process of sentiment analysis is based on two levels: Opinion extraction and Sentiment classification. Opinion extraction aims to extract opinion holding words from a text, while sentiment classification categorizes a text based on their final polarity. In this work, we propose to analyze the Brexit data that is gathered from social media platforms such as Twitter and Reddit. In our previous work [1], we analyzed the same data using machine learning and lexicon methods, which gave us good results in terms of accuracy. We decided to compare the previous analysis with other algorithms to see how it will perform.

2 Related Work A proposed system to analyze the sentiment of Twitter data by Amol S et al. [2] is a combination of machine learning and ontologies. The sentiment analysis system used the Naïve Bayes classifier, and the analysis of the features used domain ontology. Also, they work with a real-time approach to give users the ability to interact with their system in real-time. The architecture of the system that improves services to clients, as mentioned by Cristian Bucur [3], consists of crawling data from hotel reviews web site. It creates a preprocessing algorithm that cleans data for SentiWordNet1 , before getting the final value of the polarity. Smart cities’ event detection based on social media analytics was proposed by Aysha Al Nuaimi et al. [4] and is divided into three phases: (i) Data Capture, this step consists of collecting a massive amount of data from different types of social media platforms. (ii) Data Understanding that works on cleaning data from noises to facilitate the learning part to predict the best value of their polarity. (iii) The presentation that has the summarized data and the evaluation of the Framework. Another system that is based on Ontology is one proposed by Efstratios Kontopoulos et al. [5]. It has two phases: (i) creation of the domain ontology, either with Formal Concept Analysis or with Ontology Learning. The first consists of defining extensions. Each extension is a set of objects, otherwise, we have the intension, which is a set of attributes. The Ontology Learning method, as known as Ontology acquisition, or ontology generation, is referring to the idea of the automatic creation of an ontology, via extraction of their concepts and the relations of the data set. (ii) the second phase is the calculation of the sentiment on a set of tweets. Using Fuzzy Logic, like what Karen Howellsa et al. [6] describe in their paper, gives us a good start to work with Fuzzy Logic, to improve our vision to create a useful Framework that extracts the sentiment of a text. At first, they defined a fuzzy set, and they moved on to explain the if-then rules that describe the interaction of the fuzzy set. Also, they present the relation between the fuzzy operators and the fuzzy sets. And lastly, they transform the values into crisp values through different methods. They categorized the classification into five distinct categories: strongly positive tweets, positive tweets, neutral tweets, negative tweets, and strongly negative tweets. 1 http://ontotext.fbk.eu/sentiwn.html.

112

I. Moudhich et al.

Using Big data with the Fuzzy Logic to monitor services of smart cities also counts as an exciting field of research, like Bahra et al. [7] proposed. They explain the benefits of using the information in real-time with Big data to enhance smart cities’ services, by using Kafka, Apache Spark Streaming, and Cassandra. Their work consists of five layers: Data streams, Data integration, Streaming classification, External storage, Real-time Monitoring. A study on building Fuzzy Ontologies has been conducted by Morente-Molinera et al. [8], as they give an overview of their idea of ontologies creation with heterogeneous linguistic information. Their workflow is based on collecting data and obtaining information known as multi-granular, preprocessing, and applying a fuzzy ontology so that users can create queries. An application of ontology in sentiment analysis is presented by Pratik Thakora et al. [9], where their methodology consists of two processes. First, they extract the data from the social media platform to build the ontology model. Second, they identify the negative sentiments within a tweet to retrieve the problem area. The work starts with data extraction and cleaning, then a search for the negative sentiment within tweets, also the detection of the subjectivity, building the ontology model, and retrieving information from the ontology model. Farman Ali et al. [10] presented the classification with fuzzy domain ontology and support vector. Their concept is composed of three phases: first, they collect data and make the process of semantic analysis. After that, they move along to the tokenization and word-tagging process. The second phase consists of fuzzy domain ontology, feature extraction, and classification with SVM classifier. For the last step, they get features classification, polarity calculation, and the presentation of the hotel features polarity. The research conducted by Lopamudra Dey et al. [11] has the goal of determining if the collected data is suitable to be analyzed with the discussed data mining methods. For that, they used two approaches: First, The Naïve Bayes classifier, and the K-nearest Neighbour Classifier. The result shows that the Naïve Bayes works well with the movie reviews with 80% accuracy, and outperforming the K-nearest Neighbour approach. For the hotel reviews, both got a lower accuracy. According to the conclusion, the Naïve Bayes method is better to use when the work is based on the movie reviews data. A study on sentiment classification using Decision Tree has been made by A. Suresh et al. [12] in which they use the Decision tree classification to make a sentiment analysis classifier system. Their result shows that the Learning Vector Quantization gets a lower precision for positive opinions. Also, the LVQ has an accuracy of 75%. Finally, they stated that the same result is observed with Naïve Bayes approach.

3 Main Concepts 3.1 Sentiment Analysis Sentiment Analysis, also known as opinion mining, is the process of recognizing the opinions expressed in a text, by applying natural language techniques and machine learning approaches. This field is considered as one of the essential elements to enhance economic strategies, services of smart cities, the e-commerce field, and also for the detection of hate speech on social media.

Sentiment Analysis. A Comparative of Machine Learning and Fuzzy Logic

113

In order to analyze the text that an individual posts on social media, we should use the sentiment analysis technique. This is categorized as a classification problem. In general, the sentiment categorization is split into three types: positive, neutral, negative. Sentiment analysis can have three different levels [13]: • Document-level sentiment analysis: it takes the whole opinion of the text as a single topic and classifies the document as positive or negative [14]. • Sentence level sentiment analysis: this sentiment can be determined by two necessary steps. First, the detection of subjectivity or objectivity of the sentence. The second is the calculation of the sentiment related to the sentence [15]. • Feature level sentiment analysis: it’s based on the extraction of features and characteristics. Using part-of-speech tagging, as an example: “This is a bad computer.” We consider “bad” as an opinion word and “computer” as a product feature [16]. Also, we can mention that we can extract the sentiment of a text, using lexiconbased approaches, machine learning, or by combining the two, which is called the hybrid sentiment analysis approach. 3.2 Ontology2 It’s the way of showing a subject’s properties, and how these two concepts are related to each other. Thus, to study a subject, we should define a set of concepts and their categories. Ontologies aim to provide knowledge about specific domains in a way that can be understood by the computer and researchers or developers. It can be used to explain the relationship between the entities within a domain. Also, an ontology that is assigned to one domain cannot be applied to another one. 3.3 Fuzzy Logic The Fuzzy Logic approach was proposed in the early days of computer science by Lotfi Zadeh in 1965. Zadeh says that the real word cannot be presented in a binary method because there are many grey areas, that cannot be considered as white or black3 . The Fuzzy Logic allows researchers to extend variables that are, at best vague. It will enable us to include approximate reasoning when we need to present a subject. Fuzzy Logic and probability logic are mathematically similar. Both have truth values ranging between 0 and 1, but conceptually distinct and lead to different interpretations. For probabilities, we base them on likelihood, but the Fuzzy Logic corresponds to a degree of truth. To explain more, we will present a brief definition of Fuzzy Logic vocabulary and methods. • Fuzzy sets: It’s a combination in a set where elements have a degree of membership determined by a membership function whose values are in a real interval [0,1]. 2 https://en.wikipedia.org/wiki/Ontology. 3 https://en.wikipedia.org/wiki/Lotfi_A._Zadeh.

114

I. Moudhich et al.

For the fuzzy set F of U defined by a membership function Mf(x), with U being a universe of discourse (1), and its elements are presented by x. We have: MF (x) : U → [0, 1], where MF (x) = 1 if x is totally in F MF (x) = 0 if x is totally in F < MF (x) < 1 if x is totally in F

(1)

• Membership functions: it’s the representation of the fuzziness of a fuzzy set, it gives each x of U a value from the range [0,1], and the MF(x) is known as a degree of membership of x in F. The most popular functions in Fuzzy Logic are Trapezoidal, Gaussian function, and Triangular. • Linguistics variables: it contains variables that have a form of word or sentences as values, not a number. We call it a linguistic value of that variable. (e.g., very positive, positive, neutral, negative, very negative). This was introduced by Zadeh [17]. • Linguistics hedges: it’s an extension of linguistics variables, in case we don’t have enough terms to express a fuzzy value. To accomplish that, we change the linguistics values with adjectives or adverbs (e.g. very, precisely, quiet, nearly). • Fuzzy if-then rules: it provides the output based on predefined input variables. Their formulation is: IF THEN

(2)

Where the IF part is defined as the condition and THEN represents the result. • Fuzzification and defuzzification: the fuzzification step aims to transfer the real value into a fuzzy input using membership functions. Whereas its reverse process that returns the crisp value of fuzzy is called defuzzification. There are some methods for defuzzification like max-membership, the center of gravity, and the Wright average. • Fuzzy interface: it’s a combination of if-then rules, membership functions, and operators to present a mapping for a given input to produce an output. The fuzziest interfaces used are Tsukamoto [18], Mamdani [19], and Sugeno [20]. 3.4 Support Vector Machine It is considered as a supervised machine learning algorithm that is used for regression and classification problems, which can separate the data into two different classes by the creation of a hyperplane or a line. 3.5 Naïve Bayes It is a learning method based on probability by applying Bayes theorem. Its equation is the following: posterior =

prior ∗ likelihood evidence

It has three different types: • Multinomial: Used for data that can be easily turned into counts. • Bernoulli: Used for binary features (i.e., 0 or 1). • Gaussian: Used when we have continuous values.

(3)

Sentiment Analysis. A Comparative of Machine Learning and Fuzzy Logic

115

3.6 Decision Tree It is a supervised learning algorithm that is used for classification problems. Decision Tree is used to predict the target variable based on their inputs. A Decision Tree must contain three types of nodes (3): Decision nodes, chance nodes, End nodes. 3.7 Long Short-Term Memory Network It is generally called LSTM and is a type of RNN4 that is used to remember the information for an extended period to avoid dependency problems. To define a LSTM, first step is to define what type of information we are going to use. Second, which new information we are going to store, and finally, what’s the output.

4 Our Proposed Work In this section, we will work with the same data we previously collected in [1], as well as the same preprocessing methods. However, we will implement new algorithms such as Support Vector Machine, Naïve Bayes, and Decision Tree, to compare and see which method is the most accurate and optimized for sentiment analysis. Afterward, we will implement a subjectivity detection to our lexicon-based method that we used back in [1], which will allow us to apply Fuzzy Logic to our results. Also, we will use the same dataset for training our new models. 4.1 Ontology The term Brexit was introduced in 2016, and it states the exit of the United Kingdom citizens from the European Union. David Cameron, the former Prime Minister, took this initiative with the organization of a referendum in 2016, and as a result, 51.9% of the citizens left the EU. This fact that there are individuals who want Brexit should take place, and on the other side, there are individuals who wish to stay in the EU; thus, this topic becomes highly suitable to test the Framework. This is because with provoking multiple mixed feelings within the population, it also distributed the country into communities who share not only the same point of view but also the most suitable approach to solve the problem. As a first step, we have to define the keywords which will be used for the collection of Brexit data on Twitter. This phase is critical because the selection of keywords matters a lot on the accuracy of the Framework. First, we start by studying the Brexit case, the different political parties in the UK. From here, we gathered politicians’ username, the parties’ official usernames, and their used hashtags. Also, we collected data using hashtags that are proper to each side of the movement, such as “LeaveEU,” “Remainer,” “Revoke” etc. (Fig. 1). We started the data collection on March 1st , 2019 until June 29, 2019, through Twitter API. We collected more than 5 million tweets, retweets, quotes, replies, and we stored it in a Mongo database. 4 RNN: recurrent neural networks.

116

I. Moudhich et al.

Fig. 1. Brexit domain’s ontology

4.2 Text Preprocessing Working on a sentiment analysis means that we should clean our data and prepare the text for our algorithm. For that, we did a preprocessing module. We started by removing extra whitespaces and tab spaces, screen names, numbers, and any existing URLs. The next step is changing any abbreviations (e.g. IDK becomes I Don’t Know) along with the transformation of the emojis by their correspondent text. And contractions (e.g. don’t become do not). And finally, we normalize the text into lower case.

Algorithm: PreprocessingForText Input: tweet Variables: text, Output: clean_tweet 1. Text ← removeSpaces(tweet) 2. Text ← removeScreenName(Text) 3. Text ← removeNumbers(Text) 4. Text ← removeUrls(Text) 5. Text ← changeAbbreviation (Text) 6. Text ← emojisTransformation(Text) 7. Text ← lowerCase(Text) 8. return clean_tweet After applying preprocessing to our tweets, it will be easy to our model to predict the sentiment with a good accuracy.

Sentiment Analysis. A Comparative of Machine Learning and Fuzzy Logic

117

4.3 Machine Learning Approaches Comparison Support Vector Machine. This proposed model is based on the SVM approach. At first, we tokenize our data with TfidVectorizer, that transforms text to features vectors to make the data understandable for our machine model. Second, we divide the data into training and test data in a 3 to 1 ratio. Third, we use the SVC classifier - an unsupervised method - that builds on kernel function. As a parameter passed to the kernel, we use a linear method, for two main reasons. First, because in text classification, the number of instances and features is large, so we don’t need to map our data into a high dimensional space. Second, a linear method is considered as the fastest training method in SVM for text classification. For the gamma, we make it auto, which means 1/number of features. After the training, we got an accuracy of 0.86. Finally, we moved to analyze our data to predict if it’s positive or negative. For this step, we worked with more than five million texts. Week 1 to 6 refers to the period from 01/03/2019 to 12/04/2019. As the (Fig. 2) shows, the majority of social media users are positive to remain the EU5 , and also, we can see that this result is closed to the previous work [1].

Fig. 2. Displays the results of the SVM method.

Naïve Bayes. This proposed model is based on the Naïve Bayes approach by implementing the Multinomial algorithm, which is considered a specialized version of Naïve Bayes for text documents. The simple Naïve Bayes approach models a document depending on the presence or absence of a particular word. In contrast, the multinomial is based 5 EU: European Union.

118

I. Moudhich et al.

on counting the number of times a word was observed and adjusted the calculation to deal with it. For the tokenization, we used TFidVectotizer and utilized 25% of our data for testing our model, and thus, we got 0.88 as accuracy. Weeks 1 to 6 refers to the period from 01/03/2019 to 12/04/2019. As the (Fig. 3) shows, the majority of social media users are positive to remain in the EU, and also, we can see that this result is closed to the lexicon-based approach that is used in our previous work [1].

Fig. 3. Displays the results of the Naïve Bayes multinomial approach.

Decision Tree. This proposed model is based on the Decision Tree approach. Just like for the other previous two methods, we worked with TFidfVectorizer to transform the data into vectors. We split our data into training and testing, with 25% reserved for training. As an algorithm, we used Iterative Dichotomiser 3, also known as ID3. This specific algorithm uses entropy to see if the samples have the homogeneity or not. As a result, we got a 0.72 accuracy. Weeks 1 to 6 refers to the period from 01/03/2019 to 12/04/2019. As the (Fig. 4) shows, the majority of social media users are positive to remain in the EU, but as we can see that the result here is far from the result, we got in our previous work [1]. Results Comparison. The table below summarizes the results obtained by all methods applied, in [1], in this article, and the actual results published by NatCen, which is the

Sentiment Analysis. A Comparative of Machine Learning and Fuzzy Logic

119

Fig. 4. Displays the results of the Decision Tree approach.

largest independent social institution in the UK, whose full title is “National Centre for Social Research.” (Table 1). Table 1. Proposed framework’s results comparison with NatCen’s results and with [1]. Leave the EU

Remain in the EU

NatCen’s results

44.45

55.55

Lexicon Based Approach results [1]

35.01

64.99

Machine Learning results for LSTM [1]

45.12

54.88

Support Vector Machine results

38.38

61.62

Naïve Bayes results

31.88

68.11

Decision Tree

28.30

71.70

The results show that the Support Vector Machine approach is more accurate, as its percentages are very close to the official NatCen results. For the Naïve Bayes, the result values are closer to the Lexicon based approach. Other than that, we can also observe that the Decision Tree is the only one whose results are way different than all the other methods mentioned in this article and the previous one. Fuzzy Logic Approach. We applied Fuzzy-based sentiment analysis on the same data to assess the level of social media users’ satisfaction. This approach was used by Ghani et al. [9] to measure customer loyalty.

120

I. Moudhich et al.

The proposed architecture in (Fig. 5) describes a fuzzy-based sentiment analysis approach and subjectivity detection for analyzing user’s sentiment in terms of categorizing opinions and emotions that are mentioned in their tweet.

Fig. 5. Sentiment classification framework.

For the preprocessing and sentiment classification, we used the same one as [1] by applying SenticNet to determine the polarity and NLTK for text preprocessing. After this step, we moved to Subjectivity detection to determine if a text is subjective or objective. Then we applied the Fuzzy Logic system to classify our text to five categorizes (Very Positive, Positive, Neutral, Negative, Very Negative) based on the polarity and subjectivity. Subjectivity Detection. In this phase, an input text from social media users is categorized as subjective or objective using the machine learning approach. We trained our model using the dataset from [22], by implementing an LSTM layer. We also must mention that a text that doesn’t contain opinion words is considered as objective, whereas the text that has opinion terms is labeled as subjective. The subjectivity detection categorization aims to identify subjective phrases or words by checking their existence. Every sentence is scanned for checking the opinion terms, to see whether they exist or not. Each sentence that has one or more opinion words is declared as subjective. Otherwise, it’s considered as objective. As an example, in the sentence: “This flower is beautiful,” the word “beautiful” is an opinion term, so we classify this sentence as subjective using this Eq.  Subjective, if ((Wx ∈ OL)) (4) Tweetsub_obj = Objective, if ((Wx ∈ / OL))

Sentiment Analysis. A Comparative of Machine Learning and Fuzzy Logic

121

Where OL is an opinion lexicon, and Wx is a word from the sentence. Fuzzy-Based System for User’s Sentiment Level. In this phase, to determine the level of user sentiment for a given tweet, we used a Fuzzy Logic system as (Fig. 6) described:

Fig. 6. Fuzzy Logic approach

• Fuzzy sets In our work, we are taking subjectivity detection and sentiment polarity as a linguistic input variable and user sentiment level as an output. As seen in Table 2. Table 2. Defining the fuzzy sets variables. Type

Linguistic variable

Input

– Sentiment polarity – Subjectivity detection

Output – Sentiment level

• Fuzzification After the identification of the inputs and output variables, we transform the crisp input values into a fuzzy set to obtain the fuzzified values. Based on the input and output, we determine the associated linguistic terms, as shown in Table 3. • Membership function The membership function is used to plot the fuzzy sets. In our work, we used the Triangular membership function, as shown in Fig. 7. Where “a” is a lower boundary, “b” is an upper boundary, “0” is the degree of membership, and “m” is the center. • Rules determination In this phase, we define the Fuzzy Logic rules to describe the results, as shown in Table 4.

122

I. Moudhich et al. Table 3. Determination of the fuzzified values. Input Linguistic variable

Linguistic terms

Sentiment class

Positive(P) Neutral (Neu) Negative(N)

Subjectivity detection Subjective(S) Objective(O) Output User sentiment level

Very Positive Positive Neutral Negative Very Negative

Fig. 7. Triangular membership function Table 4. Fuzzy Logic rules.

If/Then rules If (polarity is > 0 and the sentence is Subjective) Then (the user opinion is Very Positive) If (polarity is > 0 and the sentence is Objective) Then (the user opinion is Positive) If (polarity is < 0 and the sentence is Objective) Then (the user opinion is Negative) If (polarity is < 0 and the sentence is Subjective) Then (the user opinion is Very Negative) If (polarity is equal to 0) Then (the user opinion is Neutral)

Sentiment Analysis. A Comparative of Machine Learning and Fuzzy Logic

123

• Defuzzification Finally, to determine the user sentiment level, the defuzzification function used for transforming the fuzzy values to the crisp values is Mamdani. max Y = ∫max min u(y)ydy/ ∫min u(y)dy

(5)

Where Y is the result of defuzzification, u(y) is the membership function, y is the output variable, max is the maximum limit, and min is the lower limit for defuzzification. • Our results In this phase, we worked to define what’s the sentiment of social media users about the Brexit subject by the implementation of subjectivity detection and the lexicon-based approach using Fuzzy Logic. The results are displayed in the figure below (Fig. 8).

Fig. 8. Final result of Fuzzy Logic and subjectivity detection framework.

This approach shows that the majority of users are positive, which means a big amount of data has a positive polarity but is considered objective. Due to the fuzzy rules, we categorized them with a positive label. Note that positive results mean people prefer to remain in the EU and not to leave the EU.

5 Conclusion Sentiment analysis in social media will keep being rising as one of the most exciting fields of research, due to the insight that are provides about opinions. It gives the companies the power to understand their customers’ reactions. In our work, we first presented a

124

I. Moudhich et al.

comparison between different algorithms such as Support Vector Machine, Naïve Bayes, and Decision Tree, along with the other methods used in our previous work, mentioned in [1]. Secondly, we created a sentiment analysis framework based on a lexicon-based approach and subjectivity detection by applying the Fuzzy Logic approach. In this paper that, we can clearly say that our previous Framework [1], and these approaches presented here work well because the United Kingdom is exiting the European Union now. As future work, we will continue with implementing the Fuzzy Logic approach. We’ll explore the possibility of preprocessing the text with new standards, working even more with ontologies, and see how we can integrate ontologies and Fuzzy Logic too. Finally, we aim to implement a big data architecture and real-time sentiment analysis of tweets.

References 1. Ihab, M., Soumaya, L., Mohamed, B., Haytam, H., Abdelhadi, F.: Ontology-based sentiment analysis and community detection on social media: application to Brexit. In: SCA 2019: Proceedings of the 4th International Conference on Smart City Applications, pp. 1–7 (2019). https://doi.org/10.1145/3368756.3369090 2. Gaikwad, A.S., Mokhade, A.S.: Twitter Sentiment Analysis Using Machine Learning and Ontology. ISSN (Print) 2347-6710 3. Bucur, C.: Using Opinion Mining Techniques in Tourism. https://doi.org/10.1016/s2212-567 1(15)00471-2 4. Al Nuaimi, A., Al Shamsi, A., Al Shamsi, A., Badidi, E.: Social media analytics for sentiment analysis and event detection in smart cities. https://doi.org/10.5121/csit.2018.80605 5. Kontopoulos, E., Berberidis, C., Dergiades, T., Bassiliades, N.: Ontology-based sentiment analysis of twitter posts. http://dx.doi.org/10.1016/j.eswa.2013.01.001 6. Howells, K., Ertugan, A.: Applying Fuzzy Logic for sentiment analysis of social media network data in marketing. https://doi.org/10.1016/j.procs.2017.11.293 7. Mohamed, B., Abdelhadi, F., Adil, B., Haytam, H.: Smart City Services Monitoring Framework using Fuzzy Logic Based Sentiment Analysis and Apache Spark. https://doi.org/10. 1109/icssd47982.2019.9002687 8. Morente-Molinera, J.A., Pérez, I.J., Ureña, M.R., Herrera-Viedma, E.: Building and managing fuzzy ontologies with heterogeneous linguistic information. http://dx.doi.org/10.1016/j.kno sys.2015.07.035 9. Thakor, P., Sasi, S.: Ontology-Based Sentiment Analysis Process for Social Media Content. https://doi.org/10.1016/j.procs.2015.07.295 10. Ali, F., Kwak, K.-S., Kim, Y.-G.: Opinion mining based on fuzzy domain ontology and Support Vector Machine: a proposal to automate online review classification. https://doi.org/ 10.1016/j.asoc.2016.06.003 11. Dey, L., Chakraborty, S., Biswas, A., Bose, B., Tiwari, S.: Sentiment analysis of review datasets using Naïve Bayes’ and K-NN classifier. Int. J. Inf. Eng. Electron. Bus. (IJIEEB) 8(4), 54–62 (2016). https://doi.org/10.5815/ijieeb.2016.04.07 12. Suresh, A., Bharathi, C.: Sentiment classification using decision tree based feature selection. Int. J. Control Theory Appl. 9, 419–425 (2016) 13. Kolkur, S., Dantal, G., Mahe, R.: Study of different levels for sentiment analysis. Int. J. Curr. Eng. Technol. 5(2), 768–770 (2015)

Sentiment Analysis. A Comparative of Machine Learning and Fuzzy Logic

125

14. Moraes, R., Valiati, J.F., Neto, W.P.G.: RDocument-level sentiment classification: an empirical comparison between SVM and ANN 40(2), 621–633 (2013) 15. Jagtap, V.S., Pawar, K.: Analysis of different approaches to sentence level sentiment classification. Int. J. Sci. Eng. Technol. 2(3), 164–170 (2013) 16. Tribhuvan, P.P., Bhirud, S.G., Tribhuvan, A.P.: A peer review of feature based opinion mining and summarization. (IJCSIT) Int. J. Comput. Sci. Inf. Technol. 5(1), 247–250 (2014) 17. Zadeh, L.A.: The concept of a linguistic variable and its application to approximate reasoning—I. Inf. Sci. 8(3), 199–249 (1975) 18. Murakami, S.: Application of fuzzy controller to automobile speed control system. IFAC Proc. Vol. 16(13), 43–48 (1983) 19. Mamdani, E.H.: Application of fuzzy algorithms for control of simple dynamic plant. Proc. Inst. Electr. Eng. 121(12), 1585–1588 (1974) 20. Takagi, T., Sugeno, M.: Fuzzy identification of systems and its applications to modeling and control. IEEE Trans. Syst. Man Cybern. SMC-15(1), 116–132 (1985) 21. https://en.wikipedia.org/wiki/Support_vector_machine 22. Kontopoulos, E., Berberidis, C., Dergiades, T., Bassiliades, N.: Ontology-based sentiment analysis of Twitter posts. Expert Syst. Appl. 40, 4065–4074 (2013). https://doi.org/10.1016/ j.eswa.2013.01.001 23. Cotfas, L.-A., Delcea, C., Roxin, I., Paun, R.: https://doi.org/10.1007/978-3-319-16211-9_14 24. Kontopoulos, E., Berberidis, C., Dergiades, T., Bassiliades, N.: Ontology-based sentiment analysis of Twitter posts. https://doi.org/10.1016/j.eswa.2013.01.001 25. Dragoni, M., Poria, S., Cambria, E.: OntoSenticNet: A Commonsense Ontology for Sentiment Analysis. https://doi.org/10.1109/mis.2018.033001419 26. Ali, F., Kwak, D., Khan, P., Islam, S.M.R., Kim, K.H., Kwak, K.S.: Fuzzy ontology-based sentiment analysis of transportation and city feature reviews for safe traveling. https://doi. org/10.1016/j.trc.2017.01.014 27. Zhang, L., Wang, S., Liu, B.: Deep Learning for Sentiment Analysis: A Survey 28. Polanco, X., San Juan, E.: Text data network analysis using graph approach. In: I International Conference on Multidisciplinary Information Sciences and Technology, October 2006, Mérida, Spain, pp. 586–592 (2006). ffhal00165964f 29. Deitrick, W., Hu, W.: Machine learning-based sentiment analysis for Twitter accounts. J. Data Anal. Inf. Process. 1, 19–29 (2013) 30. Wang, X., Zhang, H., Xu, Z.: Public Sentiments Analysis Based on Fuzzy Logic for Text. https://doi.org/10.1142/S0218194016400076 31. Emadi, M., Rahgozar, M.: Twitter sentiment analysis using fuzzy integral classifier fusion. https://doi.org/10.1177/0165551519828627 32. Angiani, G., Ferrari, L., Fontanini, T., Fornacciari, P., Iotti, E., Magliani, F., Manicardi, S.: A Comparison between Preprocessing Techniques for Sentiment Analysis in Twitter 33. Hemalatha, I., Saradhi Varma, G.P., Govardhan, A.: Preprocessing the Informal Text for efficient Sentiment Analysis. ISSN 2278-6856

Sentiment Analysis and Opinion Mining Using Deep Learning for the Reviews on Google Play Sercan Sari(B) and Murat Kalender Department of Computer Engineering, Yeditepe University, Istanbul, Turkey {ssari,mkalender}@cse.yeditepe.edu.tr

Abstract. Sentiment analysis and opinion mining have an important role to trace consumer behavior. With the recent advances in machine learning techniques, this issue has been addressed and come a long way in English. However, in agglutinative languages such as Turkish, it has been still one of the hot topics. In this study, we compare several classification methods and deep learning methods to make a sentiment analysis for Turkish reviews that we have collected from Google Play. We have 87.30% prediction accuracy for multinomial Naive Bayes and 95.87% prediction accuracy for deep learning model. We have significant results both from the machine learning classifiers and the deep learning model. While there is no difference between machine learning classifiers when we use different vectorizers, there is a difference when we build a deep learning model to predict target value. This model can also be applied to any data from Twitter, Facebook, or any other microblogging. Keywords: Sentiment analysis · Opinion mining Text classification · Deep learning

1

· Machine learning ·

Introduction

Behaviors of online users have transformed as the growing rate of data on the internet. According to the statistics [11], 70% of online users who like to purchase electronics, stated that they have read online reviews before purchasing the product. This situation accelerates the researches about sentiment analysis and opinion mining for online resources such as Twitter and Facebook on the internet. There are several studies [4,14,18] which analyze microblogging platforms such as Twitter and Facebook to devise people’s opinions and classify them according to the sentiment. Various kinds of information can be extracted from these resources. For example, manufacturer companies can collect information about their products, and political parties and social organizations may determine their events according to the results of this researches. With the recent advances in machine learning techniques, this issue has been addressed and come a long way in English. However, in agglutinative languages such as Turkish, it c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021  M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 126–137, 2021. https://doi.org/10.1007/978-3-030-66840-2_10

Sentiment Analysis and Opinion Mining Using Deep Learning

127

has been still one of the hot topics [13,16,25]. While there are some advantages to use microblogging especially Twitter such as a variety of data and the velocity of data [18], some of the tweets can have sarcasm in it [5] and it can be problematic to label such a huge data by just looking at some features in it [7]. These can lead to inefficient results and to degrade the performance of machine learning models. To solve these issues and to make models that predict more accurately we come up with a different approach. In the literature, there have been studies which they used movie reviews [8–10,19,24]. The approach that we have followed is similar to these studies. We use reviews on Google Play to get rid of the ambiguity of texts. Because in such statements, people directly mention their opinions and rate them before submitting them. As a result of rating before submitting, the aforementioned problems can be elucidated easily. Although, our study has similarities with the current state of literature, our main difference over current state of studies is that we build a deep neural network to do sentiment analysis and opinion mining for the Turkish text corpus. We also contribute the literature by sharing our corpus that we have prepared for this study [1]. In our study, we apply different machine learning algorithms and build a deep neural network to analyze how the reviews on Google Play in Turkey can be utilized for sentiment analysis and opinion mining purposes. We select the reviews for the following reasons: – Application reviews section is directly used to express the opinions of users. – Application reviews contain enormous texts and up to date. – There are a lot of different people using an enormous number of different applications which means different contexts. We collected a corpus of 11000 reviews from Google Play Turkey by adjusting their ratings evenly between two sets of reviews: 1. reviews belonging four and five stars as positive emotions 2. reviews belonging one and two stars as negative emotions (Table 1). Table 1. Examples of reviews on Google Play Rating Review 5

Eskiden beri oynuyorum. Oyun muhte¸sem. Kesinlikle herkese tavsiye ediyorum

4

Oyunu ¸cok be˘ gendim. Y¨ ukledi˘ gim oyunlar¨ un en g¨ uzellerinden biri

2

Telefonuma yaptı˘ gım son g¨ uncelleme ile foto˘ graftaki renkler bozulmaya ba¸sladı ˙ Program s¨ urekli olarak kendini Ingilizce yapıyor. Sorunu ¸co ¨z¨ un l¨ utfen

1

The rest of the paper will be analyzed as follows: In Sect. 2 we will mention the relevant work. In the third section, we will clarify how we collect the data

128

S. Sari and M. Kalender

for our study. The fourth will give an analysis of the corpus that we use the training models. Section 5, we present model building and the results of our study. Section 6 will conclude the paper.

2 2.1

Related Works and Background Related Works

There have been studies that focus on sentiment analysis and opinion mining in the literature. The main motivation behind these studies comes from the track-ability of consumer behavior and the wide-range of data. In 2012, Kaya et al. [13] integrate sentiment classification techniques into the domain of political news for Turkish news sites. They compare supervised machine learning algorithms which are Na¨ıve Bayes, Maximum Entropy, SVM and the character based N-Gram Language Model for sentiment analysis of Turkish political news. Kucuk et al. [16] work on named entity recognition (NER) and report experiments about NER on Turkish tweets. In 2015, Yildirim et al. [25] reports the effects of preprocessing layers on the sentiment classification of Turkish social media texts. While there are some benefits to utilize microblogging especially Twitter and Facebook such as a variety of data and the velocity of data [18], some of the microblogging texts can have sarcasm in it [5] and it can create ambiguity to work with such a huge data by just looking at some features in it [7]. These kind of features of the data may cause inefficient results. We use reviews on Google Play to get rid of the ambiguity of the texts. Because in such statements, people directly mention their opinions and rate them before submitting them. As a result of this, the aforementioned problems can be elucidated easily. In the literature, there have been other studies which they used movie reviews [8–10,19,24]. The approach that we have followed is similar to these studies. The background of our study consists of three main parts which are feature engineering, machine learning classifiers and sequential deep learning model. In the feature engineering part, we are going to explain why we need different representations of the text data to extract information from them. In the second part, we are going to mention the machine learning classifiers that we have used in this study and finally, we going to give details about deep neural network that we have used in our research. 2.2

Feature Engineering for NLP

In the natural language process (NLP), we cannot directly use the text to extract information and to build machine learning models. In order to utilize the text information, we need to convert it into numerical values. A simple and adequate model to make us able to use text documents in machine learning is called the Bag-of-Words Model, or BoW [12]. This simple model actually focuses on the occurrence of each word in a text. By using this method, we can easily encode

Sentiment Analysis and Opinion Mining Using Deep Learning

129

every text as an encoded fixed-length vector with the length of the vocabulary which we have known. In the scope of this study, we will explain the vectorization for text-based features and analyze two of the three different ways to use this model in the scikit-learn library [20] which are CountVectorizer, TfidfVectorizer, and HashVectorizer. Since we will use CountVectorizer and TfidfVectorizer, the following subsections will explain the details about them. CountVectorizer. The CountVectorizer provides and implements both occurrence counting and tokenization. It builds a vocabulary of known words. It uses this vocabulary to encode new text data. Although it is a good solution, there are some drawbacks to use CountVectorizer such as irrelevant words that occur so many times. TfidfVectorizer. There is an alternative method to calculate word frequencies. It is called Term Frequency - Inverse Document frequency (TFIDF) which is the part of the scores belonged to each word. – Term Frequency: How often a given word appears with in a text – Inverse Document Frequency: This actually adjusts words which are appear so many times in texts. 2.3

Machine Learning Classifiers

In this subsection, we are going to give some details about the machine learning classifiers that we have used in our study. Multinomial Naive Bayes. Multinomial Naive Bayes classifier is based on Naive Bayes theorem [17]. It is a simple and baseline approach to build classifiers since it is fast and easy to implement. In [21], they discuss that while it has really good efficiency, it affects the quality of results because of its assumptions. In order to eliminate such drawbacks, they introduce the multinominal Naive Bayes (MNB) method. MNB models the distribution of words in a corpus as a multinomial. They pretend the text as a sequence of words and assume that the location of words is produced independently of each other. K-Nearest Neighbors. KNN (K-Nearest Neighbors) is one of the simplest and widely used classification algorithms. KNN is a non-parametric and lazy learning algorithm and it is used for classification and regression [3]. The main idea behind the nearest neighbor method is to find a label for the new point according to the closest distance metric to it and predict the label from these. The number of neighbors can be defined by the user.

130

S. Sari and M. Kalender

Decision Tree Learning. Decision tree learning is one of the commonly used methods in predictive analysis [22]. The purpose is to build a model that predicts the right label of target variables from the input variables. A tree is created by splitting the input variables. Classification features have an important role while splitting the tree [23]. There are some advantages to use decision trees. It is simple to interpret decision trees and it uses white-box model. In our study, we pick the above classifiers and analyze the effects of different vectorizers on different models. We have also built a sequential deep learning model to predict the right target value. The deep learning network that we have used in our study can be seen in Fig. 1. We are going to explain it in detail in the Model Building and Results section. 2.4

Sequential Deep Learning Model

Deep learning is a variety of a family of machine learning methods. It is based on artificial neural networks. The use of multiple layers makes the learning process deep and it is where the adjective “deep” comes from. The sequential model is one of the simplest ways to build a model in Keras which is a deep learning framework [6]. It is built on top of TensorFlow 2.0 [2]. It provides us to build the model layer by layer. In our study, we have also built a sequential deep learning model to predict the target value according to textual information. Input Layer

Hidden Layer

Hidden Layer

Output Layer

Fig. 1. Deep neural network with 2-hidden layers

Sentiment Analysis and Opinion Mining Using Deep Learning

3

131

Corpus Collection

There are several data collection methods such as APIs, We have used web scraping methods to extract the data from Google Play Turkey. We select the most popular 112 applications and we also extract the most useful 100 reviews from each application. By doing that, we achieve to collect the most relevant reviews. We simply collected the text part of the reviews and their ratings. We eliminate the ratings that take 3 stars. We use the below procedure to label reviews according to their ratings: – Positive label for the reviews that take 4 stars or 5 stars – Negative label for the reviews that take 1 star or 2 stars These two types of labeled data are used to train a classifier to predict whether the given review has a positive or negative sentiment. In our research, we specifically use the Turkish language. Categories of 112 applications that we have selected to extract reviews can be seen in the below Table 2. Table 2. Categories of applications in Google Play Categories Art & Design

Dating

Augmented reality Daydream Auto & Vehicles

Education

Beauty

Entertainment

Books & Reference Events

4

Business

Finance

Comics

Food & Drink

Communication

Health & Fitness

Corpus Analysis

In order to build a machine learning model, we examine our data and explore new features such as length of the review, capital letter percentage in the review that we can use for classification. Before modeling, we check the correlation between extra features and the target value. The below Fig. 2 depicts the correlation between the percentage of digits and the percentage of capital letters in the reviews. Other features that we extract from our corpus are digit percentage, exclamation mark usage, length of the review, and capital letter percentage. We see that none of these features are related to the target value. Correlation values

132

S. Sari and M. Kalender

Fig. 2. Distribution between the percentage of digits and the percentage of capital letters in the reviews Data Extraction from the Google Play Store Turkey

Model Training

Exploratory Data Analysis

Algorithm Selection

Data Cleaning

Feature Engineering

Fig. 3. Overall design

such as in Fig. 2 show us there is no relation among these features and the target value which is used for sentiment analysis. Our overall design can be seen in below Fig. 3. As we have mentioned above, we have collected over 10.000 reviews from the Google Play Turkey. After we have collected the data, we have explored the data and we have shared some of our findings like in the Fig. 3. After these steps, we have started to clean our data by applying all procedures as follows:

Sentiment Analysis and Opinion Mining Using Deep Learning

– – – – – –

133

Replacing similar emoticons with a determined keyword Removing punctuations Replacing some Turkish letters with their corresponding English letters Lowercasing the text Removing digits Removing extra white spaces and punctuation

After these procedures, there are also special requirements to extract features from the text data because of its structure. The text should be parsed to remove tokens and after that, the words must be encoded as numerical values such as integer or float to be used in machine learning models. In order to that, we use the scikit-learn library which is a free software machine learning library [20]. By using scikit-learn we both perform tokenization and feature extraction of our corpus. We have used two types of feature extraction methods which are CountVectorizer and TfidfVectorizer and compared their results in terms of effects to the prediction accuracy. In our study, we both used machine learning classifiers and deep learning models for algorithm selection and model training. We are going to give the details about algorithm selection and model training parts in the following sections.

5

Model Building and Results

We build a classifier using multinomial Naive Bayes which is based on Bayes’ theorem [15]. We also build models with decision trees and K-Nearest neighbors. As we have mentioned earlier, we both use CountVectorizer and TfidfVectorizer for feature engineering and analyze the results. 5.1

Results for Machine Learning Classifiers

The bar chart in Fig. 4 shows the prediction accuracy of the machine learning classifiers which we have used in our study. As can be seen from the figure, there is no significant difference in prediction accuracy when we use CountVectorizer or TfidfVectorizer. The prediction accuracy results can be seen in Table 3. As can be seen, while there is a difference when we use a multinomial Naive Bayes classifier, there is no difference when we use different vectorizer for decision tree and KNN classifier. 5.2

Results for Deep Learning Model

As we have mentioned, we have also built a sequential deep learning model. In our model, we have 2 hidden layers and we apply dropout for the 20% of the nodes in order to avoid overfitting in our model. Figure 1 depicts the deep learning network that we have used in our study.

134

S. Sari and M. Kalender CountVectorizer

TfidfVectorizer

Prediction accuracy of test data

100

75

50

25

0 Multinomial Naive Bayes

Decision Tree

KNN

Classifier

Fig. 4. Prediction accuracy of the classifiers Table 3. Prediction accuracy table CountVectorizer TfidfVectorizer MNB 87.30% DT

83.13%

77.93%

78.09%

KNN 74.48%

74.15%

As can be seen in Table 4, our results indicate that we have 95.87% prediction accuracy on test data for TfidfVectorizer and 95.71% prediction accuracy on test data for CountVectorizer. However, as can be seen in Fig. 5 and Fig. 6, even though we exactly use the same model for both, there may be overfitting when we use CountVectorizer. Also, when we look at the results in Table 4, it can be seen that there is overfitting in the model. While the prediction accuracy is 99.73% for training data, the test data accuracy is less than training data almost 5%. This may give us a clue that there can be overfitting when we use the same model with the CountVectorizer. In order to get rid of this overfitting, we may need to optimize the parameters of the model. Table 4. Prediction accuracy results for the deep learning model Training data Test data CountVectorizer 99.73%

95.71%

TfidfVectorizer

95.87%

95.97%

Sentiment Analysis and Opinion Mining Using Deep Learning

135

Fig. 5. Training and validation accuracy for TfidfVectorizer

Fig. 6. Training and validation accuracy for CountVectorizer

6

Conclusions

It is conspicuous that sentiment analysis and opinion mining have an important role to trace consumer behavior. With the recent advances in machine learning techniques, this issue has been addressed and come a long way in English. However, in agglutinative languages such as Turkish, it has been still one of the hot topics. In this study, we compare several classification methods and deep learning

136

S. Sari and M. Kalender

methods to make a sentiment analysis for Turkish reviews that we have collected from Google Play. We have explained the machine learning classifiers that we have used and we have depicted our overall design. We have significant results both from the machine learning classifiers and the deep learning model. Our prediction accuracy results are satisfactory. We had 87.30% prediction accuracy for multinomial Naive Bayes and 95.87% prediction accuracy for deep learning model. Our experiments showed that we could have overfitting when we use different vectorizer techniques. While there is no difference between machine learning classifiers when we use different vectorizers, there is a difference when we build a deep learning model to predict target value. Although we build our model from the reviews that we have extracted from Google Play, this model can also be applied to any data from Twitter, Facebook, or any other microblogging.

References 1. https://github.com/ssari-memory/corpus-turkish-reviews 2. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Man´e, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Vi´egas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). https://www.tensorflow.org/. Software available from tensorflow.org 3. Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46(3), 175–185 (1992) 4. Anjaria, M., Guddeti, R.M.R.: Influence factor based opinion mining of Twitter data using supervised learning. In: 2014 Sixth International Conference on Communication Systems and Networks (COMSNETS), pp. 1–8. IEEE (2014) 5. Bouazizi, M., Ohtsuki, T.: Opinion mining in Twitter how to make use of sarcasm to enhance sentiment analysis. In: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015, pp. 1594– 1597 (2015) 6. Chollet, F., et al.: Keras (2015). https://keras.io ¨ Ozyer, ¨ ¨ 7. C ¸ oban, O., B., Ozyer, G.: T¨ urk¸ce twitter mesajlarının duygu analizi. In: Signal Processing and Communications Applications Conference (SIU) (2015) 8. Demirtas, E., Pechenizkiy, M.: Cross-lingual polarity detection with machine translation. In: Proceedings of the Second International Workshop on Issues of Sentiment Discovery and Opinion Mining, pp. 1–8 (2013) 9. Gezici, G., Yanıko˘ glu, B.: Sentiment analysis in Turkish. In: Turkish Natural Language Processing, pp. 255–271. Springer (2018) 10. Ghorbel, H., Jacot, D.: Sentiment analysis of French movie reviews. In: Advances in Distributed Agent-Based Retrieval Tools, pp. 97–108. Springer (2011) 11. Gordon, K.: Topic: Online reviews. https://www.statista.com/topics/4381/onlinereviews/ 12. Harris, Z.S.: Distributional structure. Word 10(2–3), 146–162 (1954) 13. Kaya, M., Fidan, G., Toroslu, I.H.: Sentiment analysis of Turkish political news. In: 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, vol. 1, pp. 174–180. IEEE (2012)

Sentiment Analysis and Opinion Mining Using Deep Learning

137

14. Khan, F.H., Bashir, S., Qamar, U.: TOM: Twitter opinion mining framework using hybrid classification scheme. Decis. Support Syst. 57, 245–257 (2014) 15. Klon, A.E., Glick, M., Davies, J.W.: Combination of a Naive Bayes classifier with consensus scoring improves enrichment of high-throughput docking results. J. Med. Chem. 47(18), 4356–4359 (2004) 16. K¨ u¸cu ¨k, D., Steinberger, R.: Experiments to improve named entity recognition on Turkish tweets. arXiv preprint arXiv:1410.8668 (2014) 17. Maron, M.E.: Automatic indexing: an experimental inquiry. J. ACM (JACM) 8(3), 404–417 (1961) 18. Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining. LREc 10, 1320–1326 (2010) 19. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing-Volume 10, pp. 79–86. Association for Computational Linguistics (2002) 20. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011) 21. Rennie, J.D., Shih, L., Teevan, J., Karger, D.R.: Tackling the poor assumptions of Naive Bayes text classifiers. In: Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp. 616–623 (2003) 22. Rokach, L., Maimon, O.Z.: Data Mining with Decision Trees: Theory and Applications, vol. 69. World Scientific, Singapore (2008) 23. Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, Cambridge (2014) 24. Vural, A.G., Cambazoglu, B.B., Senkul, P., Tokgoz, Z.O.: A framework for sentiment analysis in Turkish: application to polarity detection of movie reviews in Turkish. In: Computer and Information Sciences III, pp. 437–445. Springer (2013) 25. Yıldırım, E., C ¸ etin, F.S., Eryi˘ git, G., Temel, T.: The impact of NLP on Turkish sentiment analysis. T¨ urkiye Bili¸sim Vakfı Bilgisayar Bilimleri ve M¨ uhendisli˘ gi Dergisi 7(1), 43–51 (2015)

Smart Guest Virtual Assistant with Automatic Guest Registration Mohammed Hussain1(B) , Abdullah Hussein2 , and Mohamed Basel AlMourad1 1

2

College of Technological Innovation, Zayed University, Dubai 19282, UAE {mohammed.hussain,basel.almourad}@zu.ac.ae College of Computing and Informatics, University of Sharjah, Sharjah 27272, UAE [email protected]

Abstract. Virtual assistants are a key component of smart devices and spaces. Such assistants help users access information and accomplish tasks. In this paper, a smart virtual assistant is presented. The goal of the assistant is to identify guests arriving at organizations, converse with the guests and respond to their inquiries. The assistant makes use of face recognition to identify guests and natural language understanding to interact with them. The novelty in the presented smart assistant comes from the automatic registration of guests. Once the organization invites a guest, the assistant is copied in the email. From the invitation email, the assistant applies natural language understanding to extract guest names and affiliations. Such information is then used to acquire an image for each guest from the Internet. The paper describes a working prototype of the smart virtual assistant, which was successfully tested against a set of real invitation emails used by the authors’ institutions.

Keywords: Virtual assistant understanding · Chatbots

1

· Face recognition · Natural language

Introduction

Virtual assistants and chatbots enhance user quality of experience by responding intelligently to user requests. Gartner predicts that virtual assistants powered by artificial intelligence will replace almost 69% of managers workload by 2024 [13]. Virtual assistants such as Apple’s Siri, Amazon’s Alexa and Microsoft’s Cortana are becoming more accurate, supporting Gartner’s prediction [19]. Chatbots, such as IBM Watson Assistant, are increasingly being used by organizations to engage customers [3]. Chatbots provide faster access to information and limits the need for human involvement. Although virtual assistants and chatbots are not new, the application of deep learning [21] in this domain made significant enhancement to the provided quality of experience and prompted a renewed interest [10]. The inclusion of facial recognition component to identify users extends the functionality of virtual assistants and chatbots [5]. Beyond the use of virtual assistants in smart devices, organizations may also use virtual assistants c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021  M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 138–146, 2021. https://doi.org/10.1007/978-3-030-66840-2_11

Smart Guest Virtual Assistant with Automatic Guest Registration

139

to interact with guests. Hotels and resorts may use them to show guests around [22,25], whereas hospitals and clinics may use them to assist patients [1]. This paper presents a smart virtual guest assistant. The assistant is designed to help an organization recognize invited guests, arriving at the organization premise. The purpose of the assistant is to greet guests upon their arrival to an institution, answer frequent questions and guide guests to meeting location. The main contribution of this paper is automating the registration of guests in the system. Typically, guests need to identify themselves to the virtual assistant, then the assistant would interact with guests. One could use face recognition and preload an image of the guest to the system, which would allow the assistant to recognize the guest. In the presented virtual assistant, there is no need to manually register guests. The assistant is designed to use natural language processing to extract the names and affiliations of guests from invitation emails or announcements. Then, it crawls the web for images of the guests and supply these images to the face recognition component. The assistant is then ready to recognize guests automatically. A working prototype has been developed to read invitation emails used at the authors institution to announce seminars and events. The assistant is able to extract the names and affiliations of guest speakers. The assistant is also able to supply the face recognition module with guests images, which are acquired from the web. The authors used the last 20 email announcements as an experiment. The prototype was able to correctly identify all the guest names and their affiliations in 18 emails. To test the prototype ability to recognize the automatically registered guests, the authors used a new image per guest. All new images were correctly labeled by the prototype. One may note the assumption that correctly labeled public images must exist for the assistant to work. Although this may seems a limitation, it is normal for many companies to have public profiles of their employees. Many doctors, teachers, professors, lawyers, consultants who work at educational institutions, hospitals and law firms, have public personal pages. Such pages can be accessed by crawling the institution’s page, or by querying public search engines, such as Google Images. At the authors’ institutions, all faculty, as well as most of the staff, have public pages at the institution website. Each page has a picture of the employee along their name and other information. As these pages are indexed by Google, one may also access the employee image using Google Images. The paper is organized as follows. Section 2 describes relevant background information and technologies used by the presented smart virtual guest assistant. Section 3 presents the architecture of the assistant and illustrates the assistant operations via a case study conducted at the authors’ institution. Section 4 compares the paper to recent related research in the area of virtual assistants and discusses the limitations, as well as the ethical concerns. Section 5 then concludes the paper.

140

M. Hussain et al.

Fig. 1. IBM Watson Assistant

2

Virtual Assistants and Deep Learning

Modern virtual assistants and chatbots rely on deep learning [21], which is a type of machine learning. A deep learning model consists of multiple learning layers, where one group of layers is capable of extracting one high level feature from the input data. The more layers are added, the more features are extracted. Deep learning makes it easier for data scientists to build more accurate models. Convolutional neural networks is an example of deep learning which are popular in image processing tasks, such as face recognition [11]. Recurrent neural networks [15] is an example of deep learning used for natural language processing tasks, such as chatbots [6]. Gartner predicts that by 2022, over 75% of organizations will use deep learning instead of classical Machine Learning techniques [9]. The following subsection uses IBM Watson Assistant [2] as an example of a chatbot development platform, powered by deep learning. 2.1

IBM Watson Assistant

IBM Watson [17] was originally created as a question answering system. Watson now offers a wide range of cloud services including natural language processing, machine learning, knowledge discovery, computer vision and chatbots. IBM Watson Assistant is a platform to build chatbots that could be integrated with other IBM Watson services, such as extracting keywords and user emotions. In this paper, IBM Watson Assistant is used to build the chatbot part of the presented virtual assistant. Figure 1 illustrates components of a sample assistant, created using the IBM Watson Assistant service. User utterances are passed from the user interface to the IBM Watson Assistant service where the assistant is hosted. The assistant process the speech based on the dialog skills designed by the developer, as described below. The response is then returned to the user. IBM Watson Assistant service allows a developer to create an assistant with two skills, a search skill and a dialog skill. A search skill allows the assistant

Smart Guest Virtual Assistant with Automatic Guest Registration

141

to search using Watson Discovery service. A dialog skill models the flow of a conversation with the user. For example, one may design the dialog skill to start the conversation by greeting the user and ask about the way it can assist the user. Then, based on the user response, the dialog skill can go in different branches, where each branch handles one task. To understand the nature of the user input, one needs to create intents. Each intent describe goal of the user is trying to achieve, for example, asking about remaining balance, how to open an account, how to apply for a credit card, etc. To create an intent, the developer needs to list many examples of the ways used by the user to present the request. For example, the intent related to credit card application can have the following example utterances: ‘how can I apply for a credit card?’, ‘I need to apply for a credit card’, ‘how can I get a credit card’, etc. The assistant can be connected to web services to fetch responses to user inputs. The assistant service offers built-in integration to make assistants accessible through the web and social networks. As noted earlier, face recognition feature can be added to chatbots to extend the functionality. Amazon, IBM and Microsoft include face recognition services through their cloud offerings, that can also predict a person age, gender and race [18]. There are many open-source face recognition libraries, such as [14], which can be used to build assistants capable of recognizing faces. The next section presents our Smart Virtual Guest Assistant (SGVA), which is built on Watson Assistant and face recognition.

Fig. 2. Architecture of the Smart Guest Virtual Assistant

3

Smart Virtual Guest Assistant

The presented Smart Virtual Guest Assistant (SGVA) is built using IBM Watson and an open-source face-recognition library [14]. We make use of the IBM Watson for text analysis and for building the chatbot. Specifically, we use IBM Watson Natural Language Understanding (NLU) and IBM Watson Assistant services. Figure 2 shows the system architecture, which consists of the SGVA application server for parsing the text of the organization announcements with regards to events, as well as invitation emails. The SGVA server submits the parsed text

142

M. Hussain et al.

to the Watson NLU service for keyword and entity extraction. The NLU entities include persons and organizations mentioned in the text, as well as keywords, which can help determine the nature of the announcement. The server then matches the extracted persons to the extracted organizations. The SGVA server automatically registers the persons in the database by retrieving a facial image for that person from a potential source, such as Google Images or the person’s organization, analyzing the image with face recognition module, encoding the face and saving the encoded face into the SGVA database. Whenever a guest arrives at the organization’s reception, the SGVA client supplies the application server with guest’s image. The server uses the face recognition module to extract and encode the face of the guest. If the encoded face is very similar to one of the encodings in the database, the SVGA loads the event information from the database and activates the chatbot to interact with the guest. The SVGA client may greet the guest, offer information with regards to the event location and timing, as well as alerting event organizers with regards to guest arrival. 3.1

Entities and Keywords Extraction

The SGVA matches the persons and organizations, based on the following patterns. (i) The first pattern looks for persons who are affiliated with the organization hosting the event, as their organization is already known. (ii) The second pattern looks for a name of an organization, followed by a list of persons. In this scenario, we assume that all mentioned persons are affiliated with that organization. (iii) The third pattern looks for the name of a person, followed or preceded by the name of an organization within the same sentence. In this scenario, we assume that the person is affiliated with that organization. (iv) The fourth pattern looks for the name of a person, followed or preceded by the name of an organization within one the sentence from each other. In this scenario, we assume that the person is affiliated with that organization. (v) The fifth pattern looks for persons that are still not assigned to any organization. In this scenario, we assume that these persons are affiliated with the first preceding organization, if any. 3.2

Automatic Guest Registration and Face Recognition

Once a list of persons and their organizations are obtained with the help of Watson NLU, the SGVA server then attempts finding one public image per person from the web. Currently, the developed prototype queries Google Images to retrieve the images. The Google query is https://www.google.com/search? safe=off&site=&tbm=isch&source=hp&q=q&gs l=img, where {q} is the query consisting of a person name and affiliation. We use the first image only.

Smart Guest Virtual Assistant with Automatic Guest Registration

143

To recognize the person’s face in the retrieved image, we utilize an opensource face recognition library, written in Python [14]. The library uses a neural network with 29 convolutional layers, based on the ResNet-34 network [16]. The library achieves an accuracy rate of 99.3% on the Labeled Faces in the Wild benchmark [20]. We use the library to extract and encode the image region containing a face and save that encoding into our database. 3.3

Guest Recognition and Assistance

To assist guests arriving at the organization reception, the SGVA client needs to be installed on any smart device with a touch screen and camera. The SGVA client sends a video stream to the application server, while the server uses the face recognition library to extract faces within the stream, encode the faces and compare their encodings to the existing ones in the SGVA database to detect potential match. If a match is detected, the server sends the event information to the client. The client uses a Watson Assistant chatbot to interact and assist the guest.

Fig. 3. Processing of invitation emails, image retrieval and encoding

3.4

Use Case: Invited Speakers

A working prototype of the assistant was developed, with face recognition module already incorporated. At authors institution, seminars, as well as other events are announced via emails. Such emails are semi-structured and can be processed to learn the name and affiliation of the guest speakers. We developed a prototype that extracts names and affiliations of guests from invitation emails, retrieves guest pictures from Google Images, encodes guest faces and registers their information in the database. The chatbot is still being developed. Figure 3 shows two

144

M. Hussain et al.

sample email invitations and the output of each processing step. Please note that the process is repeated for the last 20 invitation emails sent by the authors’ institution, with 18 emails correctly parsed. That is, all persons are correctly identified with their correct affiliation, their images retrieved, face data encoded and saved in the database. The prototype was able to recognize the faces of registered guests in new images.

4

Related Work

As noted earlier, Amazon, IBM [2] and other major cloud providers offer platforms for building virtual assistants. Companies in all industries are using virtual assistants serve their customers [10]. In [24], a conceptual framework is proposed which integrates artificial intelligence with business processes in the context of smart tourism. Their framework covers cognitive engagement, process automation, decision support and forecast. A review of tools used for smart tourism development is found in [12]. In [7], the authors analyzed the interaction of users with their Amazon virtual assistant service (Alexa). Their analysis uncovered useful insights into the life style of the users, such as user interests, as well as sleeping and wakeup patterns. The contribution in our work is the use of natural language understanding to identify events that are planned within an organization and automatically setup the virtual assistant to recognize the guests upon their arrival to the organization. The use of face recognition software raises concerns with regards to privacy, as the technology requires streaming live feed of all individuals passing by a particular location [23]. This is similar to the privacy issues with regards to the use of CCTV to monitor public areas for security and safety purposes. Individuals may or may not be aware that their images are being recorded for the purpose of face recognition, which leads to ethical issues [4]. For example, ClearviewAI [8], which offers facial recognition service for law enforcement organizations around the world, has built its database by analyzing millions of public pictures from different social networks, without the consent of the social network users or the social network companies. In our work, we do not harness public images other than those for invited guests. The database stores the encoding of the face region; that is, the image is not stored. The encoding itself is immediately deleted, once the guest visit is over.

5

Conclusion

Virtual assistants and chatbots are receiving great attention due to their ability to enhance user experience. They are used in Android, iOS and Windows smart phones. Organizations rely heavily on virtual assistants and chatbots to engage their clients. The paper presented a smart guest virtual assistant, capable of automating the process of registering users in its database. A working prototype was successfully tested against a set of real invitation emails used by the authors’ institutions. The prototype uses IBM Watson Natural Language Understanding

Smart Guest Virtual Assistant with Automatic Guest Registration

145

for extracting, a variant of the ResNet-34 for facial recognition tasks and IBM Watson Assistant for developing the chatbot. The presented assistant stores an encoding of the guest image and not the actual image. That is, only a mathematical representation of the face region is stored in the database and once an event is over, all relevant guest face encodings and personal data are deleted.

References 1. Abashev, A., Grigoryev, R., Grigorian, K., Boyko, V.: Programming tools for messenger-based chatbot system organization: implication for outpatient and translational medicines. BioNanoScience 7(2), 403–407 (2016) 2. About Watson Assistant. IBM. https://cloud.ibm.com/docs/assistant?topic= assistant-index. Accessed August 2020 3. Brandtzaeg, P.B., Følstad, A.: Why people use chatbots. In: Proceedings of the International Conference on Internet Science. LNCS, vol. 10673, pp. 377–392. Springer, Thessaloniki (2017) 4. Brey, P.: Ethical aspects of facial recognition systems in public places. J. Inf. Commun. Ethics Soc. 2, 97–109 (2004) 5. Buhalis, D., Harwood, T., Bogicevic, V., Viglia, G., Beldona, S., Hofacker, C.: Technological disruptions in services: lessons from tourism and hospitality. J. Serv. Manag. 30(4), 485–506 (2019) 6. Cho, K., Van Merri¨enboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014) 7. Chung, H., Lee, S.: Intelligent virtual assistant knows your life. arXiv preprint arXiv:1803.00466 (2018) 8. Clearview AI. https://clearview.ai. Accessed August 2020 9. Costello, K.: Gartner predicts the future of AI technologies, Gartner (2020). https://www.gartner.com/smarterwithgartner/gartner-predicts-the-future-of-aitechnologies. Accessed August 2020 10. Dale, R.: The return of the chatbots. Nat. Lang. Eng. 22(5), 811–817 (2016) 11. Ding, C., Tao, D.: Robust face recognition via multimodal deep face representation. IEEE Trans. Multimed. 17(11), 2049–2058 (2015) 12. Gajdoˇs´ık, T., Marciˇs, M.: Artificial intelligence tools for smart tourism development. In: Computer Science Online Conference, pp. 392–402. Springer (2019) 13. Gartner Predicts 69% of routine work currently done by managers will Be fully automated, by 2024, Gartner Press Releases (2020). https://gartner.com/ en/newsroom/press-releases/2020-01-23-gartner-predicts-69--of-routine-workcurrently-done-b. Accessed August 2020 14. Geitgey, A.: Modern face recognition with deep learning. Medium (2016). https://medium.com/@ageitgey/machine-learning-is-fun-part-4-modern-facerecognition-with-deep-learning-c3cffc121d78. Accessed August 2020 15. Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649. IEEE (2013) 16. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. IEEE (2016)

146

M. Hussain et al.

17. High, R.: The era of cognitive systems: an inside look at IBM Watson and how it works, pp. 1–16. IBM Corporation, Redbooks (2012) 18. Jung, S.G., An, J., Kwak, H., Salminen, J., Jansen, B.J.: Assessing the accuracy of four popular face recognition tools for inferring gender, age, and race. In: Proceedings of the International Conference on Web and Social Media, pp. 624–627. AAAI (2018) 19. Kepuska, V., Bohouta, G.: Next-generation of virtual personal assistants (Microsoft Cortana, Apple Siri, Amazon Alexa and Google Home). In: Proceedings of the IEEE Annual Computing and Communication Workshop and Conference, pp. 99– 103. IEEE, Las Vegas (2018) 20. Learned-Miller, E., Huang, G., RoyChowdhury, A., Li, H., Hua, G.: Labeled faces in the wild: a survey. In: Kawulok, M., Celebi, M.E., Smolka, B. (eds.) In: Proceeding of the Advances in Face Detection and Facial Image Analysis, pp. 189–248. Springer (2016) 21. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015) 22. Lewis-Kraut, G.: Check in with the velociraptor at the world’s first robot hotel, Wired Magazine (2016). www.wired.com/2016/03/robot-henn-na-hotel-japan/. Accessed August 2020 23. Raji, I.D., Gebru, T., Mitchell, M., Buolamwini, J., Lee, J., Denton, E.: Saving face: investigating the ethical concerns of facial recognition auditing. In: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, pp. 145–151. ACM (2020) 24. Tsaih, R., Hsu, C.C.: Artificial intelligence in smart tourism: a conceptual framework. In: Proceedings of the 18th International Conference on Electronic Business, pp. 124–133 (2018) 25. Zalama, E., Garcia-Bermejo, J.G., Marcos, S., Dominguez, S., Feliz, R., Pinillos, R., Lopez, J.: Sacarino, a service robot in a hotel environment. In: Armada, M.A., et al. (eds.) First Iberian Robotics Conference, Advances in Intelligent Systems and Computing. LNCS, vol. 253, pp. 3–14. Springer, Madrid (2014)

Topic Modeling and Sentiment Analysis with LDA and NMF on Moroccan Tweets Nassera Habbat(B) , Houda Anoun, and Larbi Hassouni RITM Laboratory, CED ENSEM Ecole Superieure de Technologie Hassan II University, Casablanca, Morocco [email protected], [email protected], [email protected]

Abstract. Twitter is one of the most popular social media platforms. Due to its simplicity of use and the services provided by twitter API, it is extensively used around the world, including Morocco. It provides huge volume of information and is considered as a large source of data for opinion mining. The aim of this paper is to analyze Moroccan tweets, in order to generate some useful statistics, identify different sentiments, and extract then visualize predominant topics. In our research work, we collected 25 146 tweets using Twitter API and python language, and stored them into MongoDB database. Stored tweets were preprocessed by applying natural language processing techniques (NLP) using NLTK library. Then, we performed sentiment analysis which classifies the polarity of twitter comments into negative, positive, and neutral categories. Finally, we applied topic modeling over the tweets to obtain meaningful data from Twitter, comparing and analyzing topics detected by two popular topic modeling algorithms; Non-negative Matrix Factorization (NMF) and Latent Dirichlet Allocation (LDA). The observed results show that LDA outperforms NMF in terms of their topic coherence. Keywords: Twitter · Moroccan tweets · Sentiment analysis · Topic modeling · NLP · Python · MongoDB · LDA · NMF

1 Introduction With the rapid development of Web 2.0, a large number of users publish messages on social networks to express their opinions about different topics. Twitter is one of the most popular social media platforms. It plays a fundamental role in the diffusion of information. In fact, about 500 million tweets are published every day and around 200 billion tweets are posted every year [1]. As for Morocco, there are 453.5 thousand active users monthly in twitter [2]. Twitter allows its users to post short messages limited to 288 characters. It is considered as an important source for understanding people’s emotion (Sentiment analysis) and discovering the most discussed topics (Topic modeling). In this paper, we focused on Moroccan tweets. We collected 25 146 tweets published by Moroccan users from 26 April 2020 to 08 June 2020 and stored them in MongoDB database. After analyzing © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 147–161, 2021. https://doi.org/10.1007/978-3-030-66840-2_12

148

N. Habbat et al.

them, we managed to stand out some useful statistics, identify the sentiments contained in these tweets, and extract the most discussed topics in tweets comparing two of the most popular methods: Latent Dirichlet Allocation (LDA) and Non-negative Matrix Factorization (NMF). In this paper we make the following contributions: – Generation of useful statistics from collected tweets – Sentiment analysis on tweets – Topic modeling on Moroccan tweets comparing LDA and NMF So, the two big research questions which will be answered through data analysis are: – What are the feelings expressed through collected Moroccan tweets? And which are the most discussed subjects in these tweets? – What is the best topic model according to our use case? For that, we organized our paper as follows. Section 2 gives a brief literature review. Methods and tools are described in Sect. 3. In Sect. 4, we present our experiments results. We end with a conclusion where we summarize this paper and outline our future work.

2 Literature Survey 2.1 Sentiment Analysis There are several researches discussing opinion mining in Social Media especially Twitter, which contains posts (Tweets) written in different languages. In some researches, the authors focused on sentiment analysis of tweets written in one language. For example, in [3], they analyzed English tweets and classified them into three categories (Positive, Negative, and Neutral tweets) using two classifiers Naïve-Bayes and K-Nearest Neighbors (K-NN). They assumed that K-NN is the best classifier in their situation, because it gives more accurate predictions than Naïve-Bayes. Furthermore, the authors in [4] worked on sentiment lexicon for sentiment analysis of Saudi dialect tweets (SaudiSenti) and compared it to Large Arabic sentiment dictionary (AraSenti). For that, they used a dataset of tweets previously labeled, that comprises 5400 tweets dealing with various topics. To evaluate their experiments, they used a dataset of 1500 tweets in modern standard Arabic (MSA) and Saudi dialects and distributed over three categories: positive, negative and neutral, The results showed that because AraSenTi identified most of the neutral tweets either as positive or negative, it was outperformed by SauDiSenti in terms of the precision, recall, and F measure. Other researches, focused on sentiment analysis of tweets in relation with an event. For example, in [5], authors analyzed tweets during the 10 days of United Nations Climate Change Conference in Paris in 2015 (COP21). In their analysis, they used data collected between 30 November and 09 December 2015 (a total of 1,602,543 tweets comprising keywords in relation with COP21). They discussed the top Twitter accounts based on in-degree, out-degree, the number of tweets by day etc. In [6], the Paris attacks event that took place on November 2015 is analyzed. In this work, the authors evaluated the information spread around the world based on geo-located tweets about this event.

Topic Modeling and Sentiment Analysis

149

2.2 Topic Modeling In addition to sentiment analysis on twitter, some researchers have analyzed content of tweet (topic modeling) using different methods. In [7], authors compared three methods: Latent Semantic Indexing (LSI), Non-negative Matrix Factorization (NMF), and Latent Dirichlet Allocation (LDA). In their experiments, they adopted the dataset provided at the GitHub website, scraped from the websites of two Hindi newspapers named Amar Ujala and Navbharat, and they found that NMF model performed little better than LDA model which was better than LSI, using perplexity and coherence as metrics to evaluate the topic models. We outline below, some researches which compared two of the most popular topic modeling; LDA and NMF. The authors in [8] used the TC-W2V measure that is generally more sensitive to changes in the top terms used to represent topics and they found that NMF consistently achieves higher coherence scores than LDA in the case of 94.7% of all 300 experiments (results for k = 10 and k = 50 topics) and in their study, they limited on 210,247 English EP (European Parliament) speeches from the official website of the EP. In addition to the metrics already mentioned, there are other tools for evaluation of topic models, such us the intrinsic and extrinsic scores using UCI and UMass, which compared in [9], using Wikipedia as an external resource (its index occupies 27 Gb of disk space) and 20 newsgroup as a local resource (its index occupies 38M). In their research, they shown that intrinsic evaluation (concerning 20 newsgroups) had better performance than extrinsic using Wikipedia as an external resource, mainly because of the index size and disk usage, regarding speed, UCI performed 6% to 8% faster in comparison to UMass. In general, UMass and UCI executed reasonably well with both providing satisfactory results when estimating the correlation with human evaluation scores. In [10], the authors combined two text mining techniques (sentiment analysis and topic modeling), to analyze 16 million tweets collected using the streaming API, during 51 days from September to October 2019. The authors used Valence Aware Dictionary and sEntiment Reasoner (VADER) for sentiment analysis around Brexit with stock prices and British pound sterling. To discover the most popular daily topics of discussion on Twitter using “Brexit” as keyword, they applied LDA model for topic modeling. The same algorithms were used in [11] to analyze tweets relating to climate change and compare the results over time between different countries. Those results showed that the USA is behind in its discussion of topics concerning policies and plans to address climate change compared to the UK, Canada, and Australia.

3 Tools and Methods In this section, we present our implemented architecture. Then, we describe different tools used to generate statics from twitter data and perform sentiment analysis and topic modeling. 3.1 Proposed Architecture Figure 1 details the proposed architecture of our system. Our system is based on many layers; each layer performs a task ranging from collecting data (tweets) to visualization of results. These layers will be described in the next sections.

150

N. Habbat et al.

Fig. 1. Proposed architecture

As shown in the figure above, Twitter data has been collected and stored into MongoDB database using tweepy and pymongo libraries. We only selected tweets geolocated in Morocco, then those tweets were preprocessed, to generate some statistics, perform a sentiment analysis, and visualize topics extraction comparing two topic models: LDA and NMF. Different tools and algorithms used in this architecture are described below. 3.2 Sentiment Analysis and Statistics of Twitter Data Tweepy Tweepy [12] is an open source Python library that gives a very convenient way to access the Twitter API with Python. To represent Twitter’s models and API endpoints, Tweepy includes a set of classes and methods, and transparently manage various implementation details, such as: • • • • • •

Data encoding and decoding HTTP requests Results pagination OAuth authentication Rate limits Streams

MongoDB MongoDB [13] is a distributed, universal, document-based database. It is classified

Topic Modeling and Sentiment Analysis

151

as a NoSQL database program; it has been designed by MongoDB Inc. For modern application developers, and licensed under the Server Side Public License (SSPL). A database in mongoDB is organized into collections, and each collection contains documents. In our case, the tweets are stored in collections. To manipulate our mongoDB data base using python, we used PyMongo driver presented below. PyMongo PyMongo [14] is a Python distribution containing tools for working with MongoDB. This library allows Python scripts to connect and perform the CRUD (Create/Read/Update/Delete) operations on the mongoDB database. Seaborn and Matplot Seaborn [15] is a Python data visualization library based on MatPlotLib. It gives a high-level interface to draw attractive and informative statistical graphics, MatPlotLib [16] is a 2D plotting library for the Python programming language and its numerical mathematics extension NumPy. It was inspired by MATLAB in the start. It provides an object-oriented API for embedding plots into applications using generalpurpose GUI toolkits like Tkinter, wxPython, Qt, or GTK+. It can be used in: • • • • •

Python scripts, The Python and IPython shells, The Jupyter notebook, Web application servers, Four graphical user interface toolkits.

MatPlotLib allows us to generate plots, histograms, power spectra, bar charts, errorcharts, scatterplots, etc., using just a few lines of code. 3.3 Topic Modeling Algorithms LDA Latent Dirichlet Allocation [17] is a generative probabilistic model used for topic extraction in a given text collection. These topics are not strongly defined – as they are identified on the basis of the probability of co-occurrences of words contained in them. As shown in Fig. 2, the boxes are “plates” that represent replicates: The upper outer plate represents documents, The upper inner plate denotes the repeated choice of topics and words within a document; The lower plate marks the latent topics hidden in the document collection. Formally, we define the following terms: A word is the basic unit of discrete data, defined to be an item from a vocabulary indexed by {1,…, V}. These words are represented using unit-basis vectors that have a single component equal to one and all other components equal to zero. Consequently,

152

N. Habbat et al.

Fig. 2. Hierarchical graphical model for LDA.

using superscripts to denote components, the vth word in the vocabulary is represented by a V-vector w such that wv = 1 and wu = 0 for u = v. A sequence of N words is a document, denoted by D = (w1 , w2 , . . . , wN ), where wn is the n-th word in the sequence. A corpus is a collection of M documents di denoted by C = (d1 , d2 , . . . , dM ). The generative process for each document in the text archive is illustrated as follows: 1. For the m-th (m = 1, 2, · · ·,M) document d in the whole M document-corpus, choose θm ∼ Dirichlet(α); 2. For each word wm,n in the document d: (a) Choose topic assignment zm,n ∼ Multinomial (θm ); (b) Find the corresponding topic distribution ϕzm,n ∼ Dirichlet(β); (c) Sample a word wm,n ∼ Multinomial (ϕzm,n , ). By repeating the procedures in the above generative process for M times with each corresponding to one document, which is plotted in graphical model language with “boxes” and “circles” as Fig. 1 (α and β are two Dirichlet-prior hyper-parameters). Obviously, we can easily arrive at the probability of the D corpus: P(D|α, β) =

M m=1

∫ P(θm |α) F(θ, ϕ)dθm

(1)

Topic Modeling and Sentiment Analysis

153

    P zm,n |θm P wm,n |ϕzm,n .

(2)

Where: F(θ, ϕ) =

N n=1

 zm,n

Nmf Non-negative Matrix Factorization (NMF) [18], is a linear algebraic optimization algorithm, that factors the original high-dimensional data into a low-dimensionality representation with non-negative hidden structures, these data are viewed as coordinate axes in the transformed space with geometric perspectives. Briefly, NMF tries to express the complex source matrix as a product of two matrices with much lower dimensionally. Suppose we factorize a matrix X into two matrices W and H so that X ≈ WH, NMF has an inherent clustering property, such that W and H represent the following information about X: ✓ X (Document-word matrix)—input that hold which words appear in which documents. ✓ W (Basis vectors)—the topics (clusters) discovered from the documents. ✓ H (Coefficient matrix)—the membership weights for the topics in each document. We can calculate W and H by optimizing over an objective function (like the EM algorithm [19]), in parallel we update both W and H iteratively until convergence. In the following objective function, we measure the error of reconstruction between X and the product of its factors W and H, based on Euclidean distance: n m  2 1 2 X − WH = Xij − (WH)ij i=1 j=1 F 2

(3)

Using the objective function, we get the following values of W and H, which can be derived via the update rules: Wic ← Wic ·

(XH)ic (WHH)ic

Hcj ← Hcj ·

(WX)cj (WWH)cj

These updated values are calculated in parallel operations, and we re-calculate the reconstruction error, repeating this process until convergence, and using the new W and H. Topic Coherence UCI UCI (or CV measure) [7] is automatic coherence measure to assess topics depending their understandability, this coherence measure handles words as facts and restricts to be always based on comparing word pairs. UCI is derived from Pointwise Mutual Information (PMI) [20] that used to calculate words associations and word sense disambiguation, PMI measures how much one variable tells about the other and is formally defined as following:   p wi , wj   (4) PMI(wi , wj ) = log p(wi )p wj

154

N. Habbat et al.

Where the mutual information between words wi and wj compares the likelihood of observing the two words together to the likelihoods of observing them independently. Then, the UCI measure is as follows:     p wi , wj + ε   (5) ScoreUCI wi , wj = log p(wi )p wj p(w  The probability of word wi appears in the corpus,  i ): pwj : The  probability of word wj appears in the corpus, p wi , wj : The probability of the word wi appears together with the word wj on the corpus. UMass Measure UMass [11] calculates the correlation of words in a given document based in conditional probability. The conditional probability of an event wi given that event wj has happened is:        p wi wj   ; P wj > 0 P wi |wj = (6) p wj Applying this concept, the equation of UMass measure is defined as follows:     D wi , wj + ε ScoreUMass wi , wj = log D(wi )   D wi , wj : The number of documents that contain words wi and wj . D(wi ): The number of documents containing wi . Being wi always a word with more frequency than wj .

(7)

4 Experiments and Empirical Results In this section, we describe the collected data, and the different steps of pre-processing using NLP techniques. Finally, we present the results of our research work. 4.1 Data Collection and Storage In this research, we collected 25 146 tweets published by Moroccan users from 26 April 2020 to 08 June 2020 and stored them in MongoDB database (Table 1). Table 1. Description of the collected Dataset. Start date

End date

Number of collected tweets

Apr 26, 2020 June 08, 2020 25 146 Tweets

Topic Modeling and Sentiment Analysis

155

We used Twitter API (provided by Twitter Platform) to pull data from Twitter. After creating an account on: https://apps.twitter.com, we had the authorization to access the database by using four secret keys (consumer key, consumer secret key, access token and access secret token) and collect tweets using the REST API. To get Moroccan tweets written in one of the standard languages (Arabic, French and English), we filter tweets by place and language. To handle these data, we used Python library Tweepy, and to store the collected data in a MongoDB database we used Python library Pymongo. 4.2 Preprocessing Data Our stored tweets were preprocessed using natural language processing techniques. We present, below, the most important procedures we applied: 1. Translation of Arabic and French texts to English using a python script based on google translate, 2. Conversion of all letters to lower case, 3. Removal of stopwords and punctuation using NLTK Python library [21] that provides a list of stopwords as well as punctuation symbols for many languages 4. Elimination of hyperlinks, hashes (tags were preserved) and usernames with preceding “@” which expresses a reply to another user 5. Removal of words with less than three characters 6. Tokenization which consists of broking sentences into sections (tokens). For instance, consider the sentence, before tokenization: Never give up!, and after tokenization it comes: {‘Never’, ’give’, ‘up’,’!’} 7. Stemming technique which consists of eliminating suffixes and affixes and getting the word base, For example: studying, study, student to study, Argue, arguing, argues to argue 8. Lemmatization of words (e.g., converting each word to its base form). We used Wordnet Lemmatizer, and we provide for each word its part of speech tag (POS TAG) (e.g., noun, verb etc.) using pos_tag method of NLTK library. Like studied to study, learned to learn etc. 4.3 Experimental Results In this Section, we present the results of our analysis on Moroccan tweets; statistics, sentiment analysis and comparative study between NMF and LDA models. Moroccan Tweets Statistics We used seaborn [15] to visualize some statistics about collected Moroccan tweets concerning used languages (Fig. 3), places (Fig. 4) and the different types of accounts (Fig. 5). Matplotlib enables many others libraries to run and plot on its base including Wordcloud, used to generate Fig. 6. By analyzing collected Moroccan tweets, we found that the most used language is Arabic, followed by French. English language comes in third position as shown in Fig. 3.

156

N. Habbat et al.

Fig. 3. Statistics about used languages active in collected tweets

Fig. 4. Statistics about Moroccan’s most places in terms of published tweets.

In Fig. 5, the bar graph presents Moroccan’s most active places in terms of posted tweets. The first one is Anfa, and El Maarif (in Casablanca city), followed by Agdal (in Rabat city). Tangier, Gueliz (in Marrakech city), and Oujda city come last.

Fig. 5. Statistics about types of Twitter accounts.

Fig. 6. WordCloud of Moroccan #Hashtags.

Figure 5 shows that 99.7% of the accounts used in the analyzed tweets are individual accounts, whereas 0.3% of them are organization accounts. Finally, Fig. 6 presents the Wordcloud of #Hashtags mentioned in collected tweets. It shows that the top cited words are: “Morocco”, “Corona_Maroc”, “Voyage”, “Travel”, “Repost”, “Ramadan” and “COVID_19”. Sentiment Analysis In order to determine the general emotion of each collected tweet, we used textblob library [22] which allows us to do sentiment analysis in a very simple way. Textblob provides a trained model generated using Naïve Bayes algorithm. This model calculates a polarity score which is a float within the range [−1.0, 1.0], where

Topic Modeling and Sentiment Analysis

157

negative value indicates text expressing a negative sentiment and positive value indicates text expressing a positive sentiment. We applied this model to our pre-processed tweets. The table below shows some examples of tweets and their calculated polarity (Table 2): Table 2. Example of tweets and their calculated polarity.

We created three Dataframes for the three categories: positive for the tweets with polarity > 0, negative for the tweets with polarity < 0, and neutral for tweets with polarity = 0. Then, we counted the number of tweets within each category and extracted the percentages as shown in Table 3. Finally, we classified them by places as shown in Fig. 7. Table 3. Percentages of tweets in the three categories # of tweets % Positive % Negative

% Neutral

25 146

51.41%

35.85%

12.73%

Topic Modeling In order to discover the hidden topics that occur in our preprocessed tweets, we implemented two modeling algorithms; LDA and NMF. Firstly, we created a dictionary from the document collection using gensim package [23]. The dictionary created is a collection of unique terms in the document collection. This dictionary was then used to create a document-term matrix. This document-term matrix is used by each of the models. For NMF, the same pre-processed corpus documents were transformed to log-based Term Frequency-Inverse Document Frequency (TF-IDF) vectors, after we used gensim to get the best number of topics with the coherence score and then use that number of topics for the sklearn [24] implementation of NMF. In the case of LDA, the MALLET [25] implementation was applied to the sets of document feature sequences. Using the coherence score, we can run the model for different numbers of topics and then use the one with the highest coherence score.

158

N. Habbat et al.

Fig. 7. .Sentiment analysis of tweets by places.

In two cases, topics were discovered for values of k  [5; 70] (intervals of 5) as number of topics, and we calculated the coherence using CV (UCI) and UMass to compare NMF and LDA algorithms:

Fig. 8. Comparison in UCI score between LDA and NMF

Fig. 9. Comparison in UMass score between LDA and NMF

We observed in Figs. 8 and 9, that LDA achieves higher topic-coherence scores across all of number of topics. Finally, we used sklearn and gensim packages in Python to run NMF and LDA topic models, respectively. As shown in Table 4 and 5, we selected the four most frequent topics detected by our models, each topic comprising of 10 words. For example: Topic 01 in NMF model is about events in Morocco, topic 02 is about weather, topic 03 about feelings, and topic 04 about an event in USA concerning George Floyd death. Concerning LDA, topic 01 is about Corona virus, Topic 02 is about quarantine, topic 3 is about George Floyd death in USA, and Topic 04 is about holidays. After inspecting the above topics, we noted that results generated by LDA are more meaningful than the ones modeled by NMF.

Topic Modeling and Sentiment Analysis

159

Table 4. Topics detected by applying NMF Topic #01

Topic #02

Topic #03

Topic #04

sahara laayoune_western true story morocco meaning people quarantine ramadan episode

wind_kmh clouds_humidiy humidity temperature marrakech overcast rabat cloud Agadir humidity_wind

beautiful happy really good nature good_morne shining love come day

trump update think forever speedy strategy black racism wrong murder

Table 5. Topics detected by applying LDA Topic #01

Topic #02 Topic #03 Topic #04

family virus end moment thought second together stay corona chance

world year still quarantine home job covid_19 feeling darkness morocco

black_live guy police racism attack urgent love trump wrong black

friend work thing nature well picture back shining cool happy

Next to the quality, it is important to know the costs of topic modeling algorithms. Therefore, we have analyzed their runtimes; during the experiment, we used a dataset limited on English tweets and number of topics (k = 10) to analyze the runtimes of our models. As result, we observed that the time taken by LDA was 01 min and 30.33 s, while the one taken by NMF was 6.01 s, so NMF was faster than LDA.

5 Conclusion and Future Work In this paper, we investigated the tweets posted by Moroccan users, calculating some useful statistics and visualizing the results using different graphs. We focused in particular, on the distribution of used languages, the most active places in terms of published tweets, the type of accounts (particular or organization), and the most frequently #Hashtags used in these tweets (WordCloud). Moreover, we performed a sentiment classification of tweets into three categories: positive, negative and neutral. We also compared two of most popular topic models; LDA and NMF in order to dig out topics distribution under Moroccan tweets, which will lead to better understanding of

160

N. Habbat et al.

the Moroccan mood. In our experiment, we evaluated the models using topic coherence (UCI and UMass), meaning of topics and runtime of algorithms, and we deduced that LDA outperforms NMF if runtime is not a constraint. In brief, the results shows that Twitter users in Morocco post more neutral tweets (51.41%), Casablanca is the most active place in terms of published tweets, and the most used language in tweets is Arabic. Moreover, the different discussed topics are diverse, but some topics are more prevalent than others, and the most remarkable topics concerned covid_19, George Floyd death and different events during studied period like Ramadan and holidays. Future research efforts will be devoted to using Big Data plateforms (e.g., Hadoop, Spark) in order to store more tweets in a distributed way and accelerate their processing using different deep learning algorithms.

References 1. «Twitter Usage Statistics - Internet Live Stats». https://www.internetlivestats.com/twitter-sta tistics/. (consultéle févr. 19, 2020) 2. «DataReportal – Global Digital Insights». [En ligne]. Disponible sur: https://datareportal. com. [Consulté le: le mars 25, 2020] 3. Tripathi, P., Vishwakarma, S., Lala, A.: Sentiment analysis of English tweets using rapid miner. In : 2015 International Conference on Computational Intelligence and Communication Networks (CICN), Jabalpur, India, pp. 668–672 (2015). https://doi.org/10.1109/CICN.201 5.137 4. Al-Thubaity, A., Alqahtani, Q., Aljandal, A.: Sentiment lexicon for sentiment analysis of Saudi dialect tweets. Proc. Comput. Sci. 142, 301–307 (2018). https://doi.org/10.1016/j.procs. 2018.10.494 5. Wang, X., Yu, Y., Lin, L.: Tweeting the United Nations climate change conference in Paris (COP21): an analysis of a social network and factors determining the network influence. Online Soc. Netw. Med. 15, 100059 (2020). https://doi.org/10.1016/j.osnem.2019.100059 6. Cvetojevic, S., Hochmair, H.H.: Analyzing the spread of tweets in response to Paris attacks. Comput. Environ. Urban Syst. 71, 14–26 (2018). https://doi.org/10.1016/j.compenvurbsys. 2018.03.010 7. Ray, S.K., Ahmad, A., Kumar, C.A.: Review and implementation of topic modeling in Hindi. Appl. Artif. Intell. 33(11), 979–1007 (2019).https://doi.org/10.1080/08839514.2019. 1661576 8. Greene, D., Cross, J.P.: Exploring the Political Agenda of the European Parliament Using a Dynamic Topic Modeling Approach. ArXiv160703055 Cs, juill. 2016, Consulté le: mai 30, 2020. [En ligne]. Disponible sur: https://arxiv.org/abs/1607.03055 9. Pasquali, A.R.: Automatic coherence evaluation applied to Topic Models (2016) 10. Ilyas, S.H.W., Soomro, Z.T., Anwar, A., Shahzad, H., Yaqub, U.: «Analyzing Brexit’s impact using sentiment analysis and topic modeling on Twitter discussion», p. 7 (2020) 11. Dahal, B., Kumar, S.A.P., Li, Z.: Topic modeling and sentiment analysis of global climate change tweets. Soc. Netw. Anal. Min. 9(1), 24 (2019). https://doi.org/10.1007/s13278-0190568-8 12. «Tweepy». [En ligne]. Disponible sur: https://www.tweepy.org/. [Consulté le: 25-nov-2019] 13. «The most popular database for modern apps», MongoDB. [En ligne]. Disponible sur: https:// www.mongodb.com. [Consulté le: 25-nov-2019]

Topic Modeling and Sentiment Analysis

161

14. Siddharth, S., Darsini, R., Sujithra, D.M.: Sentiment Analysis On Twitter Data Using Machine Learning Algorithms in Python, p. 15 15. «seaborn: statistical data visualization—seaborn 0.10.0 documentation». [En ligne]. Disponible sur: https://seaborn.pydata.org/. [Consulté le: 12-févr-2020] 16. «Matplotlib: Python plotting—Matplotlib 3.1.3 documentation». [En ligne]. Disponible sur: https://matplotlib.org/. [Consulté le: 12-févr-2020] 17. Blei, D.M. : Latent Dirichlet Allocation, p. 30 18. Chen, Y., et al.: Experimental explorations on short text topic mining between LDA and NMF based schemes. Knowl.-Based Syst. (2018). https://doi.org/10.1016/j.knosys.2018.08.011 19. Mclachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions. John Wiley (2007) 20. Nugraha, P., Rifky Yusdiansyah, M., Murfi, H.: Fuzzy C-means in lower dimensional space for topics detection on Indonesian online news. In: Tan, Y., Shi, Y. (eds.) Data Mining and Big Data, vol. 1071, pp. 269–276. Springer, Singapore (2019) 21. «Natural Language Toolkit—NLTK 3.4.5 documentation». [En ligne]. Disponible sur: https:// www.nltk.org/. [Consulté le: 17-févr-2020] 22. Loria, S.: textblob Documentation, pp. 1–73 (2018) 23. Rehurek, R.: «gensim: Python framework for fast Vector Space Modelling». [En ligne]. Disponible sur: https://pypi.org/project/gensim/. [Consulté le: 26-févr-2020] 24. «scikit-learn: machine learning in Python—scikit-learn 0.23.1 documentation». https://scikitlearn.org/stable/ (consulté le juin 06, 2020) 25. McCallum, A.: Mallet: A Machine Learning for Language Toolkit (2002)

Smart Education and Intelligent Learning Systems

A Deep Learning Model for an Intelligent Chat Bot System: An Application to E-Learning Domain Ben Ahmed Mohamed, Boudhir Anouar Abdelhakim(B) , and Saadna Youness List Laboratory, Faculty of Sciences and Techniques, Tangier, Morocco {mbenahmed,aboudhir}@uae.ac.ma, [email protected]

Abstract. Nowadays the use of Chatbots is very popular in a large scale of applications especially in systems that provide an intelligence support to the user. In fact, to speed up the assistance, in many cases, these systems are equipped with Chatbots that can interpret the user questions and provide the right response, in a fast and correct way. This paper proposes an intelligent chat bot system able to give a response in natural language or audio about an image or natural language question in different domain of education. This system will support multiple language (English, French and Arabic). In this System, we used different Deep learning architecture (CNN, LSTM, Transformers), Transfer learning to extract image features vector, Computer Vision and Natural Language Processing techniques. Finally, after the implementation of the proposed model, a comparative study was conducted in order to prove the performance of this system using Image-response model and question-response model using accuracy and BLEU score metrics. Keywords: Chatbot · Artificial intelligence · Education

1 Introduction The Artificial Intelligence and his advanced technology has many advantages in several domains, industry, economy, agriculture, education and more. Many researchers concluded to the importance of Artificial Intelligence in improving the human life and especially in education. Therefore, chatbots in industry and education has increased considerably in recent years. Most of them are used as customer support or as a tutor for students. In both cases, the Chatbot is trained to perform a question/response task. On the other hand, systems are known to respond user questions on a specific topic or level and do not support all types of questions (natural language question, image). The aim of this paper is to design and implement a Chatbot that can cover multiple levels of education, support all types of questions (natural language question, image) and support multi-language questions (English, French, Arabic). The system act as an intelligent robot that explain, in different languages, the given image or text in the input by giving the response as text or audio in the output. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 165–179, 2021. https://doi.org/10.1007/978-3-030-66840-2_13

166

B. A. Mohamed et al.

2 Related Works In the literature, there are many approaches related to Chatbots, in particular on e-learning systems. From the beginning of the last decade, the use of artificial intelligence as e-learning support has captured the interest of many researchers for its many applications. One of these research works is [1], in which Farhan M. et al. using a web bot in an e-learning platform, to address the lack of real-time responses for the students. In fact, when a student asks a question on e-learning platform the teacher could answer at a later stage. If there are more students and more questions, this delay increases. Web bot is a web-based Chatbot that predicts future events based on keywords entered on the Internet. In this work Pandora is used, a bot that stores the questions and answers it on XML style language i.e. Artificial Intelligence Markup Language (AIML). This bot is trained with a series of questions and answers: when it cannot provide a response to a question, a human user is responsible for responding. In the last recent years, some interesting research works can be found. In [2] Niranjan M. et al. discussed about an interesting approach using Bayesian theory to match the request of student and furnish the right response. In particular, Chatbot agent accepts to student’s answers and extracts the keywords from the question using a lexical parser, and then the keywords are compared with the category list database. The Bayesian probabilities are obtained for all categories in the list. Once the category is selected, keywords are compared with the questions under the category using Bayesian probability theory. The answer to the question, which has the highest posterior probability, is then fed into the text to speech conversion module and thus the student receives the answer to his question as a voice response. In [3] Satu S. et al. many Chatbot applications based on AIML are analyzed: in particular, an integrated platform that consists of a basic AIML knowledge is presented. In this project, Chatbot is called Tutorbot because it is functionality backing of didactics done in E-learning environments. It contains some features as natural language management, presentation of contents, and interaction with search engine. Besides, e-learning platforms work is linked to indispensable services to web service. A continuous monitoring service has been created on e-learning platform servers, which is another controlling machine: Daemon. In [4] Nordhaug O. et al. proposed a game based e-Learning tool called The Forensic Challenger (TFC), used to teach digital forensic investigation. A Chatbot inside the learning platform helps students. A multiple-choice question based quiz is implemented for kinesthetic learners, and there is a pedagogical Chatbot agent that assists users. It provides easy navigation and interaction within the content. The Chatbot is implemented to be a pedagogical agent for the users, which is meant for discussions and help with the topics. It also acts as a navigation tool and can play video or use the advanced wiki if there are somethings to ask. In [5] Nenkov N. et al. have investigated about the realization of intelligent agents on platform IBM Bluemix with IBM Watson technology. These agents in the form of Chatbots have to automate the interaction between the student and the teacher within the frames of Moodle learning management system. Watson is a cognitive system that combines capabilities in Natural Language Processing, analytics, and machine learning techniques. In this case, Facebook Messenger Bot GUI Builder realizes a Chatbot through

A Deep Learning Model for an Intelligent Chat Bot System

167

the Facebook Messenger to simplify communication between a teacher and a student: it could be arranged by acquiring Moodle test basis.

3 AI Algorithms Background 3.1 Convolutional Neural Network (CNN) CNN is used to extract the feature vector from an image. There are two phases in CNN: feature extraction and output prediction. In our case, we used only the first phase features extraction using a pre-trained model to extract features form images and then fed them to our model. There are two layers in feature extraction: convolution layer and sub sampling (pooling) layer. After convolution layer, the obtained feature goes through an activation function. In the convolutional layer, there are series of matrix multiplications followed by summation operation. 3.2 Long Short Term Memory (LSTM) LSTM [6] resolves the problem of vanishing gradient and getting successes in natural language processing applications like machine translation, speech recognition etc. LSTM cell consists mainly three gates i.e. input gate, output gate and forget gate and a cell state. Actually LSTM says what is the relevant part of that a network has learned and what to forget.

Fig. 1. State diagram of LSTM cell.

In Fig. 1, ft , it , ot are variables; forget gate, input gate and output gate, respectively. Wf , Wi , Wo are the corresponding weight vectors and bf , bi , bo are the bias. Ct−1 , ht−1 are values of previous cell and previous hidden sate respectively at time t − 1. ct is the value of cell state at time t; ht is the output vector of LSTM cell at time t. Cell state is long-term memory, it represents all the learning at over time, and hidden state (is like a current memory). Forget gate is called remember vector that learns what to forget and what to remember. Input gate is also called the save vector that determines

168

B. A. Mohamed et al.

number of inputs getting into the cell state, and output gate is also called focus vector that is akin to an attention mechanism (what part of data should be focused on). Actually, these gates of LSTM are perception i.e. single layer neural network. So LSTM functionality is mainly based on forgetting, remembering and paying attention of data. The mathematical formulation LSTM is the following:     (1) ft = σ Wf · Ct−1 , ht−1 , xt + bf     it = σ Wi · Ct−1 , ht−1 , xt + bi

(2)

    Ct = Wc · ht−1 , xt + bc "Memory cell state”

(3)

  ct = ft · Ct−1 + [it ] · Ct

(4)

    ot = σ Wo · Ct−1 , ht−1 , xt + bo

(5)

ht = ot · [(Ct )]

(6)

In the Eq. (1), the output of the forget gate (ft ) denotes the cell state that what to forget by multiplying 0 to the particular position in the LSTM matrix. The information is remembered if ft is equal to 1. Here sigmoid activation function σ is applied to the weighted input and previous hidden state. In the Eq. (2, 3), the output of input gate (it ) is a sigmoid function ranged of [0, 1]. That is why sigmoid function is not capable to forget the information of the cell state. The output of input modulation gate (Ct ) is Tanh activation function ranged of [−1, 1]. It permits the cell state to forget the information. Equation (4) is called cell state equation. The previous cell state Ct−1 forgets information multiplying by output of forget gate ft and adds new information through the output of input gate (it ). Equation (5) is called output gate equation. The output of the output gate equation (ot ) defines all possible values from LSTM matrix which must be moving forward to the next hidden state. In the Eq. (6), ht is output of hidden state equation that defines what information we should take for next sequence. 3.3 Transformers Here, the encoder maps an input sequence of symbol representations (x1, …, xn) to a sequence of continuous representations z = (z1, …, zn). Given z, the decoder then generates an output sequence (y1, …, ym) of symbols one element at a time. At each step the model is auto-regressive [10], consuming the previously generated symbols as additional input when generating the next. The Transformer follows this overall architecture using stacked self-attention and point-wise, fully connected layers for both the encoder and decoder, shown in the left and right halves of Fig. 4 in the next section, respectively.

A Deep Learning Model for an Intelligent Chat Bot System

169

4 System and Proposed Architecture 4.1 System Architecture The architecture of our system (see Fig. 2) is composed of Front-End, Back-End, Model and Database. The first module represents the presentation layer (front-end) through providing a user-friendly interface; it consists of different kinds of device like smartphones, tablets, PCs and so on. The Back-End used to manage operations that are not seen by end user like: • • • •

Users registration and authentication. Predicting responses for question or image input. Converting text to speech to make an audio response. Handling multiple languages (English, French, Arabic) using translation API.

This module works in the background to better satisfy user demand: it handles business logic and data storage, working in collaboration with the model part. The model module is where we have our deep learning models. The Database module used to store data of users and dataset after cleaning.

Fig. 2. System architecture

4.2 Proposed Architecture In this sub-section, we discuss each tool that we use in the model and how they work together for solving this heterogeneous problem.

170

B. A. Mohamed et al.

4.2.1 Image-Response Model This model combines between two families of artificial intelligence, Computer Vision and Natural Language Processing. In this model and as shown in Fig. 3, we have combined the results of two different models:

Fig. 3. Image-response model

– The response will be pre-processed before being indexed and encoded using our vocabulary built from the pre-processed response tokens of the whole corpus.

A Deep Learning Model for an Intelligent Chat Bot System

171

– The image will be passed to a pre-trained convolutional neural network (CNN), in order to extract the features vector of image using the Transfer Learning Fixed Feature Extracted method. – Then we will pass the vector of image features and the vector of response to our model encoder. – The Image Feature Vector will pass through a Dropout layer to avoid Model Overfitting and a Dense layer (fully connected layer) to obtain a 256-dimension vector output. – The response vector will go through an Embedding layer to make the correlation between the words, then a Dropout layer to avoid model overfitting and an LSTM layer to obtain an output vector of dimension 256. – As the two outputs of the last two layers have the same 256 dimension we will merge them with an Add layer, the output is the output of our Encoder that we will pass to the Decoder. – The output of the Encoder will be passed to the Decoder of which we have two Dense layers (fully connected layer). The last Dense layer contains a Softmax activation function to generate the probability distribution for the 2124 words in the vocabulary we have. The main idea of this approach is to repeat the image vector n times, where n is the length of the response that is fixed for all the responses corpus; then these resulting vectors will be passed to an Encoder and a Decoder generating a response at the end. An Encoder is generally used to encode a sequence, in our case the sequence is the two vectors, image vector and first sequence of the response vector, to combine them and pass them to a Decoder, in order to generate a probability distribution. To obtain the next word, we select the word with a maximum probability at each time step using Greedy Search Algorithm. 4.2.2 Question- Response Model Here we used the transformers architecture [7] without changing anything in the global architecture. We focused on changing the hyperparameters until we get high results and adapt the model to our problematic (see Fig. 4). In our system, we used the same architecture as in [7] and we changed some hyperparameters to achieve high results.

5 Implementation 5.1 Dataset To perform the implementation we used the datasets mentioned below: • SciTail [8]: The SciTail dataset is an entailment dataset created from multiple-choice science exams and web sentences. Each question and the correct answer choice are converted into an assertive statement to form the hypothesis. • ARC [9]: A new dataset of 7,787 genuine grade-school level, multiple-choice science questions, assembled to encourage research in advanced question answering. The dataset is partitioned into a Challenge Set and an Easy Set.

172

B. A. Mohamed et al.

Fig. 4. Question-response model (Transformers architecture)

• SciQ [10]: The SciQ dataset contains 13,679 crowdsourced science exam questions about Physics, Chemistry and Biology, among others. The questions are in multiplechoice format with 4 answer options each. • Question Answer Dataset [11]: There are three question files, one for each year of students: S08, S09, and S10, as well as 690,000 words worth of cleaned text from Wikipedia that was used to generate the questions. • Physical IQA [12]: Physical Interaction QA, a new commonsense QA benchmark for naive physics reasoning focusing on how we interact with everyday objects in everyday situations. • AI2 science [13]: The AI2 Science Questions dataset consists of questions used in student assessments in the United States across elementary and middle school grade levels. Each question is 4-way multiple-choice format and may or may not include a diagram element.

A Deep Learning Model for an Intelligent Chat Bot System

173

• Image Answer Dataset: this is a dataset that we collect using google forms and it contains about 1200 image answer in different domain (Physic, Biology, Computer science,…) and level (primary, school, high school and university) of education. 5.2 Hardware Component We trained the Image-Response and the Question-Response models on one machine with the following specification (Table 1): Table 1. Hardware specification Item

Value

Processor

i7-8550U

RAM

24Go

Storage

1To HDD + 256Go SSD

GPU

NVidia GeForce MX130

VRAM

2Go

Operating System Windows 10 Pro 64bit

5.3 Software Component In this section, we are going to list the languages and libraries we used to develop the system. As described in the previous figure (see Fig. 2), the soft architecture ids described as below • Front-End To develop the Front-End we used the ReactJS framework with other front-end tools as Bootstrap, JavaScript, CSS3 and HTML5. • Back-End For the Back-End, we used Django REST Framework and the Python language. We used other libraries as gTTs to convert text to speech and translate-api to handle the translation from language to other. • Model In this part, we have Keras, TensorFlow, Pandas, NumPy, Scikit-Learn and OpenCV used during preprocessing, creating models and training them. To evaluate models with BLEU score we used NLTK library. • Database To store data of users and dataset after cleaning we used the MongoDB database.

174

B. A. Mohamed et al.

5.4 Pre-processing • Image-Response model Images are only entries (X) in our model. As we know, any entry in a model must be given in the form of a vector. Therefore, we need to convert each image into a vector of fixed size, which can then be powered as an input to the neural network. To do this, we opt for transfer learning by using pre-trained models such as VGG16 (Convolutional Neural Network) to extract characteristics for each input image. For feature extraction, we use the pre-trained model up to part 7 × 7 × 512. If we wants to make a classification, we will use the whole model (Fig. 5).

Fig. 5. Architecture of the VGG16

The model accepts as input an image of size 224 × 224 × 3 and returns as output a feature vector with a dimension of 7 × 7 × 512. We should note that the responses are something we want to predict. Thus, during the learning period, the responses will be the target variables (Y) that the model learns to predict. However, the prediction of the whole response is not done at once. We will predict the response word by word. Thus, we must code each word in a fixed size vector. In simple terms, we will represent each unique word in the vocabulary by an integer (index). We have 2124 unique words in the corpus and therefore each word will be represented by an integer index between 1 and 2124. We take the example of the following response: “angular is typescript based open source web application framework”. Let us build the vocabulary of our example by adding the two words “startseq” and “endseq” to determine the beginning and end of the sequence: (Suppose we have already done the basic cleaning steps).   angular, is, endseq, typescript, based, open source, web, vocab = application, framework, startseq

A Deep Learning Model for an Intelligent Chat Bot System

175

Let us give an index to each word in the vocabulary we get: angular−1, is−4, endseq−3, typescript−9, based−7, open source−8, web−10, application−2, framework−6, startseq−5 Let us take an example where the first image vector Image_1 contains the logo of the Angular framework and its corresponding response is “startseq angular is typescript based open source web application framework endseq”. Remember that the image vector is the input and the response is what we have to predict. However, the way we predict the response is the following: For the first time, we provide the image vector and the first word as input and we try to predict the second word, i.e. Input = Image_1 +  startseq ; Output =  angular . We then provide the image vector and the first two words as input and let us try to predict the third word, that is: Input = Image_1 +  startseq angular ; Output =  is . And so on… Thus, we can summarize the data matrix for an image and its response corresponding as follows (Table 2): Table 2. Predicting a response i

Image feature

First part of the response

Target word

1

Image_1

startseq

angular

2

Image_1

startseq, angular

is

3

Image_1

startseq, angular, is

typescript









N−1

Image_1

startseq angular is typescript based opensource web application

framework

N

Image_1

startseq angular is typescript based opensource web application framework

endseq

We will not transmit the English text of the response, but rather the sequence of indexes where each index represents a unique word. First, we need to make sure that each sequence is of equal length. That is why we need to add 0’s (zeros) at the end of each sequence. We calculate the maximum length of a response, which is 114 words for our case. We will therefore add numerous zeros that will make each sequence will have a length of 114 (Table 3).

176

B. A. Mohamed et al. Table 3. Predicting a response using word indexes and sequence padding i

Image feature First part of the response

Target word

1

Image_1

[5, 0, 0,…, 0]

1

2

Image_1

[5, 1, 0, 0,…, 0]

4

3

Image_1

[5, 1, 4, 0, 0,…, 0]

9









N-1 Image_1

[5, 1, 4, 9, 7, 8, 10, 2, 0, 0,…, 0]

6

N

[5, 1, 4, 9, 7, 8, 10, 2, 6, 0, 0,…, 0] 3

Image_1

• Question - Response model Since both models have an encoder-decoder architecture, they have the same principle for predicting a response. The only difference is that they have a very different encoder-decoder architecture and also instead of giving an image feature vector in this case, we have a vector that represents the question asked. 5.5 Evaluation Metrics We used two types of evaluation metrics: accuracy and BLEU score. Accuracy is calculated by the following equation: Accuracy =

Item predicted correctly All items predicted

(7)

BLEU (bilingual evaluation understudy) is an algorithm for evaluating the quality of text that has been machine-translated from one natural language to another. Quality is considered the correspondence between a machine’s output and that of a human. BLEU’s output is always a number between 0 and 1. This value indicates how similar the candidate text is to the reference texts, with values closer to 1 representing more texts that are similar. BLEU score, as defined in [14], is calculated by the following equation:

 N wn log(pn ) (8) BLEU = BP.exp n=1

6 Results and Comparative Study In this section, results of our system are tested by using the test data of datasets we used. It describes the comparative analysis with different pre-trained model and hyperparameters with respect to accuracy and BLEU score. • Image-Response Model

A Deep Learning Model for an Intelligent Chat Bot System

177

For this model, we use the same hyperparameters but we change the pre-trained model used in each training (Table 4). Table 4. Used hyperparameters Hyperparameter Value Dropout

0.5

Optimizer

Adam

Learning rate

0.0001

Split data

0.4

Batch size

128

Epochs

30

Loss function

Categorical_crossentropy

Table 5. Results with different pre-trained model Pre-trained model Accuracy BLEU score VGG16

99.03

86.67

VGG19

99.82

85.24

Xception

99.70

86.99

ResNet50

99.79

91.88

InceptionV3

99.58

85.22

As shown in the Table 5, using pre-trained model ResNet50 we obtained the highest BLUE score and the second highest accuracy. As the BLUE score is more effective as an evaluation metric than accuracy in the case of text generation, we decided to choose the model with ResNet50 for the deployment. • Question-Response Model For this model, we fixe some hyperparameters and change the others (Table 6). From Table 7 and 8, we can see that the model with good results is with the Optimizer RMSprop, 16 heads and 1 layer. Therefore, this is the best one for deployment.

178

B. A. Mohamed et al. Table 6. Fixed hyperparameters Hyperparameter Value Dropout

0.5

Learning rate

0.0001

Split data

0.2

Batch size

64

Epochs

150

Loss function

SparseCategoricalCrossentropy

Table 7. Results with RMSprop optimizer and changing the other model hyperparameters Number of heads Number of layer Accuracy BLEU score 16 8

1

37.89

42.88

2

37.01

27.44

1

36.67

41.63

2

36.36

32.13

Table 8. Results with Adam optimizer and changing the other model hyperparameters Number of heads Number of layer Accuracy BLEU score 16 8

1

35.18

40.55

2

31.47

26.72

1

34.32

35.50

2

33.53

26.53

7 Conclusion In this paper, we proposed a Chatbot system for education application, which can support different levels and multiple language (English, French and Arabic). The concluded results shows the good performance of the ResNet50 model compared to other model for the Image-Response model. In other hand, and according to the Question-Response model, the results conducted to the relevance of the RMSprop Optimizer compared to Adam Optimizer for well deployment interest. The Chatbot uses an API for translation to handle multiple languages. This is not favorable, because sometimes we lose the technical words of a domain, to improve this it is better to have a dataset, for example in Arabic, to train the model with it. There is lot of things that we can do to improve this work. In this context, we plan in the future work to add the functionality to record a question in audio format.

A Deep Learning Model for an Intelligent Chat Bot System

179

References 1. Farhan, M., Munwar, I.M., Aslam, M., Martinez Enriquez, A.M., Farooq, A., Tanveer, S., Mejia, P.A.: Automated reply to students’ queries in e-learning environment using WebBOT. In: Eleventh Mexican International Conference on Artificial Intelligence: Advances in Artifical Intelligence and Applications, Special Session - Revised Paper (2012) 2. Niranjan, M., Saipreethy, M.S., Kumar, G.T.: An intelligent question answering conversational agent using naïve Bayesian classifier. In: International Conference on Technology Enhanced Education (ICTEE) (2012) 3. Satu, S., Parvez, H., AI-Mamun, S.: Review of integrated applications with AIML based chatbot. In: First International Conference on Computer and Information Engineering (ICCIE) (2015) 4. Nordhaug, Ø., Imran, A.S., Alawawdeh, Al., Kowalski, S.J.: The forensic challenger. In: International Conference on Web and Open Access to Learning (ICWOAL) (2015) 5. Nenkov, N., Dimitrov, G., Dyachenko, Y., Koeva, K.: Artificial intelligence technologies for personnel learning management systems. In: Eighth International Conference on Intelligent Systems (2015) 6. Sundermeyer, M., Schlüter, R., Ney, H.: LSTM neural networks for language modeling. In: Thirteenth Annual Conference of the International Speech Communication Association, pp. 147–156 ( (2012)) 7. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017) 8. Khot, T., Sabharwal, A., Clark, P.: SciTaiL: a textual entailment dataset from science question answering. In: AAAI (2018) 9. Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have solved question answering? Try ARC, the AI2 Reasoning Challenge. ArXiv abs/1803.05457 (2018) 10. Welbl, J., Liu, N.F., Gardner, M.: Crowdsourcing multiple choice science questions. ArXiv abs/1707.06209 (2017) 11. Smith, N.A., Heilman, M., Hwa, R.: Question generation as a competitive undergraduate course project. In: Proceedings of the NSF Workshop on the Question Generation Shared Task and Evaluation Challenge, September 2008 12. Bisk, Y., Zellers, R., Le Bras, R., Gao, J., Choi, Y.: PIQA: Reasoning about Physical Commonsense in Natural Language. ArXiv abs/1911.11641 (2020) 13. Clark, P.: Elementary school science and math tests as a driver for AI: take the aristo challenge! In: AAAI (2015) 14. Papineni, K., et al.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, pp. 311–318, July 2002

E-learning at the Service of Professionalization in Higher Education in Morocco: The Case of MOOCs of the Maroc Université Numérique Platform Nadia Elouesdadi(B) and Sara Rochdi(B) Laboratory: CLHEN, Linguistics, Didactics and Communication, FLSH Mohammed First Oujda, Oujda, Morocco [email protected], [email protected]

Abstract. Today e-learning is a part of information and communication technologies in education (ICT) and allows various activities to be carried out online, either on computers or on mobile devices (Smartphones, tablets, etc.). E-Learning has developed since the beginning of this century. It is a distance learning mode made available to learners of educational content through the internet which exposes the learner to research, create, be entertained and train. Recently E-learning has revolutionized the world of higher education with the arrival of MOOCs around the world. Morocco has also embarked on the adventure of Massive Open Online Courses, with the creation of the “Maroc Université Numérique” platform, which offers a set of MOOCs in different subjects. This manuscript presents a general portrait on the issue of MOOCs (massive open online courses) in university education in Morocco. For us, this is about demonstrating how MOOCs could be sources of educational innovation and facilitators of access to knowledge and interactivity In this article, we aim to expose this learning modality in the world of education precisely through the MUN platform, while establishing two main features: the first is a theoretical anchoring on the situation of the Moroccan university. According to the last report of the Board of Governors on the one hand and on the other hand, we will define the key concepts of online learning, namely e-learning with its components, MOOCS, the origins of MOOCS in Morocco etc. And the second feature is a survey carried out within the MUN platform; it includes testimonials from teacher designers of some MOOCs and an analysis of the MOOC Methodological Support to University Work of the University of Mohammed Premier of Oujda. Keywords: E-learning · Moocs · MUN · Pedagogy · Interactivity

1 Introduction In our society where a digital economy is developing in all areas, information and communication technologies (ICT) are constantly evolving and influencing the formal © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 180–194, 2021. https://doi.org/10.1007/978-3-030-66840-2_14

E-learning at the Service of Professionalization in Higher Education in Morocco

181

or informal relationship to knowledge in the Moroccan education system. ICTs present a very important opportunity to modernize the education system and renovate pedagogical practices. Thanks to technology and the explosion of the Internet, a new way of distance learning has emerged, be it school, university, work or otherwise. E-learning opens up a multitude of new possibilities, in particular in fields and fields which may be limited by various constraints. Learners try to improve their performance and knowledge by exchanging the necessary information, which would allow users to have full autonomy. Indeed, online learning or e-learning allows flexibility and speed in obtaining information. This type of learning has developed since the beginning of this century. It is a distance learning mode providing learners with educational content via the internet that would allow the learner to follow the course from no any place. On January 6, 2003 the European Union argued that “e-Learning is the use of new multimedia technologies from the Internet to improve the quality of learning by facilitating on the one hand access to resources and to services, on the other hand remote exchanges and collaboration”. We can therefore say that e-Learning is revolutionizing the field of teaching and learning and the main purpose of this learning is to organize teaching and facilitate the task for teachers and learners as well as facilitate research. Today with E-learning, the world of higher education sees “the booming arrival of MOOCs”. According to some authors, MOOCs are a revolution, a “tsunami” [1], or even “the most important [educational] innovation for 200 years” [2]. This phenomenon is adopted by several countries such as: America, Canada, France, Germany etc. MOOCs raise and provoke questions in the teaching world of higher education (KARSENTI, 2013) [3] and secondary (HORN, 2014) [4] as well as among the general public (BRAFMAN, 2014) [5] or the public more specialized (CHAFKIN, 2014) [6]. Morocco has also embarked on the adventure of Massive Open Online Courses, with the creation of the Maroc Université Numérique platform, which offers a set of MOOCs in different subjects. However, the use of MOOCS prompted our reflection and led us to ask our research question which is as follows: • How could the use of MOOCs from the Maroc Université Numérique platform be a source of educational innovation and facilitator of access to knowledge and interactivity? This manuscript therefore presents a general portrait of the question of MOOCs (massive open online courses) in university education in Morocco. For us, this is about demonstrating how MOOCs could be sources of educational innovation and facilitators of access to knowledge and interactivity. To better understand our research question, we have developed a set of questions that we will answer as we go through our research. Research questions: 1. What are E-learning and MOOCs? 2. How are MOOCS presented? 3. What consequences do MOOCs have on learning?

182

N. Elouesdadi and S. Rochdi

4. What activities and educational resources are offered? 5. How do online exchanges promote or not the didactic co-construction between teachers, tutors and learners? 6. How do you foster online collaboration and turn tasks into action? 7. What is the significance of this mode in the virtual university? Based on a pre-survey conducted in the MUN platform and our experience in the field of online tutoring, we have developed the following hypotheses. • The courses, exercises and tests presented in the MUN platform develop the skills of learners. • Learners can take lessons interactively and collaboratively at their own pace. This article aims to expose this learning modality in the world of education, precisely in the MUN platform. To do this we are going to study two main axes. The first is a theoretical anchoring which sets up the situation of the Moroccan university according to the last report of the higher council as well as the definition of the key concepts of learning in line namely e-learning with its components, MOOCS, the origins of MOOCS in Morocco etc. The second axe will be devoted to the survey that we carried out within the MUN platform, to the testimonies of the design teachers of some MOOCs and to the analysis of MOOC Methodological Support to University Work of the University of Mohammed Premier of Oujda.

2 Theoretical Anchoring 2.1 The Moroccan University and Digital Technology The current situation of higher education in Morocco is marked by what some observers consider to be an “educational crisis” [Roland et al. 2017] [7]. The latest report entitled: “Reform of higher education: strategic perspectives” [8] is in direct line with the Strategic Vision of the reform 2015–2030, drawn up at the end of a long and complex process of studies and consultations, carried out within the Superior Council of Education, Training and Scientific Research (CSEFRS). It is considered as a benchmark for the reform of different components of the national education system, from preschool to higher education and scientific research. This report identifies the elements that hinder the higher education system through two determinants. Endogenous determinants: these are the demographic factors known within the Moroccan university in recent years in terms of the number of students enrolled each year; according to the CSEFRS 2019: “the demographic evolution experienced by public higher education, during recent decades, and which has seen the number of students enrolled in particular in open-access courses (Faculties of Sciences, Letters and Humanities and Law) is multiplied by six or more, less fifteen years old”. However, this development has resulted in negative consequences on the higher education system “the consequences of this demographic pressure in terms of reception capacity, level and quality of supervision, level of training, internal profitability and system, and various

E-learning at the Service of Professionalization in Higher Education in Morocco

183

other aspects of student and university life in general, are considered the main constraint on higher education, to the point of obscuring the many other dysfunctions and deficits from which the system suffers» CSEFRS 2019. And the exogenous determinants: these indicate that “higher education has not integrated, in quantity and quality, the opportunities for employability and social advancement made possible by the new dynamics of national economic activity. “This is reflected negatively on the external productivity of the system” seeing the arrival on the market of tens of thousands of graduates whose profile and training do not converge with the needs and expectations of the job market. “In addition, we are now witnessing an evolution of the world economy based on the digital revolution, yet this point” has not been sufficiently taken into account by the national higher education system” CSEFRS 2019. It should also be noted that the report also cited other prominent factors such as: language deficits, saying that “the studies carried out in 2009 by the public authorities on the occasion of the emergency plan highlighted the linguistic problem. It was strongly observed that new students at the university did not master either the language or the communication and information techniques (the language of instruction is French in most higher education institutions, while the language of instruction in secondary and primary education is Arabic)”. And the absence of a structuring digital plan for higher education: “What characterizes the current digital situation in Moroccan higher education is the disparate nature of the projects and experiments existing at the level of universities which have developed, with their own resources, internal applications and platforms for the digital environment. Some universities have designed and experimented with a few MOOCs and timid distance education practices”. All these dysfunctions have led to the entire team of the Board of Governors to react by revising the curricula and the educational engineering of the training courses on the one hand. Quoted in the report “This educational reengineering involves the use of various acquisition and learning methods and the implementation of innovative and interactive teaching methods, in addition to the predominant lecture, such as hybrid pedagogy, role-playing games, ”study of practical cases, scenarios, simulation, etc. “In addition, the team added that it is essential” to give priority to initiation into the fundamentals of digital culture, the focus on multidisciplinary approaches, so-called behavioral learning (soft skills) and acquisition of skills for adapting and integrating new developments in the demands of the labor market”. And on the other hand, to develop recommendations. Among which we cite: • Recommendation 3: this calls for the implementation of a digital strategy to overcome dysfunctions: “In order to boost digital technology at the level of higher education, it would be necessary, as others have done in the country, put this policy into law. Distance education, MOOC validation procedures, hybrid education, online continuing education, e-governance, etc. require the promulgation of regulations, so that digital technology becomes a means to accelerate the process of transformation of universities”. • And recommendation 5: which aims to Develop digital technology for a better engineering and educational approach: “Digital technology must contribute, on the educational level, to reconcile the democratization of higher education and demographic change with quality, by offering hybrid training (face-to-face and online) and by setting

184

N. Elouesdadi and S. Rochdi

up validation procedures for courses or educational content through an educational service that the establishment offers to the learner”. Everything that we have just mentioned above shows that higher education in Morocco adopts in its reform strategy the use of digital technology through the modality of e-learning, we will discover in the rest of this article the key concepts of this type of learning. 2.2 E-learning E-learning is defined by: “the use of new multimedia and Internet technologies in order to improve the quality of education and training through remote access to resources and services, as well as to collaborations and exchanges”. According to the SET Lab [9], e-learning is: “online learning centered on the development of skills by the learner and structured by interactions with the tutor and peers”. Telematics Education Support Laboratory has structured the definition around simple questions: When we talk about e-learning, we are talking about a set of questions that are as follows: • Who? E-learning is aimed at anyone who wants to train, learn, acquire new skills, new capacities, supplement their knowledge and know-how… E-learning is therefore aimed at to networks of people, to networked humans. • What? E-learning focuses on the transfer of knowledge and content, the acquisition of skills and know-how • Why? The purposes pursued by e-learning aim to: - make learning more accessible and flexible - improve the performance and efficiency of learning • How? ’Or’ What? With whom? With what? The methods, strategies and systems put in place are varied but their common points are that they: – – – –

learner-centered give them access to up-to-date information offer them the opportunity to vary their learning allow him to manage his training by allowing him to choose his support methods (alone, accompanied…) or by allowing him to set up his own network of resources.

Therefore, e-learning aims at interactive, collaborative and customizable learning, built around learners with: – Using networks of people: tutors (experts, trainers, coaches, etc.) and peers (other trainees or learners); – Using a network of material resources: content, multimedia support, etc. integrated into a learning platform. – When? - at the learner’s best convenience - throughout his life – Or? - remotely from the teacher and his peers - on a private or public network (Internet, Intranet…

E-learning at the Service of Professionalization in Higher Education in Morocco

185

Information and communication technology specialists offer the following definitions: Sandra Bellier (2001) [10] defines e-learning “as a training device that places great emphasis on the Internet or intranets. This includes virtual classes, videoconferences, forums, chats….” As for Didier Paquelin (2004) [11], he defines it as “an affective, cognitive, social and existential co-construction which mobilizes resources and plural training situations within a framework of action allowing the empowerment and regulation of learner selfdirection.” In this sense Bernard Blandin (2001) [12] speaks of a set of material and human resources, corresponding to a particular form of socialization intended to facilitate a learning process. E-learning Models We distinguish two models: • Synchronous model: it is a form of communication that corresponds to live contact times through instant messaging, surveys, interactive whiteboards or screen sharing, application sharing tools, audio and video conferences, webcasting all participants are connected at the same time and communicate directly with each other eg Virtual classes. • Asynchronous model, also called “distributed class”, where the trainer is in one place and the learners in another. It is characterized by emails, discussion forums, wikis and other shared tools: editing, blogging, webcasting. Exchanges take place through asynchronous messages and forums.

The Components of an E-learning Program The components found in an e-learning program carried out by B. Ghirardini “Methodologies for the development of e-learning courses” (2012) [13] are: • learning content: learning resources (documents, presentations, video or audio), interactive e-lessons, work tools (memos, glossaries, decision support system); • e-tutoring, e-coaching, e-mentoring (support and personalized comments) • collaborative learning (online discussion, collaboration between learners) • the virtual classroom (live shared whiteboard, etc.). After having defined e-learning with its components, we will move on to the discovery of Moocs and the discovery of the Maroc Université Numérique platform. 2.3 MOOCS The History of MOOCs MOOCs are part of an evolutionary dynamic within the “digital revolution” that took

186

N. Elouesdadi and S. Rochdi

place at the end of the 20th century. The word MOOC, which, let us remember and correspond to the acronym Massive Open Online Course, only really appeared in 2008, the expression having been launched by Dave Cormier, professor at the University of Prince Edward of Iceland, in response to the course entitled Connectivism and Connective Knowledge by George Siemens and Stephen Downes. What is a MOOC? These are free online courses, open to all, having been popularized by major American universities such as Stanford, Harvard or MIT since 2011. Coursera is the largest platform for hosting MOOCs, with more than four million students. Internet users. The definition of the term MOOC, which appeared in 2008: Acronym. • Massive: the course can accommodate an, in principle, unlimited number of participants. • Open: the course is open to all Internet users • Online: the entire course can be taken online: lessons, activities, homework, tests, etc. • Course: this is a course with educational objectives and one or more educational paths. OpenupEd, a project supported by the European Commission, proposes the following definition: “MOOCs are courses designed for large numbers of participants, that can be accessed by anyone anywhere as long as they have an internet connection, are open to everyone without entry qualifications, and offer a full/complete course experience online for free” (OPENUPED, 2015, p.1) [14]. Another definition less sophisticated and more extensive than that presented by the American institution EDUCAUSE (EDUCAUSE LIBRARY, 2013) [15]: “A massive open online course (MOOC) is a model for delivering learning content online to any person who wants to take a course, with no limit on attendance”. MOOCS Morocco Digital University As we have previously reported, Morocco has set up a Mooc platform under the slogan of: “MUN at the Service of Excellence in Training and Student Success” (see Fig. 1).

Fig. 1. Screenshot of the MUN platform

The Origins of MOOCS in Morocco According to the Site: www.MUN.MA “On July 15, 2016, an agreement creating the platform” Maroc Université Numérique “was signed between the Moroccan Ministry of Higher Education and Research, France Université Numérique and the French Embassy. This agreement aims to set up a white

E-learning at the Service of Professionalization in Higher Education in Morocco

187

label Morocco platform operated by France Digital University to allow Moroccan universities to develop MOOCs (massively open online courses), SPOCs (online courses in small private groups) or any Another form of online course; this initiative aims to federate the projects of Moroccan universities and schools to give them international visibility”. The MUN platform contains several MOOCS from several universities (see Fig. 2).

Fig. 2. Screenshot of the MUN platform

3 The Methodological Approach of Our Research In our research, we opted for a qualitative approach that is based on data collection first, namely the testimonials of teacher designers of MOOCS and secondly on the analysis of: • Content of the resources used (videos, lessons, exercises, tests) (observation and analysis grid) • Construction of knowledge and types of learning • Interaction and collaboration in forums and wikis (observation and analysis grid) In order to answer our research question and to affirm or refute our hypotheses.

4 The Results of Our Survey 4.1 The Testimonies of Some Teacher Designers of MOOCs • “For several years now, our university has given digital technology a major importance: online courses, distance training, online diploma, content design. Our experience has revealed to us the enormous difficulties of our students with regard to

188

N. Elouesdadi and S. Rochdi

academic methodological work, which explains the importance and at the same time the objectives which motivated the establishment of this MOOC” [16]. Professor Khalid JAAFAR, Mohammed I University Oujda • “The phonetics of MOOC is designed as a support for face-to-face lessons, in a hybrid training method, to cope with the enormous massification experienced by the Ibn Zohr University (more than 120,000 students at the start of the 2018–2019 school year). By its technical nature, this MOOC will allow learners to better understand the phonetic and acoustic aspects of the French language with animations, simulations and pronunciation correction activities, very relevant for a good use of French as a foreign language. This state of affairs is not feasible in practice in classrooms with more than 200 students». [17] Professor Ahmed ALMAKARI, Ibn Zohr University Agadir • “This MOOC aims to initiate the basic concepts of didactics and the theoretical tools to rigorously construct teaching and learning situations…. Throughout the week, a forum is made available to you to encourage you to interact with teachers, tutors and among peers” [18] Mohammed Droui professor of didactics at the FSE in Rabat • “If you are students of management schools or universities, you are professionals and you are interested in human resources management, or you want to deepen your knowledge in human resources, obtain daily HR tools. You claim to develop your skills and improve the quality of your relationships at work. Then this MOOC is for you. The MOOC HR practice will take place over 6 weeks and will cover the main HR practices on a daily basis. A headhunter will reveal to us the key points of successful recruitment, a psychologist will reveal the secrets of assertiveness in a professional environment, a coach will provide us with the basic principles of team coaching and finally a manager. RH will explain its job to us and talk to us about professional efficiency. Each week you will find video interviews, quizzes to test your knowledge, additional resources and forums to practice and discuss. You can ask questions and share your experience with the other members of the Mooc that interests you passionately so we will meet soon for this MOOC on HR practice» [19] Professor Sana QARROUTE Mohammed I University Oujda

Discussion According to the testimonies collected from the professors who designed MOOCS, we have retained the following elements regarding the objectives of the design of MOOCS in Morocco: • The Moocs help learners with difficulties in their university career • The Moocs keep the Moroccan university open • MOOCs are designed as a support for face-to-face lessons, in a hybrid training method, to cope with the enormous massification experienced by the Moroccan University • MOOCs will allow learners to better understand the phonetic and acoustic aspects of the French language with animations, simulations and pronunciation correction activities, very relevant for a good use of French as a foreign language. This state of affairs is not feasible in practice in classrooms with more than 200 students.

E-learning at the Service of Professionalization in Higher Education in Morocco

189

• The Moocs will make it possible to rigorously construct teaching and learning situations. • And MOOCS forums allow you to interact with teachers, tutors and peers. 4.2 Analysis of the MOOC: “Methodological Support for University Work” (Fig. 3) Designed by Doctor Khalid JAAFAR

Fig. 3. Mooc Methodological support for university work

The Environment of the Mooc AMTU The Mooc Methodological support for university work covers five stages: • • • • •

Sequence 1: Tools Sequence 2: Scholastic writings Phase 3: Academic and professional writings Phase 4: Academic standards Sequence 5: Defense

At the end of each sequence a test is presented to the learners to check their knowledge. In this Mooc, it is about learning in asynchronous mode, i.e. the Design of sequences is done through digital resources (videos, lessons, exercises, tests) and through collaborative tools, namely the forum and wiki which in the form of a glossary. See Fig. 4.

Fig. 4. The components of the Mooc

190

N. Elouesdadi and S. Rochdi

Our research objective in this step is to study the design of teaching and learning courses on the one hand and on the other hand to see the use of collaborative tools used by learners. The Design of Teaching and Learning The teaching and learning course design is made through various resources (video, concept map, online links etc.). Each week consists of a sequence that begins with an introductory and informative video that announces the objectives and tasks to be accomplished by the learners. See Fig. 5.

Fig. 5. Video introduce

After announcing the objectives of the sequence, the learner finds himself in front of a panoply of digital resources designed by the entire teaching team. These resources are in the form of texts, images, PPT instructional videos, concept map etc. Example see Fig. 6.

Fig. 6. Digital resources

E-learning at the Service of Professionalization in Higher Education in Morocco

191

Once the learner discovers the universe of the courses presented, he then moves on to the training activity to mobilize the acquired knowledge. Below Fig. 7 is an example of a training activity in which the learner exercises through a text accompanied by comprehension questions.

Fig. 7. The training activity

When the learner completes the training activity, he can check his answers using the correction option as shown in Fig. 8.

Fig. 8. The answer

This possibility will allow the learner to have immediate feedback which will allow him to move forward in his course, while improving his shortcomings. Evaluation The MOOC AMTU, puts intermediate quizzes at the end of each sequence at the end of each week, will allow you to obtain a mark. If the learner successfully completes all the different stages, a follow-up certificate will be issued at the end of the course. Below are sample quizzes Fig. 9.

192

N. Elouesdadi and S. Rochdi

Fig. 9. Quiz

Collaborative Tools MOOCs are distinguished from other online course formats by the importance given to interactions between participants. Several interaction spaces are used in the Mooc AMTU: • Discussion forum. • Social networks. • The glossary. These interaction spaces are moderated by the teaching team and allow learners to discuss, exchange and question each other (see Fig. 10 below).

(a)

(b) Fig. 10. (a) Forum (b) Wiki

E-learning at the Service of Professionalization in Higher Education in Morocco

193

Discussion Our analysis of the Mooc AMTU has shown us that this Mooc offers learners a course on note-taking, formulation, speaking during the presentation of work, responsiveness to the documents and exercises requested, efficiency and regularity of the personal work; Mastery of university exercises: summary, report, synthesis of documents, internship report, dissertation and oral presentation during the defense; to appropriate the methods of using and exploiting knowledge: documentary research, use of citations and resources, standards for writing a bibliography and presentation of academic work. All of these elements promote the acquisition of autonomy, rigor and motivation in the learner. • For the educational approach: This MOOC is based on expository methods, aiming at support in the acquisition of a certain knowledge, and explanatory methods to exploit the knowledge acquired while relying on action-oriented approach implemented in the form of learning by tasks by putting the learner at the center of his acquisition process.

5 Conclusion Throughout this research we believe that online learning, especially through MOOCs, is an inevitable and promising new method in the student curriculum. Based on the theoretical grounding as well as the testimonies of the designer teachers and the Mooc experiment of the methodological support to the university work of the University Mohammed first, we can only confirm our hypotheses above, because of this research, we demonstrated that Moocs are carriers of knowledge through digital resources: The courses, exercises and tests presented in the MUN platform will allow learners to develop their skills. Without forgetting that the collaborative tools of the MUN platform, give the possibility to the learners to follow the courses in an interactive and collaborative way at their own pace.

References 1. Auletta, K.: Get Rich U. New-Yorker (2012) 2. Regalado, A.: The most important education technology in 200 years. In: The Most Important Education Technology in 200 Years. MIT Technology Review (2012) 3. Karsenti, T.: MOOC: Révolution ou simple effet de mode? In: Conférence au Centre de Recherche Interuniversitaire sur la Formation et la Profession Enseignante, Québec, Canada, (2013). https://www.youtube.com/watch?v=nyzn1W-wRQg 4. Horn, M.B.: MOOCs for High School: unlocking opportunities or substandard learning? Education Next, Summer 2014, 14(3) (2014). https://educationnext.org/moocs-high-school/ 5. Brafman, N.: La «MOOC-mania» gagne la France. Le Monde.fr. (2014). https://www.lem onde.fr/societe/article/2014/03/28/le-cnam-grand-gagnant-des-moocs-francais_4391106_3 224.html?xtmc=mooc&xtcr=34

194

N. Elouesdadi and S. Rochdi

6. Chafkin, M.: Udacity’s Sebastian Thrun, Godfather of free Online Education, Changes Course. Fastcompany Tech Forecast (2014). https://www.fastcompany.com/3021473/udacitysebastian-thrun-uphill-climb 7. Roland, N., Stavroulakis, M., François, N., Emplit, P.: MOOC Afrique: Analyse des besoins, étude de faisabilité et recommandations. Rapport de recherche. ULB, Bruxelles (2017) 8. Réforme de l’enseignement supérieur Perspectives stratégiques Rapport N°5/2019 Dépôt légal: 2019MO3450 ISBN: 978-9920-785-15-0 9. https://www.labset.net/ 10. Bellier, S.: Le e-learning, Ed. Liaisons, p. 13 (2001) 11. Paquelin, D.: «Dispositif et Autoformation: quelles convergences?», Dispositif d’Autoformation Accompagnéessss: formateur et changement, Apprenant et processus d’apprentissage, organisation et jeux d’acteurs, Educagri éditions (2004) 12. Blandin, B.: «Historique de la formation ouverte et à distance». Actualité de la formation permanente, no. 189, pp. 69–71 (2001) 13. Ghirardini, B.: Méthodologies pour le développement de cours e-learning. FAO (2012). https:// www.fao.org/docrep/015/i2516f/i2516f.pdf 14. Openuped: Definition massive open online courses (MOOCs) (2015). https://www.openuped. eu/images/docs/Definition_Massive_Open_Online_Courses.pdf 15. Educause Library: Massive open online course (MOOC). EDUCAUSE (2013). https://www. educause.edu/library/massive-open-online-course-mooc 16. https://www.youtube.com/watch?v=-LsbIrK1oi0&feature=youtu.be translated into English 17. https://youtu.be/9iy0mBpqyOY translated into English 18. https://www.youtube.com/watch?v=J1bX6zj8prg&feature=youtu.be translated into English 19. https://youtu.be/isU2gYY-IE0 translated into English

Methodology to Develop Serious Games for Primary Schools Younes Alaoui(B)

, Lotfi El Achaak, and Mohammed Bouhorma

PLIST Laboratory UAE University, P.O. Box 416, Tangier, Morocco [email protected]

Abstract. Serious Games are video-games designed to support learning and start playing a role in education. Serious games are used as an eLearning tool to complement traditional education or for distance learning. Researchers have tested serious games in preschools and primary schools recently, and results started proving efficiency. During the Covid19 study-from-home period in Morocco, we have developed a serious game for preschool to help practice logical reasoning and remembering. We have used a process to design and develop the serious game by a collaborative team. This process is based on a methodology called GLUPS (Serious Game Development Processes). During the development process, we had to adapt GLUPS to better fit with serious games for school children and with development carried by developers, not always familiar with game vocabulary and game development. GLUPS helped us produce a playable well featured serious game quickly. Keywords: eLearning · Distance learning · Serious game · Development processes · GDSE · Open-source · Unified process · Smart education

1 Introduction Researchers have conducted different experiments to measure the results of game-based learning or “serious games”. Results suggest that these games achieve learning objectives [1–4]. On March 16, 2020, due to Covid19 pandemic, the Ministry of Education in Morocco stated that schoolchildren and students would study from home. Schoolchildren faced a challenge to continue learning in full autonomy. They had to use digital technologies to communicate and learn. They had also to accommodate to lower quantity of interactivity, tutoring, practicing, and learning from the peers. We thought serious games could help schoolchildren in their learning process through providing them with additional opportunities to interact, learn from peers and practice. In our research, we work on methodologies and frameworks to develop open-source serious games [5]. We call our methodology GLUPS (for Gaming and Learning Unified Process to engineer Software). GLUPS is open-source and available under the Eclipse Public License. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 195–205, 2021. https://doi.org/10.1007/978-3-030-66840-2_15

196

Y. Alaoui et al.

Our university is offering multiple curricula related to computer science and software development, but none has a major related to video game development. Our students usually have an introductory course to video game development and techniques, but they does not have any advanced courses on video game design or development. The same apply for the developers available in our region. However, video game development competency is available in our research community and in our region. Some researchers and software developers are self-learners and develop video games. Some are even freelance developers working on cross-border video-game development projects. As we are aiming to foster a community of contributors from our students and from surrounding developers, we are working on methodologies that guide the work of computer science students or regular software developers to produce playable and meaningful serious games. During the Covid19-constrained study from home period, we worked on developing some games for schoolchildren in preschool and primary school. On our first projects, we have always used GLUPS with some team members already familiar with video-game development technologies or methodologies. These members acted as mentors for the other team members during project meetings or when all team members gathered and worked from the same office. Two teams of students worked on the projects developed during the Covid19 study from home period, two teams of students have worked on them. For these projects, we had limited access to developers familiar with video-game developments and we had to leave a lot of autonomy to the students. We used this opportunity to test GLUPS in this new context and to identify what should be adapted to make it better work for regular software developers. Adapted, GLUPS helped the students produce playable well-featured serious games. This paper describes Serious Game Development Processes, GLUPS artifacts, the serious games that we have developed and that underlay this research, and the lessons learned.

2 Serious Game Development Processes Serious games are software applications running on computers or mobile devices. Salen and Zimmerman [6] define software game as a software application in which one or more players make decisions by controlling game objects and resources, in the pursuit of its goal. Development of serious games involve pedagogy, didactic, learning design, game design, sound, art, artificial intelligence and software developments. Even if the process to develop serious games derives from software development processes, the diversity of disciplines involved is making serious game development process different from traditional software development [7, 8]. First, the design of serious games must “incorporate sound cognitive and learning principals” [3]. Said differently, serious game design must include a learning design dimension [9]. Second, serious games are video games and their design involve players, decisions, objects, resources, and goals [6]. As a discipline, game design involves multiple competencies and use different methodologies that usually builds on software development

Methodology to Develop Serious Games for Primary Schools

197

methodologies [7]. However, experiences and practices have shown that game development projects face many challenges and issues if the software development team follows just a traditional software development process [10, 11]. Game developers usually select a software methodology that best fit to the game under consideration such as team size, team skills, and game context. They also adapt the selected methodology while adopting it. Thus, game development processes are usually proprietary processes of game development companies. However, recent studies started proposing a Game Development Software Engineering (GDSE) process that builds on experiences and patterns and provides guidelines for the game development software engineering [7, 12, 13]. Like software development process, GDSE defines a lifecycle with phases. Each GDSE process define a specific number of phases, but we can group these different phases under three main phases A) pre-production, B) production, and C) post-production. Preproduction encompass scenario design, scenario testing and requirement engineering. Production phase involves planning, documentation, and development with sound and graphics. Post-production involves testing and marketing [7].

3 GLUPS GLUPS combines learning design and software engineering. It is currently developed in the LIST laboratory of FST Tangier GLUPS is open source and available on github. The current version of GLUPS is 0.1. GLUPS is based on the Unified Process [14] methodology (OpenUP) [15] and defines 4 phases of the project lifecycle, which are: Inception, Pre-production, Production and Transition Like Unified Process, GLUPS defines the “Building Blocks” that will produce the intended work: the engineering disciplines that produce (the what) and the tasks to be performed (the how). For this reason, engineering disciplines are also called “workflows”. GLUPS represents the phases and engineering disciplines in a matrix: the project phases are the columns; the engineering disciplines are the rows (see Table 1). 3.1 GLUPS Engineering Disciplines GLUPS engineering disciplines cover the tasks to be performed during the project life cycle. They can be considered as the ingredients to be mixed to obtain a result. GLUPS phases cover the “quantity” or “doses” of each activity or ingredient to be involved in that phase. The first phases (launch or pre-production) mainly involve modeling, requirement analysis, and design. These first phases may also include some implementation and testing activities. The production phase mainly involves implementation activities. However, it may also involve additional modeling and design tasks. This structure allows for iterative and incremental development of deliverables Fig. 1. 3.2 GLUPS Artifacts We use GLUPS artifacts to capture the requirements and the design of a serious game, such as the content of the learning task that the learner must follow in order to develop a specific skill.

198

Y. Alaoui et al. Table 1. GLUPS phases and engineering disciplines Phases Inception

Engineering disciplines

Pre-production

Production

Transition

Learning objectives & didactics Learning design Game design Software design Implementation Test Learning evaluation Project management

Fig. 1. Iterative and progressive development in GLUPS

GLUPS refers to requirements and designs as descriptions. To capture these descriptions, GLUPS defines a set of content templates. These content templates help a learning designer and a game designer capture important information about the game specification. GLUPS content templates are similar to XML schemas (XSD), JSON schemas, or UML class diagrams. We will also use the name “content classes” to refer to “templates”. A designer will use a GLUPS content template to create a description. A description describes an aspect of a game. A description is similar to an instance created from a template. Descriptions are similar to UML object diagrams, XML documents, or JSON documents. Because of this similarity, we will also use the term “content objects” to refer to “descriptions”.

Methodology to Develop Serious Games for Primary Schools

199

3.3 GLUPS Templates GLUPS templates are grouped into two main packages: – Learning Description: this package groups the templates that help describe the learning process – Game Description: this package groups the templates that help describe the game structure and logic. The Learning Description package enable the designer to describe the learning objects and process. Table 2 lists the main template of this package. Table 2. Learning description templates Template name

Description

Learning description

Free text to describe the learner background and the learning objectives

Learning competencies & assessment

Uses a structure called ABCD-LD. Describes a competency with the attributes: (A) Audience, (B) Behavior, (C) Condition, and (D) Degree/Assessment; plus (L) Level of learning and (D) DiDactics

Learning tasks

This is setting in a situation (context for learning) The learning tasks to be used to develop the competence. A learning task is linked to a skill, has a background (context), an objective communicated to the learner (Goal), an expected product (Expected Action), a result that the learner will obtain as a result of his action (Expected Result)

The Game Description package is composed of 4 sub-packages: – Game Concept: this package provides the templates that help describe the general concept of the game, and the high level requirements – Game Structure: this package provides the templates to describe the objects and the characters of the game – Gamification: this package provides the templates to capture the gamification rules of the game – Storyline: this package provides the templates to describe the storyline, the levels and the scenes of the game.

4 Serious Games Built We have applied GLUPS on two projects: the first project to develop a calculus application running on mobiles, the second project to build a desktop application that help preschool students remember and perform a sequence of actions.

200

Y. Alaoui et al.

We have used GLUPS artifacts to carry need analysis and design. We have also used GLUPS to manage the development process (called production process). Each team has successfully managed to deliver a working serious game. The source code of the developed serious games are available on github at https://github.com/FSTT-LIST; projects GLUPS-123 and GLUPS-mem3. 4.1 Examples with GLUPS Templates In this section we will illustrate how we have used GLUPS for the Mem3 game. The main objective of Mem3 is to train schoolchildren to memorize a sequence of 3 actions necessary to obtain a result; and to apply this sequence of actions in different contexts (related, generalize). The description illustrated in Table 3 shows we have used a GLUPS template to describe the learning objectives of the Mem3 game. Table 3. The learning objectives of the Mem3 serious game :Learning description summary Learner

schoolchildren from 4 to 7 years old

Learning Objectives+=

Memorize a sequence of 3 actions necessary to obtain a result; apply this sequence of actions and obtain the result

Learning Objectives+=

Analyze the course of action, identify missing actions, or relate these actions to other knowledge

Learning Objectives+=

Implement this suite of actions in a slightly different context

Learning Competencies+=

The following Table 4, shows a description of a competency built in the Mem3 serious game. This template captures how the game develops the competency, how to asses this competency and the assessment context. 4.2 Game Screenshots This section presents some screenshots from the Mem3 game. The Mem3 game has multiple levels. The screenshot of Fig. 2 shows the screen that enable to navigate in the level map of the game. The screenshot of Fig. 3 shows one of the playing scenes of the game. The screenshot Fig. 4 shows the score display screen.

5 Results and Lessons Learned GLUPS proofed very helpful and powerful for the development of the two serious games. In general, it enabled a thoughtful specification yet easy to use by developers with little experience in serious game development. It enabled to specify important character behavior and gamification rules to make the game challenging and appealing

Methodology to Develop Serious Games for Primary Schools

201

Table 4. Description of a Mem3 competency :Competence Name=

C2

Description= “C2: finds the following actions to do, and grows the flower, after seeing the current action” Behavior=… …Action=

Finds the following actions to do

…Result=

Makes the flower grow

Condition=

See the 1st actions that have been done (example, we dug the hole). Can replay actions already done. A tutor can remind him what he is doing to grow a flower

Degree=

Reproduces the actions to perform in the correct order. In case of difficulty, the tutor switches to mode and show again the sequence of actions to perform

Level=

Didactics=

. In case of difficulty, we switch to < Reproduce>

Fig. 2. The level map of Mem3

to young players. For example, the player selects the playing character “boy” or “girl” as well as a buddy. The game enable players to win points and accumulate stars across playing sessions. All these features made the game playable and appealing since the beginning. We have also identified two areas difficult for developers to grasp or to use from GLUPS: – Learning tasks developing multiple competencies – Storyline and levels.

202

Y. Alaoui et al.

Fig. 3. Scene of the Mem3 game

Fig. 4. Score screen of Mem3

5.1 Competencies and Learning Tasks The templates attached to learning description enable learner designers or teachers to describe the competencies to develop and the learning tasks that should achieve such objective. In GLUPS, we describe Learning Competencies using User Stories (like UML), but adopting a specific template. Learning tasks are the pedagogical tasks that use didactics and situations to develop a competency with a learner: – Competencies are developed within learning tasks. – A learning task can develop one or multiple competencies. – A complex competency can require multiple tasks. The design of GLUPS is generic, and enables a 1 to N relationship between Competencies and Learning Tasks. However, we found it difficult for learning designers or software developers to specify how to develop multiple competencies within the same

Methodology to Develop Serious Games for Primary Schools

203

task. In GLUPS, any task that develops a competency should assess the mastery degree of the competency. Assessing multiple competencies within the same task proofed difficult to design or to implement. We have defined a new Learning Task class that inherit from both task and competency, and that describe tasks that implement and assess just one competency. This new class proofed easier to use (specification were shorter) and more efficient (development was more conform to specifications). 5.2 Storyline and Levels A storyline and levels [16] are concepts and terms used by video-game designers and developers. A game level is a section or part of a game. To complete a game level, a gamer usually needs to meet specific goals or perform a specific task to advance to the next level. In puzzle games [17], levels may be similar but more difficult as you progress through the game. GLUPS provides templates to capture Storylines and specify Game Levels. These templates are defined in the package Storyline. The Fig. 5 represents the class diagram of the templates associated with the Storyline package.

Fig. 5. The class diagram of the Storyline sub package

In practice, we have discovered that specifying game levels and level map (level sequence) alone is not enough. Many software developers were not able to translate Levels into screens or screen flows easily. Our serious games are puzzle games mainly. As many puzzle games, multiple levels are happening in the same world (or scene) but may be involving different objects. We have had to specify screen contents and dynamics using wireframes and screen screen flows, to help developers better understand how to implement levels correctly. Once we have defined screen layout, screen content and screen flows, we have attached

204

Y. Alaoui et al.

levels to screens. This approach made it easier for developers to understand requirements and produce the foreseen results. Thus, we have modified GLUPS and organized our templates, classes and hierarchy differently. The new model used in described in the Table 5. Table 5. Scene and level modeling hierarchy

--

A game has a screen flow A screen flow has screens A screen have controls (bu ons) A screen has also a scene The scene contains objects and enable the player to act on these objects The scene logic implements one level at a given me. A screen can be reused in the screenflow to implement a different level.

6 Conclusion and Future Work To develop serious game for education we need a process. Such process is even more important when a community will conduct this development and in a collaborative opensource mode. In this paper, we have described how we have developed a serious game during the confinement period using a Serious Game Development and Software Engineering methodology called GLUPS. We have adapted GLUPS to better fit with the profile of software developers not necessary yet savvy in video-game development. GLUPS is open-source. We have licensed the game GLUPS-mem3 presented in this article under a GPL open-source license. The code source is available on GitHub. Other game designers or developers can contribute to enhance this serious game in the future by specifying additional features using GLUPS or developing such specified features. Acknowledgments. We gratefully acknowledge the support of the students and all other participants, namely Amal Elhadyne, Lamyae Khairoun, Fatima Zohra Jaanin, Widad Bouhasni, and Amine Belahbib.

References 1. Young, M.F., et al.: Our princess is in another castle: a review of trends in serious gaming for education. Rev. Educ. Res. 82(1), 61–89 (2012). https://doi.org/10.3102/0034654312436980

Methodology to Develop Serious Games for Primary Schools

205

2. Papanastasiou, G., Drigas, A., Skianis, C.: Serious games in preschool and primary education: benefits and impacts on curriculum course syllabus. Int. J. Emerg. Technol. Learn. 12(01), 44 (2017). https://doi.org/10.3991/ijet.v12i01.6065 3. Liarokapis, F., de Freitas, S. (eds.): A case study of augmented reality serious games. IGI Global (2010) 4. Lotfi, E., Yedri, O.B., Bouhorma, M.: Towards a mobile serious game for learning object oriented programming paradigms. In: Special Issue on Data and Security Engineering, pp. 450–462 (2019) 5. Alaoui, Y., El Achaak, L., Belahbib, A., Bouhorma, M.: Serious games for sustainable education in emerging countries: an open-source pipeline and methodology. In: Emerging Trends in ICT for Sustainable Development: The Proceedings of NICE2020 International Conference. Springer (2021) 6. Salen, K., Tekinba¸s, K.S., Zimmerman, E.: Rules of Play: Game Design Fundamentals. MIT Press, Cambridge (2004) 7. Aleem, S., Capretz, L.F., Ahmed, F.: Game development software engineering process life cycle: a systematic review. J. Softw. Eng. Res. Dev. 4(1), 1–30 (2016). https://doi.org/10. 1186/s40411-016-0032-7 8. Barbosa, A.F.S., Pereira, P.N.M., Dias, J.A.F.F., Silva, F.G.M.: A new methodology of design and development of serious games. Int. J. Comput. Games Technol. 2014, 1–8 (2014). https:// doi.org/10.1155/2014/817167 9. Greitzer, F.L., Kuchar, O.A., Huston, K.: Cognitive science implications for enhancing training effectiveness in a serious gaming context. J. Educ. Resour. Comput. 7(3), 2-es (2007). https://doi.org/10.1145/1281320.1281322 10. Kanode, C.M., Haddad, H.M.: Software engineering challenges in game development. In: 2009 Sixth International Conference on Information Technology: New Generations, Las Vegas, NV, USA, pp. 260–265 (2009). https://doi.org/10.1109/itng.2009.74 11. Murphy-Hill, E., Zimmermann, T., Nagappan, N.: Cowboys, ankle sprains, and keepers of quality: how is video game development different from software development? In: Proceedings of the 36th International Conference on Software Engineering - ICSE 2014, Hyderabad, India, pp. 1–11 (2014). https://doi.org/10.1145/2568225.2568226 12. Ramadan, R., Widyani, Y.: Game development life cycle guidelines. In: 2013 International Conference on Advanced Computer Science and Information Systems (ICACSIS), Sanur Bali, Indonesia, pp. 95–100, September 2013. https://doi.org/10.1109/icacsis.2013.6761558 13. Blitz Games Studios :: Blitz Academy :: Game Development. http://www.blitzgamesstudios. com/blitz_academy/game_dev. Accessed 04 October 2020 14. Kruchten, P.: Le Rational Unified Process®, p. 22 15. Balduino, R.: Introduction to OpenUP (Open Unified Process). https://www.eclipse.org/epf/ general/OpenUP.pdf 16. Lotfi, E.L., Belahbib, A., Bouhorma, M.: Adaptation of rapid prototyping model for serious games development. JCSIT 2, 173–183 (2014) 17. Lotfi, E., Amine, B., Mohammed, B.: Application of analytic hierarchical process method for video game genre selection. IJCA 96(16), 30–37 (2014). https://doi.org/10.5120/168816888. ISSN: 0975 – 8887

Methods and Software Tools for Automated Synthesis of Adaptive Learning Trajectory in Intelligent Online Learning Management Systems Mariia Dutchak1 , Mykola Kozlenko1(B) , Ihor Lazarovych1 Nadiia Lazarovych1 , Mykola Pikuliak1 , and Ivan Savka2

,

1 Vasyl Stefanyk Precarpathian National University, 57 Shevchenko Street,

Ivano-Frankivsk 76018, Ukraine {mariia.dutchak,mykola.kozlenko,ihor.lazarovych, nadiia.lazarovych,mykola.pikuliak}@pnu.edu.ua 2 Institute for Applied Problems in Mechanics and Mathematics (IAPMM), 3b Naukova, Lviv 79060, Ukraine [email protected]

Abstract. This paper presents a new methodology for synthesis of individualized educational trajectory for intelligent online learning management systems in engineering education. It is based on case-based reasoning and self-study methods and production-based model of knowledge representation. Such quality indicators as relevance of synthesized adaptive learning trajectory, integrity of adapted learning material, quality of knowledge assessment, quality of new material absorption, student achievement goal setting, adequacy of the model of forecasting academic achievements, usability of user interface, automation of structuring and importing of educational material, teachers workload, use of equipment, response time, data recoverability, knowledge base integrity, and compatibility were studied. In the paper we present the results of the use of the synthesized individual educational trajectory in the form of quality indicators of students’ educational achievements. In this paper we report the methods for evaluating of the probability of achieving the educational goals based on parameters of the student model and the lesson model. Keywords: Learning system · Learning trajectory · Adaptive trajectory · Fitness function · Software tools · Software quality

1 Introduction Nowadays, with the rapid development of science and technology, there is a need for lifelong learning. The development of Intelligent Online e-Learning Systems (IOELS) is essential. Such systems should maximize the efficiency of the learning process, adapting it to the educational needs and capabilities of students. At the same time, Ukrainian higher education system is being reformed. Higher Education Institutions (HEI) are given © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 206–217, 2021. https://doi.org/10.1007/978-3-030-66840-2_16

Methods and Software Tools for Automated Synthesis

207

autonomy in context of competences and learning outcomes. In these circumstances, the ability of IOELS to synthesize and implement an adaptive learning trajectory (ALT) for each student can be used as an aid in traditional learning. An ALT depends on the educational goal, the level of knowledge and abilities of the student. The development and improvement of methods and software tools for synthesizing an ALT improves the quality and efficiency of the educational process and optimizes the use of human and technical resources in the educational process. This research aimed at developing and improving the synthesis automation of an ALT is relevant.

2 Background Analysis Nowadays, computer technologies are widely and actively used in the educational process, but mostly as an aid for delivering a static educational content to the students and for the learning outcomes assessment. This does not provide any automation for the synthesis of an ALT, but only a change in the material presentation. There are a large number of proprietary and open-source free e-learning platforms. Most of existing educational environments usually are not adaptive [1–3]. The background analysis of the existing learning management systems is presented in [3, 4]. Most of the IOELSs are based on the learning content model, student model, and the adaptive learning model. There are the following requirements for these models: adequacy to reality, accuracy, and performance. Learning trajectory is the sequence of learning the modules of the course. It is influenced by the parameters of the student model. There are reactive models, agenda models, and models based on fixed and nonfixed plans [4]. From technological point of view there are the following classes of models: based on Petri nets, probabilistic models, and models based on state machines. Most popular adaptive learning models are the following: Knowledge Flow Structure (KFS), Dynamic Content Model (DCM), and Competence-Driven Content Generation Model (CDCGM) [5]. The main weakness of existing models, methods, and technologies is high maintenance and support complexity. This analysis has shown that the studied systems are convenient tools for the delivery of static training materials, knowledge assessment, and reports generating. But they do not implement intelligent automated adaptation to the parameters of the student model. At the moment, there is no proposed system that sufficiently constructs a relevant ALT appropriate to the student’s learning abilities and goals, taking into account academic requirements. The main advantage of the proposed solution is the reduced complexity. The main difference between proposed solution and existing ones is extended number of parameters used in the adaptive algorithm. The authors use state-of-the-art higher-order quantum genetic algorithm [6] as a solver of optimization tasks. Basically, this article has three main parts. In the Methodology section, we present the new original ALT synthesis method. The Result section contains evidence of the efficiency of the proposed method. The Discussion/Conclusion section contains conclusions and interpretation of the results.

208

M. Dutchak et al.

3 Method of ALT Synthesis The ALT synthesis is an adaptive change in a sequence and the content of the lessons. The IOELS lesson is a complex system that includes a finite number of subsystems, which are the learning units (LU). Each LU has its own set of parameters. Each lesson is combined from the LUs of the set. The main parameter of the lesson is complexity. The complexity of the lesson for a particular student depends not only on their level of knowledge, but on the knowledge absorption degree and perception speed of new material as well. The complexity can be determined as follows: ci =

1 ((1 − Li ) + 2vi + wi cd ), 3

(1)

where L i is knowledge absorption degree for i-th lesson, vi is coefficient of variation of absorption degree for i-th lesson, wi is mean of the differences between the average level of absorption degree of the students who passed the given course and the level of absorption degree of the given course, cd is the complexity of the course to which belongs i-th lesson, and obtained as follows: cd =

1 ((1 − Ld ) + 2vd + wd ), 3

(2)

where L d is knowledge absorption degree for course d, i-th lesson belongs to, vd is coefficient of variation of absorption degree for the course, wd is mean of the differences between the average level of absorption degree of the students who passed the given course and the level of absorption degree of the given course d. Thus, the difficulty level is calculated for all disciplines: the closer it is to 1, the more difficult the courses. The level of perception can be estimated as follows: n_vuk 

r=

n=1

f −n+1 1 + 1−d f

∗(Ln −d )

2

n_att_used

,

(3)

where n is the ordinal number of the attempt, Ln is knowledge absorption degree for n-nth attempt, f is total number of allowed attempts, d is lesson pass threshold, n_att_used is number of used attempts. Value of r is normalized: 0 ≤ r ≤ 1. As adaptive learning systems are characterized by rather complex and timeconsuming computing processes, the most promising direction for their development is the use of synthesis of several methods of mathematical modeling of the learning systems and the use of powerful computing systems to obtain a high quality adaptive learning process [7]. The basis for constructing the ALT model are the case-based reasoning method, a self-study method based on the analysis of statistical data of learning outcomes and knowledge representation models which presents knowledge by the rules in form: “If than ” [8]. In the condition a comparison and analysis of the particular student model parameters and the parameters of the LU model is carried out, and the conclusion is that there are several possible learning scenarios. In the process

Methods and Software Tools for Automated Synthesis

209

of passing the courses by students, the ALT model becomes much more perfect and consistent with the capabilities and needs of the student. An estimate of the goal achievement probability is used under various training scenarios in order to select the optimal parameters of the ALT. The purpose of training is the basis for the generating of control actions. It is one of the student’s model parameter. The changes of a sequence and content of the LU are made at the superstructure level over the IOELS knowledge base. It is performed by filling in the corresponding slots of the knowledge base elements and launching the software module for the ALT generating [9]. According to the proposed methodology, the learning stage is determined by the student’s knowledge level. The difficulty of the lesson should correspond to the degree of perception of new knowledge. The educational content and the sequence of its presentation depend on the purpose of the training, the level of knowledge of the key concepts of the topic or discipline as a whole, taking into account the established substantial and qualitative links between the LU and the importance levels of these LUs. The combination of these parameters affects the duration of the educational process. The essence of the ALT synthesis is as follows: if the student is faced with the task of taking a certain course, on the basis of the student’s model parameters, IOELS should generate the ALT that is most suitable for the student’s capabilities and needs, which is most conducive to effective absorption of student learning materials. This approach will improve the quality of learning and the efficiency of using the technical resources in automated learning systems [10]. The functional model of IOELS operation is as follows: – generation of the program and educational support of the discipline (Teacher module, based on academic requirements); – automated structuring of educational material and parameters setting (knowledge base generating module); – goal setting (student module, academic requirements module); – selection of the course key concepts (CKC) and/or lessons key concepts (LKC) (student module, academic requirements module); – choosing of learning conditions that ensure the goal achievement of completing the course: the educational material volume, the knowledge level (satisfactory Q1 , good Q2 , excellent Q3 ), the training duration (you can specify for the course as a whole or separately for each CKC or LKC) (student module, academic requirements module); – initial knowledge assessment (testing, practical tasks, etc.) to determine the mastering level of each selected CKC, LKC and the corresponding LU (knowledge assessment module); – student model synthesis (knowledge base module); – generating a list of under-studied CKC (LKC), including those LU the selected CKC are based on (ALT generating module); – synthesis of the set of ALTs (content module); – construction of an optimal ALT, based on estimation of the probability of goal achievement and adaptation function optimization using higher-order quantum genetic algorithm (ALT generating module); – passing the first training session (student module);

210

M. Dutchak et al.

– reconstruction and correction of the ALT for the next lesson based on the analysis of the previous training results (ALT generating module); – completing the course learning (student module); – analysis of the achievement of the goal by means of the generated ALT, saving ALT into the IOELS base (knowledge base module). The structural model of IOELS is shown in Fig. 1.

Fig. 1. The structural model of IOELS

Let us consider the building and selecting of the optimal ALT in more detail. To construct the ALT, the probabilistic estimates of the goal achievement were used based on the model of study parameters (study stage (E z ), difficulty level (I z ), importance degree (C z ), learning degree (L z ), speed of passage (τ z )) and parameters of the student model (knowledge level (Q), assimilation degree (L), the passage time (t) and the perception degree (r)). The purpose of the training may include the following components: increasing the depth of knowledge in particular courses or topics; increasing the volumes of the learned educational material; reducing the time for learning the material; development of practical skills, etc. To study the course with given difficulty level C i a student is given a certain number of hours t. Therefore, depending on the study period of single course lessons, the initial level of student knowledge and his perception degree, the course can be passed to different degrees of mastering. In general, statistics of absorption of a separate portion of knowledge at different students are different. Using the indicator of student’s knowledge initial level, his degree of perception and conditional probabilities of transitions from one state to another for the tasks of three complexity levels, the possibility of each hypotheses to come true is evaluated: Pj (t), j = 1, 2, 3 - the probability of reaching the j-th degree of mastering in the time t. Learning of each lesson requires t z hours: t z = 2 + n, where n is the time of independent work in hours, which varies for a particular lesson and from student to another. If the initial

Methods and Software Tools for Automated Synthesis

211

probabilities pi (0) for the states L i , i = 1, 2, 3 are known, then the probability that at time t the system is in L j state can be obtained as follows:  pj (t) = pi (0)pijt (4) i

pijt

characterizes the probability of system transition from state L i to state L j at where time t (Fig. 2 and Fig. 3). C 01 in the figures means the completion of the first level of difficulty by the student with the level of initial knowledge Q0 , etc.

Fig. 2. The dependence of the learning level L on time t and initial knowledge level Q0 = 0,1 and Q1 = 0,5 for lessons of three difficulty levels C i for students with low perception level.

Fig. 3. The dependence of the learning level L on time t and initial knowledge level Q0 = 0,1 and Q1 = 0,5 for lessons of three difficulty levels C i for students with high perception level.

Let the goal of the course is the maximum amount of learned material and attaining the maximum knowledge level Q with a defined training duration T and a list of key competences. Fitness function can be determined as follows: n n Ci Ltzik → max, at ti ≤ T , Ltzik > di (5) Q= i=1

i=1

212

M. Dutchak et al.

where is L tzik probable absorption degree of the i-th lesson of complexity k in time t, n is number of lessons, S i is degree of importance, d i . is minimum value of mastering level for the i-th lesson at which it can be passed through (Table 1). Table 1. Probabilistic parameters of the lesson for the student Lesson id

Z1

Z2



Zn

Initial level of knowlege

Q1

Q2



Qn

Complexity level

C1

C2



Cn

⎡ Absorbtion probability

L0 L0 ⎢ z11 z12 ⎢ L1 L1 ⎢ z11 z12 ⎢ ⎢ L2 L2 ⎣ z11 z12 L3z1 L3z1 1

2

L0z1



⎥ L1z1 ⎥ 3 ⎥ ⎥ L2z1 ⎥ 3 ⎦ L3z1 3

3



L0 L0 ⎢ z21 z22 ⎢ L1 L1 ⎢ z21 z22 ⎢ ⎢ L2 L2 ⎣ z21 z22 L3z2 L3z2 1

2

L0z2



⎥ L1z2 ⎥ 3 ⎥ ⎥ L2z2 ⎥ 3 ⎦ L3z2



3

3



L0zn L0zn ⎢ 11 12 ⎢ Lzn Lzn 2 ⎢ 1 ⎢ 2 ⎣ Lzn1 L2zn2

L0zn3



⎥ L1zn3 ⎥ ⎥ ⎥ L2zn3 ⎦

L3zn1 L3zn2 L3zn3

To optimize the fitness function, higher-order quantum genetic algorithm (QGA) was used. Quantum algorithms are often used for the resource-consuming computational tasks such as cryptography [11]. It allows global solution search with fast convergence and small population size [12]. To find a solution in QGA, the original state superposition is changed by sequential action of quantum operators over the evolution steps. All the information about the problem and the algorithm for its solution is embedded in the quantum gate, so its algorithm is crucial in the construction of QGA. In general, the operation algorithm of the quantum gate operator to the quantum chromosome consists of n quantum registers of size r, can be implemented as shown in Fig. 4. The μ is the algorithm parameter and its value within the range from 0 to 1 is determined by previous research [13]. Thus, each new generation provides an increase in the probability of event that the formed classical individuals will be more similar to the best ones as a result of the observations.

4 Results We present the results of the research in form of quality assessment of IOELS methods and software. Automated ALT constructing and support are the results of the interaction of all IOELS components. This process is the primary task of these systems, so the quality assessment of the ALT constructing module follows from the overall quality assessment of the system. The quality of IOELS can be assessed both by the numerical value of the individual output indicators (criteria, characteristics, metrics), as well as by a summary evaluation of the whole set of indicators, taking into account the rating of their importance [14].

Methods and Software Tools for Automated Synthesis

213

Fig. 4. The algorihtm of quantum gate operator.

The evaluation is carried out directly by the experts on partial criteria. The criteria is further combined with other partial estimates of this hierarchy level and thus generalized (intermediated) estimates of the higher level of hierarchy are obtained. Then, the merging on higher-level is carried out, and so on. This process is continued until the required final quality metric is obtained. The main task of IOELS is the automation and adaptation of the process of knowledge acquisition. Thus, the following quality indicators of IOELS (Table 2) were identified and evaluated in order to investigate the effectiveness of the proposed methods and the developed software. The H (high), M (medium) and L (low) ranks of software quality indicators are given in the Table 2. Quality Score values are unified and normalized by converting them into the range from 0 to 1. The developed IOELS was tested in real experiment within the scope of educational process at the Vasyl Stefanyk Precarpathian National University (city of Ivano-Frankivsk, Ukraine). The Software Engineering and Applied Mathematics students used the IOELS within the “Web Design” and “Web programming” courses. The research of such quality indicators as the relevance of synthesized ALT, the integrity of adapted learning material, the assessment of knowledge quality, and the friendliness and usability of the user interface (UI) was conducted by interviewing the students using a 100-point scale for each parameter. The quality of new knowledge absorption was evaluated by the split testing method. Statistical estimates were obtained based on the sample of 160 students’ educational results in 2018/19 academic year and the fall semester of 2019/20 academic year. The comparison was performed using Student’s t-test and Pearson’s chi-squared test [15]. Firstly, the assessment of the initial knowledge level and the degree of new material perception were performed. Secondly, the ultimate goal of course studying was set. After

214

M. Dutchak et al. Table 2. IOELS quality scores evaluation.

Characteristics

Subcharacteristics

Rating

Title

Code

Title

Value

Functionality

F

Relevance of the constructed ALT

0.96

H

Integrity of adapted learning material

0.9

H

Quality of knowledge assessment

0.94

H

Impact on absorption level

0.9

H

Ease of use

Efficiency

Reliability

Portability

U

E

R

P

Goal achievement

0.95

H

Forecast model adequacy

0.93

H

Usability and friendliness of user interface

0.9

M

Automation of the structuring and importing learning material

0.95

M

Applicability of teachers and experts

0.85

H

Applicability of technical means

0.84

M

System response time

0.83

M

Recoverability of data

0.99

H

Integrity of the knowledge base

0.99

H

Compatibility

0.99

L

that the three groups: A, B, and C were formed. The group A used distance learning system in a classical way. The members of group Group B used adaptive knowledge assessments only. The group C used IOELS with automated generating of the ALT taking into account an individual features and the pace of learning operations. We designed two two-group experiments. The initial average perception level, knowledge of the material, and the ultimate goal of course studying were approximately the same for members within the groups (p-value 0.79 for A and C groups and 0.97 for B and C groups with significance level of 0.05. So, we assumed the same distribution over all groups. Students of all groups were allowed to go through the practical works at their own pace. But for the course a date of final exam and completion of an individual final practical work have been set. The outcomes of the developed methods and software were analyzed using the results of these experiments. This analysis showed the stable and effective operating of the ALT module. The faster rates of practical works defensing, fewer requests for teacher assistance, and a higher level of grades were achieved by the

Methods and Software Tools for Automated Synthesis

215

students who have studied according to the synthesized ALT compared to students who have studied under traditional learning technology. The analysis of the obtained metrics showed that the quality of absorption (percentage of students who have grades from A to C (from 70 to 100 points in 100-point scale)) increased for 20% with the proposed method. The average value of absorption level increased for 6.7% for A and C groups (p-value = 0.047 at significance level of 0.05) and 5.2% for B and C groups (p-value = 0.029). Evaluation of the academic goal achievement P(Ga ) was performed by averaging Ga_i values, those can be expressed as follows:

sa_i , sa_i < sp_i Ga_i = sp_i , (6) 1, sa_i ≥ sp_i where sa_i – parameters vector for i-th student at end of the training, sp_i , – vector of personal goal parameters for i-th student. Parameter values are unified normalized. The best value is 1, the worst is 0. The adequacy of the academic achievements forecasting model was evaluated as mean absolute deviation of prior probabilities P sp_i and actual estimates of the goal achievement Ga_i :

n Ga_i − P sp_i (7) Ea = i=1 n where n is the number of students who completed the training with the developed system. The structuring and importing training material into the IOELS knowledge base is fully automated and requires no additional manipulation by the teachers or experts compared with traditional approach. The developed IOELS is aimed at minimizing the efforts of teachers to maintain the educational process and reduce the use of teacher work. The work of teachers involves creating educational content, quizzes, consulting, as well as evaluating completed practical problems. The maintenance of the educational process is automated. The intermediate and final grades are available for the teacher. The IOELS involves the complicated and time-consuming computations, which are associated with the uncertainty and high dimension and a large number of input and output parameters, and a large amount of the data processed, high requirements for the process quality and learning outcome. Thus, requirements for the technical computing capabilities are high enough. The response time of the system depends upon the hardware computing capabilities and the information channels capacity, as well as upon the complexity of the computational tasks and the amount of data processed. The integrity of the knowledge base and the data recovery are provided with the backup and electronic archiving of the data. The software modules are implemented with the PHP programming language using the PostgreSQL database management system. This allows deployment on any platform that supports PHP scripts execution and provides access to the PostgreSQL.

5 Discussion One of the main research methods was experimental evaluation of the level of goal achievement and technical parameters of the software. An experiment is considered to be

216

M. Dutchak et al.

the automated synthesis and selection of the optimal student-specific ALT, the passing the studies on this trajectory, the including of this ALT in the knowledge base, with indicating the level of goal achievement. Thus, on each passing of the course students refine the procedure of ALT synthesis taking into account accumulated knowledge. The developed IOELS is being improving and is “self-learned” from the student’s learning process.

6 Future Research A promising direction for further research is the improvement of the modules of ALT and IOELS in order to increase the quality of the educational process, obtaining the better usage resources. Also, the authors are planning to study the efficiency of the proposed methods on multidisciplinary international projects [16].

7 Conclusion A new method of automated ALT synthesis in IOELS is developed and tested. It is based on the higher-order quantum genetic algorithm, the case-based reasoning method, the self-study method, various empirical methods, and the production knowledge representation model. It provides the improvement of grades for 15% on the results of the individual final practical task and for 20% on the results of the final exam. Acknowledgment. This work has been supported by the MINDCRAFT AI LLC. The authors gratefully acknowledge the contributions of scientists of the Department of Information Technology of the Vasyl Stefanyk Precarpathian National University for scientific guidance given in discussions and technical assistance helped in the actual research.

Disclosures. The authors declare that there are no conflicts of interest related to this paper.

References 1. Roddy, C., Amiet, D., Chung, J., Holt, C., Shaw, L., McKenzie, S., Garivaldis, F., Lodge, J., Mundy, M.: Applying best practice online learning, teaching, and support to intensive online environments: an integrative review. Front. Educ. 2 (2017). https://doi.org/10.3389/ feduc.2017.00059 2. Rovai, A., Downey, J.: Why some distance education programs fail while others succeed in a global environment. Internet High. Educ. 13(3), 141–147 (2010). https://doi.org/10.1016/j. iheduc.2009.07.001 3. Apoki, U., Al-Chalabi, H., Crisan, G.: From digital learning resources to adaptive learning objects: an overview. In: Simian, D., Stoica, L. (eds.) Modelling and Development of Intelligent Systems. MDIS 2019, vol. 1126, pp. 18–32. Springer, Cham (2020). https://doi.org/10. 1007/978-3-030-39237-6_2 4. Bisikalo, O., Kovalenko, O., Palamarchuk, Y.: Models of behavior of agents in the learning management system. In: 2019 IEEE 14th International Conference on Computer Sciences and Information Technologies (CSIT), Lviv, Ukraine, pp. 222–227 (2019). https://doi.org/10. 1109/stc-csit.2019.8929751

Methods and Software Tools for Automated Synthesis

217

5. Ivanova, O., Silkina, N.: Competence-oriented model of representation of educational content. In: Proceedings of the 40th International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2017, Opatija, Croatia, 22–26 May 2017, pp. 791–794. IEEE (2017). https://doi.org/10.23919/mipro.2017.7973510 6. Dutchak, M.: Methods and software of automated construction of adaptive trajectory of training. Visn. Vinnitsa Polytech. Inst. (2), 58–66 (2020). https://doi.org/10.31649/1997-92662020-149-2-58-66 7. Terzieva, T., Rahnev, A.: Basic stages in developing an adaptive e-learning scenario. Int. J. Innov. Sci. Eng. Technol. 5, 50–54 (2018) 8. Guevara, C., Aguilar, J., González-Eras, A.: The model of adaptive learning objects for virtual environments instanced by the competencies. Adv. Sci. Technol. Eng. Syst. J. 2(3), 345–355 (2017). https://doi.org/10.25046/aj020344 9. Ennouamani, S., Akharraz, L., Mahani, Z.: Integrating ICT in education: an adaptive learning system based on users’ context in mobile environments. In: Farhaoui, Y., Moussaid, L. (eds.) Big Data and Smart Digital Environment. ICBDSDE 2018. Studies in Big Data, vol. 53, pp. 15–19. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-12048-1_3 10. Tadlaoui, M., Aammou, S., Khaldi, M., Carvalho, R.: Learner modeling in adaptive educational systems: a comparative study. A Comparative Study. Int. J. Mod. Educ. Comput. Sci. 8(3), 1–10 (2016). https://doi.org/10.5815/ijmecs.2016.03.01 11. Iavich, M., Gagnidze, A., Iashvili, G., Gnatyuk S., Vialkova, V.: Lattice based merkle. In: CEUR Workshop Proceedings, vol. 2470, pp. 13–16 (2019) 12. Tkachuk, V., Kozlenko, M., Kuz, M., Lazarovych, I., Dutchak, M.: Function optimization based on higher-order quantum genetic algorithm. Electron. Model. 41(3), 43–58 (2019). https://doi.org/10.15407/emodel.41.03.043 13. Tkachuk, V.: Quantum genetic algorithm on multilevel quantum systems. Math. Probl. Eng. Article ID 9127510 (2018). https://doi.org/10.1155/2018/9127510 14. Kuz, M., Solovko Y., Andreiko, V.: Methodology of formation of generalized software quality criteria under uncertainty. Visnyk Vinnitsa Polytech. Inst. (5), 104–107 (2015) 15. Evans, M., Rosenthal, J.: Probability and Statistics: The Science of Uncertainty. W. H. Freeman (2009) 16. Akerlund, H., Audemard, G., Bollaert, H., Hayenne-Cuvillon, V., Hlobaz, A., Kozlenko, M., Milczarski, P., Monteiro, J., Morais, J., O’Reilly, D., Possemiers, P., Stawska, Z.: Project GGULIVRR: generic game for ubiquitous learning in interactive virtual and real realities. In: EDULEARN20 Proceedings, pp. 5973–5979 (2020). https://doi.org/10.21125/edulearn. 2020.1566

National University of Uzbekistan on the Way to the Smart University A. Karimkhodjaev(B) and M. Nishonov National University of Uzbekistan, Tashkent, Uzbekistan [email protected]

Abstract. The case study of e-learning evolution in National University of Uzbekistan (NUUz) is represented. The review of ICT implementation and Universities infrastructure creation is done. The analysis of projects directed towards e-learning is described. The results on virtual learning environment (VLE) creation process are reviewed. The future development prospects are discussed. Integrating the higher education system of Uzbekistan into the global educational space is noted, based on recent innovative educational reforms. Keywords: Higher education system · Educational space · National qualifications framework · European higher educational area · Bologna process · E-learning · Virtual learning environment (VLE) · LMS MOODLE · Digital competence

1

Introduction

The educational reforms carried out in the independent Uzbekistan as part of its approximation to the ideas of the Bologna Declaration, were aimed at building the Uzbek national education system that would be comparable to the educational systems of Western countries. As a result, the higher education (HE) system Republic of Uzbekistan (RUz) in its present form has become much closer to the ideas of the Bologna declaration. Namely, similar structure of the higher and postgraduate education has been implemented, and a rating system has been established for assessing and monitoring students’ knowledge, in line with the European Credit Transfer System. The accession of Uzbekistan to the Bologna process gives a new impetus to the modernization of higher professional education, and to the integrating of the Uzbek lifelong learning system into the world educational sphere. It opens additional opportunities for Uzbek universities to participate in projects funded by the European Union, and for students and staff of HE institutions to participate in academic exchange with foreign universities. Recently several important decisions were taken by the Uzbek governing bodies that will accelerate the integration of the national HE into the world educational space: c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021  M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 218–229, 2021. https://doi.org/10.1007/978-3-030-66840-2_17

National University of Uzbekistan on the Way to the Smart University

219

1) A number of instructions was published by the Ministry of higher and secondary specialized education of RUz (MHSSE RUz) in the beginning of 2019, describing organizational measures for the Ministry of HE and HEIs that will contribute to the implementation of the Bologna principles in the national HE. Since the academic year 2020/2021, all of the HEIs of the RUz are switching to the credit-modular system. These events were reflected in the Decision of the President on the flagship status of the National University of Uzbekistan [1]. 2) Decree of the President of the RUz that established the concepts of development of the HE system until 2030 [2]. This document defined the strategic goals, the priority directions, the tasks and development stages of the HE for the medium and long term, setting the environment for the development of the industrial and educational policies. The basic principles defined in this decree include: • Maximum academic and economic independence to HEIs and their transition to self-financing; • Credit-modular system for the organization of the educational process in HEIs; • Academic (credit) mobility for students and teachers; • Educational standards and qualification requirements, under phased transition to NQF environment; • HEI ranking; • Creation of branches of foreign universities, double-degree and jointdegree programs; • International scientific and pedagogical cooperation. 3) Resolution of the Cabinet of Ministers RUz [3] on adoption the concept of National Qualifications Frameworks for Continuing education system of RUz necessary to make the national HE degrees comparable to foreign degrees, and one of the key components of the Bologna process [4,5]. The educational process of a modern university is difficult to imagine without the functioning of distance learning. The move into e-Learning requires understanding the technological requirements as well as the pedagogical requirements. These will depend on the nature of the e-learning course – which subject, blended or purely online, which level? Online learning can be used to support our faceto-face teaching, a mode which is often called blended learning. It can also be used as the only delivery mode, typically in the case of distance learning. There are many versions of online learning, and many different tools that can support delivery, as part of VLE. Although ordinary web pages can be used to deliver a complete online learning experience, this requires all participating staff to have a high level of technical expertise at producing advanced web pages. For most teachers, the use of the built-in tools and facilities of a VLE is much more convenient. The following stages of Web Integration into Teaching and Learning process of university are exists:

220

A. Karimkhodjaev and M. Nishonov

– Administrative information only; – Administrative information plus some supplementary course content and resources; – Some materials or activities essential to learning; – Extensive use for materials, discussion, collaboration and assessment; – Occurs entirely online (usually distance learning). During the last decade we work on application of ICT in the learning process at National university of Uzbekistan (NUUz). During this time, we managed to win several NATO, TEMPUS, Soros foundation grant in the area. As a result of enthusiasts’ efforts, we have built computer infrastructure of the NUUz, administer the network, provide Internet access. VLE NUUz is being created based on open recourse Moodle, but our experience of creating e-courses is lack. The limited resources and high cost of telecommunication services, especially broadband network connections, as well as the high cost of the corresponding equipment, the most advanced and sophisticated ways of distance learning, such as real-time video conferences, are not affordable for most universities of Uzbekistan yet, and therefore cannot be the core part of the e-learning services in the framework of today’s E-Learning projects. Moreover, Distance Learning cannot be the sole purpose of ICT application in education. The more appropriate paradigm is so-called “hybrid” form of distance education which combines more cost-efficient tools of e-learning and traditional forms of education (blended learning). Among these relatively inexpensive tools are e-mail, webbased and CD-based multimedia textbooks and training aids, synchronous and asynchronous testing systems, etc. Summarizing the above mentioned the following major directions of application and content development can be realized: – Virtual workspace creation for everyday life based on Intranet, communication and collaboration environment, both for educational and management purposes; – Electronic libraries and archives development; – Network-based e-learning solutions, both web-based online ones and less expensive off-line and asynchronous solutions; – Creation of Internet resources related to education, such as universities websites, file archives, directories, list-servers, etc.; – Resource and process management software for systematic planning, control and development of the learning process. Continuing the international collaboration for adoption and migration of most efficient and proven solutions within the frameworks of most suitable co-operation programs, like Tempus, ERASMUS MUNDUS and NATO Science-Committee. The importance of ICT for education is obvious. This fact is mentioned in the National Program of Personnel Training and the Governmental regulations stating that the development of information technologies and Internet are the most urgent tasks of development of science and education in Uzbekistan. The key role in this activity is to be played by the universities and other higher education institutions as catalyst in this development process.

National University of Uzbekistan on the Way to the Smart University

221

In the light of these serious and substantial issues hindering the ICT development in Uzbek Universities still there are opportunities. One of them is the international co-operation programs for Academic and Educational institutions. The illustration of this is the example of the NUUz. Management and staff of University is keen on development of ICT application in education project. This interest has extended from networking and infrastructure creation till the e-learning and Virtual University projects. Below we shall discuss this evolution.

2

Review of ICT Infrastructure Creation at NUUz

From the very beginning, the development of University in the ICT fields was planned according to the Concept of Corporate Network. The Concept was presented at the 2nd UNESCO Conference on Education development in Uzbekistan held in Tashkent. The special attention was paid to the further expansion of technical and software capacity to allow the on-line lecturing. The gradual construction of technical infrastructure involved the following major milestones. The first Fiber-Optic network of University in Uzbekistan was created within the frameworks of the TACIS-Tempus project UZBEKINFO. The pilot network experience was further disseminated to other Universities of Uzbekistan during the other Tempus project UZNANETU implementation. As a result of these projects, the pilot education network of Universities was created in Uzbekistan. We can say that as a result of these two Tempus projects the technical infrastructure of NUUz was crated. UZBEKINFO project accomplished in cooperation among the NUUZ, Fontys University of the Netherlands and University of Central Lancashire, Preston, Great Britain, aimed at improved University management through the creation of the network based on fiber-optic channels between three main buildings. 8 faculties and administrative building of the University was covered by the network and Network-Operation Centre was established. Within the UZBEKINFO Tempus Project the NOC Centre was established and for this purpose PC and Servers, communication equipment were purchased, system managers were trained. In the course of the project the top-managers of the University had also a number of training courses, among them: Comparative Study of VET System of European States, ICT in Education, Quality Aspects in Education, Project Development and Management, etc. The effects of the project to the University were quite positive and these results were further developed. By the years 1999–2001 the first scientific and educational network of Uzbekistan – UZSCINET, http://www.uzsci.net – was already operational and this fact allowed using its facilities for communication with external world. Connection to UZSCINET gave advantages to not only to Uzbekistan higher educational institutions to contact with the Western ones but vice versa, the Western audience obtained the opportunity to familiarize with the Oriental mentality. NUUz received access to Internet, and the educational and research network of Uzbekistan was enlarge by the campus network of NUUz, the largest of its kind. This was an important development and NUUz as well as other Tempus Beneficiary organizations, became the major participant of the growing educational network.

222

A. Karimkhodjaev and M. Nishonov

The example of such a co-operation is the oint accomplishment of the NATO Science Committee project named as UZUNINET “Uzbekistan Universities Network” by the specialists of the Physics Faculty of NUUz (co-director of project - prof. Robert Janz from University of Groeningen, NL). The objective of the project was the construction of the unified virtual network of the Universities of Uzbekistan. This process included the following stages: the initial connection media for all participating Universities was the Radio-Ethernet technology and the leased-lines. The full installation was successfully completed by the middle of 2002. During 2003–2004, in the second stage of the project, all buildings of NUUz and TSTU were connected via Fiber-Optic channels and Campus network was created, the distant Faculties were still using the Radio-Ethernet over the node antenna located on the main administrative building of the University. By 2004, in the third stage of the project the connection to UZSCINET was installed via the Fiber-Optic channels, and Radio-Ethernet channel was the service media for Educational Institutions in the North-West of Tashkent. The next stage of UZUNINET was the creation of Medical Campus network with all the educational and clinical buildings of the Tashkent Medical Academy are connected by the fiber-optics and the access to Internet is accomplished via Radio-Ethernet. The infrastructure created within the frameworks of UZUNINET comprise the largest segment of “UZSCINET” by the number of users [6,7]. This work of networking was continued also throughout the implementation of another Tempus project Uzbek National Network of Universities UZNANETU. This project involved the joint activities of 8 Uzbekistan Universities combined with the efforts of FONTYS, NL and UCLAN, UK and was directed at the dissemination of the best practice of UZBEKINFO project in the eight Universities of Uzbekistan. In contrast with the UZUNINET of NATO this project has the inner centralization and has developed the internal network or LAN of Universities including the full-functional NOC. The capacity of Universities was increased by the creation of connectivity and staff development activities. As a result of this project eight Beneficiary Universities had a well trained and highly qualified ICT staff responsible for the sustainability of technical connectivity and the content was developed and offered to the stakeholders. LAN and Internet connection comprise the major part of the ICT infrastructure, but in order to provide the most efficient use of these structures by students and faculty/staff for free Internet access, creation of an Internet Public Access Site (Open Learning and Information Center, OLIC) was proposed. This activity was supported by Open Society Institute Assistance Foundation – Uzbekistan (OSI AF) through its Internet Program. In the period of 2000–2005 the following activities were accomplished by staff of OLIC to achieve the objectives set: 1. The technical solution of Internet access and creation of Campus network 2. Developed and implemented the DBSM of users 3. The system of electronic communication is implemented

National University of Uzbekistan on the Way to the Smart University

223

4. Training process is established 5. Creation, implementation and management of electronic learning materials In the indicated period over 1600 users had undergone training in the Centre. Below are some of the statistics (Fig. 1 and 2).

Fig. 1. Participants of internet training course.

Fig. 2. Internet users of NUUz.

We can say that the establishment of this type of Center in Uzbekistan was the breakthrough as it combines the several functions and provides the widest range of services to the all levels of Academic and Scientific community of the University. In 2009 the LAN NUUz was modernized in the framework of NATO project “e-Workspace” and all distant buildings were connected via the One Mode FiberOptic channels (Fig. 3).

224

A. Karimkhodjaev and M. Nishonov

Fig. 3. One mode fiber-optic channels of NUUz.

3

Results and Outcomes: E-Learning Environment

The education quality issues were addressed in the UNIQUM project within the frameworks of Tempus program and the special software named SAMMER for the knowledge support and control as well as the student registration and administration has been developed. SAMMER consists of four main components: knowledge support system, knowledge testing system, student registration and university administration. The workplace of student, administrator, administration officer, lecturer are available based on the profile of the user and allowed access levels. SAMMER was programmed exclusively for NUUz and is executed in the open-source resources based on Web-technologies and PHP. The databases utilize apache server. Users of system can access the resources offered by the system using the Internet browser and this makes the access very simple and easy-tolearn. The software uses the frame of the education program and curricula of Uzbekistan and developed according to the requirements of the National program of professionals training. This makes the software very convenient for the use in Universities of Uzbekistan. The project has automated the collection of information on the main parameters of the education process, this made possible to gauge the quality of education in various elements of learning. As the logical continuation of the Quality management was issue the eresources management at the University. For this purpose, the project named UNIQERM was submitted to Tempus and approved. E-resources management system is being developed and implemented. The contents of network should be enriched by the localized learning materials that are pedagogically and technologically adapted for the transmission and delivery over the virtual media. Provided the quality of these materials being intact with the requirements the issue of the resources management will arise.

National University of Uzbekistan on the Way to the Smart University

225

All the learning materials will comprise the e-learning resources, and the problems related with their timely delivery, everyday management, control of copyrights will be resolved by the system of the e-Resources management. As such system implemented well known LMS MOODLE within the frameworks of the UNIQERM project [8]. Electronic learning management systems (LMS) allow you to automate an exhaustive list of functions for administering the learning process and delivery of e-education systems among the most common and popular LMS. Moodle, like any other learning management system, possessing all the basic capabilities of commercial systems, provides a number of additional features that stem from its initial pedagogical orientation to the active involvement of students in the learning process. The documentary portal is a simple, fast and effective tool to access printed and electronic documents. The end-user will access, via a single interface of simultaneous questioning, to heterogeneous documentary sources (catalogues, data bases, reviews, theses and electronic books, multimedia documents). After consolidation, personalization, localization we have now one portal running in NUUz, including some services. The services at the moment include the communication tools (Internet, Intranet, mail, chat, document exchange, etc); administrative tools (SAMMER QAS, OAS, etc.); and pedagogical tools Moodle, etc. As it can be seen the services can be as the open-source resource as well as the commercial software. The software for the quality assurance (SAMMER) developed at the University is also integrated into the portal. Next step is to add some more courses and disseminate to students and teachers.

4

Why LMS MOODLE as VLE

Moodle is an acronym for Modular Object-Oriented Dynamic Learning Environment. The open source software offered by Moodle is designed around constructivist pedagogy. Moodle software is free to the user and is copyrighted under the GNU Public License, allowing the user a freedom in copying, modifying and sharing. The software is compatible with both Windows and Mac operating systems and many Linux distributions. Moodle is not a company, but rather a collaborative project organization. It is a technological mosaic of activity modules that can be customized as deemed necessary by anyone involved in the learning community. Developers of Moodle include an in-house team in collaboration with a worldwide professional network. This collaborative group continually creates and writes a variety of modules and plug-ins. The developers meet in online forums and rooms. Documentation that includes software specifications, brainstormed ideas, implementation procedures, necessary standards for use and guidelines can be found on Moodle Docs. The site, Moodle Roadmap, provides the latest additional features and update information. Users can find out about both negative and positive development issues on Moodle Tracker The Moodle learning community consists of a collaboration of participants found in wikis, blogs and/or participating in forums and events. The virtual,

226

A. Karimkhodjaev and M. Nishonov

global community is measured by registered users who log onto the various sites. This community communicates by postings in forums within “courses” that are accessed through free enrollment. Events or Moddlemoots happen in posted rooms where conferencing occurs. To ensure the continual operation of this open source LMS, Moodle Partners, a worldwide group of service companies, have committed financial backing and support services to the cooperative effort. This group underwrites and provides technical support to the sites that supply needed information to Moodle users. Moodle is the most user-friendly and flexible free open-source course ware products available all over the world. Moodle is a VLE that lets teachers provide and share documents, assignments, quizzes, forums, chats, etc. with students in an easy-to-learn and user-friendly interface [9]. Moodle is CMS designed to help educators who want to create quality online courses. It has excellent documentation, strong support for security and administration, and is evolving towards Information Management System/Shareable Content Object Reference Model (IMS/SCORM) standards. Moodle has a strong development and large user community and users can download and use it on any computer they have at hand. Currently, Moodle has a large and diverse user community with over 1,077,969 users on this site, speaking 86 languages in 112 countries around the world [10]. We list here the most important reasons for choosing this package: 1. Moodle is OSS, which means users are free to download it, use it, modify it and even distribute it under the terms of GNU; 2. Moodle is CMS & VLE, and lets teachers provide and share documents, graded assignments, quizzes, discussion forums, etc. with their students in an easy-to-learn manner and to create quality online courses; 3. Moodle can be used on almost all servers that can use PHP; 4. The key to Moodle is that has been developed with both pedagogy and technology in mind. One of the main advantages of Moodle over other systems is a strong grounding in social constructionist pedagogy with good educational tools; 5. It works well with languages and is currently being used in 86 languages in 112 countries; 6. Users can download and use Moodle on any computer they have at hand; 7. It has excellent documentation, and strong support for security and administration and easy to upgrade from one version to the next; 8. It has many user-friendly features such as easy installation, customization of options and settings, good support/help and good educational tools; 9. It demonstrates the use of OSS in creating a high quality e-learning environment that incorporates many other subjects; 10. Moodle is the LMS most often recommended of all the OSS packages, as well as being the most popular; 11. The credibility of Moodle is very high. At present, there are 52289 web sites from 193 countries that have registered with it;

National University of Uzbekistan on the Way to the Smart University

227

12. The importance of Moodle is its good reputation according to good reports, grade of admission in the community and number of places, existing languages, etc; 13. Moodle should be able to be used in conjunction with other systems. It keeps all files for one course within a single, normal directory on the server. Administrators allow the provision of seamless forms of file-level access for each teacher, such as SMB, FTP, and so on. Currently, there is work on more features planned for Moodle in future versions, such as export and import data using XML that can be integrated visually into other web sites. In addition, has presented a good solution for this integration, enabling more VLEs to work together by using Web services and related techniques (AlAjlan, et al., 2008); 14. Moodle runs without modification on Unix, Linux, Windows, Mac OS X, Netware and any other systems that support PHP; 15. Data is stored in a single database: MySQL or PostgreSQL are best but it also supports Oracle, Access, Interbase, ODBC and others; Some universities integrate Moodle with other VLE products, such as Oxford University which has integrated two OSS learning environments, Bodington VLE and Moodle although they are slightly different to each other.

5

Conclusion

The limited resources and high cost of telecommunication services, especially broadband network connections, as well as the high cost of the corresponding equipment, the most advanced and sophisticated ways of distance learning, such as real-time video conferences, are not affordable for most universities of Uzbekistan yet, and therefore can not be the core part of the e-learning services in the framework of today’s E-Learning projects. Moreover, Distance Learning cannot be the sole purpose of ICT application in education. The more appropriate paradigm is so-called “hybrid” form of distance education which combines more cost-efficient tools of e-learning and traditional forms of education. Another issue requiring attention is the institutionalization of Distance Learning practices with the corresponding legal basis behind it. This touches upon the political solutions and managerial concern on results outcomes from combining the ICT and education. Summarizing the above mentioned the following major directions of application and content development can be mentioned: Virtual workspace creation for everyday life based on Intranet, communication and collaboration environment, both for educational and management purposes electronic libraries and archives development network-based e-learning solutions, both web-based online ones and less expensive off-line and asynchronous solutions creation of Internet resources related to education, such as universities web-sites, file archives, directories, list-servers, etc. Resource and process management software for systematic planning, control and development of the learning process.

228

A. Karimkhodjaev and M. Nishonov

While now these components are being developed mostly independently from each other, they in fact can be easily integrated in an efficient virtual learning space within the university intranet because of their web-oriented nature. As noted above, in the light of innovative transformations in the field of HE, the status of the flagship calls the NUUz to take appropriate measures: to modernize the infrastructure of the VLE, improve the qualification of teaching staff, and retain highly qualified technical staff. All this together affects the quality of the created educational materials. Unfortunately, during the transition period, it was not possible to maintain the quality of education, mainly due to financial difficulties. The widespread forced transition to the form of distance learning, due to the Covid 19 pandemic, exposed these shortcomings in an obvious way (http://webdars.nuu.uz/). To become a smart university, NUUz has a lot to do. We are ready to cooperate with HEI within the framework of projects of international donor organizations in the implementation of the following tasks: equipping education and training systems to face the challenges presented by the recent sudden shift to online and distance learning, including supporting teachers to develop digital competences and safeguarding the inclusive nature of learning opportunities.

References 1. President of RUz: Decision No. PP-4358 of the President of the Republic of Uzbekistan “On measures to radically improve the training system of required qualified personnel and develop scientific potential at the National University of Uzbekistan named after Mirzo Ulugbek in 2019-2023” dated 17 June 2019 (2019a). https:// lex.uz/pdfs/4380626. Accessed May 2020 2. President of RUz: Decree of the President of the Republic of Uzbekistan No. UP5847 “On approval of the Concept of development of the higher education system of the Republic of Uzbekistan until 2030” dated 8 October 2019 (2019b). https:// lex.uz/ru/docs/4545887. Accessed May 2020 3. Cabinet of Ministers of RUz: Resolution No.287 “On measures to organize the National system of development of professional qualifications, knowledge and skills in the RUz” dated 15 May 2020 (2020). https://lex.uz/docs/4814154. Accessed May 2020 4. Imamov, E., Khodjaev, A., Karimkhodjaev, A.: Guidelines on the formation of the NQF CES RUz, Nodirabegim, Tashkent, 32 p. (2019a). ISBN 978-9943-5222-6-8. https://ec.europa.eu/programmes/erasmus-plus/project-result-content/1defadc8d204-43cc-87f2-1d14defa8125/Guidelines NQF final en.pdf. Accessed May 2020 5. Imamov, E., Khodjaev, A., Karimkhodjaev, A.: National qualifications framework of the continuing education system of the Republic of Uzbekistan. Basic Regulations, Nodirabegim, Tashkent, 95 p. (2019b). ISBN 978-9943-5222-5-1. https://ec. europa.eu/programmes/erasmus-plus/project-result-content/af62df3c-7888-4f6d8a92-370ed92c4658/General Regulations NQF final eng.pdf. Accessed May 2020 6. Karimkhodjaev, A., Garnov, S., Norboev, T.: Implementation of the information technologies on educational process in NUUz. In: contribution Second International Workshop on “Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications” (IDAACS 2003), 8–10 September 2003, Lviv, Ukraine (2003)

National University of Uzbekistan on the Way to the Smart University

229

7. Karimkhodjaev, A., Garnov, S., Norboev, T.: Towards creation of virtual learning environment in Uzbekistan universities. In: First International Congress on Higher Education: Perspectives on University Education in the 21-Century, Fatih University, 27–29 May 2004, Istanbul, Turkey (2004) 8. Karimkhodjaev, A., Akramov, S., van Zantvoort, G.: E- learning: from informatics to digital university. In: Contribution Papers: Tempus III in Uzbekistan, Tashkent, Uzbekistan, pp. 180–188 (2007) 9. Dougiamas, M.: Moodle, 17 June 2011. www.moodle.org 10. Al-Ajlan, A., Zedan, H.: Why moodle. In: Proceedings of 12IEEE International Workshop on Future Trends of Distributed Computing Systems (FTDCS), pp. 58–64. IEEE Press, Kunming (2008)

Smart Pedagogical Knowledge Management Model for Higher Education Meriyem Chergui1(B)

, Aziza Chakir1

, Hajar Mansouri1 , and Adil Sayouti2

1 Hassan II University, Casablanca, Morocco

[email protected] 2 Royal Naval School, Casablanca, Morocco

Abstract. Thanks to technological advances, research on adaptive hypermedia and the rise of big data, personalized training is shifting into second gear with smart and adaptive learning. This new avenue aims to generate in real time for each learner the learning path most likely to enable them to achieve their objectives. Intelligent adaptive learning system provides information that, when used wisely, can help the university identify which knowledge and skills it would benefit from developing students. Allows to assess the impact of training investments. Once again, thanks to the data it collects on the learning path, this system makes it possible to know whether the efforts invested in training are yielding as desired or whether it is necessary to adjust the course. It Builds a culture of development, since this training system is at the cutting edge of technology, it allows you to acquire new tools without wasting time. In this paper, a smart education knowledge management system is proposed to be adapted to Moroccan university based on knowledge management and artificial intelligence to cover learning environment specificities. Keywords: Smart education · Adaptive learning · Knowledge management · Pedagogical engineering · E-learning · Artificial intelligence

1 Introduction At the university like any organization, each teacher has his own knowledge, and he develops skills on his field either as tacit or implicit knowledge to be shared with learners. However, depending on the nature of knowledge, and given the peculiarity of the pedagogical flow, the goal of creating collective knowledge for smart education is not easily attainable. During a retirement, a transfer, a change of position or any other event of academic life, knowledge may be lost if it is not formalized. Otherwise, educational management has two functions interdependent and complementary: A didactic function and educational function. The didactic function concerns learning management in the taught subject. As for educational function concerns everything related to classroom management. Several parameters must be taken into account namely learning planning, adopted approaches and pedagogies, learning activities choice, methodology and presentation material. The pedagogical and didactic management is based on the principle according to which the learner must be at the heart © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 230–239, 2021. https://doi.org/10.1007/978-3-030-66840-2_18

Smart Pedagogical Knowledge Management Model

231

of learning. Hence, the need to contextualize learning starting from relevant problem situations with meaning for learners. the learner’s involvement in the problem situation by taking into consideration his prerequisites and by creating the conditions necessary to promote exchanges and interaction. In this article, educational design and curriculum development are highlighted as two main pedagogical knowledge application. And the main question this article answers is how pedagogical knowledge could be managed to improve the curriculum and to design a smart education system with reusable learning objects according to students and teacher’s needs. To answer this question, an empirical study was held to evaluate the actual learning system and to question students as final users about what they expect. Hypothesis was done about the main functionalities a smart pedagogical system should have in order to manage higher educational knowledge and to satisfy students and teachers daily need. As a result, a pedagogical knowledge management model for smart education is proposed to solve both educational and curriculum problems via Artificial intelligence and knowledge management assets. The proposed model was test by 100 students in COVID19 distance learning experience and their feedback was positive. This paper is organized as follows. Section 2 provides the literature review of both pedagogical knowledge management, and smart education, Sect. 3 presents an empirical study of Moroccan university expectations about smart education. The proposed model will discuss in Sect. 4 before a conclusion and works perspectives.

2 Smart Education and Knowledge Management Basics 2.1 Smart Learning Systems Mobile learning is an innovative method currently used for employees training process as a smart learning attitude [2]. In fact, recent e-learning standardization focuses on the reuse of learning material and functions [3] to make e-learning system a collection of activities putting learners in interaction with courses and quizzes as resources. Otherwise, in order to identify learner’s current level of understanding [1], online courses are mainly developing creativity, cognitive and emotional support [4]. One of the main goals of this new educative research discipline is the development of a new generation of intelligent tools for online learning using recommender systems. Recommender System (RS) in e-learning is a system to assist learners in discovering convenient learning actions matching perfectly match learners’ profiles’. It is also able to define the best time, and the best context and way to keep them motivated enough to complete their learning activities in an effective and efficient way. [5] presented many key features of smart education and discussed, smart competences. [6] focused on contemporary universities applications of smart ICT in education. In [7], authors propose a detailed study of universities and MOOC’s role in smart education context. As for [8], authors introduced outcomes-based education as a part of the smart education paradigm. In [9], instructional design and cognitive science take in consideration learning problems related to material, communication, and cognitive competence as resources matter. In [10], authors deal with the smart learning and smart e-learning problems. Many publications analyze different possible ways of implementing smart learning concept and design models and schemes of smart educational systems [11]. In [12] authors discuss

232

M. Chergui et al.

conceptual model of ICT infrastructure of smart educational and e-learning systems as far as databases, standards, learning gadgets and equipment are concerned. Authors in [13] analyzed smart education organizational aspects like smart learning strategies and educational path. 2.2 Pedagogical Knowledge Management Definitions Pedagogical knowledge management was first proposed in [14]. Authors in [15] and [16] also discussed the management of pedagogical knowledge and its impacts. They see knowledge management as a discipline that allows the use of available knowledge for the strategic development of an educational process. In [15], the term “pedagogical knowledge management” was used for the diagnosis of the knowledge management information system of teachers at the technical school. Pedagogical knowledge management is defined as the management of knowledge about educational process in its different quality aspects: teaching, classroom management, learning experiences and content evaluation. According to [17] academic performance can be improved via the management of pedagogical knowledge. Indeed, knowledge management for university education is widely discussed by the research community, which considers university as a business where all concepts and principles of enterprise knowledge management are used. Every knowledge in the educative system is a pedagogical knowledge, and its management is a necessity to implement and improve smart education in an easy way for both students and teachers according to [18]. Pedagogical Knowledge Management Applications In literature, there are many applications of pedagogical knowledge management, namely: • Lesson study process: lesson study is a way to improve educational standards and the quality of classroom instruction. It is a process for developing teachers’ knowledge and the quality of education in primary and higher education. In the case of higher education, the lesson study makes it possible to develop the skills of teachers to improve the quality of education. In this configuration, the establishment and all its stakeholders collaborate and interact around the assessment, planning, driving any problems in class. Also, Nonaka and Takeuchi knowledge management model establishing relationship between tacit and explicit knowledge was used to promote pedagogical knowledge. Authors in [19–21] show how lesson study contributes to improve trainings in general and Education Management Models. • Lifelong learning: lifelong learning, is the ability to continuously acquire knowledge and skills throughout their life VIA neurocognitive mechanisms which together contribute to the development and specialization of sensorimotor skills as well as the consolidation and long term memory recovery. Authors in [29] define and analyze this application from different point of view namely technically and pedagogically speaking.

Smart Pedagogical Knowledge Management Model

233

• Teachers’ professional development and professional qualifications promotion: teachers are the main source of the knowledge. This source should be continually feed and actualized in a quantitative and qualitative way. In this perspective, teachers’ professional development and qualifications should be thought as sensitive strategic decision. Pedagogical HR management has been deeply treated by the authors in [22] and [23]. • Curriculum Development: in this process, teachers play an essential role in planning and implementing programs. In higher education, the teacher is the ultimate master of the situation. All the more, curriculum planning is an important component of the pedagogical knowledge requested for the identification and analysis of the needs of existing and to be implemented programs. There are several ways to develop the curriculum such as planning process in [24], curriculum design and reactive curriculum in [25] and [26], curriculum assessment in [27], and curriculum development in [28].

3 How to Manage Learning Knowledge? In this article we deal with learning knowledge in educational design and curriculum development aspects where educational knowledge is managed like business and technical knowledge of the company in a dynamic and changing environment. This vision corresponds to smart learning insofar as the educational memory of the school or faculty represents the basis of an effective and efficient educational system. According to related works bellow, there are three essential levels to design and implement a smart education system: (1) Organizational dimension (2) ICT dimension (3) Educational outcomes In this work the following questions are answered: How to design an educational strategy that deploys ICT correctly in order to obtain a positive educational outcome? How to manage pedagogical knowledge ensuring its correct and effective use independently of the human factor? And how to ensure the performance of a smart learning and e-learning platform compared to international quality learning systems? In a previous article (accepted for publication), an empirical study was conducted (1200 students and 100 teachers were questioned from Moroccan Universities) to collect higher education students’ expectations in Morocco about smart education systems, to have an overview on the real context, its particularities and its constraints. The main results obtained from this empirical study are clear ideas about Students’: • • • • • • • •

Satisfaction and feedback about actual trainings contents, tools and methodologies. Training needs: e-learning platform, smart education, innovation projects. Expected learning performance: conditions and constraints. Expectations about university e-learning platforms. Ways of communication with teachers and professionals. Smart learning evaluation mode and certification. E-learning motivation. Additional training supports

234

M. Chergui et al.

4 Proposed Solution Nowadays, higher education has an essential contribution to make to the creation of highly creative and innovative human resources. It is therefore necessary for higher education establishments to invest in the design of teacher training in order to exercise new functions in the evolving smart education and learning systems. Furthermore, they must have sufficient resources to provide student-centered education and prepare them for labor market integration. Therefore, institutions should maintain an advanced and up-to-date knowledge base and stimulate research and innovation. The pedagogical knowledge management system is a knowledge transfer solution which emphasizes the ability to apply knowledge to real world problems (industry, technology … etc.). It allows the university to make decisions regarding the strategic development of the educational process. In this article, the proposed architecture consolidates and manages the pedagogical knowledge of a teacher in one or more courses or training he provides. The main objective is to preserve this knowledge which is lost by the departure of the teacher or his cessation of this course. Because the course is not only the slides and the support given to the students is also the explanations the examples, the parallel activities (exercise, research work, project and mini project), it is also the knowledge, the know-how and the competence of each teacher in the classroom and beyond the course. Indeed, it is a very recurrent problem in faculties and engineering schools where the same course is provided by several teachers but a teacher manages to transmit this knowledge better than others for students. Therefore, a mastery of this discipline is found among students, a motivation for more detail in this area and especially a good impression from the business world. A link between Knowledge Engineering and Pedagogical Engineering is essential to capitalize on pedagogical knowledge. In an article preceding [26] we proposed a pedagogical engineering meta-model of Moroccan University ICTE driven. This model breaks down each course into a set of concepts and prerequisites governed by a generic approach that can be detailed by several specific approaches to each part of the course or each key concept. A specific approach has implementation phases carried out by the main actors namely: the student, the teacher and the company. Each actor performs one or more activities that are either educational activities or support activities. Its activities are in an object learning environment and/or online service (Fig. 1). The proposed model is made of three levels: • The interactive system which contains teacher and student interfaces to create courses or ask questions. • The pedagogical engineering engine to decompose a course to learning objects according to [26]. • The knowledge management system (KMS) which capitalize the pedagogical knowledge through courses as service. The KMS is connected to the educational memory of a faculty or school as an educational knowledge base. The aim of this solution is to implement the pedagogical engineering model [30] in a smart learning way. With the three levels described before, the learning KMS is able to manage the strategic dimension, and the educational outcomes

Smart Pedagogical Knowledge Management Model

235

Fig. 1. Smart education proposed model

as well. In fact, the involvement of the student and the teacher ensures the dynamics of the system and avoids overly theoretical courses which do not really serve the training of the student. The pedagogical engineering engine of the processing system guarantees the conformity of the courses made available to students. It obliges the teacher to align himself with a complete pedagogical model which breaks down a course in concept and technique. Each concept is linked to dedicated educational approaches, activities and learning objects. This hierarchy ensures the quality of the course and makes it possible to assess the educational outcomes. The use of information and communication technology is also favored in this system: The teacher and the student are connected to a web/mobile platform. At any time, the teacher can design a course and the student can consult one. The system keeps track of the educational operations carried out, manages the profiles and reusable educational objects as well. The use of Artificial Intelligence to manage pedagogical knowledge is a technical and architectural solution for the problematic. Artificial Intelligence replaces human assistance on several levels of this so-called “smart” educational system. The use of chatbots to interact in real time with users and answer their questions is a major advantage of the solution. The teacher can easily design his course based on an educational engineering model. He can also modify it, and evaluate it in relation to the courses existing in his establishment. It can also be inspired by existing reference courses on the web. The use of ontology for understanding the concepts of the course and their correspondence and also a strong point of the solution insofar as the knowledge management system will be able to easily identify the entities and the relationships. On the other hand, the service oriented aspect (course as service) of the platform ensures easy handling and updating. Our pedagogical knowledge management system proposes smart and easy use of educational computer systems insofar as the course design is done in a guided manner following the educational meta-model. This course can also be used on-site or remotely for a student requested by the job market. One last question that the scientific community can ask to us is how general are our results? And how these have to be of interest to everyone? According to Professor Hermann Maurer in his article “Problems and solutions for the use of computers (networks) for education” [7], e-Learning configurations have succeeded in many cases, but those aimed at completely replacing ordinary teachers have failed. Hence the importance

236

M. Chergui et al.

of pedagogical knowledge management on the one hand and the unavailability of the teacher in the smart learning system on the other hand (two points proposed by our model).

Fig. 2. Pedagogical engineering engine

Otherwise, for educational computer systems new generation, according to Khalil and Ebner, (2016) and Ebner and Harmandic (2016) the major course repositories should be stored and that each educational institution has to make them available for both students and teacher as in the proposed education knowledge base. The reason why it is necessary to have a pedagogical Knowledge Management System for every institution, and the real challenge of educational computer systems is to describe the maximum learning situation to find the right combination of approaches for each situation as the proposed pedagogical engineering engine do (see Fig. 2). There is therefore not an ideal solution for an educational computer system but the ideal situation for it according to a variety of scenarios and possibilities. The proposed Knowledge Management model was implemented in Moodle with an AI plugin developed by authors, and the communication was held through 3 e-learning platforms namely google classroom, MS Teams and google meet. It was tested in his first version, for engineering school students: industrial and computer sciences engineering students, total of 100 students, 3 learning groups and 3 learning subjects: “Knowledge Management”, “software quality assurance” and “UML”. Trained students were satisfied as shown in Fig. 3. Meanwhile, their results were significantly better than other school subjects where the model was not applied. (the new model is tested from March 2020 to September 2020. For other school subjects a classical learning system was adopted). Also, their implication in end of training project was 60% more than other school subjects. In Fig. 4 we compare the same learning group in three school subjects: UML where the new model is adopted, JAVA and Process Modeling where classical model is used:

Smart Pedagogical Knowledge Management Model

237

Fig. 3. Pedagogical engineering engine

Fig. 4. Students’ results comparison with and without the proposed model

5 Conclusion In this article, a pedagogical knowledge management model for higher education is proposed. This model deals with the capitalization and consolidation of pedagogical knowledge for a purpose of smart education. It covers the process of training the teacher to the student through a teaching model based on reusable teaching objects. After a presentation of smart education and a state of the art of pedagogical knowledge management the scientific reserve of all its applications are detailed. To respond to the problem of the best use of pedagogical knowledge according to intelligent education and adaptive learning measures, a pedagogical knowledge management model is proposed. Thanks to this solution, pedagogical knowledge is managed and improved by teachers as well as from online learning platforms on the Web according to a meta-model of pedagogical engineering. This approach ensures the quality and updating of scientific content in a simple and simple manner as well as the reuse of courses and educational objects to meet the expectations of students and improve the quality of higher education.

References 1. Raghunathan, V.S., Sathya, A., Devi, N.: Design and analysis of agent architecture for dashboard platform domain services. Adv. Nat. Appl. Sci. 11(8), 424–433 (2017)

238

M. Chergui et al.

2. Mittal, N., Chaudhary, M., Alavi, S.: An evaluative framework for the most suitable theory of mobile learning. In: Managing Social Media Practices in the Digital Economy, pp. 1–24. IGI Global (2020) 3. Shafrir, U.: Meaning equivalence reusable learning objects (MERLO) access to knowledge in early digital era and development of pedagogy for conceptual thinking. In: Pedagogy for Conceptual Thinking and Meaning Equivalence: Emerging Research and Opportunities, pp. 22–53. IGI Global (2020) 4. Mueller, S.: The mature learner: understanding entrepreneurial learning processes of university students from a social constructivist perspective. Doctoral dissertation (2020) 5. Cui, Y., Zhang, H., Liu, S.: Research on teacher teaching competence (TTC) model under smart education. In: 2019 International Joint Conference on Information, Media and Engineering (IJCIME), pp. 38–41. IEEE, December 2019 6. Xi, J., He, W.: Research on innovation and practice ability of computer major students in contemporary universities. In: 4th International Conference on Education, Management, Arts, Economics and Social Science (ICEMAESS 2017). Atlantis Press, December 2017 7. Cabral, P., Paz, J., Teixeira, A.: The impact of a research-based institutional strategy for opening up educational practices: the case of the MOOC-maker project. In: Project and Design Literacy as Cornerstones of Smart Education, pp. 53–65. Springer, Singapore (2020) 8. Das, D.K., Mishra, B.: Exploring the complementarity of problem based learning with outcomes based education in engineering education: a case study in South Africa. In: 7 th International Research Symposium on PBL, p. 297 (2018) 9. Verschaffel, L., Van Dooren, W., Star, J.: Applying cognitive psychology based instructional design principles in mathematics teaching and learning: introduction. ZDM 49(4), 491–496 (2017). https://doi.org/10.1007/s11858-017-0861-9 10. Truong, H.M.: Integrating learning styles and adaptive e-learning system: current developments, problems and opportunities. Comput. Hum. Behav. 55, 1185–1193 (2016) 11. Gros, B.: The design of smart educational environments. Smart Learn. Environ. 3(1), 1–11 (2016). https://doi.org/10.1186/s40561-016-0039-x 12. Uskov, V.L., Howlett, R.J., Jain, L.C. (eds.): Smart Education and E-learning, vol. 99. Springer, Heidelberg (2018) 13. Daniela, L., Chui, K.T., Visvizi, A., Lytras, M.D.: On the way to smart education and urban society. In: Knowledge-Intensive Economies and Opportunities for Social, Organizational, and Technological Growth, pp. 1–11. IGI Global (2019) 14. Lezina, O.V., Akhterov, A.V.: Designing of the information component of pedagogical knowledge management system in a chair of technical university. In: 2013 International Confer-ence on Interactive Collaborative Learning (ICL), pp. 544–546. IEEE, September 2013 15. Xu, J., Hou, Q., Niu, C., Wang, Y., Xie, Y.: Process optimization of the university-industryresearch collaborative innovation from the perspective of knowledge management. Cogn. Syst. Res. 52, 995–1003 (2018) 16. Yang, S., Liu, Y., Liang, M.: Teachers’ personal knowledge management tools and application strategies exploration based on the SECI model. In: 2018 International Joint Conference on Information, Media and Engineering (ICIME), pp. 341–346. IEEE, December 2018 17. Volegzhanina, I.S., Chusovlyanova, S.V., Adolf, V.A., Bykadorova, E.S., Belova, E.N.: Knowledge management as an approach to learning and instructing sector university students in post-Soviet professional education. J. Soc. Stud. Educ. Res. 8(2), 39–61 (2017) 18. Uskov, V.L., Bakken, J.P., Karri, S., Uskov, A.V., Heinemann, C., Rachakonda, R.: Smart university: conceptual modeling and systems’ design. In: International Conference on Smart Education and Smart E-Learning, pp. 49–86. Springer, Cham (2017) 19. Cheng, E.C.: Knowledge management strategies for capitalising on school knowledge. VINE J. Inf. Knowl. Manage. Syst. (2017)

Smart Pedagogical Knowledge Management Model

239

20. Mirzaee, S., Ghaffari, A.: Investigating the impact of information systems on knowledge sharing. J. Knowl. Manage. (2018) 21. Farhoush, M., Majedi, P., Behrangi, M.: Application of education management and lesson study in teaching mathematics to students of second grade of public school in district 3 of Tehran. Int. Educ. Stud. 10(2), 104–113 (2017) 22. Kandiah, D.A.: Clinical reasoning and knowledge management in final year medical students: the role of student-led grand rounds. Adv. Med. Educ. Pract. 8, 683 (2017) 23. Hewitt, J.E.: Blended learning for faculty professional development incorporating knowledge management principles (2016) 24. Mohammed, A.A., Hafeez-Baig, A., Gururajan, R.: Talent management as a core source of innovation and social development in higher education [NYP 5/6/2019]. Talent Management as a Core Source of Innovation and Social Development in Higher Education, pp. 1–31 (2018) 25. Correa-Díaz, A.M., Benjumea-Arias, M., Valencia-Arias, A.: Knowledge Management: an alternative to Solve Educational Problems. Rev. Electrón. Educ. 23(2), 1–27 (2019) 26. Ramazanzade, K., Ayati, M., Shokohifard, H., Abedi, F.: Pedagogical knowledge management and its application in medical education: a research synthesis study. Future Med. Educ. J. 9(1), 51–58 (2019) 27. Ngoc-Tan, N., Gregar, A.: Knowledge management and its impacts on organisational performance: an empirical research in public higher education institutions of Vietnam. J. Inf. Knowl. Manage. 18(02), 1950015 (2019) 28. Bisandu, D.B., Datiri, D.D., Onokpasa, E., Mammuam, A.T., Thomas, G.A., Gurumdimma, N.Y., Madugu, J.M.: A framework for the adoption of knowledge management system (KMS) in University of Jos, Nigeria 29. Kaplan, A.: Lifelong learning: conclusions from a literature review. Int. Online J. Primary Educ. (IOJPE) 5(2) (2017). ISSN 1300-915X 30. Chergui, M., Tahiri, A., Chakir, A., Mansouri, H.: Towards a new educational engineering model for Moroccan university based on ICT. Int. J. Eng. Pedagogy (iJEP) 10(3), 49–63 (2020)

The Recommendation of a Practical Guide for Doctoral Students Using Recommendation System Algorithms in the Education Field Oumaima Stitini(B) , Soulaimane Kaloun, and Omar Bencharef Laboratoire d’ingénierie Informatique et Système FSTG Marrakech, Cadi Ayyad University, Marrakech, Morocco [email protected]

Abstract. Recommendation systems provide an approach to facilitate the user’s desire. It is helpful in recommending things from various domains like ecommerce, the service industry, and social networking sites. The most researcher we found is based in the trading domain. However, there is limited information on the impact of recommender systems in other domains like education. Recently recommendation systems have proved to be efficient for the education sector as well. The online recommendation system has become a trend. Nowadays rather than going out and buying items for themselves, online recommendation provides an easier and quicker way to buy items, and transactions are also quick when it is done online. Recommended systems are powerful new technology and it helps users to find items that they want to buy. A recommendation system is broadly used to recommend the most appropriate products to end users. Thus, the objective of this study is to summarize the current knowledge that is available with regard to recommendation systems that have been employed within the education domain to support educational practices. Our results provide some findings regarding how recommendation systems can be used to support main areas in education, what approaches techniques or algorithms recommender systems use, and how they address different issues in the academic world. Keywords: Recommender system (RS) · Collaborative filtering · Content-based filtering · Data mining · Machine learning · Deep learning

1 Introduction Systems that retrieve and filter the data through content and similar profiles are known as recommendation systems (RS) [6, 7, 13–16]. These systems are usually used within the e-commerce domain. For example, some websites (such as Amazon) through the application of RS, allow for offering the user recommendations for products that users may not know and which could be of interest to them. Suggested recommendations help to overcome the distressing search problem for the user [4, 5, 9]. But this technology is © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 240–254, 2021. https://doi.org/10.1007/978-3-030-66840-2_19

The Recommendation of a Practical Guide for Doctoral Students

241

not only used to sell products, but it is also used to suggest videos (YouTube), movies (Netflix), friends (Facebook), among others. This demand spans across several domains, among which is the educational domain. Every day we overwhelm with options and choices. What news or article to read? Which book to read? What thesis topic looks like mine? The answers to all these questions help users to discover and choose resources in information space of this magnitude, hence the need for a system of recommendation in education sector which has the role of supporting the activities of teaching and learning through better information retrieval [17, 18]. There limits information on the application of recommender systems in educational environments [9, 19, 20]. This study aims to summarize the current knowledge that is available concerning the recommendation System that had used to support educational practices. We focus our research on the analysis of our own created dataset, which contains article information got after crawling addresses of all papers on the page “http://aaai.org/ Library/AAAI/aaai-library.php”. Research work on the recommendation system in the educational domain is very important and understudied. This work aims to provide new insights and analysis on recommending information related to the education domain. In the following sections, we discuss data collection, analysis and results, and future work. We organize this paper as follows: Sect. 2 contains the state-of-the art of our work that reviews the relevant approaches to recommend articles to the user. We describe the methodology and overall approach in Sect. 3. Then in Sect. 4, we will elaborate the experimental results of our proposed model. In Sect. 5, we show all problems and challenges. At the end of the work, we conclude all the work in Sect. 6.

2 State of the Art Research work on the recommendation system doesn’t finish yet and still at an early stage and requires a deep search for the right choice of the algorithm to use. This paper proposed a novel approach to recommend an item to the user, each approach emit some problems and analyze the fundamental aspect of generic Recommender systems. We try to mention and analyze the important researches which we find relatable to our work. We categorize them into two subsections: Recommender System Category, and Machine Learning Category: 2.1 Recommender System Category The purpose of this category of papers was to establish what review studies had carried out by other authors previously and give a brief background of the field of recommender systems. Recommendation systems mainly use two filtering methods to provide personalized recommendations to users, namely collaborative filtering and content-based filtering. The first uses the opinions of users similar to the active user. The second uses only the preferences of the current user. We consider collaborative filtering being the most popular and most widespread method in referral systems [24]. Publications reviewed in this category included two sub categories:

242

O. Stitini et al.

Collaborative and Content Based Filtering In their paper [1] Rouzbeh Meymandpour has mentioned three major categories of semantic similarity measurement on linked data approach namely distance-based metrics, feature-based models, and statistical methods as well as hybrid approaches. He has discussed also the limitation of each category, the first limitation regarding the linked data graph is a complex semantic network in which information resources (nodes) are connected by a wide range of semantic relations (edges). Unlike WordNet, Linked Data has a wide range of relations of which ‘is-a’ and ‘part-of’ are two particular types. Therefore, any measure of semantic similarity for Linked Data has to consider its particular characteristics, such as the variety in link types and the direction of the relations. The second limitation is about Distance-based metrics deal only with ‘is-a’ relations, while Linked Data is characterized by many kinds of links, of which the ‘is-a’ relation. Comprehensively review the key decisions in evaluating CF recommender systems. They demonstrate the importance of recommender system evaluation and they claim that accuracy is a major metric. Table 1. Collaborative and content based filtering approach comparison. Methodology

Study

Goal

Method used

Advantages

Limitation

Collaborative and content based filtering

[1]

This paper propose our partitioned information content (PIC)-based semantic similarity measure, called PICSS, which is a combination of feature- and information content-based approaches. PICSS measures the degree of semantic similarity between two resources based on the PIC of their shared and distinctive features

Collaborative filtering, content-based filtering

PICSS combines the advantages of featureand information content-based measures. It enables applications to perform in-depth semantic analysis of entities based on structured data gained from Linked Open Data

1. Linked open data quality: completeness redundancy noise 2. The negative effects of uniqueness 3. Incorporating distant features into the semantic similarity measure

[3]

In this paper, we explored the mathematical regularities in the MAE/RMSE computations and developed efficient algorithms to evaluate them. This work is an integral part of the process for the design and implementation of recommender system algorithms

Collaborative filtering

Improve the Collaborative Filtering algorithm

However, in the big data era, as the size of data becomes ever large, algorithms need to be not Only efficient, but also parallelized. It is mentioned as future work (combine both efficiency and parallelism to develop improved evaluation algorithms for collaborative filtering)

[5]

1. identify attractive books for children to motivate reading 2. motivate young readers to read by offering them appealing books to read so that they can enjoy reading and gradually establish a reading habit during their formative years that can aid in promoting their good reading habits

Content-based filtering method

Motivate the young children to have reading habits

(continued)

The Recommendation of a Practical Guide for Doctoral Students

243

Table 1. (continued) Methodology

Study

Goal

Method used

Advantages

[7]

*The goal is to provide a journal recommendation system to authors who are interested in publishing their articles in the appropriate journal *The proposed system fully depends on input data given by the user and on journal dataset. They based it on content-based filtering technique. The results show that it will help the authors in finding the appropriate journals and fastening their submission process, and further enhance the user’s experience. Assigning the score to the importance of all the keywords is the vital part of the work

Preprocessing Single-value decomposition. Euclidean distance

Make it easier for researchers to choose the journal based on the title, keywords, and abstract entered by the researcher

[13]

Book recommendation system is recommending books to the buyers that suits according to their interest and stores recommendations in the buyer’s web profile. This system will store the details of the books which users have bought earlier and find the category of book from users buying history. It using content based filtering and collaborative filtering and finds out the list of books based on content and ratings

Association rule, content based filtering, collaborative filtering

The major problem faced during the implementation, developing a new website application for book selling and implementing the appropriate recommendation module based on the user’s interest. Also, coordinating and implementing both content based filtering and collaborative filtering together

[16]

In this paper, the proposed recommendation method is used for recommending computer science publications. It is worth mentioning that only abstract of a paper is used to recommend. The method could also be used by other e-library recommender systems. For example, the results of the recommendation could help readers to quickly determine the domain of a paper or to retrieve similar papers

Softmax regression, Chi-Square feature selection

Facilitate for researchers to know the domain of each scientific paper

Limitation

In their paper [3] Feng Zhang has explored the mathematical regularities in the MAE/RMSE computations and developed efficient algorithms to evaluate them. His work is an integral part of the process for the design and implementation of recommender system algorithms. Yiu-Kai Ng [5] has developed content based recommender system (CBRec), a book recommender tailored to children, which simultaneously considers the reading levels

244

O. Stitini et al.

and interests of its users in making personalized suggestions. CBRec adopts the widely used content-based filtering approach and the user-based collaborative filtering approach and integrates the two filtering approaches in predicting ratings on children’s books to make book recommendations [23] (Table 1). In this category, we mention articles that used both collaborative and content-based filtering methods, and we mainly focus on the limitations and benefits of each approach. While all used articles in this section produced significant results showing the best use of recommender system algorithm could give better results on recommendation there are limitations related to collaborative Filtering (CF) recommendation systems which are the most affected by cold start because they generate recommendations based only on odds, and also the Data sparsity problem which is also a very frequent problem in RSs [22]. It represents a recommendation of quality degradation because of the insufficient number of ratings. 2.2 Machine Learning Category The second category comprised papers specific to this study on ontology-based recommenders for education domain, in the second category of papers only journal articles published from 2017 to 2019 considered. There are two subcategories: Supervised Learning Rahman Ali [2] has introduced the concepts of algorithms’ quality meta-metrics (QMM), describing the physical meaning of the evaluation criteria, and developed a classification model with the help of extensive literature to assist experts in the selection of suitable evaluation criteria for comparison of the classifiers. [9] has presented a Library recommender system that offers results to the users after combining results using content-based and collaborative approaches. The range of library resources proposed requirement of classifying and grouping them which look a lot like to the notion of keeping related library books/journals into the shared bookshelf (Table 2).

The Recommendation of a Practical Guide for Doctoral Students

245

Table 2. Supervised learning approach comparison. Methodology

Study

Goal

Method used

Machine learning algorithm

[2]

In this paper, firstly, we introduced the concepts of algorithms quality meta-metrics, describing physical meaning of the evaluation criteria, and developed a classification model with the help of extensive literature to assist experts in the selection of suitable evaluation criteria for comparison of the classifiers. Secondly, we estimated consistent relative weights for the evaluation metrics using the expert group-based decision making using the analytical hierarchy process

Collaborative filtering, content-based filtering

[9]

The purpose of the Classification ontology proposed referral system is machine learning to make archives, such as books, journal articles and articles available to members of the library in an efficient and timely manner

[12]

The predictor helps to increase the placement rates by helping teachers and placement cell in an institution to coach for the students

[10]

In this paper, we propose a K-Nearest neighbor deep learning (KNN) and Collaborative recommendation filtering algorithm framework for E-learning, which is first applied in E-learning area. Through the testing and applications of our framework have not been carried

The advantages of our framework are clear. First, although the training process is time-consuming and complex, it is suitable for a running system to react immediately instead of the conventional recommendation systems. Second, after training finished, it can recommend new items whose similarity is unknown. Meanwhile, we admit large scale data set for KNN is necessary before deploying the deep learning framework

[21]

In this work, we made efforts to explore a model-driven deep learning framework that can balance the flexibility and appositeness in behavior study

The primary advantages of GPS sensor data are: (i) unobtrusive data collection; (ii) large-scale of sample, e.g., millions of drivers; (iii) real-time and continuous dataset

Deep learning algorithm

Advantages

Limitation

The primary purpose of the system is to offer the recommendations to an active user according to his interests

Decision tree classifier machine learning logistic regression

Deep learning

In this category, we concentrate on all articles that use supervised learning algorithms, as mentioned in the table above by using k-nearest neighbor we can get the best result and good recommendation.

246

O. Stitini et al.

Unsupervised Learning

Table 3. Unsupervised learning approach comparison. Methodology

Study

Goal

Method used

Data mining

[11]

This study has been Association rule done implementing behavior tracking on LMS (Learning Management System) for determining student learning status on e-learning course with MOCLog model. The parameter for determining Process are student action, quiz submission, action, and assignment submission

Advantages

Limitation

In this category, we concentrate on all articles that use data mining algorithms, for that a significant advantage of the algorithm is the gradual generation of the refined rules. We take good attributes in discriminating the classes one by one in the iterative process, beginning with the best, but the limitation that exists that we should know the choice of good attributes for classification and then use them progressively, beginning with the best (Table 3). 2.3 Our Approach Goal The goal of the development of our recommendation system is first to make it easier for new doctoral students to search for articles that roughly resemble their thesis subject and to recommend them a first more relevant track of what has already done, the novelties of a service regarding the research subject. The application of the technique is efficient and accurate recommendation is very important for a practical guide system that will provide good recommendations and useful results for doctoral students.

3 Methodology and Overall Approach In this section, we are going to present the methodology used to create models that provided us with the analysis regarding the recommendation system. The purpose is to provide a model that helps users and especially new researchers to have a global overview

The Recommendation of a Practical Guide for Doctoral Students

247

regarding a specific topic. Dataset comprises some articles published in The AAAI Conference on Artificial Intelligence records that have prepared to validate the model. The dataset contains the title, author, link, abstract, introduction, related work, preliminaries, experiments or experimental results or results, the proposed model, evaluation, discussion, conclusion, and future work or conclusion and future research, acknowledgment. The table below shows the description of each column (Table 4). Table 4. Dataset of article content recommendation details. No Title

Description

1

Title

The title of the article

2

Author

The author of the article

3

Link

The article url

4

Abstract

The abstract of the article

5

Introduction

The introduction of the article

6

Related work

The related work of the article

7

Preliminaries

The preliminaries of the article

8

Experiments

All experiments on the article

9

Proposed model

The proposed model in the article

10

Discussion

The article discussion

11

Evaluation

All results found on the article

12

Results

All results elaborated on the article

13

Acknowledgement

The acknowledgment

14

Conclusion and future work Conclusion of what is done on the article and what should be done in the future

15

References

The references on which the article inspires

3.1 Proposed Work Our proposed work involves working steps, which is as Data collection process, clustering approach, and recommendation approach. Data Collection Process Web Crawling and Data Scraping A web crawler is a program/software or programmed script that browses the World Wide Web in a systematic, automated manner. Data scraping, refers to retrieving information from any source (not necessarily the web). Scraping data does not involve the web. Data scraping could refer to extracting information from a local machine, a database, or even if it is from the internet.

248

O. Stitini et al.

In our case, we used the web crawling to store all pdf link in the first phase on our data collection process as mentioned on Fig. 1, and then we scrape data from the pdf files and store it into the final dataset as described on phase 2 and 3 (Fig. 1). The number of datasets developed solely for this purpose has limited. For this reason, we try to collect our own dataset using text mining and web scraping to scrape all article links about the Artificial Intelligence topic did in 2019. The figure above shows the flowchart representation of the data collection process.

Fig. 1. Flow representation data collection.

Clustering Approach We group together all similar items based on similarities, this approach of grouping similar items (in our case each item represents one article) calls the clustering approach. We base the similarity on the maximum cluster size and the relevant value. We choose

The Recommendation of a Practical Guide for Doctoral Students

249

the K-mean clustering algorithm to group the similarities of articles according to the words that form the sentence entered by the user. Recommendation Approach We will build a content-based recommender system based on a natural language processing approach Without the need of user preferences, This approach is very useful because we are dealing with articles that are featured by an abstract, introduction, related work, preliminaries, results, and conclusion those are text data in general. The aim of the study is to make it easier for doctoral students to see a reduced summary of the subject sought for example what is already done by other doctorant student? What are the results obtained? What is future work? 3.2 Methodology Recommending what it already does and what is a future work for a specific title of thesis, methodology is prepared using a web scraper designed with Python to get all article links and then applying text mining algorithm to extract information from pdf articles. We have scrapped all articles published in the AAAI conference about Artificial Intelligence to prepare datasets. This system comprises six stages: Stage 1: In the first stage, we take the input from the user which consists of title or abstract or keywords. The web portal (Fig. 2) creates for this purpose, from where the user can give the input of the article. For summarizing the whole system, the results are evaluated by taking an example of the title “Zero Shot Learning for Code Education: Rubric Sampling with Deep Learning Inference.” and recommending a list of the same articles surrounding the same topic entered by the user (Fig. 3), By clicking on the detail link (Fig. 4) the user is redirected to another page where we can find the recommendation details of the article showing a brief text of each feature (Table 5).

Fig. 2. Web portal of content article recommendation for user.

250

O. Stitini et al.

Fig. 3. List of recommendation articles.

Table 5. The recommendation result. Technology

Problematic

Solution

Result

Future work

Advantages

Machine learning technique

Improve recommendation accuracy

The proposed attentive multi-view learning frameworks enables the model to accurately predict ratings and infer multi-level user profiles

Experimental results show that our model performs better than state-of-the-art methods in both accuracy and explainability

Not specified

A deep explicit attentive multi-view learning model (DEAML) performs better than state-of-the-art methods in both accuracy and explainability

Limitations

Gap Indicators RMSE

Stage 2: Pre-processing is one of the major steps in preparing the raw data we can use such that it for further analysis. We perform removal of punctuations and stop words at this stage as it improves the accuracy of the term matrix. Stage 3: This stage concerns the clustering approach already cited in the methodology and overall approach section, the idea is to classify all similar articles in one cluster. This stage includes the interaction between what the user enters as input and the first step in our python code in order to get all similar articles to what users enter. Stage 4: This stage concerns the recommendation approach, it includes the interaction between which article detail is demanded by the user and the second part of our python code. For this purpose the idea is to recommend to the user a brief text showing what is already done on the study, the results found, and the future work. Stage 5: The system has to recommend all did work and also other points discussed as future work and also the system can recommend similar articles to the given input. For this purpose, there is a need to find the similarity between the input paper details given by the user and article details in the dataset. Stage 6: The purpose of the system is to recommend and predict similar researcher’s papers to the user and synthesis of the article. We give predicting similar articles with a percentage of relevance.

The Recommendation of a Practical Guide for Doctoral Students

251

4 Experimental and Results In this section, we are presenting the results that emerged from our Analysis. We tested our datasets and provide the accuracy for the experiment we conducted. We present also the search string used to elaborate this work and the major journals which contain articles. 4.1 Experimental Results All recommended articles to the user according to their vector distances nearest to the input vector are being plotted on a scatter graph (Fig. 4), especially on the yellow cluster. The graph depicts the 10 used clusters. The system had predicted the similar articles near to the domain given by the user.

Fig. 4. The cluster repartition articles.

As mentioned in the Fig. 4 the yellow circle centered by the star represents what the user enters as input and the other circles surrounding them are the items similar to the first circle cited. The Figure below shows a pie chart that presents all similar articles by relevance to the entered input (Fig. 5).

252

O. Stitini et al.

Fig. 5. The distribution of similar articles by relevance.

4.2 Tools Used To carry out this work, we use the following tools: For the backend part we use Jupyter Notebook and Flask, and for Frontend part we use Visual Studio.

5 Problem and Challenges Cold-start was the most acute problem that found. Collaborative filtering (CF) recommender system is the most affected by cold start as they generate recommendations relying on ratings only. Hybrid Recommender System tries to overcome the lack of ratings by combining CF or other recommendation techniques with association rule mining or other mathematical constructs which extract and use features from items. Table 6. Problems and possible solutions Problems

Possible solutions

References

Cold-start

Use association rule mining on item or user data to find relations [12, 14] which can compensate the lack of ratings. Mathematical constructs for feature extraction and combination of different strategies can also be used

Sparsity

Use the few existing ratings or certain item features to generate extra pseudo ratings. Experiment with Matrix Factorization or Dimensionality Reduction

[1]

Accuracy

Use Fuzzy Logic or Fuzzy Clustering in association with CF. Try putting together CF with CBF using Probabilistic Models, Bayesian Networks or other mathematical constructs

[1–3, 5, 8, 16]

Scalability

Try to compress or reduce the datasets with Clustering or different measures of similarity

[7, 9, 15]

Diversity

Try modifying neighborhood creation by relaxing similarity (possible loss in accuracy) or use the concept of experts for certain item tastes

[5, 8, 16]

The Recommendation of a Practical Guide for Doctoral Students

253

Data sparsity is also a very frequent problem in RSs. It represents a recommendation for quality degradation because of the insufficient number of ratings. Hybrid approaches try to solve it by combining several matrix manipulation techniques with the basic recommendation strategies. They also try to make more use of item features, item reviews, user demographic data or other known user characteristics (Table 6).

6 Conclusion and Future Work In summary, a lot of progress has been made on the recommender system in the education domain to recommend books to children or recommend some specific journal to researchers. At present, the recommender system algorithm has become mainstream for developing systems for example, helping children find books that pertain to their interests and are within their reading level is a task well suited for a web-based recommender as is recommending journal articles available to members of the library in an efficient and timely manner. The proposed system is fully dependent on input data given by the user and on article dataset. It is based on content-based filtering techniques. The results show that it will help the authors in finding all similar articles as what is entered on the input also predicting what is already done and what is the future work in the case when the input entered corresponds to a specific article. Assigning the score to the importance of all the keywords is the vital part of the work. For the future work, other similarity measuring techniques can be applied to get closer results in other domains also, for building recommendation system.

References 1. Meymandpour, R., Davis, J.G.: A Semantic Similarity Measure for Linked Data: An Information Content-Based Approach, pp. 1–29. Elsevier, Amsterdam (2014) 2. Ali, R., Lee, S., Chung, T.C.: Accurate multi-criteria decision making methodology for recommending machine learning algorithm. Expert Syst. Appl. 71, 257–278 (2017) 3. Zhang, F., Gong, T., Lee, V.E., Zhao, G., Rong, C., Qu, G.: Fast algorithms to evaluate collaborative filtering recommender systems. Knowl.-Based Syst. 96, 1–31 (2015) 4. Portugal, I., Alencar, P., Cowan, D.: The use of machine learning algorithms in recommender systems: a systematic review. arXiv, pp. 1–31 (2017) 5. Ng, Y.-K.: Recommending books for children based on the collaborative and content-based filtering approaches, pp. 1–16. Springer (2016) 6. Xiao, J., Wang, M., Jiang, B., Li, J.: A personalized recommendation system with combinational algorithm for online learning, pp. 1–11. Springer (2017) 7. Jain, S., Khangarot, H., Singh, S.: Journal recommendation system using content-based filtering, pp. 1–10. Springer (2019) 8. Motajcsek, T., Le Moine, J.-Y., Larson, M., Kohlsdorf, D., Lommatzsch, A., Tikk, D.: Algorithms aside: recommendation as the lens of life. ACM, pp. 1–5 (2016) 9. Shirude, S.B., Kolhe, S.R.: Classification of library resources in recommender system using machine learning techniques, pp. 1–13. Springer (2018) 10. Wang, X., Zhang, Y., Yu, S., Liu, X., Yuan, Y., Wang, F.-Y.: E-learning recommendation framework based on deep learning, pp. 1–6. IEEE (2017)

254

O. Stitini et al.

11. Aviano, D., Putro, B.L., Nugroho, E.P., Siregar, H.: Behavioral tracking analysis on learning management system with apriori association rules algorithm, pp. 1–6. IEEE (2017) 12. Thangavel, S.K., Bkaratki, P.D., Sankar, A.: Student placement analyzer: a recommendation system using machine learning, pp. 1–5. IEEE (2017) 13. Mathew, P., Kuriakose, B., Hegde, V.: Book recommendation system through content based and collaborative filtering method, pp. 1–6. IEEE (2016) 14. Yildiz, O.: Development of content based book recommendation system using genetic algorithm, pp. 1–4. IEEE (2016) 15. Jannach, D., Jugovac, M., Lerche, L.: Supporting the design of machine learning workflows with a recommendation system, pp. 1–35. ACM (2016) 16. Wang, D., Liang, Y., Xu, D., Feng, X., Guan, R.: A content-based recommender system for computer science publications. Knowl.-Based Syst. 157, 1–24 (2018) 17. Simovi´c, A.: A big data smart library recommender system for an educational institution, pp. 1–27. Library Hi Tech (2018) 18. Chau, H., Barria-Pineda, J., Brusilovsky, P.: Learning content recommender system for instructors of programming courses, pp. 1–6. AI in Education (2018) 19. Wu, L., Liu, Q., Zhou, W., Mao, G., Huang, J., Huang, H.: A semantic web–based recommendation framework of educational resources in E–learning, pp. 1–23. Springer (2018) 20. Wan, S., Niu, Z.H.: An E-learning recommendation approach based on the self-organization of learning resource. Knowl.-Based Syst. 160, 1–9 (2018) 21. Guo, J., Liu, Y., Zhang, L., Wang, Y.: Driving behaviour style study with a hybrid deep learning framework based on GPS data. In: MDPI, pp. 1–16 (2018) 22. Su, X., Khoshgoftaar, T.M.: A survey of collaborative filtering techniques. Advances in Artificial Intelligence, pp. 1–19 (2009) 23. Milton, A., Green, M., Keener, A., Ames, J., Ekstrand, M.D., Pera, M.S.: StoryTime: eliciting preferences from children for book recommendations, pp. 1–5. ACM DL Library (2019) 24. Alonso-Betanzos, A., Troncoso, A., Luaces, O.: Peer assessment in MOOCs using preference learning via matrix factorization. Semantic Scholar, pp. 1–7 (2013)

Virtual Reality–Enhanced Soft and Hard Skills Development Environment for Higher Education Abid Abdelouahab(B) Faculty of Computer and Information Systems, Islamic University of Madinah, Medina, Saudi Arabia [email protected]

Abstract. Indeed, the 21st -century dynamic and competitive employment and labor market require Higher Education (HE) graduates to be equipped with a substantial degree of knowledge and skills. Comparable to knowledge, skills can be domain-specific or domain-general. Both types of skills are crucial for a wellcrafted HE program graduate. To provide students with key skills and the chance to test, experience, and integrate them, institutions should not merely rely on short work placements and industrial training programs. Students should be given the opportunity to sharpen their competencies, acquire, and develop the right skills, technical and non-technical, throughout their study life. Virtual reality, as a revolutionary technology, can support achieving that. In this paper, a web-based environment enhanced with VR framework is proposed to enable students to increase their knowledge, experiences, and gain employability skills progressively for the duration of their study journey. The environment, accessible starting from the first year, provides the students with the opportunity to be engaged with a professional setting and immersed in interactive experiences to gradually, effectively, and affordably experiment concepts, enhance their skills and develop new ones. Keywords: Virtual reality · Learning system · Soft skills · Hard skills

1 Introduction Further Education (FE) or Higher Education (HE) is post-secondary third-level education presenting the concluding stage of formal learning. Tertiary education is made accessible, by most governments, to all, since the last century, though higher education institutions had appeared centuries ago. Some literature states that in 859, what was founded as a madrasah (learning and studying place) and named later Al-Karaouine University, in Morocco, is the oldest degree-awarding university. Since then, universities and higher education went through tremendous transformations and development, influenced by many factors, such as international regulations, demographic shifts, and technological advancements. Indeed, technology has a significant impact on education in many aspects. Technology has dramatically expanded access to educational content. Currently, massive volumes of data, information, and multimedia content are available at the learners’ fingertips via the Internet. Technology also has affected teaching and learning styles and © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 255–267, 2021. https://doi.org/10.1007/978-3-030-66840-2_20

256

A. Abdelouahab

instructional strategies. Due to the access to information and educational content that the Internet has enabled, the teachers’ role is shifting to a supportive and guiding role as learners take more responsibility for their learning and knowledge assimilation. Engaged learning, interactive learning, and flipped classroom are among many terminologies of the 21st -century educational ecosystem. One central element of the current educational ecosystem is the Learning Management System (LMS) [1]. Learning Management System (LMS) is a digital platform that supports teaching and learning activities. LMS can be free or commercial, and it can be academic or corporate. In all cases, an LMS will provide tools, functions, and resources to manage participants and learning content. From the institutions’ and teachers’ perspective, an LMS can be regarded as an automation tool of the administration related activities such as grading, reporting, and analytics; and academic-related matters as content delivery and assessments. For the students, as digital natives, an LMS is the campus, the library, the classroom, and a space where they interact with teachers, peers, and the learning content. The Learning Management System market is growing very fast due to online learning social acceptance and government regulations; and the provided interactivity, flexibility, and rich content. The LMS functions and services keep increasing and becoming more sophisticated, redesigning learning and enabling new educational models. All for the same objective that is to deliver knowledge and build competencies and skills, most effectively and efficiently [2–6], and [7]. Nevertheless, the students’ gained skills, and what they learn at the HE institutions will not necessarily meet the employers’ requirements is becoming a prevailing concern. This skill shortage and misalignment are especially evident in the Information and Communication Technology (ICT) fields. These days, it is not unusual to meet a computer-science graduate with the appropriate theoretical knowledge but incompetent to code or design and integrate systems (hard skills). Without neglecting the graduates’ inability to readily and seamlessly integrate, meaningfully engage, and successfully navigate their working environment (soft skills). Knowing that experience is the best teacher, skill-based learning will be the right approach to complement traditional knowledge-oriented teaching. Skill-based learning aims to provide the students with suitable skills through experiment and the application of the concepts and theoretical knowledge, which will develop understanding, strengthen learning, and help to retain complex concepts. The integration of the connection, collaboration, and interactive capabilities of extended reality technology in eLearning environment will help to bridge the skills gap. The paper presents a virtual reality enhanced learning environment that provides the student with a corresponding real working setting from the first-year till graduation. The environment allows the students to be exposed to near to real-life professional scenarios and cases to grow and develop their knowledge and skills during the course of their study. The rest of this paper is organized as follows: after the introduction section, Section two is about extended reality, followed by a Section on industrial training and short industrial placements, then a section on skills and skills categories. Section five introduces and describes the learning environment. Finally, a conclusion is drawn in the last section.

Virtual Reality–Enhanced Soft and Hard Skills Development Environment

257

2 Extended Reality Technology Extended Reality Technology is the set of technologies that can fuse both real and virtual worlds, forming immersive experiences for learning, training, or entertainment. Extended reality (XR) is a relatively new term though that its early roots go to the twenties of the twentieth century. XR encapsulates Virtual Reality (VR), Augmented Reality (AR), Mixed Reality (MR), and everything that uses technology to supplement and expand reality. Virtuality levels can range from limited sensory inputs to immersive virtuality; therefore, the first flight training simulator presented in 1929 can be considered an early form of XR [8]. The same thing can be said about “Sensorama” the multi-sensorial simulator that exhibited colorful 3D display in addition to the use of sound, vibration, wind, and scent emitter to senses stimulations [9]. Other literature, base on first concepts, may take the roots of extended reality to the 1800s when practical photography started and when the first stereoscope was invented to project a single image using twin mirrors in the 1960s [10]. Regardless of when the XR was conceived, XR is now a reality and a prevailing one. It is clear that XR in one of its forms, either MR, AR, or VR, has found its way to many applications around us. XR can be found in entertainment and gaming, healthcare, real estate, marketing, and training and education. With technological advancements, many new applications related to Extended Reality will be discovered, especially with affordability, as one of XR’s main issues is being solved progressively. Consequently, in the near future, it is projected to see global growth in the use of XR in education and training. At Present, VR is being used in many fields of training and education as science, engineering, and arts [11–18], and [19].

3 Industrial Training Industrial Training or Short Industrial Placements is a supervised fixed period training at an organization or working site. It is an integral part of most post-secondary education programs. Its main objective is to provide students, towards the end of their study and before graduation, with sufficient practical knowledge and skills [20, 21]. Through industrial training, higher education institution seeks to lessen to the gap between theoretical education and real-life working setups [22]. 3.1 General Industrial Training Objectives HE institutions prepare and provide general references and guidelines on industrial training and short industrial placements practices. The guidelines usually start with the program’s description, objectives, and learning outcomes. Industrial Training programs typically have the following objectives: • Giving students the opportunity to practice the knowledge and skills in a real-life work situation. • Providing students with hands-on learning from experts in the field of study. • Exposing students to a real-life work environment, standard practices, employment opportunities, and work ethics in their relevant field. • Enhancing students’ employability skills and professional network.

258

A. Abdelouahab

3.2 Learning Outcomes By the end of Industrial Training or Short Industrial Placements, students should be able to: • Apply their gained knowledge and skills relevant to their field of study • Relate the theoretical knowledge and skills acquired in the professional environment. 3.3 General Roles and Responsibilities For a successful implementation of the Industrial Training or Short Industrial Placements programs and regardless of the financial support, institutions, students, and industries have roles and responsibilities that complement each other before, during, after the training. Whilst the institute provides the systems, the guidelines, and the framework, in addition to the coordination of all related processes and procedures, the followings are the main responsibilities of the students and the organization. Students Before Industrial Training begins • Obtain Industrial Training placements through any of the available channels • Register and confirm the registration for the Industrial Training program • Attend any related briefings and be aware of the associated requirements and guidelines. During the Industrial Training • Complete your Log Book/Attendance • Carry out your Industrial Training ethically and professionally, uphold the reputation • Inform your university supervisor of any problems or issues arising concerning the Industrial Training experience. After the Industrial Training • Submit the feedback through an electronic system or paper form • Submit the Industrial Training final report, logbook, and any other documents if any.

Organization • Assign a field supervisor for the students. • Position the student in the appropriate unit, department, or site. • Provide the student with suitable and adequate opportunities to gain and experiment with relevant knowledge and skills. • Monitor the student’s progress and provide appropriate guidance, job training, needed support, and constructive feedback. • Evaluate the student performance and submit the related information within a specified deadline.

Virtual Reality–Enhanced Soft and Hard Skills Development Environment

259

3.4 Insights Regarding Industrial Training and Short Industrial Placements The results of completing industrial training or short industrial placements show that the students’ performance is very acceptable in the general work ethics such as appearance, attitude, character, attendance, teamwork, and relations with coworkers and colleagues. The results also show clear indications of inadequacy on technical (job-related) skills, e.g., the ability to recognize tasks related problems, efficiency in completing the tasks, in addition to the inadequacy of knowledge on current developments related to the field of study and work. With the abovementioned overall findings, the implementation of the industrial training and short industrial placements programs is faced by a set of challenges [23] and constraints. The challenges are mostly related to administrative, resources, and funds issues and adequate industrial facilities availability. For the program to achieve its targeted objectives and outcomes, In [23], a set of strategies and remedies are suggested to deal with the challenges. The remedies are not technology-related solutions, and they are also associated with administrative and resource matters and accessibility of appropriate industrial facilities.

4 Skills Categories To enable individuals to contribute to the economic growth, to create productive workforces, and to prepare youth to be added value to their societies are some of the primary purposes of education through knowledge dissemination and skills development [24, 25]. To be effective, regardless of the targeted sector, the needed set of skills is usually categorized into three main groups, namely: human (soft skills), technical (hard skills), and conceptual skills. 4.1 Human (Soft Skills) Referred to as noncognitive, soft skills are those skills that feature practical functions independent of acquired knowledge. Soft skills are general in nature, and they focus on the individual attitude and intuition; examples include emotional intelligence, communication skills, critical thinking, and a positive attitude [26, 27]. Formally, the term is used first in a US Army training manual [28] at the beginning of the seventies. Soft Skills Development and Measurement: The development of soft skills is a difficult task compared to other skills, as it involves continuous and monitored active interaction with other individuals. A clear targeted set of soft skills should be identified, based on a field and working environment analysis, to be able to design a plan to develop the desired soft skills. The plan should be accommodating and can be tailored to various situations and individuals. Soft skills are usually assessed through self-assessment or self-reporting tools, in addition to 360-degree feedback methods to draw a holistic picture input from different interaction, communication, and collaboration angles. Welldesigned engagement surveys are used to assess student engagement, student morale, and performance. Rubrics are handy tools to interpret and score students’ progress against criteria, standards, and requirements [29].

260

A. Abdelouahab

4.2 Technical (Hard Skills) Referred to as cognitive skills, it includes technical competencies or administrative abilities. They are primarily related to the field of education or training. Formerly, they were regarded by employers as the only skills required to be qualified and be successful in a profession. Hard skills are relatively simple to teach and to learn through traditional or technology-enhanced methods. Technical Skills Development, and Measurement: Hard skills can be discovered and learned in classrooms or labs. Many of the hard skills are related to many different jobs and professions. They are typically developed through education, training, and practice. To develop any technical ability, after acquiring the necessary related theoretical knowledge, a practice plan should be set and adhered to. The progress should be evaluated and analyzed regularly, and feedback (expert) should be collected. Technical skills are naturally easy to measure and to prove. Hard skills can be measured through a variety of examination styles, e.g., problem or case-based exams, scenarios, trial runs, and test the specifics.

4.3 Conceptual Skills Conceptual skills are needed and found at the executive level of an organization. It is the “big picture thinking” skills. Having these skills will qualify the employee to understand the working environment, as well as the associations and relations among the organization’s components, and how the organization’s processes integrate with each other to perform a task, deliver a service, or create a product. Conceptual skills give the ability to deal with abstract ideas critically, reason creatively, and logically foresees problems [30]. Basically, with conceptual skills, an employee will be able to have a comprehensive view of the organization and how it functions. Conceptual Skills Development, and Measurement: Conceptual skills can be developed or improved by obtaining additional academic education or professional training, which will Consequently, help to expand one’s perspective, giving him/her a broader worldview; furthermore, it cultivates various ways of thinking and dealing with different situations. Moreover, this type of skill can be developed by constructing solutions to problems that are not necessarily related to the students’ field of study. Observing how supervisors, managers, and leaders approach daunting tasks, deal with issues, and make decisions, can give similar results. Learning from mentors and reviewing ideas with peers and experts in the field of study is an excellent way to develop conceptual skills. Following up with the field trends and news and reflecting on how these trends influence the student’s line of work and what are the possible adaptation plans can be very useful in the process of developing these “big picture thinking” skills. Conceptual skills may be measured through Conceptual Reasoning tests, or Diagrammatic and Inductive Reasoning targeted tests.

Virtual Reality–Enhanced Soft and Hard Skills Development Environment

261

5 Skills Development Environment With reference to the abovementioned industrial training and short industrial placements general description, all objectives and learning outcomes can be attained; through a comprehensive eLearning knowledge and skills development environment, as in the structure shown in Fig. 1, which could be made available to students starting from the first year. The environment is accessible through web browsers and VR equipment (VR headset) for specific VR content. If accessed via web browsers, the student will be accessing a portal that mimics a corporate portal to develop all related skills like

Fig. 1. VR - skills development environment structure.

Fig. 2. VR - skills development environment conceptual framework.

262

A. Abdelouahab

business communication and writing skills. In Fig. 1 the shaded components are for future development and research. A general conceptual framework is depicted in Fig. 2, where the learner prior knowledge and abilities are linked to the learning environment aspects and other factors that influence the content development, delivery medium, and the training sequence; and the assessment strategy. If accessed through a VR or 3D display, the student will be immersed in a focused simulation or scenario to develop the appropriate hard skills. Figure 3, and Fig. 4 respectively show the login page and student card (skills development progress dashboard).

Fig. 3. Login page (via VR headset).

Fig. 4. Student card (progress dashboard).

Virtual Reality–Enhanced Soft and Hard Skills Development Environment

263

After the login, the student is prompted to select soft skills development or hard skills development to continue (see Fig. 5 and Fig. 8).

Fig. 5. Soft skills and hard skills access screen.

The institute should develop a skills map for each program. An example is shown in Fig. 6. The student is to select a skill to access the related content.

Fig. 6. Soft skills map and student’s performance indicators.

264

A. Abdelouahab

The type and characteristics of the content will indicate a suitable level of interactivity and whether the training content should be web content (textual, visual, or aural) or virtual reality content. Figure 7 show an example of Business communication (Soft skills) training content.

Fig. 7. Business communication skills development access screen.

Fig. 8. Hard skills selection via VR headset and VR controller.

Similarly to the soft skills map, a hard skills map should be developed for each program. An example is shown in Fig. 9. The student is to select an available hard skill to access the related content. The availability of a given skill is indicated with a green pin, and it is based on the level of student and taken prerequisites and corequisites.

Virtual Reality–Enhanced Soft and Hard Skills Development Environment

265

Fig. 9. Hard skills map and student’s performance indicators.

Fig. 10. Internal combustion engine VR simulation training content (access point).

Figure 10 shows some of the available hard skills for mechanical engineering, through the VR content. The VR content, depending on the learning requirements and available resources, can be for a simple exploration or full interaction experience. Despite the fact that VR content gives the feeling of immersion in a simulated world near to reality, no all training content must be in the VR environment. In many cases, just a well-designed guided and supervised collaboration or communication will be the best way to develop the targeted skill, which is achieved through the eLearning web portal in the proposed skills development environment.

266

A. Abdelouahab

6 Conclusion It is evident that 21st -century key drivers of change will affect all aspects of life, especially education. Education must cope with continually fast-changing societies, economies, and working environments. Therefore, teaching and learning methods must be reshaped, programs and curriculums must be redesigned, and technology must be adopted wisely and efficiently. This paper proposed a web-based learning environment that makes use of virtual reality technology features, benefits, and functionality. For the learning environment to become successful and remain effective, industrial training should start as early as the first academic year, for early exposure to the working environment. Industrial training should be considered as study long program and should incorporate the development of all skills categories, particularly hard and soft skills. Scenarios, cases, tasks, and activities will be continuously injected into the learning environment, targeting a specific skill or set of skills. Feedback will be collected, analyzed, and acted upon to improve and enrich the environment. When the learning environment is mature enough, further research can be conducted to incorporate new technologies like Artificial Intelligence (AI), Natural Language Processing, or the Internet of Things. Research can also be related to the academic aspects, as Collaborative Distance Learning Environments, flipped Classrooms, and Active Learning Classrooms.

References 1. Sabharwal, R., Hossain, M.R., Chugh, R., Wells, M.: Learning management systems in the workplace: a literature review. In: 2018 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE), pp. 387–393 (2018) 2. Pankaja, N.: A comparative study of popular online platforms for E learning solutions. Ph.D. dissertation, Educational Media Research Center, University of Mysore, Mysore (2015) 3. Derouin, R.E., Fritzsche, B.A., Salas, E.: E-learning in organizations. J. Manag. 31(6), 920– 940 (2005) 4. Kimiloglu, H., Ozturan, M., Kutlu, B.: Perceptions about and attitude toward the usage of e-learning in corporate training. Comput. Hum. Behav. 72, 339–349 (2017) 5. Zhang, B., Yin, C., David, B., Xiong, Z., Niu, W.: Facilitating professional; work-based learning with context-aware mobile system. Sci. Comput. Program. 129, 3–19 (2016) 6. De Smet, C., Schellens, T., De Wever, B., Brandt-Pomares, P., Valcke, M.: The design and implementation of learning paths in a learning management system. Interact. Learn. Environ. 24(6), 1076–1096 (2016) 7. Saidin, S.S., Iskandar, Y.H.P.: Proposed model to evaluate the impact of e-training on work performance among IT employees in Malaysia. In: IEEE Conference on E-Learning, EManagement and E-Services, Langkawi, pp. 17–22 (2016) 8. Page, R.L.: Brief history of flight simulation. In: SimTecT 2000 Proceedings, pp. 11–17. Academic Press (2000) 9. Pelargos, P.E., Nagasawa, D.T., Lagman, C., Tenn, S., Demos, J.V., Lee, S.J., Bari, A.: Utilizing virtual and augmented reality for educational and clinical enhancements in neurosurgery. J. Clin. Neurosci. 35, 1–4 (2017) 10. Plunkett, J.: ‘Feeling Seeing’: touch, vision and the stereoscope. Hist. Photogr. 37(4), 389–396 (2013)

Virtual Reality–Enhanced Soft and Hard Skills Development Environment

267

11. Kami´nska, D., Sapi´nski, T., Wiak, S., Tikk, T., Haamer, R.E., Avots, E., Helmi, A., Ozcinar, C., Anbarjafari, G.: Virtual reality and its applications in education: survey. Information 10, 318 (2019) 12. Stankovi’c, S.: Virtual reality and virtual environments in 10 lectures. Synth. Lect. Image Video Multimed. Process. 8(3), 1–197 (2015) 13. Sternig, C., Spitzer, M., Ebner, M.: Learning in a virtual environment: implementation and evaluation of a VR math-game. In: Mobile Technologies and Augmented Reality in Open Education (2017) 14. Pasaréti, O., Hajdin, H., Matusaka, T., Jambori, A., Molnar, I., Tucsányi-Szabó, M.: Augmented reality in education. In: INFODIDACT 2011 Informatika Szakmódszertani Konferencia (2011) 15. Leitão, R., Rodrigues, J.M.F., Marcos, A.F.: Game-based learning: augmented reality in the teaching of geometric solids. Int. J. Art Cult. Des. Technol. (IJACDT) 4(1), 63–75 (2014) 16. Panciroli, C., Macauda, A., Russo, V.: Educating about art by augmented reality: new didactic mediation perspectives at school and in museums. In: Proceedings, vol. 1, no. 10, p. 1107 (2018) 17. Kavanagh, S., Luxton-Reilly, A., Wuensche, B., Plimmer, B.: A systematic review of virtual reality in education. Themes Sci. Technol. Educ. 10(2), 85–119 (2017) 18. Jensen, C.G.: Collaboration and dialogue in virtual reality. J. Probl. Based Learn. High. Educ. 5(1), 85–110 (2017) 19. Kommetter, C., Ebner, M.: A pedagogical framework for mixed reality in classrooms based on a literature review. In: EdMedia + Innovate Learning, pp. 901–911, Netherlands (2019) 20. Callanan, G., Benzing, C.: Assessing the role of internship in the career-oriented employment of graduating college students. Emerald 46(2), 82–89 (2004) 21. Tanius, E.: Business’ students industrial training: performance and employment opportunity. IJSR 5(5), 1–5 (2015) 22. Yusof, N.A., et al.: Improving graduates’ employability skills through industrial training: suggestions from employers. J. Educ. Pract. 4(4), 23–29 (2013) 23. Ogbuanya, T.C., Njoku, C.A., Kemi, P.O., Ogunkelu, M.O.: Evaluating the effectiveness of students industrial work experience scheme (SIWES) programme to ensure quality of technical, vocational education and training in technical colleges in Lagos State. Int. J. Vocat. Tech. Educ. 10(7), 61–69 (2018) 24. OECD Skills Outlook: Thriving in a Digital World, p. 2019. OECD Publishing, Paris (2019) 25. Shawcross, J.K.: Manufacturing excellent engineers. In: Department of Engineering. University of Cambridge (2018) 26. Mitchell, G.W., Skinner, L.B., White, B.J.: Essential soft skills for success in the twenty-first century workforce as perceived by business educators. Delta Pi Epsil. J. 52(1), 43–53 (2010) 27. Schulz, B.: The importance of soft skills: education beyond academic knowledge. Nawa J. Lang. Commun. 2, 146–155 (2008) 28. Newman, K.S.: Chutes and Ladders: Navigating the Low-Wage Labor Market. Harvard University Press, Cambridge (2006) 29. Valenzuela, V.: The exploration of employers’, educators’, and students’ perceptions regarding the influence of soft skills for transitioning into the workforce. University of La Verne, ProQuest Dissertations Publishing (2020) 30. Luck Jr., G.E.: Conceptual leadership skills for the twenty-first century, a means of dealing with complexity, ambiguity, uncertainty, and speed. US Army Command and General Staff College, Fort Leavenworth (1998)

A Comparative Study Between K-Nearest Neighbors and K-Means Clustering Techniques of Collaborative Filtering in e-Learning Environment Rajae Zriaa1(B) and Said Amali2 1 Department of Mathematics and Computer Science, Informatics and Applications Laboratory,

Faculty of Science, Moulay Ismail University of Meknes, Meknes, Morocco [email protected] 2 Department of Mathematics and Computer Science, Informatics and Applications Laboratory, FSJES, Moulay Ismail University of Meknes, Meknes, Morocco [email protected]

Abstract. Data mining is an important phase for obtaining useful information from a data warehouse, through the use of a set of processes that generally rely on certain techniques and algorithms. Data mining is mainly used in recommendation systems based on collaborative filtering, with the aim of extracting useful and specific knowledge about the users of the system in order to recommend personalized items to them. Several data mining techniques have been considered in literature, namely: classification, clustering, regression, association, etc. Each one is based on a set of algorithms, e.g. classification with K-Nearest Neighbors, decision tree, naive bayes, neural networks, etc. and clustering with k-means. Our work is to make a comparative study between k-NN and k-means clustering in order to identify the most efficient algorithm in terms of prediction in e-learning recommender system. Mean Absolute Error (MAE) is the most widely used technique to measure the effectiveness of algorithm performance in terms of accuracy; The lower the MAE value, the more accuracy of prediction model. Keywords: Data mining · Recommendation system · Collaborative filtering · K-Nearest Neighbors · K-means clustering

1 Introduction With the emergence of e-learning systems, the number of digital resources is increasing rapidly over time, making the process of filtering information increasingly complex [1]. Recommendation systems are promising new technologies in e-learning environments, as they overcome the problem of information overload through data mining techniques, and they also facilitate access to the information desired by the user through the suggestion of learning objects personalized according to user profiles [2–4]. The user profile contains all the information that describes how the learner prefers to learn according to the learning objects for example (lesson, exercise, practical work, simulation…). © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 268–282, 2021. https://doi.org/10.1007/978-3-030-66840-2_21

A Comparative Study Between K-Nearest Neighbors

269

K- Nearest Neighbors is a classification technique, it uses all user data in the classification, which causes a high level of complexity in terms of calculation. However, the K-means is a clustering technique that allows to create groups of similar users and it takes all cluster centers as a new sample of users, through which future calculations are made, in order to reduce the level of computational complexity. In this respect, we propose a comparative study between the two algorithms K-NN and K-Means in order to deduce the most efficient algorithm in terms of accuracy in the context of recommendation systems based on collaborative filtering. The rest of this paper is organized as follows: Sect. 2 highlights the main approaches considered in recommendation systems. Section 3 describes the datamining techniques used in recommender systems. The results and evaluation of our research are presented in Sect. 4. In the final section, we conclude the paper with a summary of the work, its limitations and future works.

2 Literature Survey 2.1 Recommendation System Recommendation system is defined as a strategy that assists users in making decisions in complex and evolving information spaces [5, 6]. The recommendation system suggests to the user items, that he can assess according to his profile and the targeted domain. In addition, recommendation systems address the problem of information overload and lack of domain knowledge, which users normally encounter, by providing them with personalized, exclusive content and service recommendations. Recommendation systems are classified into main approaches (See Fig. 1) such as:

Fig. 1. The extended classification (Burke 2007; Rao and Talwar 2008)

Content-Based Filtering (CBF): allows proposing similar learning objects to those previously appreciated by the learner [7, 8] (see Fig. 2). Collaborative Filtering (CF): is the most common and effective technique in recommendation systems, which allows comparing the learner’s ratings with those of other

270

R. Zriaa and S. Amali

Fig. 2. Content-based filtering

learners, in order to find learners who are “most similar”, based on a similarity criterion and to recommend learning objects that similar learners have previously liked [9, 10] (See Fig. 3).

Fig. 3. Collaborative filtering

Hybrid Filtering (HF): allows to combine the two approaches mentioned above ( See Fig. 4), in order to reward the weaknesses linked to each technique and to make recommendations more efficient and accurate; Several hybridization methods have been

A Comparative Study Between K-Nearest Neighbors

271

proposed by (Burke 2002) such as: weighted, switched, mixed, combined, cascaded, etc. [11].

Fig. 4. Hybrid filtering

2.2 Data Mining Data mining is a process based on the exploration and exploitation of learning objects visited by learners from a data warehouse. The extracted learning objects can be analyzed to predict decision-making processes. Data mining is an important phase for obtaining useful information from raw data through dataset pre-processing phase; hence the data set has gone through the following steps: Dataset cleaning, features encoding and normalization [12]. Several data mining techniques have been considered in the literature, such as: classification, clustering, regression, association, etc. Each one is based on a set of algorithms [13], which are divided into three main categories, namely: Memory-Based Algorithms [Breese et al. 1998]: used to predict users’ rating and use average ratings to predict items for other users; the study of the similarity between users is done through Pearson correlation and vector similarity. Several algorithms have been considered including: K-Nearest Neighbors, etc. Model-Based Algorithms [Sarwar et al. 2000]: works on providing recommendations by learning a pre-defined model, and using the information provided implicitly or explicitly in the system, this model can be made from data mining and machine learning techniques, in addition, the model-based methods don’t suffer from the drawbacks of memory-based method, but it suffers with the problem like decreasing the performance accuracy. Many algorithms have been envisaged as: K-means clustering, etc.

272

R. Zriaa and S. Amali

Knowledge-Based Systems [Burke 2001]: used the user knowledge and products in order to match both implicit and explicit models; they use certain methods such as: Decision Trees and Case-Based Reasoning (CBR), etc. (Table 1). Table 1. Comparison between memory-based, model-based algorithms and knowledge based systems. Algorithms

Characteristics

Memory-based [14]

- Simple in implementation - Operated in online mode - Low scalability - More sensitive to sparsity and cold-start problems - Difficult to find underlying characteristics in the data.

Model-based [14]

- Complex in implementation - Used in offline mode - Higher scalability - Ability to find underlying characteristics in the data - Faster in prediction time - Poor sensitive to sparsity and cold-start problems

Knowledge-based systems [15] - Used to solve problems that don’t have a traditional algorithmic solution - Implement human heuristic reasoning based on specific techniques, procedures and mechanisms such as: Case based reasoning, Knowledge based, neural networks, genetic algorithm, etc.

2.3 Classification Algorithm K-Nearest-Neighborhood (K-NN) [16–18] is a memory-based algorithm that uses a classification approach, which requires the computation of similarity between users or items. Several techniques envisaged in the literature for measure of similarity as: Pearson correlation, cosine similarity, adjusted cosine similarity, Jaccard similarity [19]. Based on the analyses that have already been done, cosine similarity offers a higher accuracy compared to Pearson correlation [20]. (See Fig. 5) (Table 2). 2.4 Clustering Algorithm Clustering is a type of unsupervised learning, which aims to partition the data set into different subsets called clusters and it is considered one of the data analysis methods, that are widely used in data mining [21]. According to (FRALEY, Chris et RAFTERY, Adrian E,2002), clustering approaches have been divided into two broad categories: hierarchical and partitioning techniques. In addition, (HAN, Jiawei, PEI, Jian, et KAMBER, Micheline, 2011) proposed three sub-categories of clustering such as: density-based

A Comparative Study Between K-Nearest Neighbors

273

Fig. 5. A survey of accuracy evaluation metrics of recommendation.

Table 2. Pseudo code for K- Nearest Neighbors algorithm

Input: k L= {l1,l2,l3,……..Ln)

// Number of desired neighbors // Set of learners

Output: list of k nearest neighbors Process: //Neighborhood generation For (i,j) N do //Calculating the similarity Compute the distance d(li,lj) End for //Selecting the nearest neighbors 1-Sort the |d| by increasing order 2-Select the k nearest neighbors Return a list of k the k nearest neighbors

methods, model-based methods and grid-based methods. Figure 6 illustrates the clustering taxonomy of the approach (DUBEY, Ankita and CHOUBEY, Asso Prof Dr Abha, 2017). In addition, the K-means algorithm is well known for its efficiency and power in clustering a large dataset. Moreover, it is considered one of the most popular clustering algorithms for unsupervised learning. Shi Na et al. [22] focuses on an analysis to identify the shortcomings of k-means clustering, especially when calculating the distance to each data object and all cluster centers in each iteration. The procedure of k-means algorithm is illustrated in Table 3.

274

R. Zriaa and S. Amali

Fig. 6. Taxonomy of clustering approaches

Table 3. Pseudo code for K-means clustering algorithm

Input: k L= {l1,l2,l3,……..Ln)

// Number of desired clusters // Set of learners

Output: a set of k clusters Process: Arbitrarily select k learners form L as the initial cluster centers; Repeat: 1- (re) assign each learners li to the clusters which have the closest similar interest, through the mean value of the learners in the cluster; 2- Update the cluster means; calculate new mean value of learners for each cluster; Until no change;

3 Proposed Approach In this work, we proposed an approach based on data mining techniques (see Fig. 7), while exploring log file data, which are used to record the traces of learners during their interactions with the e-learning system. The log file data and explicit learner ratings will be used as inputs to the K-means clustering algorithm, thanks to its proven performance over K-Nearest Neighbors, in order to improve the accuracy and diversity of the recommended learning objects and to perfectly match the preferences of each learner.

A Comparative Study Between K-Nearest Neighbors

275

The assignment of learners in clusters is based on a hybridization between implicit preferences deduced from the log file and explicit preferences extracted from the learner’s profile. The similarity between learners is calculated through the similarity measure “Euclidean distance”, In order to build virtual communities of similar interest, each learner is represented by a vector with the following elements: Learner_id {learning styles, prerequisites, expertise level, performance} In addition, the learning objects visited by the closest neighbors of the same cluster, will be taken into account, when starting the recommendation process.

Fig. 7. E-learning recommender model

4 Compare K-NN and K-means Algorithms This section presents the performance analysis between K-NN and K-means algorithms. The participants in this experience are 100 students from a high school in the delegation of Chefchaouen, Morocco. However, students were required to study four modules in the computer science subject, namely: ‘Generally on computer systems’, ‘Software’, ‘Algorithms and programming’ and ‘Networks and Internet’; Each module consists of a

276

R. Zriaa and S. Amali

set of lessons, which are well defined in the pedagogical orientations of computer science in high school. Our first experiment is based on the first module ‘Generally on computer systems’, which contains three lessons: lesson 1 ‘basic definitions and vocabulary’, lesson 2 ‘basic structure of a computer’ and lesson 3 ‘software and computer application areas’ (see Table 4). Table 4. Module N°1: Generalities on computer systems Content

Common core Schedule

Letter & Arts

Original

Science

Technologies

Definition of information

2

2

2

2

Definition of treatment

2

2

2

2

Definition of computer science

2

2

2

2

Definition of the computer system

2

2

2

2

2

2

3

3

Definition and basic vocabulary

Basic structure of a computer

2h

4h

Functional diagram of a computer Peripherals

2

2

3

3

Central processing unit

2

2

3

3

Types of software

1h

Basic software

2

2

2

2

Application software

2

2

2

2

2

2

2

2

Fields of application

1h

The following table expresses the degree of depth of the concepts for each concept (Table 5). Table 5. The degree of depth Degree of depth Descriptor 1

Initiation

2

Appropriation

3

Master

A Comparative Study Between K-Nearest Neighbors

277

Several versions of learning objects have been stored in a database and proposed in order to provide a suitable learning environment. The performance of this work is evaluated and compared to memory-based algorithms such as k-Nearest Neighbor (kNN) and model-based algorithms namely K-means clustering, which requires the calculation of similarity between set of learners [23, 24]. The experiments are performed on an HP computer with CORE i5 processors. The Mean Absolute Error (MAE) “(1)”is a most widely technique used to measure the efficiency and performance of the two algorithms mentioned above.; the smaller MAE value, the more accurate of the prediction.  (u,i)∈test |predictionu,i − realu,i | (1) MAE = ntest Where: n is the total number of ratings-prediction pairs in the test set, predictionu,i is the predicted rating for learner u on learning object j, and realu,i is the actual rating in the real dataset. The Sum Squared Error (SSE) is used to find an appropriate k by plotting the number of clusters against the SSE, while evaluating SSE for different values of k. n  2 SSE = yi − yˆ i (2) i=1

Where: i is test set, yi is predicted value and yˆ i is actual value. In the first experience, we carry out an analysis to find the optimal k value according to the experimentations data for both algorithms K-NN and K-means clustering.

K- Nearest Neighbors 3 2.5

MAE

2 1.5 1 0.5 0 2

3

4

5

6

7

8

9

Value of K Fig. 8. The performance of Nearest Neighbors under different K

10

278

R. Zriaa and S. Amali

According to the graph in Fig. 8, we can see that 3-NN is more efficient than 2-NN and 4-NN.

K-means clustering Sum Squared Error(SSE)

70 60 50 40 30 20 10 0 2

3

4

5

6

7

8

9

10

Value of K Fig. 9. The performance of clustering under different K

Table 6. MAE of K-means clustering and KNN

K-means clustering 2 3 4 5 6 K-NN 2 3 4 5 6

Mean Absolute Error Input data (Number of learners) 20% 60% 100% 0.183863251 0.17546994 0.171898164 0.159037417 0.145695666

0.099087001 0.093133063 0.087160131 0.082727262 0.080488854

0.078402631 0.072887544 0.06906422 0.067870945 0.065017698

2.6431 2.5943 2.7331 2.8105 2.9447

1.7154 2.189 2.3658 2.529 2.5967

1.8033 2.2465 2.4421 2.5418 2.6225

Based on the graph illustrate in Fig. 9, we can see that 4-means is more efficient than 3-means and 2-means. In the second experience, We compared the accuracy of the k-means clustering algorithm with respect to K-NN, while gradually increasing the number of learners for

A Comparative Study Between K-Nearest Neighbors

279

each K value. Table 6 depicts the comparative analysis of the performance of K-NN and the proposed algorithm.

PERFORMANCE OF K-MEANS CLUSTERING

MAE

20% data input

60% data input

100% data input

0.2 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 2

3

4

5

6

VALUES OF K Fig. 10. Performance of k-means clustering by varying number of learners

According to the results obtained in Fig. 10 and Fig. 11, we can see that the number of learners increases, the MAE value decreases in K-means clustering, but in K-NN it gradually increases, it seems that K-means clustering algorithm is more efficient than the K-NN classifier. Moreover, the experimental results illustrated in Fig. 12, show that the Euclidean distance has a better accuracy compared to the Manhattan distance according to the MAE metric evaluation; Especially when operated in k-means clustering. Then, The Euclidean distance offers a high performance in K-means clustering compared to its use in the kNN algorithm.

280

R. Zriaa and S. Amali

PERFORMANCE OF K-NN 20% data input

60% data input

100% data input

3.5 3

MAE

2.5 2 1.5 1 0.5 0 2

3

4

5

6

VALUES OF K Fig. 11. Performance of K-Nearest Neighbors by varying number of learners

Similarity metrics Mean Absolute Error

3 2.4421

2.5

2.4535

2 1.5 1 0.5

MAE 0.06906422

0.119028521

Euclidean distance

ManhaƩan distance

0

K-means clustering

Euclidean distance

ManhaƩan distance

K-NN

Fig. 12. Accuracy of the data mining algorithms using MAE

5 Conclusion This paper examined the K-NN classification and k-means clustering algorithm. From the results obtained in the previous section, we can see that the K-NN algorithm is

A Comparative Study Between K-Nearest Neighbors

281

efficient on small data, compared to k-means, that it is more accurate even with a large amount of data. K-means is used to divide the data to similar groups in order to find an accurate result, but in KNN it is used to find similar values from the set of data. Furthermore, it is shown that the Euclidean distance is more appropriate as a measure of similarity when used in k-means clustering, but it is less suitable with K-NN. Finally, each algorithm has its own specifications and never all algorithms have met all criteria and requirements. The choice of the algorithm depends on the context in which it will be used and on the amount of data to be manipulated. In our future work, we would have liked to exploit the k-means clustering algorithm in e-learning recommender system, thanks to its efficiency and accuracy; In order to predict and suggest personalized learning objects to learners based on learners’ preferences in terms of learning styles, skills, center of interest,…etc., so as to decrease the dropout rate and maintain learners’ perseverance and motivation.

References 1. Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender systems: azsurvey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 17(6), 734–749 (2005) 2. Melville, P., Sindhwani, V.: Recommender systems. In: Encyclopedia of Machine Learning and Data Mining, pp. 829–838 (2010). https://doi.org/10.1007/978-0-387-30164-8_705 3. Joshi, N., Gupta, R.: A personalized web based e-learning recommendation system to enhance and user learning experience. Int. J. Recent Technol. Eng. (IJRTE) 9(1), 1186–1195 (2020). ISSN 2277–3878, https://doi.org/10.35940/ijrte.F9991.059120 4. Aher, S., Lobo, L.M.R.J.: A framework for recommendation of courses in e-learning system. Int. J. Comput. Appl. 35(4), 21–28 (2011) 5. Rashid, A.M., Albert, I., Cosley, D., Lam, S.K., McNee, S.M., Konstan, J.A., et al.: Getting to know you: learning new user preferences in recommender systems. In: Proceedings of the 7th International Conference on Intelligent User Interfaces, pp. 127–134 (2002). https://doi. org/10.1145/502716.502737. 6. Nafea, S.M., Siewe, F., He, Y., et al.: On recommendation of learning objects using feldersilverman learning style model. IEEE Access 7, 163034–163048 (2019). https://doi.org/10. 1109/ACCESS.2019.2935417 7. Pagare, R., Shinde, A.: A study of recommender system techniques. Int. J. Comput. Appl. 47(16) (2012). https://doi.org/10.5120/7269-0078 8. Deepika, P., Parvathi, R.: Location recommendation system on point of interest and place-user similarity. Int. J. Recent Technol. Eng. (IJRTE) 9(1) (2020). ISSN: 2277–3878, https://doi. org/10.35940/ijrte.A2257.059120 9. Elahi, M., Ricci, F., Rubens, N.: A survey of active learning in collaborative filtering recommender systems. Comput. Sci. Rev. 20, 29–50 (2016). https://doi.org/10.1016/j.cosrev.2016. 05.002 10. Çano, E., Morisio, M.: Hybrid recommender systems: a systematic literature reviews. Intell. Data Anal. 21(6), 1487–1524 (2017). https://doi.org/10.3233/IDA-163209 11. Awla, A.H.: Learning styles and their relation to teaching styles. Int. J. Lang. Linguist. 2(3), 241–245 (2014). https://doi.org/10.11648/j.ijll.20140203.23 12. Zohair, L.M.A.: Prediction of Student’s performance by modelling small dataset size. Int. J. Educ. Technol. Higher Educ. 16(1), 27 (2019). https://doi.org/10.1186/s41239-019-0160-3

282

R. Zriaa and S. Amali

13. Madni, H.A., Anwar, Z., Shah, M.A.: Data mining techniques and applications—a decade review. In: 2017 23rd International Conference on Automation and Computing (ICAC). IEEE (2017), https://doi.org/10.23919/IConAC.2017.8082090 14. Tatiya, R.V., Vaidya, A.S.: A survey of recommendation algorithms. IOSR J. Comput. Eng 16(6), 16–19 (2014) 15. Ahmed, A., Al-Masri, N., Abu Sultan, Y.S., Akkila, A.N., Almasri, A., Mahmoud, A.Y., Abu-Naser, S.S.: Knowledge-Based Systems Survey (2019) 16. Jiang, L., Cai, Z., Wang, D., Jiang, S.: Survey of improving k-nearest-neighbor for classification. In: Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007), vol. 1, pp. 679–683. IEEE (2007) 17. Bhatia, N.: Survey of nearest neighbor techniques. arXiv preprint arXiv:1007.0085 18. Agarwal, A., Chauhan, M.: Similarity measures used in recommender systems: a study. Int. J. Eng. Technol. Sci. Res. IJETSR (2010). ISSN 2394–3386 19. Phyu, T.N.: Survey of classification techniques in data mining. In: Proceedings of the International MultiConference of Engineers and Computer Scientists, vol. 1, pp. 18–20 (2009) 20. Gunawardana, A., Shani, G.: A survey of accuracy evaluation metrics of recommendation tasks. J. Mach. Learn. Res. 10(Dec), 2935–2962 (2009) 21. Dubey, A., Choubey, A.: A systematic review on k-means clustering techniques. Int. J. Sci. Res. Eng. Technol. (IJSRE) 6(6) (2017). ISSN 2278–0882 22. Shi, N., Liu, X., Guan, Y.: ìResearch on K-means clustering algorithm: an improved Kmeans clustering algorithm. In: 2010 IEEE Third International Symposium on Intelligent Information Technology and Security Informatics, 2–4 April 2010, pp. 63–67 (2010) 23. Jiang, L., Cai, Z., Wang, D., Jiang, S.: Survey of improving k-nearest-neighbor for classification. In: Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007), vol. 1, pp. 679–683. IEEE.(2007). https://doi.org/10.1109/FSKD.2007.552 24. Adeniyi, D.A., Wei, Z., Yongquan, Y., et al.: Automated web usage data mining and recommendation system using K-Nearest Neighbor (KNN) classification method. Appl. Comput. Inf. 12(1), 90–108 (2016). https://doi.org/10.1016/j.aci.2014.10.001

Competence and Lean Management, a Literature Review Wafae Qjane(B) and Abderazzak Boumane Laboratory of Innovative Technologies, Technology and Engineering Sciences, Tangier, Morocco [email protected], [email protected]

Abstract. Lean management promotes the adoption of gradual improvements that are part of a daily search for efficiency and progress in organizations. Companies are increasingly aware of the importance of human factor and the resulting performance potential. In this context, there is a real need in research and industrial fields for competency management studies in lean context. This paper represents a literature review of the existing researches and works related to competence management and lean concepts. In this work, we define the lean concept through different opinions and identify the lean organizational structure composed of the steering committee and lean promotion service. We highlight the impact of lean management on the organizational culture by proofing that the transformation of the organizational culture and employee’s behaviour are two key elements of lean spirit. Then, we discuss the competence management concepts according to different researcher’s points of views through a chronological order as well as we present the competence category and levels. Furthermore, we argue the essential role of competence according to different researcher’s points of views. We also discuss the theoretical reflections interested in building competency management systems via two approaches universalist and situationalist. Finally, we describe the general process of competence management from competence identification to competence usage and then make the link between competence and knowledge. Keywords: Skills · Lean management · Competence management · Lean organizational structure · Lean culture

1 Introduction In the current industrial environment, many enterprises strive to increase their performance, and most of them consider the improvement and development of team competence as a fundamental progress factor. Consequently, controlling the effects of the competence management plays a significant role in contributing considerably to the company’s progress and its success and constitutes a competitive advantage as well.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 283–297, 2021. https://doi.org/10.1007/978-3-030-66840-2_22

284

W. Qjane and A. Boumane

Even if the value of competence management activities is recognized as a key source of competitiveness, firms still struggle to find effective and efficient processes and management activities to accomplish competence management according to Campisi and Costa (Campisi and Costa 2012). Drejer and Riis (Drejer and Riis 1999) affirmed that managing competencies in an enterprise requests an integrative approach including the development of individuals and collective management processes then informal as well as formal processes. Actually, as specified by Barney (Barney 2001) competencies may be considered as resources and therefore are models of the basic structures of economic organizations and of exchange among actors. A company is not only considered as a production system creating products but also as a system combined with a set of competencies. This needs the accumulation of new knowledge and an advanced understanding of individual perception at each managerial level. Accordingly, human resources associated competencies should be recognized measured and managed in order to represent a major change in enterprise management tools and approaches as well. In addition, lean manufacturing become the main management philosophy guaranteeing the survival of the enterprise’s business in an extremely competitive field. The original lean model, called the Toyota Production System, was credited as the phenomenally successful automaker Toyota. Lean system is a continuous improvement system entailing technical tools and managerial methods. Toyota describes its system as having two pillars. The first one is the continuous and systematic elimination of waste, assuring that people, equipment, and material works in coordination to respond to customer need. The second one concerns the principle of respect of people, which is a management philosophy encouraging employee involvement in an environment of trust and respect. Therefore, lean is a confirmed management strategy requiring both knowledge and learning skills. Many researchers insist on the value of lean manufacturing in improving company’s performance (Dombrowski and Mielke 2014). Consequently, they highlight the need for employees who are able to lead this change in order to achieve the required performance (Autissier and Moutot 2003). Through this paper, we will analyze the existing theoretical reflections and study current approaches related to competence management and lean. The literature results indicate that the determination of the essential competencies are related to organizational maturity level of lean in the company and team member’s personal skills and experience in lean systems. Other researchers insist on the importance of training and certification as elementary key factors for an effective lean action implementation. Additionally, we will identify the organization of a lean company and recognize the lean organisational culture that are both important for the achievement of lean manufacturing project goals in industrial organizations. Then we will focus on the most common existing concepts regarding the competence management and consider the main characteristics of competency and the goals of competence management to finally focus on the general process of competence management and its relation with knowledge. The literature review was based on researches in databases such as Science Direct, IEEE Journals and web of science.

Competence and Lean Management, a Literature Review

285

The aim of our work is to examine the literature review of the existing researches and works related to competence management and lean concepts in order to establish in future works the final list of the required lean competencies. This list will be refined and validated in order to create a lean competence framework. However, little researches have addressed intention to the creation of human competence piloting system in firms and until now, rare are the existing researches concerning the lean competence framework. Thus, our study subject has a capital value in scientific and industrial field.

2 Lean Management 2.1 Definition According to Ohno (Ohno 1988), lean is founded on learning from Toyota who is the leader in industrial market and who increased its share by improving their processes, most notably on the shop floor, but also in design and development, by the application of process accuracy. Lewis (Ohno 2000) confirmed that lean focuses on “value stream” encompassing the company, customers and supply chain partners. Womack et al. (Womack et al. 1996) used the term lean to describe the approach aiming at the elimination of waste and the efficiency improvement. They identified five lean principles as follows: 1. Identify value: Value is defined by the final customer; it literally means what the customer is willing to pay for. It aims at filling customers’ needs by providing products and/or services with a competitive price and lead time. 2. Identify the value stream: The value stream is the set of actions that transform a product or service. At this stage, the goal is to use the customer’s value as an orientation point and recognize all the activities that contribute to these values. Activities without value added to the customer are considered as waste and should be eliminated to ensure that customer is getting precisely what he wants. 3. Create the value flow: After removing the wastes from the value stream, the succeeding action is to ensure that the flow of the remaining steps in the production process run smoothly without stoppage or delays by reducing cycle times and batch sizes to the absolute minimum. 4. Establish pull: Processes or products are produced and delivered on demand from the customers. By following the value stream smoothly we ensure that the products produced will be able to satisfy customer’s need. 5. Pursue perfection: By making lean thinking and continuous improvement a part of the organizational culture. Industry mainly focuses on the shop floor. This focus brought many benefits but only addresses the second and third of Womack and Jones lean principles. It largely ignores the first lean principle, especially how practitioners or academics are to accurately identify customer value or identify and protect their core competences during process changes. While the lean approach may bring process improvement, Moody (Moody 1997) clarified that it could damage customer value or a company’s competitiveness mostly when a process changes effect or lead to the outsourcing of core competence activities.

286

W. Qjane and A. Boumane

Moreover, Hines et al. (Hines et al. 2004) make a clear distinction between the lean production system, which means the application of operational tools designed for the development and management of production areas, and the application of lean thinking as a strategic approach. Based on these notions and the definition provided by Shah and Ward (Shah and Ward 2003), researchers can describe lean management as follows: an integrated sociotechnical system of which the primary objective is to efficiently satisfy customer needs by continuously striving for customer value, continuous flow, and waste elimination in processes. Lean system has evolved concurrently with the development of Toyota’s production practices. Researchers and managers have developed their own interpretations of lean management based these practices. Furthermore, lean is a management method that aims at improving business performance through the development of all employees. The dual purpose of lean management is the complete satisfaction of the company’s customers and the success of each employee (manifested by the motivation and commitment) (Boucher 2007). For that, the lean culture is based on four fundamental principles (Delgoulet 2013), including the involvement of workers in improving their work environments. Indeed, workers are encouraged to be engaged in the improvement of their own workstations (Heinemann 2000). On the other hand, the role of management is to support this action of improvement on a daily basis. Lean process must be structured to support progress, maintain it and deploy culture and tools across the organization. This structuring includes (Holden 2007): 1. Develop the vision of the medium-term action Plan: To establish and display the vision and the steps chosen to reach it allows mobilizing the staff around common objectives. 2. Create the Lean function: composed of a lean Expert, lean Leader and lean Practitioner. 3. Creating the Steering committee: Continuous Improvement is a dynamic activity that requires readjustments, support, etc.… In this, the committee must be aware of the company’s overall strategy to align the lean objectives with the company’s overall goals. 4. Creating a culture of excellence: the steering committee and management must also be exemplary in order to create a new culture that is constantly looking for the best. However, deploying a process of progress and creating an autonomous and sustainable dynamic are long and complex tasks for lean companies. Whatever the company is, there will always be resistance to progress (Autissier and Moutot 2003). The initial step of the lean process is the understanding of the need to evolve and change the worker’s mindset. This awareness is the initial condition of lean change. In this context, the management must have a good understanding of the existing behaviors within the lean team in order to help them to be aware of the need for change and support them to have a vision of the future results.

Competence and Lean Management, a Literature Review

287

2.2 Lean Organizational Structure The steering committee is a lean service that is at the same level as the other services and that is composed of lean practitioner who is the pilot of change. On a daily basis, he must encourage the involvement and motivate the staff with the goal of continually improving the work. Lean expert, who is the guarantor in the implementation of the tools, proposes and validates the deployment plan with the management. Then, the lean leader who is supervisor of the projects that are already launched and drive the complex projects (Black 2008). Lean promotion service is in charge of the deployment of the lean program, it should (Black 2008): • • • •

Set up the lean program as defined with the management and the steering committee. Deploy the training plan in agreement with the human resources department. Pilot the lean projects on the Gemba. Ensure respect of lean thinking and the proper use of lean tools.

The steering committee and lean promotion service should be created in companies to insure the success of lean projects implementation. A successful and lasting lean transformation cannot happen without the alignment of an organisation’s existing culture with a culture that supports lean management according to Shook (Shook 2010). The organisational culture must provide a solid base for the lean management system. In this aspect, Liker (Liker 2004) affirmed the values and assumptions that are deeply embedded in the company’s daily life must agree with the philosophy behind lean management. 2.3 Lean Organisational Culture The systematic application of lean management can not only provide higher performance but also can modify the actual organisational culture towards one that is fully aligned with the lean concept. Further, the more the original culture is from the company’s lean culture, the more difficulties the organisation will encounter during lean implementation. This recognition suggests that organisations should have a good understanding of their own culture and how it compares with the lean culture before actually implementing change programmes such as lean projects. Indeed, transformation of organisational culture and employee behaviour is a key message in lean production literature. Unfortunately, existing tools for assessing the level of lean system are only able to reflect the degree of implementation of technical tools and some human related practices. Schein (Schein 1992) proposed a conceptual model of organisational culture through the identification of five basic beliefs that impact organisational culture. These beliefs are organized as follows (Fig. 1):

288

W. Qjane and A. Boumane

Organisa

n's context (or external environment)

Human nature (internal integra or mo va n)

n

Nature of truth

Time orienta

n

Nature of human rela

ns.

Fig. 1. Schein’s five beliefs of organisational culture

Researcher considered only the dimensions that are directly linked to the primary elements of Toyota’s culture. They chose Toyota as an example based on the widely accepted notion that the roots of lean management originated in Toyota’s production and management system. Toyota is still the most-cited example of a high-performing lean management system. Thus, they ignored the dimension ‘the nature of truth’ because it does not directly relate to any of Toyota’s cultural elements. They wanted to include all of Toyota’s cultural elements in the model. Consequently, they also added an additional cultural dimension, innovation. Innovation as embodied in continuous improvement plays a central role in the lean concept. But, it is not included in Schein’s basic model. The primary elements of lean organisational culture were identified using the books by Liker (Liker 2004). These books may provide the most complete picture of Toyota’s culture. The primary elements of lean organisational culture are listed in Table 1.

3 Competence Management 3.1 Defining Competence In the research literature and according to Drejer (Drejer 1996), the terms competencies, capabilities or technologies all refer to the similar concepts. Competence was first introduced in the 1970s by David McClelland as confirmed by Boyatzis (Boyatzis 1982). Core competencies are commonly defined as competencies providing the company with a competitive advantage through the application. They mean competencies that are not easy to reproduce and are built over time.

Competence and Lean Management, a Literature Review

289

Table 1. Elements of lean organisational culture – based on Toyota’s culture (Liker 2004)

The competence perception is multidimensional. In industrial engineering, Bennour and Crestani (Bennour and Crestani 2007) defined this concept as including the application the following three elements (Fig. 2):

Mutual knowledge (theore cal, contextual, and procedural)

Behaviour (rela nal or cogni ve a tudes)

know-how (prac cal and empirically derived)

Fig. 2. Competence elements according to Bennour and Crestani (Bennour and Crestani 2007)

Related to Harzallah and Vernadat (Harzallah and Vernadat 2002), all the exiting definitions seem to agree on these three fundamental characteristics.

290

W. Qjane and A. Boumane

In the Table 2, we specify some definitions of the competence by following the chronological order of advent of scientific research. Table 2. Competence definition Researcher

Year

Hamel and Prahald

1994

Drejer

1996

Le Boterf

2000

Torkkeli and Tuominen

2002

Tobias and Dietrich

2003

Belkadi et al.

2007

Key concept Competence as a package of skills and technologies that enables a company to provide benefits for customers rather than distinct skill or technology. Competency as a system of human beings, using (hard) technology in an organized way and under the influence of a culture to create an output that yields a competitive advantage for the firm. Competence as the result of a mixture of individual cognitive resources and resources from his environment. Competency as the cross-functional integration and coordination of capabilities. Competency means the personal characteristics manifested as knowledge, skills, and abilities, which remains stable across diverse situations. • Competency is a combination of various resources where their value derives from more than the simple possession of these resources. • Competency is related to an actor that may be for instance, a company, a project team or an individual. • Competency is supported by a cognitive structure that organizes the way the activity is performed and that is relatively stable across a full range of situations. • Competency is a construction that each time it is activated, it may be improved, enriched, and developed in order to be adapted to the changing features of the situation. Competencies are not stereotypes and although responsive to situational variety, have limits and when the variation exceeds certain limits, new competencies may be developed.

Despite their differences, these definitions highlight key characteristics of competency that are fundamental to understanding and emphasizing the close relationship between competence and the work situation. Researchers defined four generic elements of the competence allowing a good comprehension of competence development and management basis. This definition highlights the internal characteristics of competencies; it is a supplement to the traditional existing definitions and not a replacement. These elements are the following: • Technology it characterizes the tools used by human beings to participate in activities and it is usually the most visible part of a competence. In this context, technologies are view as physical systems or tools with the restriction to the softer perspectives. • Human beings are the central point of competence management, they are important insofar as they are value added creators by using the technologies. • Organization refers to the organizational systems under which human beings function. • Culture refers to the informal organization of the organizational unit within which a competence is expressed. Corporate culture influences the human beings and shapes

Competence and Lean Management, a Literature Review

291

their interpretations, understandings and actions via shared values, beliefs and norms that, among other things, guide their actions. 3.2 Competence Category In this part, we will discuss two methodological approaches to competency management based on situationalist and universalist perspectives. In a progressively changing environment and in a situation where organizational performance increasingly depends on human assets of knowledge and skills, firm’s human resources should be considered as skills and individual knowledge owners, rather than limiting their behavior within roles considered via predefined expectations and responsibilities. The dominant vision in managerial practice assumes that competencies are universal concepts, meaning that they are independent from any specific background. The universalist approach guarantees a high grade of efficiency by the standardization of competency. Moreover, the universalist approach is traced back to the famous McClelland surveys (McClelland 1978), where competencies are identified by using statistics to find the manners distinguishing middling from the best performers. In addition, to Spencer and Spencer’s studies (Spencer and Spencer 1993) that are classified as rationalist or deductive and in which general competency profiles are recognized through identification of standard scales and profiles for regular professional figures. The identification is done using standard codebooks in which the description of each competency is ordinarily too general to fit into various contexts. On the other hand, many researchers define competencies using situationalist approaches by sharing the perspective that competencies are particular situated concepts. In the situationalist perspective and according to Le Boterf (Le Boterf 2000), competencies as positioned, particular and totally influenced by social interaction, organizational culture, and the way people perceive their jobs within organizations. Following the situationalist perspective, competency is defined as an individual ability or characteristic that is activated by a worker together with personal, organizational, or environmental resources to cope successfully with specific work situations. Individual abilities and characteristics mean personal aspects like skills, expertise, and traits. Resources are action means such as facilities and tools, and different knowledge sources. Individuals perceive Job situation as typical spaces of action characterized by a certain grouping of expected behaviors and results. In comparing the situational definition to other universalist definitions of competency in which competency is a fundamental individual characteristic causally associated to performance, researchers highlighted a major difference. Indeed, the situationalist approach ignores the psychological aspects of personality and it is more focusing on conditional factors characterizing the socially built nature of competencies and the system of technical, social and moral connections in which a worker is integrated. In addition, there are other studies defining competence as skills ensuring that tasks are performed satisfactorily and safely, it includes appropriate qualifications, training, understanding, behaviour and attitudes, as well as the ability to perform tasks according

292

W. Qjane and A. Boumane

to defined performance standards (Rothwell 2000). They focus its four aspects that are professional, methodological, social and personal, described as follows in Fig. 3:

Fig. 3. Competence aspects (Rothwell 2000)

Competency management is based on a variety of requirements to be met for proper implementation and in particular in a lean context. These requirements must be realistic and appropriate to the tasks and jobs performed. The competency requirements for a task or job can be graduated for different skill levels such as supervised practitioners, practitioners and experts. 3.3 Competence Levels The competencies represent a key concept, they are a standardized requirement to appropriately achieve a specific work and it includes a grouping of knowledge, skills and behaviour in order to improve the performance. Baets and van der Linden (Baets and van der Linden 2003) confirm that human competencies have a strong organizational pertinence as it represents the general human’s performance related to his behavioral or understanding skills. Consequently, they must be specified considering the link with the tasks performed. An individual owns diverse types of competencies, their identification is essential to understand their impacts. Indeed, individual competencies are classified in diverse methods in literature; studies distinguish between three concepts as follows (Bennour and Crestani 2007): • Knowledge meaning the insight gained over experience and study. • Skills defining the ability learned and established through practice and knowledge application.

Competence and Lean Management, a Literature Review

293

• Attitude describing the individual talents, characteristics and behaviour. Moreover, studies proposed a three level classification for competence. The Fig. 4 shows this classification.

Generic competencies defining the reflection of the managerial approach

Changing competencies that are oriented to the competence development and the ability to mix resources and technologies

Organic competencies that are specific to the context and related to the job

Fig. 4. Competence classification (Bennour and Crestani 2007)

Each activity requests specific competencies to ensure the targeted level of performance. In this perspective, three types of individual competences are identified that are: • Competence-in-stock referring to competences that are previously acquired. • Competence-in-use meaning the skills currently that are practiced. • Competence-in-making that is linked to target competencies. 3.4 General Process of Competence Management Competence management is associated to the managerial techniques that increase the efficiency in recognising the needed core competencies. Related to Sengupta et al. (Sengupta et al. 2013), the performance management process becomes robust when employees are appraised on both the objectives and behavioral performance of their role. This approach is referred to as the mixed model, which develops a shared understanding of what will be monitored and measured, thereby ensuring an understanding of how the work gets done, in addition to knowing what gets done. In general, competency believes to be related to the job done and is considered as the least level of achievement to perform that job efficiently according to Garavan and McGuire (Garavan and McGuire 2001). As Sebastian and Kumar (Sebastian and Kumar 2018) confirmed the gap between the expected and actual level of competencies can predict future performance and measure individual efficacy and human capital adequacy of a department/organization, thus leading to individual performance.

294

W. Qjane and A. Boumane

Indeed, various tools and approaches for the quantitative and qualitative measurement of competence need to be implemented. In this paragraph, we recapitulate the general process of competence management according to many researchers following the aspects bellow (Belkadi et al. 2007): • Competence identification, which combines the inventory of competencies required in process including the necessary tasks and assignments, and the individual competences that are acquired by the company’s employees. • Competence allocation, meaning the system of allocating tasks to individuals that are formed according to clear management procedures. • Competence acquisition, which includes recruiting and selecting individuals to meet the company’s requirements for competencies. • Competence mobilization, which covers the operations of putting in place encouraging work conditions to allow the actors to use their competencies for a better results achievement. • Competence development, which contains several methods of training and learning on-the-job. Competence development aims at preserving competencies within the firm and is reinforced by the process of identification of employee’s motivation and coordination. • Competency characterization, which aims at formalizing competencies. • Competence evaluation, which is thoroughly associated to competence identification and characterization, it’s based on criteria established in advance. The evaluation compares the expected results associated to the application of competencies and the actual results. Other researches defined competence management involves several processes that can be categorized in four classes as below (Fig. 5):

Fig. 5. Competence management process

The process of competency acquisition starts from a need in a given context. It may induce the search and the selection of relevant resources. A competence is a way to put into practice some knowledge in a specific context.

Competence and Lean Management, a Literature Review

295

3.5 Competence Management Related to Knowledge Knowledge in an organization is defined by Abecker and Decker (Abecker and Decker 1999), as the collection of expertise, experience, and information that individuals and workgroups use during the execution of their task. It is something that human acquires and stores intellectually. Know-how is related to personal experience and working conditions. It is acquired by putting into practice knowledge in a specific context. Behaviors are individual characters that lead someone to act or react in a particular way under particular circumstances. They often condition the way knowledge or know-how is put into practice. Moreover, according to Baugh (Baugh 1997), we can distinguish two types of competencies: 1. Hard competencies identify the basic resources that are required for performing an activity. These resources are generally expressed in terms of knowledge, skills and abilities. 2. Soft competencies correspond to personal behaviors, personal traits and motives as defined by Woodruff for example working with others, leadership, etc. A competency means the characteristic of an individual or group that is required to produce an effective organizational performance. Thus, competency is related to the underlying knowledge and skills needed to perform a role within an organization. In this context, Nonaka (Nonaka 1994) defined the core competencies of an organization as including tacit and explicit knowledge, and being conceived of as a mix of skills and technologies.

4 Synthesis To review, lean manufacturing concept is a significant topic that is adopted as a viable system to improve business performance. Ubiquitous in the industrial sector, lean manufacturing requires a high qualified and multi skilled team to achieve the enterprise’s performance. This paper represents a literature review of the existing researches related to competence management and lean concepts. In this work, we define the lean concept which is a system focusing on value stream and aiming at waste elimination in order to satisfy customer’ order and identify the lean organizational structure consisting of lean practitioner, lean leader and lean expert. We highlight also the impact of lean management on the organizational culture by proofing that the transformation of the organizational culture and employee’s behaviour are two key elements of lean spirit. Then, we discuss the competence management concepts according to different researcher’s points of views through a chronological order as well as we present the competence category and levels. Furthermore, we discussed the essential role of competence according to different researcher’s points of views. In addition, we argued that an effective method for maintaining and developing competencies is potentially a very effective instrument to

296

W. Qjane and A. Boumane

gain and maintain competitive advantage. We discussed theoretical reflections interested in building competency management systems through both approaches universalist and situationalist. Finally, we describe the general process of competence management from competence identification to competence usage and then make the link between competence and knowledge. However, to our knowledge, rare are the research topics that have addressed intention to the necessary skills in a lean context.

References Boucher, X.: Competence management in industrial processes, pp. 95–97 (2007) Delgoulet, C., Vidal-Gomel, C.: Le développement des compétences: une condition pour la construction de la santé et de la performance au travail, pp. 17–32 (2013) Black, J.: Lean production: Implementing a World Class System (2008) Butterworth-Heinemann: Building Practitioner Competence, Elsevier, Oxford (2000) Dombrowski, U., Mielke, T.: Lean leadership – 15 rules for a sustainable lean implementation. Procedia CIRP 17, 565–570 (2014) Xu, Y.: The role of leadership in implementing lean manufacturing. In: IEEE Frontiers in Education Conference, pp. 756–761, 2017 Sicilia, J.D.: Continuous Process Improvement/Lean Six Sigma Guidebook, Revision 1, Attachment C. Training and Certification, p. 76 (2006) Hohne, R., King: Human performance improvement. Acad. Manag. J. (2000) Autissier, D., Moutot, J.M.: Practices in the conduct of change (2003) Krichbaum, B.D.: Lean Success factor: 10 Lessons from Lean (2007) Holden, R.J.: Lean Thinking in emergency parts: a critical review (2007) Allan, M., Chisholm, C.U.: Achieving engineering competencies in the global information society through the integration of on-campus and workplace environments. Ind. High. Educ. 22(3), 145–152 (2008) Baets, W.R.J., van der Linden, G.: Virtual Corporate Universities: A Matrix of Knowledge and Learning for the New Digital Dawn. Kluwer Academic, Norwell (2003) Berio, G., Harzallah, M.: Towards an integrated architecture for competence management. Comput. Ind. 58(2), 199–209 (2007) Godbout, A.J.: Managing core competencies: the impact of knowledge management on human resources practices in leading-edge organizations. Knowl. Process Manag. 7(2), 76–86 (2000) Homer, M.: Skills and competence management. Ind. Commer. Training 33(2), 59–62 (2001) Hustad, E., Munkvold, B.E.: IT-supported competence management: a case study at Ericsson. Inf. Syst. Manag. 22(2), 78–88 (2005) Lewis, M.A.: Analysing organizational competences at Aerospace Composite Technologies (ACT). Knowl. Process Manag. 4(3), 163–176 (1997) Ley, T., Ulbrich, A., Scheir, P., Lindstaedt, S.N., Kump, B., Albert, D.: Modeling competencies for supporting work-integrated learning in knowledge work. J. Knowl. Manag. 12(6), 31–47 (2008) Lindgren, R., Henfridsoon, O., Schultze, U.: Design principles for competence management systems: a synthesis of an action results study. MIS Q. 28(3), 435 (2004) McGrath, R.G., MacMillan, I.C., Venkataraman, S.: Defining and developing competence: a strategic process paradigm. Strateg. Manag. J. 16, 251–275 (1995) Makadok, R., Walker, G.: Identifying a distinctive competence: forecasting ability in the money fund industry. Strateg. Manag. J. 21, 853–864 (2000)

Competence and Lean Management, a Literature Review

297

Belkadi, F., Bonjour, E., Dulmet, M.: Competency characterization by means of work situation modeling. Comput. Ind. 58, 164–178 (2007) Bennour, M., Crestani, D.: Using competencies in performance estimation: from the activity to the process. Comput. Ind. 58(2), 151–163 (2007) Berio, G., Harzallah, M.: Knowledge management for competence management. J. Univ. Knowl. Manag. 0(1), 21–28 (2005) Boucher, X., Burlat, P.: Vers l’intégration des compétences dans le système de performances de l’entreprise. Journal Européen des Systèmes Automatisés 37(3), 363–390 (2003) Campisi, D., Costa, R.: Intellectual capital and competitive advantage: an analysis of the biotechnology industry. World Academy of Science, Engineering and Technology, vol. 71, pp. 163–168 (2012) Le Boterf, G.: Construire Les Compétences Individuelles et Collectives. Editions L’Organisation, Paris (2000) Sydänmaanlakka, P.: Intelligent Leadership and Leadership Competencies. Developing a Leadership Framework for Intelligent Organizations (2003) Tobias, L., Dietrich, A.: Identifying employee competencies in dynamic work domains: methodological considerations and a case study. J. Univ. Comput. Sci. 9(12), 1500–1518 (2003) Torkkeli, M., Tuominen, M.: The contribution of technology selection to core competencies. Int. J. Prod. Econ. 77, 271–284 (2002) Tsai, H.T., Moskowitz, H., Lee, L.H.: Human resource selection for software development projects using Taguchi’s parameter design. Eur. J. Oper. Res. 151, 167–180 (2003)

Smart Mobility and Intelligent Infrastructures

A New Distributed Strategy to Find a Bi-objective Optimal Path Toward a Parking in the City Khaoula Hassoune1(B) and Mehdi Hassoune2 1 Systems Architecture Team, Laboratory of Research in Engineering, Hassan II University,

ENSEM, Casablanca, Morocco [email protected] 2 University of Sciences Ibn Zohr, Agadir, Morocco

Abstract. Car production has increased, which has caused an increase in the number of cars in the city. On the other hand, due to the disorganization of parking spaces in the city the demands of drivers to find the nearest parking spaces have increased to avoid traffic jams in these areas. Today, several cities have adopted the concept of Internet of Things (IoT) to improve quality of the city. Many problems have been resolved by the IoT, such as road traffic and parking space management. This paper presents a new system that will allow a vehicle driver to find the optimal path toward a parking in the city taking into consideration the number of available places in each parking. Our system is based on distributed swarm intelligence strategy using the ant colony algorithm and multi-agent systems. Keywords: Smart city · Parking routing problem · Ant colony optimization (ACO) · Multi-agent system

1 Introduction Nowadays, finding an Available parking space has become a serious problem for drivers and has become even more difficult with the increase in the number of vehicles in the cities. In order to solve this problem, the new concept of the intelligent city should take into consideration the decrease of time spent searching for parking spaces and the decrease of road traffic. For these reasons, researchers have proposed intelligent solutions that inform drivers about available parking spaces around them based on a set of technologies. Many cities have implemented a set of technologies combined with the IOT concept to control and manage the flow of data in the city. In fact, several systems have been developed to solve the parking problem and allow the drivers to receive real-time information about the status of indoor parking in a specific area by mobile applications. A set of sensors are used to manage the status of the indoor parking. In this paper, a new architecture is developed to reduce the time taken by the driver to search for an indoor car park with available places. This system will help users to find the closest car park and propose the optimal path (in terms of distance, time and number © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 301–310, 2021. https://doi.org/10.1007/978-3-030-66840-2_23

302

K. Hassoune and M. Hassoune

of available places). In this work, we propose an efficient heuristic algorithm for the parking guidance problem which combines IoT, the ant colony optimization algorithm and multi agent systems. The system will find the optimal path in terms of time and total distance. In the next part, we summarize researches developed to solve the guidance parking problem. In Sect. 3, we describe the decentralized architecture for managing parking problem in smart cities. The ant colony optimization algorithm is defined in Sect. 4. Section 5 explains the proposed mathematical model to solve the parking routing problem based on ACO Algorithm. Section 6 gives an overview of the system architecture. Section 7 described the results of the implementation. The perspectives and conclusion are given in the last two sections.

2 Previous Works Authors in [1] propose an algorithm that selects a collective route which minimizes the total congestion by applying A-star to the collective routes. When congestion is detected within a route, the algorithm creates a new route which is reintroduced back into the algorithm. W. Cai et al. [2] proposes a parking guidance system based on Dijkstra optimization algorithm and wireless sensor network. The system is composed of three elements: parking lot sensors sink node and parking manager which will help drivers to find the optimal route toward a parking place. M. Chen and T. Chang introduce a parking guidance and information system based on wireless sensor system [3]. The guidance system uses sensor network to capture realtime information about all parking and send it to the parking manager. At the end, the system displays the information and the position of the parking spaces. A dynamic parking guidance system that combines parking destination switching and real-time traffic routing is propose in [4]. Drivers can switch their parking destinations and routes during their trip to minimize their travel costs. Choe [5] designed a parking guidance system to manage parking area and to collect data. The information of the parking lot and the road condition are displayed on the electronic road signs to help The driver chooses the parking lot. Song [6] proposed an intelligent parking lot navigation scheme based on the ZigBee technology of the Internet of things (IoT). It uses ZigBee and ultrasonic sensors to get the vehicle location, and improves the parking efficiency by modifying the shortest path algorithm. J.-H. Shin et al. proposed a guidance algorithm that takes into account dynamic parking conditions in the city [7]. The results were given to validate the proposed algorithm. Zheng et al. [8] have developed a prediction parking system using machine learning algorithms and real time data acquired from two big cities. An intelligent parking approach based on meta-heuristic is proposed in [9] to find an optimal road toward a parking location. In this work [10]. A parking routing system has been developed based on genetic algorithm (GA) to solve the problem of parking in big cities. With this strategy, the drivers can route inside the parking to reach the best vacant place in the minimum time.

A New Distributed Strategy to Find a Bi-objective Optimal Path

303

This paper [12] proposes a new model of intelligent traffic system based on the Internet of Things that allows them to retrieve information in real time. The data collected by the different intelligent components are monitored and analyzed. The data obtained will be used to assess the state of the road network in a specific area. In the present paper we are designing a decentralized parking routing solution to help drivers find the optimal route between their position in real time and an indoor car park with available places in a chosen area. Our solution must be optimal in term of (distance, traffic, time). The proposed system architecture is based on multi objective ACO, cloud system and multi agent systems.

3 Overview of the System Architecture The architecture of our problem is composed of a set of elements that are connected through a cloud platform. These elements are equipped with a set of intelligent equipment to collect and transmit data to the cloud for processing. Therefore, in order to have an overview of the number of available parking spaces and the state of the car park it is necessary that car parks must be equipped with intelligent sensors. The information about each service car park will be transmitted to the cloud using a specific agent. Based on the data received, the system shall be able to provide the optimal path for the driver, taking into account the distance and the number of spaces available in each car park. 3.1 Parking Infrastructure Description In order to better respond to drivers’ requests, each indoor parking lot must be equipped with a set of sensors to provide continuous measurement of the condition of the parking lot and the number of vacant spaces. The data collected by the different devices are sent to a specific agent who transmits them to the cloud management system via the MQTT protocol [13]. 3.2 Overview of the Parking Routing System The cloud platform provides the resources for the parking routing system. When the cloud receives the request from the driver the system will catch the Coordinates of vehicle position. Google services will be used to give us a road network with places coordinates. In this step, we can say that our road network for this specific area around the driver is completely prepared, when all information is ready we will apply our distributed strategy to find the optimal path between the real time position of the driver and an indoor parking with available places in the chosen area. Our solution must fits the preferences of the driver in terms of distance, number of available places and time.

4 Ant Colony Optimisation Algorithm The ant colony optimization (ACO) algorithm is a probabilistic technique for solving computational problems that are limited to determining the best paths through the graphs.

304

K. Hassoune and M. Hassoune

Ant colony algorithms are one of the most successful examples of intelligent swarm systems, and have been applied to many problems such as the classical traveler’s problem, routing in telecommunication networks [11]. In fact, the behavior of ants is based on a deposited chemical substance called a pheromone. When they arrive at a decision node, such as a road intersection, it makes a probabilistic choice based on the amount of pheromone present on the different roads of the intersection. Initially, the amount of pheromone on both routes is null, so ants going from nest to food source will choose one of the two routes with equal probability. The ants that choose the shorter route will be the first to reach the food source. When they return to the nest, they will be attracted to the pheromone track on the shorter route, but the shorter route will be chosen with a higher probability than the longer one. New pheromones will be released on the chosen path, making it more attractive to other ants. Over time, the pheromone on the optimal path is deposited at a higher rate, making the shorter path more likely to be chosen until the end of the iteration.

5 Bi-objective Parking Problem Using ACO Algorithm In the present study, we suggest a mathematical model to improve the time needed to find an indoor car park by considering a set of constraints (distance, available places in each car park). The ACO algorithm for parking problems is based on a set of ants working together simultaneously to find the right solution. Each ant will build a complete trajectory based on a given map. The construction of the trajectory is described as follows: at the beginning, the ants start from the driver’s node and select the next node to visit according to a probabilistic transition to calculate. The choice consists in selecting a car park to visit from the list of available car parks. The procedure of searching for an optimal path starts again when the ants reach the car park position. The optimal solution is defined based on the quantity of pheromone. In the proposed strategy the number of vehicles in the destination car park is related to the quantity of pheromone. Thus, when the state of the car parks is sent to the cloud, we use it to calculate the number of available places in the parking and we consider it as the value of pheromone amount. The main goal of our algorithm is to find best routes to achieve a parking with available places and short distance. so ants will follow the routes to achieve the parking with more available places which mean higher amount of pheromone. The problem will be modeled as a road network with distributed nodes. In our case, nodes present drivers, and car parks where D = {D1 , D2 ,……., Dm } is the set of drivers searching for an indoor parking with available places in a specific area, P = {P1 , P2 ,……., Pn } is the set of indoor car parks. Let G be the graph (Fig. 1) with vertices in (D ∪ P) and the edges in E shall be determined as follows: E = {Di Pj /1 ≤ i≤m and 1 ≤ j≤n}

A New Distributed Strategy to Find a Bi-objective Optimal Path

305

Fig. 1. The problem graph.

We denote Di Pj the distance between the vertices Di and Pj (The driver Di and the parking Pj) . The purpose of the problem is to determine the optimal route for each driver looking for a parking with available places, taking into account the time of the trip, the distance and the number of available places. In this version of ACO for parking problems, each ant must create a route that allows the driver to reach a parking area taking into account the constraints defined. Each ant chooses the next node to visit based on the value of the probabilistic transition (1):       τα Di , Sj ηα1 Di , Sj k     p Di , Sj =  (1) α α1 D , S i β β∈1,p τ Di , Sβ η where   η Di , Pj =

1 Di Pj

(2)

  τ(Pj , Dj ) = Card Pjv : est le nombre d’éléments de l’ensemble Pjv où Pj = Pjv ∪ Pjv¯ et Pjv ∩ Pjv¯ = ∅ Pjv the number of available places in the parking Pj and Pjv¯ the number of places in the parking Pj . τα(Di , Pj ): value of pheromone trail on edge from Di to Pj. η Di , Pj : value of distance cost function from Di to Pj. Each ant must make a local update when building the different trajectories. Once the procedure is completed the ants must make a global update of the pheromone quantity. The following equation describes the local update process: k τij = (1 − ρ)τold ij + τij

(3)

306

K. Hassoune and M. Hassoune

τij =

⎧ m ⎨ ⎩

k=1

Q (tmk )

if (antkusetheedgeij) 0

else

ρ is the pheromone reduction parameter (0 < ρ < 1) depending on the information received from the sensors which concerns the number of available places in the car park. The parameter m is the number of ants, tmk is the time needed to cross the route between the driver Di and the parking Pj by the ant k and Q is a positive random constant set at 100.

6 Description of Architecture Agents Our architecture (Fig. 2) is composed of a set of autonomous entities that collaborate together in a distributed way to find the optimal path for the driver; our strategy is based on a set of agents with different behaviors:

Fig. 2. Multi agent distributed Architecture

– Sensor Agents: captures the information about the state of the parking and send the number of available lots in each car park to the knowledge base. – Work agent: is based on the road network, every worker (WA) will choose the following node at visit based on a probabilistic transition to be calculated. This transition is based on two values which are the distance (between the driver’s position and a parking lot) and the number of available spaces in the parking lot. At the end of the process the agent will find the optimal solution which will be transmitted to the driver.

A New Distributed Strategy to Find a Bi-objective Optimal Path

307

– Control agent: he communicates with the mistletoe in order to obtain information such as (coordinates of each node, the distance and the number of places available in the parking lot to be visited) in order to send them to the agent worker. – Work agent: is based on the road network, every worker (WA) will choose the following node at visit based on a probabilistic transition to be calculated. This transition is based on two values which are the distance (between the driver’s position and a parking lot) and the number of available spaces in the parking lot. At the end of the process the agent will find the optimal solution which will be transmitted to the driver. – Control agent: he communicates with the mistletoe in order to obtain information such as (coordinates of each node, the distance and the number of places available in the parking lot to be visited) in order to send them to the agent worker. – Driver agent: represented by an application whose role is to send a request for guidance to a car park with spaces and to display the optimal path for the driver. – Master agent: controls the communication between agents and send the current best tour to the Interface agent.

Fig. 3. The road network graph

– The GUI Agent: The user initiates the map of the parking locations, driver location and service stations locations, and this agent interact with the knowledge base to store all this information and Then send it to the Master Agent.

7 Description of Implementation We implement our solution using Java-Agent Development Environment (JADE) as a distributed and parallel platform. The proposed solution is deployed on several containers. The Fig. 3, illustrates the Gui Agent which is used to control the user interface and

308

K. Hassoune and M. Hassoune

to draw the graph and the nodes). The user uses an interface to draw its own graph by selecting a set of yellow nodes that represent indoor parks, and green nodes represent the driver. The driver searches for an optimal path toward a parking taking into account some parameters (distance, time, number of available places). We’re going to present 2 results following 2 scenarios: In the first scenario (Table 1) we notice that parking lot 6 which has the most available spaces. Table 1. The parameters of 1st scenario Parking node

Available places

P1

5

P2

15

P3

8

P4

30

P5

20

P6

40

P7

25

P8

18

P9

13

P10

27

At the beginning, we need to fix the number of iterations. Each iteration ends when a set of Ant agents reaches a parking and the parameters are initialized at the start of the iteration. In each iteration we collect the data (the best cost, the amount of pheromone, the best solution…). At the end of each round, we can use these data to find the optimal solution. In this scenario we remark that the P6 is the best parking to visit in term of distance time and available places (Fig. 4). When user draw its own graph the Master Agent controls a set of messages by sending a command to each agent in the environment to start a new iteration.in the end of each iteration the Master Agent is able to know which agents have the best solution. Figure 4 displays the result of the ant colony optimization algorithm based on multi agent systems, in this step, the Gui Agent draws the best optimal path proposed by the Master Agent that allows the user to find a place in an indoor parking taking into account some parameters (distance, number of available places in the parking). In the second scenario we are going to work with the same parameters of the 1st scenario but we are going to change the number of available places in the parking P6 by 0. we’ll see that the outcome will change Fig. 5.

A New Distributed Strategy to Find a Bi-objective Optimal Path

309

Fig. 4. Bi-objective optimal path(Scenario 1)

Fig. 5. Bi-objective optimal path(Scenario 2)

8 Conclusion This paper presents a new distributed strategy based on ACO algorithm and multi agent systems to solve bi objective parking problem. The solution uses the behavior of real ants to solve the parking routing problem by finding an optimal path in terms of (distance, Number of available places). The Parking problem is presented as a graph with multiple nodes (parking), each one is managed by a Controller agent. In the proposed architecture, the agents work in parallel to find the best solution for the driver. When one of the agents fails in his behavior, the system continues to work because we have a set of agents to accomplish this task.

310

K. Hassoune and M. Hassoune

References 1. Rhodes, C., Blewitt, W., Sharp, C., Ushaw, G., Morgan, G.: Smart routing: a novel application of collaborative path-finding to smart parking systems. In: CBI, vol. 1, pp. 119–126 (2014) 2. Cai, W., Zhang, D., Pan, Y.: Implementation of smart parking guidance system based on parking lots sensors networks. In: 2015 IEEE 16th International Conference on Communication Technology (ICCT), pp. 419–424. IEEE (2015) 3. Chen, M., Chang, T.: A parking guidance and information system based on wireless sensor network. In: 2011 IEEE International Conference on Information and Automation, pp. 601– 605. IEEE (2011) 4. Chai, H., Ma, R., Zhang, H.M.: Search for parking: a dynamic parking and route guidance system for efficient parking and traffic management. J. Intell. Transp. Syst. 23(6), 541–556 (2019) 5. Choe, H., Gorfman, S., Heidbrink, S., Pietsch, U., Vogt, M., Winter, J., Ziolkowski, M.: Multichannel fpga-based data-acquisition-system for time-resolved synchrotron radiation experiments. IEEE Trans. Nucl. Sci. 64(6), 1320–1326 (2017) 6. Song, Y., Lin, J., Tang, M., Dong, S.: An internet of energy things based on wireless lpwan. Engineering 3(4), 460–466 (2017) 7. Shin, J.-H., Jun, H.-B.: A study on smart parking guidance algorithm. Transp. Res. Part C Emerg. Technol. 44, 299–317 (2014) 8. Xiong, X., Choi, B.-J.: Design of genetic algorithm-based parking system for an autonomous vehicle. In: Control and Automation, and Energy System Engineering, pp. 50–57. Springer (2011) 9. Hunkeler, U., Truong, H.L., Stanford-Clark, A.: Mqtt-s—a publish/ subscribe protocol for wireless sensor networks. In: 2008 3rd International Conference on Communication Systems Software and Middleware and Workshops (COMSWARE’08), pp. 791–798. IEEE (2008) 10. Dorigo, M., Blum, C.: Ant colony optimization theory: a survey. Theor. Comput. Sci. 344(2– 3), 243–278 (2005) 11. Dorigo, M., Gambardella, L.M.: Ant colony system: a cooperative learning approach to the traveling salesman problem. IEEE Trans. Evol. Comput. 1(1), 53–66 (1997) 12. Dubey, A., Lakhani, M., Dave, S., Patoliya, J.J.: Internet of things based adaptive traffic management system as a part of intelligent transportation system (its). In: 2017 International Conference on Soft Computing and its Engineering Applications (icSoftComp), pp. 1–6. IEEE (2017) 13. Hunkeler, U., Truong, H.L., Stanford-Clark, A.: Mqtt-s—a publish/ subscribe protocol for wireless sensor networks. In: 2008 3rd International Conference on Communication Systems Software and Middleware and Workshops (COMSWARE’08), pp. 791–798. IEEE (2008)

A Novel Mobile CrowdSensing Architecture for Road Safety Wahiba Abou-zbiba1(B) , Hajar El Gadi2 , Hanan El Bakkali2 , Houda Benbrahim1 , and Driss Benhaddou3 1

IRDA, IT Rabat Center, ENSIAS, Mohammed V University in Rabat, Rabat, Morocco [email protected], [email protected] 2 SSL, IT Rabat Center, ENSIAS, Mohammed V University in Rabat, Rabat, Morocco [email protected], [email protected] 3 Engineering Technology Department, University of Houston, Houston, TX 77204, USA [email protected]

Abstract. Intelligent Transportation Systems have become an essential part of today’s transportation systems as they aim to enhance efficiency, safety and mobility. They rely particularly on various communication and sensing technologies to achieve their objectives. At this level, Mobile CrowdSensing presents a cost-efficient solution and provides interesting features for data collection which is a major component in ITS. However, it still faces some challenges such as lack of incentive mechanisms, data validation, privacy and security. These challenges motivate us to propose a Mobile CrowdSensing architecture for our future SI-CAR (Secure and Intelligent Crowdsensing Application for Road Safety) application that integrates deep learning-based data validation, edge computing-based local processing for data privacy and gamification based-incentive mechanism. Keywords: Intelligent Transportation Systems · Road safety · Architecture · Mobile crowdsensing · Edge computing · Machine learning · Gamification · Security and privacy preservation

1

Introduction

The growth of the world’s population is impacting the transportation systems in Morocco and worldwide. This growth comes with an increase in the number of vehicles on the roads which creates challenges related to congestion, road conditions deterioration, safety, health and environment. Road safety is one of the most important priorities of government agencies. Besides the socio-economic challenges it entails, it is a humanitarian matter. According to the World Health Organization, in 2018 the number of deaths by road injuries reached 1.35 million deaths in the world, becoming the 11th leading cause of death worldwide, c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021  M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 311–324, 2021. https://doi.org/10.1007/978-3-030-66840-2_24

312

W. Abou-zbiba et al.

and the leading cause of death for children and young adults aged 5–29 years [1]. In Morocco, the situation is indeed no better, road accidents are the 5th leading cause of death. Although Morocco has made progress in road safety and decreased fatalities by 9,73% between 2016 and 2019 due to the interventions and reforms on which authorities are working. Morocco is ambitious to reduce the number of road deaths by 50% between 2015 and 2026 following the current road safety strategy that covers the period 2017–26 [12]. Nevertheless, the situation and its challenges compel the need to take into consideration the most notable road accidents causes such as road anomalies, driver behaviors, pedestrians’ behaviors, weather conditions, and law violations. It also necessitates the authorities to enhance road safety and security for all road users by improving the infrastructure and road conditions and involving every member of society through education, awareness, and sensitization. To further improve road safety, big data analytics opens an opportunity to collect different varieties of data with high volume and high accuracy to analyze traffic and get more insight into the cause of road accidents/fatalities which can help to inquire early about needed reforms and to make the appropriate decisions. To do so researchers and practitioners integrated information technologies with transportation and developed what is known as Intelligent Transportation Systems (ITS) [19,30]. ITS integrate different advanced information systems and electronic technologies in the infrastructure to collect various types of data such as Bluetooth, images and videos. Data collected can be later on shared and processed to identify and prioritize safety problems. However, implementing data collection that spreads over a wide geographical area is a common concern for ITS as it requires costly equipment and sensors, which involves high deployment and high maintenance costs. This is in particular challenging for developing countries owing to the limit of their budgets. As an alternative, an emergent sensing paradigm is known as Mobile CrowdSensing (MCS) [9] is being discussed as a promising mechanism that enables collecting a wide range of real data, analyzing and uploading road data using mobile devices, since they are ubiquitous devices equipped with a set of cheap, yet powerful and efficient sensors such as camera, proximity sensor, light sensor, GPS, accelerometer, gyroscope, barometer, microphone, and so forth. The purpose of our work is to contribute to the collection of high amounts and various types of good quality traffic data including images, videos, and annotations using MCS solutions, and to increase the amount of sensing data by increasing the level of participation. On the grounds, two main features should be applied to raise the number of data collectors, motivating the smartphone users by applying an incentive mechanism [9] based on rewards, and preserving their privacy and security by preserving their private information and sensitive activities. But since rewards are presented and the privacy of participants is preserved, irrelevant data may be provided unintentionally or maliciously. To avoid such a problem and optimize data quality, a data validation mechanism should be applied.

A Novel Mobile CrowdSensing Architecture for Road Safety

313

In this paper, we present our MCS architecture leveraging a combination of techniques; gamification to ensure the incentive mechanism, deep learning to validate collected data to supply good quality data, and edge computing to provide data processing locally to preserve the privacy of data while preserving the security and the privacy of the smartphone user. The remainder of this paper is organized as follows: Sect. 2 discusses the background and presents relevant related work; Sect. 3, presents the design goals of our work; Sect. 4 introduces the proposed MCS architecture and brings details about it; Sect. 5 brings conclusion and future work.

2

Background and Related Work

ITS dispense innovative services to improve safety, accessibility, mobility, and efficiency. It merges current and advanced technologies: Communication technologies that include wireless and wired systems, computational technologies that provide platforms and softwares for real-time applications, Data storage and processing technologies that covers various systems for instants magnetic storage and compact discs, Database management systems such as Data Warehousing, Sensing technologies like Inductive loops, Video detection, and Bluetooth detection and other technologies. Moreover, ITS are based on data collection, data transmission, data analysis, and using the results of the analysis in several operations and providing realtime information to travelers, road users and relevant authorities. More details can be found in [19,22,29,30]. Figure 1 gives an overview of a typical Intelligent Transportation system.

Fig. 1. Intelligent Transportation Systems overview.

The data is collected via various hardware devices. We can distinguish two types of data, inventory data that describes the elements that do not change over time (physical elements of the road) and condition data that describes the condition of the elements that may change over time. These data can be provided from emergency transport service, hospital registry, police records, or can be

314

W. Abou-zbiba et al.

collected from the roadside. In general, traffic data can be collected by two different methods, intrusive and non-intrusive methods. Intrusive methods involve recording data using sensors placed on the road, and non-intrusive methods are based on remote observations, either using human or technological techniques [22]. Increasing the amount of road data and transportation information can improve road safety levels. This is achieved through the use of Big Data analytics, which helps to predict traffic accidents, detect road anomalies, and may more. Furthermore, deployed sensors must be increased as the ones integrated with smartphones such as camera, proximity sensor, light sensor, GPS, accelerometer, gyroscope, barometer, microphone... Taking advantage of this variety of sensors and the mobility of smartphone owners, MCS appeared [16]. This mechanism enables smartphones to sense, collect, upload, and analyze data. Different phenomena may be sensed for instance public safety, road condition, water quality, and air pollution. Using MCS systems, it is easier to cover a wider sensing area by motivating mobile owners everywhere to collaborate in these tasks, which is somehow easy in the presence of a large number of users worldwide. The fact that mobile owners are heterogeneous, by possessing different devices capabilities with different sensors and different functions, make the sensing task more advantageous by providing various types of data. Compared to conventional sensing techniques, data collectors in MCS are aware of the mobile loose power constraint, therefore they can charge the battery at any time to complete the task [9]. Focusing on road safety, many applications have been developed. In order to estimate the risky traffic situation to prevent and reduce accidents, understanding the various road traffic scenarios is crucial. An analysis of collected data which describes both crashes and dangerous situations allows this process to be carried out. For this purpose, [2] has designed a smartphone platform that enables the collection of behavioral sensor data of pedestrians and vehicle drivers in addition to location features for each smartphone via cellular networks. These data are then sent to a cloud server to be analyzed using a deep neural model that estimates the traffic state to classify the sensing situation and detect near-misses situations. In [26], Joao Soares et al. describe the design, implementation, and deployment of a cloud-based road anomaly information management service, following a collaborative mobile approach, where participants use smartphones to acquire data during driving activities, collected data is then processed, transformed, and classified using a machine learning model. In [23], Joao G. P. Rodrigues proposed an architecture for a Massive Multi-Sensor Urban Scanner which acquires large quantities of real-time using multiple communication: sensing technologies, data gathering units, and Back-end servers. In [17], Qun Liu et al. developed the solution SAFERNET to address road safety and route computation framework using MCS, Internet of Vehicles (IoV), and Cloud computing technologies. In the same manner, a solution has been developed for ITS road safety called IRide [8]. In [3], researchers present the vehicular sensing framework developed that enables the on-demand sensing of traffic conditions. In addition

A Novel Mobile CrowdSensing Architecture for Road Safety

315

to these solutions, an urban Safety application and a website in [33] were developed to collect structural security information of urban infrastructure. However, most of the current work does not address the challenges of security, privacy, flexibility and extensibility sufficient to incorporate new functionalities. Zhenyu Zhou et al. [34] have applied these patterns, by developing the Robust MCS framework, that integrates deep learning-based data validation and edge computing based local processing. However, it still faces some challenges such as road users’ privacy. In this manner and to support the specific requirements of our project, we design an architecture that includes both local processing and cloud-based, to collect data reliably and securely while preserving openness, extensibility, and flexibility combining different safety parameters. Furthermore, it includes an incentive mechanism in the design while preserving privacy. Then enables irrelevant data detection using a machine learning-based data validation.

3

Design Goals

To ensure good quality and reliability of collected data, and motivate mobile users to be part of the participatory sensing task while preserving their privacy and security are the major challenging issues for an efficient MCS system. Based on these challenges, we propose a new architecture design to ensure that our collaboration for road safety data collection will result in a high amount of reliable and good quality data through voluntary participants. Therefore, we propose a set of design goals that construct our architecture: participant incentivization, data quality optimization, privacy, and security protection. 3.1

Participant Incentivization

The success of the MCS system is based on motivating and incentivizing mobile users to collect and communicate relevant data. In this regard, various incentive strategies have been proposed. We adopted the gamification technique. As the process that puts the most emphasis on human motivation [20]. The gamification has gained great prominence since 2010 when several leading companies such as Microsoft started to use this technique in many applications [13]. This term was defined by [6] as the employment of game mechanisms such as leader boards, badges, and level schemes based on points in a non-game environment. The game context is such a likely strategy to keep users motivated considering that is a good manner of entertaining. Moreover, players can easily gain satisfaction by experiencing a sense of accomplishment and progress [14]. Gamification was applied in many MCS contexts, proving that the use of this concept is a primordial mechanism for data collection. [27] presented a gamification based participatory sensing framework, and showed that the use of gamification as an incentive mechanism increases the participation probability by 20%. To ensure the trustworthiness of sensed big data with fostering users’

316

W. Abou-zbiba et al.

participation, [21] designed an MCS framework based on gamification. [7] presented gamification based rewarding schemes to ensure trustworthiness in MCS, and proved satisfactory results. Four game design elements have been proposed for the SI-CAR solution: Points, Badges, Rewards, and Leaderboard. Points: This element is widely used in the gamification context, since it gives a sense of satisfaction for each contribution, regardless of the user level. In SI-CAR architecture, points are rewarded depending on the amount of the data. 50 points are rewarded for each relevant image, videos, or accelerometer data according to the quality and the GPS coordinates and date associated with each data. 50 points should be added to the user’s score if an annotation describing the situation is added. 350 points are rewarded for each set of images, video, accelerometer data, and annotations. Badges: This element is used to mark the user’s contribution level. In our architecture, the user can exchange his points with a new badge. A higher-level badge allows the user to obtain a new list of possible rewards. We propose four different badges: Beginner Badge: The participant receives this badge after registration. No rewards are available for this badge. Bronze Badge: The participant can exchange 1000points of their score to get this badge. Once the participant has obtained this badge, a list of rewards is available. Silver Badge: To obtain this badge, the participant must exchange 2000 points of their score for the previous badge. Once the participant obtains this badge, a list of additional rewards is available. Golden Badge: To earn this badge the participant should exchange 3000 points from their score. Once the participant gets this badge a list of additional rewards is available. Since the golden badge is the high-level badge, the participant will possess a medallion for every 1000 points collected without being exchanged. Rewards: The participant can choose a reward from an existing list on the application, based on the points he/she owns. Once a reward has been chosen, the user can exchange their points to obtain it. Leaderboard: To motivate the competitiveness among the participants, a leaderboard has been proposed to show the ranking of the best participants in each week, based on the amount of contribution in a week, and showing the badges they own and their medallions if available. This game design will be useful to increase the amount of reliably collected data, the user will always aim to get more points for his reliable contribution to get high-level badges and medallions, which provide a sense of satisfaction and victory, and to get rewards that are a good monetary incentive. The leaderboard design element will provide a sense of competitiveness among users, then each user will try to contribute more good quality data, to be ranked among the best.

A Novel Mobile CrowdSensing Architecture for Road Safety

3.2

317

Data Quality Optimization

The aim of our architecture is not only to collect the largest amount of data but also reliable and good quality data. However, participants may provide low quality, irrelevant, or forged data unintentionally, or deceptively in particular if rewards are presented. In addition to the unnecessary rewards, providing these types of data will reduce the accuracy of analytical decisions, which is not allowed in tasks such as road safety. Hence, data validation mechanisms should be applied to improve data quality, by filtering out unqualified data; small size, blurred or pixelated images, irrelevant data; data that is not related to road safety or forged data; data intentionally edited. One of the intricate challenges in MCS is the detection of these types of data since it is not convenient that this step depends on an expert to analyze each data, machine learning techniques are a very good solution to do the task automatically. Several works have adopted machine learning techniques to detect forged data in ITS. To detect forgery attacks that could adversely affect the needs of travelers in the transportation system, researches in [31] generate fake data using Generative Adversarial Network GAN based on several attack scenarios and then adopts the Long Short-Term Memory (LSTM) regression technique combined with Manhattan Similarity to distinguish real data from falsified data. In [34], to detect extraneous images for ITS data collection, Convolutional Neural Network (CNN) classification and recognition techniques were used. [11] proposes the implementation of GAN’s discriminator network at the ITS edge components to distinguish fake data generated by an adversary and real data to avoid the manipulation of the self-driving vehicles decisions. The presence of a data validation mechanism will undoubtedly increase the efficiency of the system, by improving the reliability of the sensing task. The verification of data quality will not only ensure that the MCS process we propose, will result in reliable and good quality data, but will also avoid wasting rewards. Also, processing this mechanism in the device is more beneficial to our solution, since it will reduce transmission latency. 3.3

Privacy and Security Protection

While the sensing task, sensitive information describing the user’s private environment may be uploaded via images or videos, which can be obtained by attackers. A data privacy protection mechanism should be applied focusing on the content of the data submitted. Data processing should be used for this task, to detect private information about the participant or either citizen on the road to be hiding. Machine learning techniques [5,24] can be used to detect faces, license plates and much private information uploaded via data, then the system should hide all these elements. However, it may be challenging to send all the data to the cloud for processing, although submitting the data with private information may

318

W. Abou-zbiba et al.

threaten the confidentiality of the users or the citizens in the road, the presence of a large amount of data from many participants affect the response time in the cloud. Besides, the transmission of large data such as images and videos for data validation will cause high transmission and processing latency. For this purpose, we suggest the use of edge computing. The edge computing involves processing data at the proximity of data sources. Several works have implemented edge nodes between physical sensing devices and the cloud. The edge computing paradigm has been used to collect data from the collector, process, and filter it, and then aggregate it to the cloud [18,34]. In our architecture, we consider the mobile phone as an edge server, which can be described by the edge between sensors and cloud [25]. To make data more reliable and more meaningful, it should be tagged with spatial-temporal coordinates, which could disclose sensitive information about the mobile user, informed of its current position, eventual activities, and daily habits. This information presents a serious risk to user security. Therefore, it is incontestable that the privacy and security of users must be ensured. A widely used technique is anonymization, which strips the user’s identity from data, but since rewards will be presented based on the data submitted, the identity should not be removed. For this reason, the system to be developed should preserve user privacy and security while enabling the application of the incentive mechanism. The user should be rewarded without being associated with their collected data [15]. In this sense, many techniques were used. [4] proposes a data anonymization method, by adding noise to collected data to avoid the disclosing of the personal identity information of the user in MCS, to, build a crowdsensing coalition under one generalized identity to achieve the k-anonymity property. [28] proposes a cryptographic solution for the data aggregation phase, by applying a secure aggregation protocol. [32] applied a pseudonym, cryptographic techniques, and a hash function to ensure identity privacy. To solve the privacy, information gathering, and fault tolerance issues in MCS networks, [10] proposed a blockchain-based MCS system. Privacy is the most important concern for the participant, if a good incentive mechanism is provided while ignoring privacy, most participants will avoid collaboration in the sensing task. As a result, user privacy protection may unconsciously increase the amount of data. Additionally, in our architecture, we propose privacy protection for data content, which is not available in any road data collection architecture in the literature. The privacy protections of the data content will ensure that the participant legally collects the data, which will increase the involvement of the collaboration. Furthermore, preserving both the privacy of participants and road users will ensure their security against malicious attacks.

4

Our SI-CAR’s Proposed Architecture

The main purpose of Moroccan authorities in our context is to improve road safety, which will be done by analyzing and well understanding the situation.

A Novel Mobile CrowdSensing Architecture for Road Safety

319

The application to be implemented with the involvement of voluntary Moroccan citizens presents a good solution for this issue. By providing Moroccan authorities with good quality data describing road conditions, road anomalies, risk situations, accidents, dangerous behaviors of road users, etc., the application should be flexible, open, and enable participants to seamlessly contribute to the collection of data in a fun and meaningful way. The voluntary participant will install the mobile application of our proposed architecture, he/she will be invited to register, once the registration is done, a beginner badge will be provided to him/her. The participant will collect the data considered as useful, either image, video, or accelerometer data, and can write annotations to share more details. These data are tagged with metadata including GPS coordinates and dates for an accurate and reliable description of the situation. The collection task is done offline and the data is stored in the device. When the participant decides to provide the data, privacy data preservation is applied to hide private elements such as faces/license plates. Then a validation mechanism will also be applied, to filter unqualified and irrelevant data, if the system judges that the data can be provided, the user must choose the transmission mode (WI-FI, 3G, 4G...) and authorize the transmission, if the system judges that the data cannot be transmitted, the user receives a notification on the failure of the transmission, including the cause of the rejection. Our system’s cloud server will aggregate the transmitted data and store it for further processing. While processing, redundant data in the system, data with irrelevant GPS coordinates, and irrelevant data will be deleted. Only the user who has sent accepted data will get points based on the type and amount of data submitted. Machine learning techniques will be applied to the accepted data to further analyze it and extract more information to send to the stakeholder. The participant can get rewards or higher-level badges depending on the points obtained. The participant can also follow the leaderboard available in the application to check the ranking of his/her contribution, to make the task more competitive. On the other hand, the decision-maker stakeholder takes on a specific platform including interfaces that give him access to the collected data, the participant’s list, and their interaction with the application, as well as dashboard interfaces. The architecture of the proposed system is illustrated in Fig. 2, it consists of four main layers: Data Collection Layer, Data Acquisition Layer, Data Storage Layer and Data Processing Layer, and two services: App services and Security Management service. The Data Collection Layer enables the user to collect data: image, video... using the mobile device’s sensors and user annotations that can describe the sensing situation or add additional information. Or uploading collected data either via Bluetooth, WI-FI or from the device. This layer involves preregistration and authentication of the mobile user, once the mobile user gets registered, he/she should be able to supply data and get rewards depending on the quality of the data submitted.

320

W. Abou-zbiba et al.

Fig. 2. SI-CAR system architecture.

The Data Acquisition Layer receives data from the data collection layer and stores it in the device, to apply the data validation mechanism to filter out unqualified data and to hide the private information in the images and videos before submitting it to the cloud. This processing will be held on the same device. The data will be transmitted to the next layer after obtaining the user’s authorization, who will be able to choose the mode of transmission. The Data Storage Layer is a cloud server that consists of aggregating data that has been received from the Data Acquisition Layer. The Data Processing Layer is a cloud computing layer that provides resources to process aggregated data to verify the reliability and eliminate redundant data. To decide if the participant should have a reward or not, and providing the decision to the user. This layer consists also in applying data analysis, reporting, and data visualization services to retrieve as much knowledge as possible before providing it to the stakeholder application. App services present the incentive mechanism service. Its main role consists of applying the game design elements we proposed for this architecture, acting as a link between the cloud server which provides points depending on the data submitted and the users. This service is in charge of ranking all the users according to their interactions to generate the leaderboard. Security Management service is responsible for providing a secure service that combines varied security aspects such as data security in terms of data integrity, data access, authentication, authorization, data privacy... and road users’ privacy concerning faces and other personal and private elements. Our proposed architecture presents a set of solutions for an efficient data collection process. The combination of all its elements makes it a new solution for traffic data collection. This architecture is based on the involvement of participants, therefore an incentive mechanism based on game techniques has been

A Novel Mobile CrowdSensing Architecture for Road Safety

321

included to motivate them. Also, to guarantee the quality of the data and to filter irrelevant data before submitting it to the cloud server, we have proposed a data validation mechanism in the user’s device. To protect the user’s privacy and the confidentiality of the data content, we have proposed to implement data processing techniques in the user’s device to hide private information before submitting it to the cloud. By including the privacy aspect of the data content, we propose a new solution in terms of traffic data collection, to ensure that the data collection task is transmitted legally and that everyone will not be confronted with a risk to their life safety.

5

Conclusion

Traffic data collection is one of the major parts of ITS dedicated to the preservation of road safety. The quality and quantity of the collected data influence the result of the analysis and thereby the decision of the authorities. Therefore, efficient architecture must be provided to ensure the collection of a large amount of high-quality data. In this paper, we proposed SI-CAR architecture: an MCS solution to collect road data in Morocco. The architecture proposed combines different paradigms; gamification, machine learning, edge computing, and security aspects to guarantee the collection of a large quantity of reliable and good quality data while preserving the security of the data collector and the privacy of the road users. This paper introduced ITS and their need for good quality data to improve the road safety situation. Besides, MCS paradigm was discussed as a promising mechanism to address this issue. For future work, it is envisaged to develop the SI-CAR mobile application based on the architecture proposed in this paper. A further perspective is to verify the performance of each aspect presented in the architecture, with real participants, to assess its effectiveness. Acknowledgment. This research received funding from the Moroccan Ministry of Equipment, Transport and Logistics (METL) and the National Road Safety Agency (NARSA) and was supported by the Moroccan National Center for Scientific and Technical Research (CNRST).

References 1. WHO—Global status report on road safety 2018 (2018). https://www.who.int/ violence injury prevention/road safety status/2018/en/ 2. Akikawa, R., Uchiyama, A., Hiromori, A., Yamaguchi, H., Higashino, T., Suzuki, M., Hiehata, Y., Kitahara, T.: Smartphone-based risky traffic situation detection and classification, pp. 1–6 (2020). https://doi.org/10.1109/ percomworkshops48775.2020.9156157 3. AlOrabi, W.A., Rahman, S.A., Barachi, M.E., Mourad, A.: Towards on demand road condition monitoring using mobile phone sensing as a service. Procedia Comput. Sci. 83(Ant), 345–352 (2016). https://doi.org/10.1016/j.procs.2016.04.135

322

W. Abou-zbiba et al.

4. Alsheikh, M.A., Jiao, Y., Niyato, D., Wang, P., Leong, D., Han, Z.: The accuracyprivacy trade-off of mobile crowdsensing. IEEE Commun. Mag. 55(6), 132–139 (2017). https://doi.org/10.1109/MCOM.2017.1600737 5. C´ ardenas, R.J., Beltr´ an, C.A., Guti´errez, J.C.: Small face detection using deep learning on surveillance videos. Int. J. Mach. Learn. Comput. 9(2), 189–194 (2019). https://doi.org/10.18178/ijmlc.2019.9.2.785 6. Deterding, S., Dixon, D., Khaled, R., Nacke, L.: From game design elements to gamefulness: defining “gamification”. In: Proceedings of the 15th International Academic MindTrek Conference: Envisioning Future Media Environments, MindTrek 2011, pp. 9–15 (2011). https://doi.org/10.1145/2181037.2181040 7. El Abdallaoui, H.E.A., El Fazziki, A., Ennaji, F.Z., Sadgal, M.: A gamification and objectivity based approach to improve users motivation in mobile crowd sensing, pp. 153–167 (2018). https://doi.org/10.1007/978-3-030-00856-7 10, https:// doi.org/10.1007/978-3-030-00856-7 18 8. Elkotob, M., Osipov, E.: iRide: a cooperative sensor and IP multimedia subsystem based architecture and application for ITS road safety. Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, vol. 16 LNICST, pp. 153–162 (2009). https://doi.org/10.1007/978-3-64211284-3 16 9. Fen, H., Yingying, P., Jingyi, S.: Springer Briefs in Mobile Crowd Sensing : Incentive Mechanism Design (2019). https://doi.org/10.1007/978-3-030-01024-9 10. Feng, W., Yan, Z.: MCS-Chain: Decentralized and trustworthy mobile crowdsourcing based on blockchain. Future Gener. Comput. Syst. 95, 649–666 (2019). https:// doi.org/10.1016/j.future.2019.01.036 11. Ferdowsi, A., Challita, U., Saad, W.: Deep learning for reliable mobile edge analytics in intelligent ransportation systems (2017). http://arxiv.org/abs/1712.04135 12. Forum, I.T.: Road safety annual report 2019 Morocco. Technical report (2019) 13. Furdu, I., Tomozei, C., Kose, U.: Pros and cons gamification and gaming in classroom, pp. 56–62 (2017). http://arxiv.org/abs/1708.09337 14. Garcia-Iruela, M., Fonseca, M.J., Hijon-Neira, R., Chambel, T.: Gamification and computer science students’ activity. IEEE Access 8, 96829–96836 (2020). https:// doi.org/10.1109/ACCESS.2020.2997038 15. Gisdakis, S., Giannetsos, T., Papadimitratos, P.: Security, privacy, and incentive provision for mobile crowd sensing systems. IEEE Internet Things J. 3(5), 839–853 (2016). https://doi.org/10.1109/JIOT.2016.2560768 16. Liu, J., Shen, H., Zhang, X.: A survey of mobile crowdsensing techniques: a critical component for the internet of things. In: 2016 25th International Conference on Computer Communications and Networks, ICCCN 2016, pp. 1–6, August 2016. https://doi.org/10.1109/ICCCN.2016.7568484 17. Liu, Q., Kumar, S., Mago, V.: SafeRNet: safe transportation routing in the era of Internet of vehicles and mobile crowd sensing. In: 2017 14th IEEE Annual Consumer Communications and Networking Conference, CCNC 2017, pp. 299–304 (2017). https://doi.org/10.1109/CCNC.2017.7983123 18. Marjanovic, M., Antonic, A., Zarko, I.P.: Edge computing architecture for mobile crowdsensing. IEEE Access 6, 10662–10674 (2018). https://doi.org/10. 1109/ACCESS.2018.2799707 19. Mishra, A., Priya, A.: A comprehensive study on intelligent transportation systems. Smart Moves J. Ijosci. 4(10), 10 (2018). https://doi.org/10.24113/ijoscience.v4i10. 167

A Novel Mobile CrowdSensing Architecture for Road Safety

323

20. Mubin, S.A., Wee Ann Poh, M.: A review on gamification design framework: how they incorporated for Autism children. ICRAIE 2019 - 4th International Conference and Workshops on Recent Advances and Innovations in Engineering: Thriving Technologies, pp. 1–4, November 2019. https://doi.org/10.1109/ICRAIE47735. 2019.9037765 21. Pouryazdan, M., Fiandrino, C., Kantarci, B., Soyata, T., Kliazovich, D., Bouvry, P.: Intelligent gaming for mobile crowd-sensing participants to acquire trustworthy big data in the Internet of Things. IEEE Access 5, 22209–22223 (2017). https:// doi.org/10.1109/ACCESS.2017.2762238 22. Qureshi, K.N., Abdullah, A.H.: A survey on intelligent transportation systems. Middle East J. Sci. Res. 15(5), 629–642 (2013). https://doi.org/10.5829/idosi. mejsr.2013.15.5.11215 23. Rodrigues, J.G., Aguiar, A., Vieira, F., Barros, J., Cunha, J.P.: A mobile sensing architecture for massive urban scanning. In: IEEE Conference on Intelligent Transportation Systems, Proceedings, ITSC, pp. 1132–1137 (2011). https://doi.org/10. 1109/ITSC.2011.6082958 24. Sawat, D.D., Hegadi, R.S.: Unconstrained face detection: a deep learning and Machine learning combined approach. CSI Trans. ICT 5(2), 195–199 (2017). https://doi.org/10.1007/s40012-016-0149-1 25. Shi, W., Cao, J., Zhang, Q., Li, Y., Xu, L.: Edge computing: vision and challenges. IEEE Internet Things J. 3(5), 637–646 (2016). https://doi.org/10.1109/JIOT.2016. 2579198 26. Soares, J., Silva, N., Shah, V., Rodrigues, H.: A road condition service based on a collaborative mobile sensing approach. In: 2018 IEEE International Conference on Pervasive Computing and Communications Workshops, PerCom Workshops 2018, pp. 639–644 (2018). https://doi.org/10.1109/PERCOMW.2018.8480346 27. Ueyama, Y., Tamai, M., Arakawa, Y., Yasumoto, K.: Gamification-based incentive mechanism for participatory sensing. In: 2014 IEEE International Conference on Pervasive Computing and Communication Workshops, PERCOM WORKSHOPS 2014, pp. 98–103 (2014). https://doi.org/10.1109/PerComW.2014.6815172 28. Wu, H., Wang, L., Xue, G.: Privacy-aware task allocation and data aggregation in fog-assisted spatial crowdsourcing. IEEE Trans. Netw. Sci. Eng. 7(1), 589–602 (2020). https://doi.org/10.1109/TNSE.2019.2892583 29. Xiong, Z., Sheng, H., Rong, W.G., Cooper, D.E.: Intelligent transportation systems for smart cities: a progress review. Sci. China Inf. Sci. 55(12), 2908–2914 (2012). https://doi.org/10.1007/s11432-012-4725-1 30. Yan, X., Zhang, H., Wu, C.: Research and development of intelligent transportation systems. In: Proceedings - 11th International Symposium on Distributed Computing and Applications to Business, Engineering and Science, DCABES 2012, pp. 321–327 (2012). https://doi.org/10.1109/DCABES.2012.107 31. Yunanto, W., Pao, H.K.: Deep neural network-based data forgery detection in transportation system (2019) 32. Zhang, J., Ma, J., Wang, W., Liu, Y.: A novel privacy protection scheme for participatory sensing with incentives. In: Proceedings - 2012 IEEE 2nd International Conference on Cloud Computing and Intelligence Systems, IEEE CCIS 2012, vol. 3, pp. 1017–1021 (2013). https://doi.org/10.1109/CCIS.2012.6664535

324

W. Abou-zbiba et al.

33. Zhao, X., Wang, N., Han, R., Xie, B., Yu, Y., Li, M., Ou, J.: Urban infrastructure safety system based on mobile crowdsensing. Int. J. Disaster Risk Reduction 27(September 2018), 427–438 (2018). https://doi.org/10.1016/j.ijdrr.2017.11.004 34. Zhou, Z., Liao, H., Gu, B., Huq, K.M.S., Mumtaz, S., Rodriguez, J.: Robust mobile crowd sensing: when deep learning meets edge computing. IEEE Network 32(4), 54–60 (2018). https://doi.org/10.1109/MNET.2018.1700442

An Agent-Based Architecture for Multi-modal Transportation Using Prometheus Methodology Design Jihane Larioui(B) and Abdeltif El Byed HASSAN 2 University Faculty of Science Ain Chock, Casablanca, Morocco [email protected], [email protected]

Abstract. In recent years, the demand for public transport and the need for travel have increased considerably. The movement of travelers is becoming increasingly difficult and complex, hence the need to develop an intelligent solution that will facilitate decision-making and route planning. In addition, ITS has become essential for the proper management of urban mobility. However, the data processed in the ITS is large and diverse, which makes it less usable, before obtaining the desired information. For that, the authors propose to set up a semantic approach in order to locate the various WEB resources, to standardize them and to make them comprehensible and usable by the various agents of the system. The objective of this work is to present a new approach for the development of an architecture based on multi-agent systems coupled with semantic web services (SWS) and this in order to help decision making in the context of urban mobility. The design of this architecture is also presented using the Prometheus methodology to effectively manage the route planning process in multimodal transport. Keywords: Urban mobility · Multi-agent system · Intelligent transport system (ITS) · Ontology · Prometheus methodology · Semantic web · Route planning · Multi-modal itinerary · Prometheus design tool

1 Introduction Multi-modal transportation refers to the use of at least two different types of transport to reach the destination. Hence, the need for a real-time information system on departures, routes and traffic conditions before and during the journey [3, 5]. Several researchers and industrialists have been involved in the development of multimodal information services and systems not only to improve the comfort of travelers but also to encourage people to use public transport to preserve the environment from pollution but also to respond to the problems of road congestion and urban traffic. The information system presented is based on a multi-agent architecture to associate user requests with information linked to the different transport operators. The system allows you to choose the modes of transport to be combined and offers routes that meet route requests. Thus, the traveler will avoid consulting several transport websites to plan his personal trip [12, 14] because he can express his preferences between different modes © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 325–343, 2021. https://doi.org/10.1007/978-3-030-66840-2_25

326

J. Larioui and A. E. Byed

of transport and define a decreasing order of priority with several criteria such as time, number correspondence, cost and safety. However, the data manipulated in the ITS is large and diversified which makes the information less exploitable and leads us to a considerable loss of time before obtaining the desired information. In order to manage these different sources of information, their semantic conflicts and to ensure interoperability between all these sources involved in the multimodal transport network, the authors decided to integrate all this information so that they are unified, flexible and understandable by all the agents of the multimodal information system. In addition, a semantic approach can be effective in overcoming, sharing knowledge and communication between the different agents of the system. Usually, most research contributions consider a multi-agent system to improve route planning in a multimodal network, other than that, they have not considered using the Semantic Web to allow flexible query information. The purpose of the semantic layer is to characterize existing information in order to facilitate the automation of services, to discover, link and infer similar knowledge. Its architecture is based on a hierarchy, which aims to represent knowledge on the web by meeting the criteria of standardization, interoperability and flexibility. In addition, it is used to efficiently integrate data and information from the system, also reduce uncertainty in decision-making and resolve semantic conflicts resulting from cooperation between different data sources. To cope with this situation, decision-making requires an intelligent and efficient modeling methodology to support the main tasks of urban mobility. Consequently, the multi-agent system used breaks down complex problems into small sub-problems that are easy to manage and solve by the individual agent in cooperation. However, the design of agents who can work together towards a common goal is one of the major challenges in the field of artificial intelligence. In fact, to develop this system, the authors chose the Prometheus methodology because it is generally designed as a complete, practical and easily achievable methodology specifically for the design of multi-agent systems, providing everything necessary for specification and design of agents [35]. The rest of this article is organized as follows: Sect. 2 begins with a Background and review of the literature on multimodal transport systems. The detailed design of the agent-based architecture is described in Sect. 3. Section 4 describes the semantic layer and the approach used for the development of the ontology. Section 5 discusses the Prometheus methodology and presents the design of the proposed agent system. The final section concludes the paper with some remarks and perspectives.

2 Background Urban mobility is the subject of several current studies. The subject is attracting more and more researchers and manufacturers. This subject has often been linked to transport problems, particularly in terms of travel difficulties. Indeed, passenger information is by origin complicated information. This information generally relates to the route itself and is always related to the traveler. The latter plans his trip according to his own needs, criteria and preferences. Consequently, the paradigm of multi-agent information systems for personalization, collection and integration is an effective means of solving the problem of multimodal passenger information.

An Agent-Based Architecture for Multi-modal Transportation

327

Fayech and et al. [3] created a decision support system whose aim is to carry out traffic monitoring, incident detection, diagnostics and regulation. This system relies on cooperation between an agent approach and an algorithmic approach to define regulatory decisions. K. Zidi and et al. [21] produced an interactive travel assistance system, in normal mode, and in degraded operating mode of the public transport network. This system also aims to minimize the waiting time of passengers, in degraded mode, in the interchanges and to ensure, as far as possible, the continuity of journeys in multimodal networks. Kamoun’s work [16–19] addresses the design of an information system for multimodal travel assistance based on a multi-agent approach for searching and composing routes online. He explained the architecture of multimodal information systems and showed the contribution of multi-agent systems in design and production of information systems. He also focused on search algorithms and route composition. A simulation platform for decision support and regulation of urban transport systems is proposed by QT Nguyen [31]. The author has implemented a software architecture of a simulation tool to help a decision maker in charge the regulation of an urban transport system, in its work of analysis and evaluation of the impacts of regulatory strategies. Its system is based on a simulator based on agents integrating geographic and temporal information to evaluate control scenarios. Responding to user needs and preferences has recently become a major objective of ITS. Wang, Ding and Jiang [15] have developed an ontology-based public transport request system to provide an efficient and timely information service. Their approach is based on an improved public transport demand algorithm. This algorithm takes into account both user needs and traffic conditions. The problem to solve concerns stations of the same name that can be found in different places. Houda et al. [26] developed a passenger planning system based on the field of transport. The system supports a multimodel path model, which helps the user find the most convenient path from one position to another with several possible options. The choice of the user depends on interesting route models, shopping routes, etc. The system is limited to roads and railways only. The related works mentioned above give an idea on the ontologies created in the transport field to manage the planning of trips between two sites. However, this work only offers systems to meet travel needs and does not take into account user preferences. To the knowledge of the authors, none of the existing agent-based methodologies has been sufficiently demonstrated to support the tasks of a multimodal transport system. However, part of this contribution of this article is based on the design of the architecture of the Multi-agent system under the Prometheus methodology using the Prometheus Design Tool to identify the types of agent and the architecture of the agent. In this article, the authors present an information system for the field of urban mobility based on a semantic approach. This system covers all modes of transport and is capable of managing travel planning as well as answering questions related to transport costs, journey time, route, number of mode changes and safety.

328

J. Larioui and A. E. Byed

3 Multi-agent Information System: Architecture Detailed Design 3.1 Organization of the Multi-agent Information System Faced with the complexity of the data manipulated in ITS systems which are large and diverse, the authors proposed a new approach combining semantic web technologies and the multi-agent paradigm to design an advanced multi-agent information system for transport multimodal. It would be an effective way to communicate and coordinate among the different agents and improve the decision-making process. Previous work for the development of an advanced and intelligent multimodal information system has been proposed to develop a new architecture based on multi-agent systems. This architecture consists of six layers which are the HMI layer, the selection layer, the decision-making layer, the information layer, the semantic layer and the physical layer [13, 14] (Fig. 1). • The IHM layer is composed of The PTA “Personal Travel Agent” representing the Human Machine Interface. • The Selection Layer is composed of The DSA “Directory Selecting Agent”, which defines the search domain by specifying the information agents that will work together to plan the route and propose the different combinations of possible information agents. • The Decision-making layer is composed of three agents: • The SA “Sorting Agent”, which examines the different routes proposed by the DSA and decides how to treat them according to the preferences of the users. • The DMA “Decision Making Agent” which is based on the method TOPSIS “Technique for order of preference by similarity to ideal solution” as a MCDM methodology to facilitate decision-making and choose the itinerary that will satisfy the user’s preferences. • The CA “Calculating Agent”, which takes up the routes proposed by the DSA in order to calculate for each route its necessary parameters. These parameters are calculated based on user preferences (travel time, number of mode changes, cost and safety). • The Information Layer is composed of the IA “Information Agents”, which are responsible for searching, collecting, integrating and manipulating the information from the different sources of information. • The Semantic layer, which uses semantics web technology to improve the flexibility between the different agents of the system, in different terms; the semantic layer act like a middleware between the physical layer and the information layer. In addition, the semantic layer is used to efficiently integrate data and information in the system, also reduce the uncertainty in the decision-making process and to resolve the semantic conflicts generated from the cooperation between the different data sources. • The physical layer, which encapsulates multiple data sources, for example: the databases of the different transport operators (Tramway, metro, bus…etc), the transport network architecture, road ITS applications, Sensors, surveillance camera…etc. These systems produce information on urban networks and are essential when the

An Agent-Based Architecture for Multi-modal Transportation

329

passenger is moving because they provide multimodal information and facilitate the use of networks. 3.2 Architecture Detailed Design The authors suggest designing an agent-based information system capable of finding the source of information necessary to meet the diverse demands of users. This system should be able to produce optimized multimodal information in real-time and calculate the route requested and should access the database of the various transport operators and integrate the results generated by the various agents that compose it. Figure 1 shows the detailed design of the proposed multi-agent system architecture. First, the user sends his request, specifies his departure and arrival stations and sets his preferences in descending order of priority with several criteria such as time, number of connections, cost and safety.

Fig. 1. Multi-agent information system architecture

Thus, the main tasks of this system is summarized in: • • • •

Assistance of passengers (Keep the passenger informed of his journey in real time) Analysis of preference criteria Decision making Route Planning

330

J. Larioui and A. E. Byed

The system is composed of four modules Fig. 2, each module is composed with different agents that is associated with: • Assistance module that is responsible for accompanying and assisting the user. • Analysis and decision making module that analyzes the criteria of preferences expressed by the user • Route planning module that calculates the final route that meets the needs of the user. • Information source module with access to all information on the network, including schedules.

Fig. 2. Multi-agent information support system modules

4 Semantic Layer: Methodology for Ontology Development 4.1 Approach The first appearance of the word ontology was in Aristotle’s philosophical essays, where he described the nature and organization of being. On the other hand, in artificial intelligence, an ontology represents a domain of knowledge. Ontology can be defined simply as a hierarchical description of important concepts in a field [22, 23]. The concepts that make up an ontology include several elements: the class, the subclass, the class hierarchy, the instance, the location, the value, the default value, the facet, the type, the cardinality, the inheritance, variable and relation [24]. An ontology has several functions in particular, to share a generic understanding of the structure of information between human and/or software agents, to allow the reuse of domain knowledge, to make domain hypotheses more precise, to create semantics interoperability between different source data, to provide a clear and precise analysis of terms [20].

An Agent-Based Architecture for Multi-modal Transportation

331

In multi-agent systems, agents are linked to various data sources. These data sources must be effectively integrated into the system in order to reduce semantic conflicts. The most effective way to remedy this problem is to put in place a global ontology that will be used by all agents. Agents sharing the same ontology can exchange knowledge and communicate fluently because their knowledge representations are compatible with the concepts considered relevant and with the names given to these concepts [15]. In this way, the decision-making process will be set up in better conditions. The semantic layer implemented in the system is made up of four main functionalities: The first concerns connectivity to data sources in order to facilitate access to data. This first step then makes it possible to extract the most relevant concepts and eliminate the misunderstanding between these data sources. Then comes the most important step which allows to generate standard and unified models for these data sources and finally to have a flexible data model understandable by all the agents of the system (Fig. 3).

Fig. 3. Semantic layer process description

The aim of this approach is to formulate all the data and concepts contributing to the semantic definition of the multimodal transport system and to standardize the description models so that they are comprehensible and usable by the various agents of the system [35].

332

J. Larioui and A. E. Byed

4.2 The Ontology Design Methodology The authors used a hybrid architecture to define the design methodology for the ontology used. This architecture offers a bottom-up process to create global ontologies from local ontologies and define the correspondence between them [30]. The proposed solution avoids misunderstandings that can arise during exchanges and communications between agents and aims to allow them to understand each other when using this information processed by these ontologies. The software used to design the ontologies is Protege-5.1.0 [27], it is a free open source editor and framework and it supports the latest OWL 2 Web Ontology Language and RDF specifications. of the World Wide Web Consortium. As Fig. 4 shows, the development of the ontology includes three phases:

Fig. 4. Approach for constructing the multimodal transportation network

• The first phase: consists of building local ontologies. Foremost, it is essential to analyze the different data sources, each independently. After that, the concepts considered relevant are defined for each ontology, their relationships and their usage constraints. • The second phase: after having defined the local ontologies, now, it’s time to pull out the global ontology from different defined concepts. This phase includes two stages: first, the analysis of local ontologies, then the selection of all concepts and the resolution of semantic conflicts in order to define the global ontology and its particular concepts.

An Agent-Based Architecture for Multi-modal Transportation

333

• The third phase: it consists of defining the mapping between global and local ontologies. The global ontology is constructed from local ontologies using OWL Annotations. To detail the construction of the different ontologies of the systems, the authors focus first on the first phase, which concerns the construction of local ontologies. This article only considers the cases of two data sources, those of the bus and the tram. The global ontology of the multimodal transport network takes up the different concepts and relationships of the local ontologies that is created: Tramway ontology and Bus ontology. The figures below show respectively the local ontologies constructed from the Bus network and that of the tramway. The ontologies created use the modeling of the transport network for each mode by illustrating major concepts such as lines, stations, operator, vehicle … etc. (Fig. 5 and 6).

Fig. 5. Fragment of bus network local ontology

Fig. 6. Fragment of tramway network local ontology

By analyzing these concepts, it is remarkable that each ontology defines its concepts differently, for example, the Station concept defined in the Tramway ontology is equivalent to Bus stop in Bus ontologies. On the other hand, there are concepts that have the same definition, for example, Schedule or Operator, also the concept of Vehicles and Lines (Figs. 7 and 8). Now, it is time to focus on the second phase which consists in extracting the global ontology from different concepts defined in the local ontologies (local ontology of the tramway network/local ontology of the bus network). After analyzing the local ontologies, the global ontology is determined using the OWL annotation. This global ontology called the transport network takes up the concepts of local ontologies and adds new concepts to them in order to meet the needs of the multimodal system such as the concept of mode of transport or even connection links (Fig. 9). In addition, in order to approach the concept of safety in the field of transport, zones in cities were defined as a concept and their ratings were assigned according to the

334

J. Larioui and A. E. Byed

Fig. 7. Bus network local ontology

Fig. 8. Tramway network local ontology

degree of safety in this zone. In addition, the concept of travel also joins other concepts of the global ontology and has for attributes: cost, duration, Start Point and End Point (Fig. 10).

An Agent-Based Architecture for Multi-modal Transportation

Fig. 9. Fragment of transportation network global ontology

Fig. 10. Transportation network global ontology

335

336

J. Larioui and A. E. Byed

5 Design Based Prometheus Methodology 5.1 Prometheus Methodology and Tools The Prometheus methodology is an agent oriented software engineering methodology [8]. The Prometheus Method consists of the following three phases: • The system specification phase that focuses on identifying the goals and basic functionalities of the system, along with inputs (percepts) and outputs (actions) • The architectural design phase that uses the outputs from the previous phase to determine which agent types the system will contain and how they will interact • The detailed design phase that looks at the internals of each agent and how it will accomplish its task within the overall system [24]. The following sections summarize the results of the different design phases. Figure 11 indicates the main design artifacts arising from each of these phases as well as some of the intermediary items and relationships between items.

Fig. 11. Prometheus methodology phases

The choice of the Prometheus methodology is supported by the fact that this methodology is complete, practical and easily implementable specifically for the design of multi-agent systems, providing everything necessary to specify and design the agents

An Agent-Based Architecture for Multi-modal Transportation

337

[23]. An important practical aspect is that the Prometheus design tool (PDT) is available free of charge. In addition, it has integrated checks for completeness and consistency. Prometheus is based on industry standards such as case scenarios, UML, AUML sequence diagrams and the Rational Unified Process (RUP) [23, 24]. 5.2 System Design and Development Designing a multi-agent system is a complicated and iterative process. These systems can contain hundreds of agents operating in real time, perceiving, communicating, negotiating, making decisions and executing. Such a complex system must be properly designed to include not only all of the functionality necessary to keep the system running and accurate, but also to make it scalable and generic. In order to systematically meet this challenge, a complete and detailed methodology should be used, the aim of which is to guide throughout the process of decomposing the objective from one complex objective to several smaller, more manageable objectives, so that the system can be fully understood and then designed [36]. The multi-agent system should be able to provide an itinerary regarding user preferences, including travel time, cost, number of mode changes and safety. System specification The system specification is the initial phase of the Prometheus methodology. This phase consists on describing the scenarios, goals, and explain how the functionalities defined should act upon the goals. Systems Goals The main goals of the agent system are briefly presented below on Fig. 12. Scenarios This section describes scenarios that take place in the system together with their steps. • S1: The Assistance Process Scenario: Once the user sends the requests, the agent system split the request into two sub-request and send them to Scenario S2 and S3; • S2: The Identification Process Scenario: Once the points of departure and arrival are received, the agent system has to defines the search domain by specifying the information agents that will work together to plan the route and propose the different combinations of possible information agents. Then, sends a list of suggestions to Scenario S4 • S3: Data Collection process Scenario: The agent system search, collects and integrates Data regarding the departure and arrival requested and send the response to Scenario S2. • S4: The Sorting Process Scenario: After receiving the user’s preferences and the list of suggestions, the agent system examines the different routes proposed and decides how to treat them according to the preferences of the users. Scenario S5 and S6 • S5: The Calculation Process Scenario: After getting the list of suggestions itineraries, the agent system calculates for each route its necessary parameters. These parameters are calculated based on user preferences (travel time, number of mode changes, cost and safety) and send the results to Scenario S6

338

J. Larioui and A. E. Byed

Fig. 12. Goal overview diagram

• S6: The Decision Making Process Scenario: Once the calculated parameters are defined, the agent system defines the optimal itinerary using a MCDM methodology that will satisfy the user’s preferences (Fig. 13).

Fig. 13. System scenario overview diagram

An Agent-Based Architecture for Multi-modal Transportation

339

Architectural Design The purpose of this section is to define what agents are to be parts of the system and how they interact with each other and which external connections are required. The artifacts produces during the system specification are used as a basis for developing the high-level design of agent system. This section presents the architecture of the multi-agent system. It starts with an illustration of how the agents are distributed in the simulated environment and continues with a detailed description of the agent types and the overall interaction between them (Fig. 14).

Fig. 14. Agent role grouping overview

The PTA agent sends messages to Sorting agent and Directory selecting agent. The Directory Selecting agent sends requests to the Information agent and wait for the response. The sorting agent communicates in the some way and in parallel with the calculation agent and Decision making agent. The agent types of this architecture are defined by considering the roles and scenario of the system specification.

Fig. 15. Agent acquaintance diagram

340

J. Larioui and A. E. Byed

The interaction between agents capture the dynamic aspects of the system, Fig. 15 shows an acquaintance diagram that illustrates how the agents are connected. Detailed Design The final design phase builds upon the agent descriptors defined in the previous phase and further specifies the behavior of every agent. This includes all triggers the agent responds to, how it responds accordingly, and which agents it interacts with.

Fig. 16. System overview diagram

An Agent-Based Architecture for Multi-modal Transportation

341

The roles and tasks of these agents are described, and the overall interaction between them. Figure 16 provides an overview of the architecture with the main entities involved. The agent overview diagram developed as an example using PDT detailed design process. PDT has the ability to validate each design entity dynamically while the development process is running.

6 Conclusion This paper proposes an architecture for representing an advanced information support system for multimodal transportation network based on multi-agent systems coupled with semantic web services. The use of these technologies for the development of an intelligent information system for urban mobility was the main approach to address the problems of passenger travel. The multi-agent architecture proposed is efficient, flexible, designed to easily be adapted and to manage disturbances in the transport network. In addition of that, a focus was made on the semantic layer of the agent-based multi-modal information system. The purpose of this layer is to manage the different sources of data and to build a flexible data model that can be understood by all the agents in the system. An ontology for the transport system was developed by understanding the common terms used in the transport system and by looking at the different preference criteria chosen by the user such as cost, duration, number of connections and safety. In this context, the design of the main architecture was described using Prometheus Methodology within PDT (Prometheus Design tools). In a future work, it is important to extend the created ontologies in order to take into account all the sources of data and so that these data are comprehensible and accessible for the agents of the system. Moreover, the negotiation and communication part between the agents themselves and between the different layers of the system will be detailed. The next step is to integrate a model of MCDM to improve the agents’ decision-making, and to develop the calculation of the multi-modal itinerary of this system.

References 1. Niaraki, A.S., Kim, K.: Ontology based personalized route planning system using a multicriteria decision making approach. Expert Syst. Appl. 36(2), 2250–2259 (2009) 2. Basak, K., Seshadri, R., Lima de Azevedo, C.M.: Sim Mobility Integrated 16. In: Simulation Platform. Research project. Massachusetts Institute of Technology, Cambridge (2018). Accessed 31 Aug 2018 3. Fayech, B.: Régulation des réseaux de transport multimodal: Systèmes multi-agents et algorithmes évolutionnistes. Thèse de doctorat, Ecole Centrale de Lille/Université des Sciences et Technologies de Lille (2003) 4. Dutta, B., Nandini, D., Shahi, G.K.: Mod: metadata for ontology description and publication. In: International Conference on Dublin Core and Metadata Applications, pp. 1–9 (2015) 5. Bernon, C., Cossentino, M., Gleizes, M.P., Turci, P., Zambonelli, F.: A study of some multiagent meta-models. In: Giorgini, P., Mueller, J., Odell, J. (eds.) Fifth International Workshop on Agent-Oriented Software Engineering (AOSE 2004) at AAMAS 2004, New York, ÉtatsUnis, juillet 2004, LNCS, vol. 3382, pp. 62–77. Springer Verlag (2005)

342

J. Larioui and A. E. Byed

6. Masolo, C., Borgo, S., Gangemi, A., Guarino, N., Oltramari, A, Schneider, L.: Dolce: a descriptive ontology for linguistic and cognitive engineering. Wonder Web Project, Deliverable D17 v2, 1:75–1:105 (2003) 7. Comtois, C., Slack, B.: The geography of transport systems. Routledge Marcel Becker and Stephen F Smith. An ontology for multi-modal transportation planning and scheduling. Technical Report CMU-R I-TR-98–15, 1997 (2009) 8. Lhafiane, F., Elbyed, A., Bouchoum, M.: Reverse logistics information management using ontological approach. World Acad. Sci. Eng. Technol. Int. J. Inf. Commun. Eng. 9(2), 396 (2015) 9. Lhafiane, F., Elbyed, A., Bouchoum, M.: Multi-agent system architecture oriented prometheus methodology design for reverse logistics. World Acad. Sci. Eng. Technol. Int. J. Inf. Commun. Eng. 9(8), 1827–1833 (2015) 10. Giunchiglia, F., Dutta, B.: Dera: a faceted knowledge organization framework. Technical report, University of Trento (2011) 11. Giunchiglia, F., Dutta, B., Maltese, V.: From knowledge organization to knowledge representation. Technical report, Università di Trento (2013) 12. Ben Khaled, I., Kamoun, M.A, Zidi, K., Hammadi, S.: Vers un Système d’information voyageur multimodal (SIM) à base de système multi-agent (SMA), REE revue de la SEE 1, 41–47 (2005) 13. Larioui, J., El Byed, A.: A multi-agent information system architecture for multimodal transportation. In: 1st International Conference on Embedded Systems and Artificial Intelligence ESAI 2019, 2–3 May 2019, Fez, Morocco (2019) 14. Larioui, J., El Byed, A.: An advanced intelligent support system for multi-modal transportation network based on multi-agent architecture. In: Advanced Intelligent Systems for Applied Computing Sciences, vol. 4, pp. 98–106 (2020) 15. Wang, J., Ding, Z., Jiang, C.: An ontology-based public transport query system. In: First International Conference on Semantics, Knowledge and Grid, SKG 2005, pp. 62–62. IEEE (2005) 16. Kamoun, M.A., Hammadi, S.: A multi-agent architecture for a transport’s multi-modal information system. WSEAS Trans. Syst. 5(3), 2062 (2004) 17. Kamoun, M.A, Uster, G., Hammadi, S.: Optimisation des systèmes d’information coopératifs pour l’aide au déplacement multimodal. In: Rencontre avec les doctorants des laboratoires ESTAS, LEOST, LIVIC, LTN, Actes INRESTS (2005) 18. Kamoun, M.A., Uster, G., Hammadi, S.: An agent-based cooperative information system for multimodal transport’s travelers assistance. In: Proceedings of IMACS 2005 Scientific Computation: Applied Mathematics and Simulation, Paris France (2005) 19. Kamoun, M.A.: Designing an information system for multi-modal travelling assistance: a multi-agent approach for an online routes composition. PHD Thesis, Ecole Centrale of Lille (2007) 20. De Oliveira, K.M., Bacha, F., Mnasser, H., Abed, M.: Transportation ontology definition and application for the content personalization of user interfaces. Expert Syst. Appl. 40(8), 3145–3159 (2013) 21. Zidi, K., Hammadi, S.: DMAS: distributed multi-agents system for assist users in the multimodal travels. In: International Conference on Industrial Engineering and Systems Management. IESM 2005, 16–19 May, Marrakech-Morocco (2005) 22. Padgham, L., Winikoff, M.: Prometheus: a pragmatic methodology for engineering intelligent agents. In: Debenham, D.J., Henderson Sellers, B., Jennings, N., Odell, J. (eds.) Proceedings of the OOPSLA 02 - Workshop on Agent-Oriented Methodologies. COTAR (2002)

An Agent-Based Architecture for Multi-modal Transportation

343

23. Padgham, L., WINIKOFF, M.: Prometheus: a methodology for developing intelligent agents. In: Giunchiglia, D.F., Odell, J., Weiss, G. (eds.) Agent-Oriented Software Engineering III, Third International Workshop, AOSE 2002, Bologna, Italy, 15 July 2002, Revised Papers and Invited Contributions, vol. 2585 de Lecture Notes in Computer Science (LNCS), pp. 174–185. Springer-Verlag (2003) 24. Padgham, L., Winikoff, M.: Prometheus: a practical agent-oriented methodology. In: Henderson-Sellers, B., Giorgini, P. (eds.) Agent-Oriented Methodologies, pp. 107–135. IDEA Group Publishing (2002) 25. Becker, M., Smith, S.F.: An ontology for multi-modal transportation planning and scheduling. Technical Report CMU-R I-TR-98–15 (1997) 26. Houda, M., Khemaja, M., Oliveira, K., Abed, M.: A public transportation ontology to support user travel planning. In: 2010 Fourth International Conference on Research Challenges in Information Science (RCIS), pp. 127–136. IEEE (2010) 27. Musen, M.A.: The protégé project: a look back and a look forward. AI Matters 1(4), 4–12 (2015) 28. Niaraki, A.S., Kim, K.: Ontology based personalized route planning system using a multicriteria decision making approach. Expert Syst. Appl. 36, 2250–2259 (2007) 29. Gruer, P., Hilaire, V., Koukam, A.: Multi-agent approach to modelling and simulation of urban transportation systems. In: Proceedings of the 2001 IEEE SMC Conference, 6–10 October 2001, Tucson, Arizona, USA, pp. 2499-2504 (2001) 30. Mulholland, P.: Introduction to ontologies, Version 2, Internal Report, code: RichODL-OU3/1999, Enriching ODL by knowledge sharing for collaborative computer- based modeling and simulation (1999) 31. Nguyen, Q.T.: Plateforme de simulation pour l’aide à la décision: Application à la régulation des systèmes de transport urbain. Ph.D. Thesis, Université de la Rohelle (2015) 32. Timpf, S.: Ontologies of wayfinding: a traveler’s perspective. Netw. Spat. Econ. 2(1), 9–33 (2002) 33. Das, S., Giunchiglia, F.: Geoetypes: harmonizing diversity in geospatial data (short paper). In: OTM Confederated International Conferences” On the Move to Meaningful Internet Systems”, pp. 643–653. Springer (2016) 34. Zgaya, H., Hammadi, S.: Multi-agent information system using mobile agent negotiation based on a flexible transport ontology. In: Proceedings of the 2007 International Conference on Autonomous Agents and Multi-agent Systems (AAMAS 2007), 14–18 May, Honolulu, Hawaii (2007) 35. Larioui, J., El Byed, A.: Towards a semantic layer design for an advanced intelligent multimodal transportation system. Int. J. Adv. Trends Comput. Sci. Eng. 9(2), 2471–2478 (2020). https://doi.org/10.30534/ijatcse/2020/236922020 36. Larioui, J., El Byed, A.: Multi-agent system architecture oriented prometheus methodology design for multi modal transportation. Int. J. Emerg. Trends Eng. Res. 8(5), 2118–2125 (2020). https://doi.org/10.30534/ijeter/2020/105852020

An Overview of Real-Time Traffic Sign Detection and Classification Youssef Taki(B) and Elmoukhtar Zemmouri ENSAM Meknes, Moulay Ismail University, Meknes, Morocco [email protected], [email protected]

Abstract. Traffic Sign Recognition systems (TSR) are important components of advanced driver assistance systems (ADAS), they are specifically designed to work in a real-time environment to increase driver safety by notifying the driver of the various traffic signs such as speed limits, priorities, restrictions, etc.. Nonetheless, in real world applications, these systems face a range of external challenges such as bad weather, broken signs, insufficient conditions of lighting (night-time), etc. However, the real-time challenge is still the biggest and most critical challenge facing ADAS, as it occurs in all challenges and increases their complexity, so all researchers in this field should give it the highest priority. In this paper, we will draw an overview of some recent and efficient real-time methods for the detection and classification of traffic signs. Keywords: Traffic sign detection · Traffic sign recognition · Advanced driver assistance system · Machine learning · Deep learning

1 Introduction The UN estimates that in the period 2010 to 2020 the number of deaths on roads will rise up by 50% [1] to about 1.9 million people. To reverse this trend, the United Nations founded, in 2011, the 1st “Decade of Action for Road Safety”. Driver Assistance Systems will help to reduce accidents rate by automating activities such as lane departure alert systems, traffic sign recognition [1]. Research in the domain of traffic sign recognition is not a recent activity. In fact, the first research paper appeared in Japan in 1984 [2], since this time many researchers and companies became interested in this field, especially with the development of automotive intelligent technology and machine learning. It is even considered as a highly important feature of intelligent vehicles [3]. In general, the recognition of traffic signs involves primarily two stages: the first stage is traffic sign detection (some authors split this stage into two parts: image processing and localization), which is concerned by detecting the position and size of traffic signs in a traffic scene image. The second stage is traffic sign classification, which deals with automatic recognition of detected traffic signs.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 344–356, 2021. https://doi.org/10.1007/978-3-030-66840-2_26

An Overview of Real-Time Traffic Sign Detection and Classification

345

In this paper we propose an overview of recent and efficient real-time methods for the detection and classification of traffic signs. We will mainly present a classification of the different methods used for TSD and TSR, the challenges of each method, and we will focus especially on the real-time challenge. In recent years, several review studies have been done in this field with some small differences in their perspectives and orientations. Among the most recent of these we mention the work of Behloul and Saadna [1]. In this work the authors introduced an overview of some recent methods in the traffic sign detection and classification. They divided the recognition process into two main parts, detection and classification. In each part they distinguished between several methods, for example, the learning methods based on hand-crafted features and the deep learning methods in the classification part. This work gave a general idea of the traffic sign detection and classification with the latest research. While the review in [8] gave more details to discuss the problems and challenges that face the traffic sign recognition. Among the problems mentioned in this work: bad weather, broken signs, insufficient conditions of lighting (night-time), real-time, etc. The latter issue is the focus of our review study. We will present recent methods dealing with TSD and TSR taking into consideration real-time constraints. We will consider research papers between 2010 and 2020 and we will draw a comparison between them. The rest of our paper is organized as follows: In Sect. 2, various publicly available traffic sign detection datasets are presented. In Sect. 3, we introduce and compare the artof-state detection methods. In Sect. 4, traffic sign classification methods are discussed. Some systems of traffic sign detection based on deep neural networks are presented in Sect. 5. Finally, we present some findings and future work.

2 Traffic Sign Databases An important element in developing any TSR system is a database. It is used to train and test the techniques of detection and recognition. There were no public datasets available in this area for a long time, but this situation changed in 2011. Nowadays, there are many databases publicly available for use by the research community. 2.1 The German Traffic Signs Benchmark The first and most widely used dataset is the German Traffic Sign Dataset, which has two subsets: • The German Traffic Signs Detection Benchmark (GTSDB): is single-image traffic sign detection. It consists of 900 images 1360 × 800 pixels divided into 600 training images and 300 evaluation images, and it is classified into three classes mandatory, warning, and prohibitory [4]. • The German Traffic Signs Recognition Benchmark (GTSRB): is composed of 43 classes containing more than 50000 images. The minimum of traffic signs in each class is 9 traffic signs with a size varied between 15 × 15 to 222 × 193 pixels [5].

346

Y. Taki and E. Zemmouri

2.2 The Belgium Traffic Sign • The Belgium Traffic Sign Detection Dataset (BTSD): consists of more than 10,000 annotations and images are divided into three categories mandatory, warning, and prohibitory. It contains also four video sequences captured in Belgium, which can be used for tracking experiments [6]. • The Belgium Traffic Sign Classification (BTSC): is just an extraction of regions of interest containing traffic signs in the BTSD dataset. It is composed of more than 4000 training images classified in 62 classes and more than 2000 testing images [6]. 2.3 The Swedish Traffic Signs Dataset (STSD) The STSD consists of more than 20,000 images that are created by taking recordings on over 350 km of Swedish highways and city roads, and every fifth frame from the sequence is manually annotated [7]. Throughout this review, we will focus on the methods that used German Traffic Sign Detection Benchmark (GTSDB), and German Traffic Sign Recognition Benchmark (GTSRB), in order to make the comparison between them easier and more reliably.

3 Traffic Sign Detection As discussed above, the detection of traffic signs can be defined as the localization of traffic signs in an input frame. This phase (detection phase) consumes more time compared to the classification phase, especially when applied to a high-resolution image. That is why the detection module has to be able to locate traffic signs in real time. Traffic Sign Detection approaches can be classified into four basic classes: color-based, shape-based, learning-based, and hybrid methods (see Fig. 1).

Fig. 1. Classification of different methods applied for traffic sign detection.

3.1 Color-Based Methods Traffic signs have a strict color scheme that includes a limited set of colors (generally red, blue, and white) which helps us to differentiate traffic signs from the background. Distinguishing a traffic sign from the background is easy for humans, for a detection

An Overview of Real-Time Traffic Sign Detection and Classification

347

system color information is also an important feature. Colors are used to detect ROI (region of interest) within an input image based on different image-processing methods. Because of this, methods based on color information can be used with a high-resolution dataset but not with grayscale images [8]. In addition, the main problem with using the color parameter is its sensitivity to various factors such as the distance of the target, weather conditions, time of the day, as well as reflection, age, and condition of the signs [9]. To overcome this problem, authors work on various color spaces, among which are represented in Fig. 2.

Fig. 2. Different color spaces

Several methods have been proposed for traffic sign detection based on colors. The most important ones are: HSI/HSV Transform. In this approach, several authors have chosen to use HSI space because its hue component is invariant to changes in the luminance. Tagunde et al. [10] have chosen HSI space to detect the signs because it’s simple and it gives good results in real-time applications. Gupta and Choudhary in [11] have chosen HSV color space segmentation, and they achieved 99% accuracy in real-time. Color Indexing. The color indexing method is another simple method that identifies objects entirely on the basis of color. In this method, a comparison of any two-colored images is done by comparing their colors. Histogram. The advantage of using color histograms is their robustness with respect to geometric changes of projected objects. However, color indexing is segmentation dependent, complete, efficient, and reliable segmentation cannot be performed prior to recognition. Thus, color indexing is negatively characterized as being an unreliable method [12].

3.2 Shape-Based Methods Traffic signs not only have different colors, but they also have very well-defined shapes that can be searched for. In this approach, color segmentation is not regarded as a discriminative feature due to its sensitivity to various factors such as target distance, weather conditions, time of day, etc., for that shape-based detection method which attempts to find approximate contours. For this method, the memory and the computational requirement for large images are quite high [13]. These methods also have some difficulties as damaged, partially obscured signs. The most common detection methods based on shape are:

348

Y. Taki and E. Zemmouri

Hough Transformation. The Hough transformation is the most common shape-based technique, used by Garcia-Garrido to detect panel edges and pick the closed contours that make their technique immune to noise and occlusion [14]. The key benefit of the Hough transformation technique is that it tolerates differences in definitions of feature boundaries and is fairly unaffected by image noise [15]. However, the key drawback is reliance on input data. Histogram of Oriented Gradients. The histogram of oriented gradients (HOG) is another shape-based detection process, HOG is constructed based on a six-step procedure as follows: 1) Divide the image into small adjacent regions called cells. 2) Measure the edge orientations of the pixels inside the cell for every cell. 3) Discretize each cell according to its gradient orientation into the angular bins. 4) Each cell pixel contributes a weighted gradient to its corresponding angular bin (binning orientation). 5) Combine neighboring cells into a block and standardize the histogram called the histogram block. 6) Concatenating the block histogram set yields the final descriptor. Zaklouta and Stanciulescu in [16] used different size histograms of oriented gradients (HOG) descriptors and red color enhancement to evaluate the performance of K-d trees classifiers. The rate detection achieved was 94.77% with HOG descriptors in a real-time. Also, Shao et al. [17] used HOG with MSERs, the rate detection achieved was 99.63% in 46 ms. Edge Detection Feature. This method is used for indicating the boundaries of objects within an image by finding a set of connected curves. Shao et al. [17] use in Simplified Gabor Wavelets (SGW) to strengthen the edges of the traffic signs, also it is used in [18] in real-time. The main advantage of this method is reliability and high accuracy in real-time. Edge detection is a good method, but the problem is that it cannot be used alone, but it can only be used as a complement to another method.

3.3 Learning-Based Methods Recently, as learning methods have demonstrated prominent representation capacity and achieved outstanding performance in traffic sign detection, more scholars have applied technologies to this area. Brkic et al. [19] used the Viola-Jones detector to detect triangular traffic signs. The detector was trained using about 1000 images of relatively poor quality. The obtained detector achieved a high true positive rate (ranging from 90 to 96%) depending on the training set and the configuration of the detector. Liu et al. [20], have achieved good execution time by implementing a new approach called Categories-FirstAssigned Tree (CFA-Tree) where they integrate the detection and the classification phase in one module, this novel system has good accuracy about 93.5%. However, this search tree can only detect three categories and has low efficiency in handling high-resolution images [8]. Aghdam et al. [21] prefer to use ConvNet in the detection phase, they achieved high accuracy in a short time (99.89% in 26.5 ms.

An Overview of Real-Time Traffic Sign Detection and Classification

349

3.4 Hybrid Methods As mentioned above there are methods that can’t be applied alone for sign detection, but they need to be combined with other methods. Besides that, both color-based and shape-based methods have some advantages and disadvantages. For that some authors choose to use two or more methods in detection phase to get good results. For example, Yang et al. [22] choose to use Color probability model and MSERs to detect traffic sings. The rate detection achieved was 99.63% in 65 ms execution time. Also, Xu et al. [23] combine color and shape method in the detection phase. They used first adaptive color threshold, after that they used shape symmetry method. In these studies, different signs with various colors and shapes were detected using different datasets. In Table 1, we summarize the comparison of the previously presented methods: Table 1. Comparative study of traffic sign detection methods Technique

Method

Dataset

Color Based Methods

HSV color segmentation

GTSDB 1360 × 800 99.97 Real time

[11]

Hybrid color segmentation

LISA

99.89 N/A

[24]

96.00 50

[18]

GTSDB 320 × 320

97.20 20–200

[25]

GTSDB 320 × 240

93.50 100–125

[20]

GTSDB N/A

99.89 26.5

[21]

99.63 67

[22]

99.60 46

[17]

Shape Based Methods

N/A

Radial Symmetry GTSDB 320 × 240 transformation Hough transform

Learning based CFA-Tree methods ConvNet with sliding windows Hybrid Methods

Image size

- Color GTSDB N/A probability model - MSERs - SGW map - MSERs based HOG

GTSDB 320 × 320

AUC

Execution time Paper (ms)

4 Traffic Sign Classification After the localization of traffic signs in the detection phase, classification techniques are employed to determine the content of that sign. In this section, we highlight some recent, successful methods in the classification of traffic signs. First, we classify them into three categories: methods based on hand-crafted features, methods using simple artificial neural networks (ANN), and methods using deep learning (see Fig. 3).

350

Y. Taki and E. Zemmouri

Fig. 3. Different categories of traffic sign recognition.

4.1 Methods Based on Hand-Crafted Features Support Vector Machine (SVM). SVM is a supervised learning method that constructs a hyperplane to separate data into classes. The “support vectors” are data points that define the maximum margin of the hyperplane. Although SVM is primarily a binary classifier, multiclass classification can be achieved by training one- vs-all SVMs. Some authors use multilevel SVM to overcome the binary classification. Multi-level SVM is used to recognize different shapes of signs in real-time applications. The main advantage of SVM is fast and highly accurate. Chang et al. [26] presented an advanced SVM method and tested with grayscale images; the result was achieving high accuracy rates of approximately 95.9% in 3.5 ms execution time. Adaptive Boosting. AdaBoost is a combination of multiple learning algorithms that can be utilized for regression or classification [12]. Chen et al. used the AdaBoost method for TSDR in [27]. It achieved more than 94% accuracy in real-time, about 30 ms on their own created dataset. The main advantage of the AdaBoost is its simplicity, high prediction power, and rapidity. But if the input data have wide variations the training time increases and classifier accuracy decreases [12]. Random Forest. It is a machine learning method that operates by constructing a multitude of decision trees during the training time and outputting the class that is the mode of the output of the class of individual trees. Ellahyani et al. [28] proposed a recognition model based on Random Forest as a classifier. The recognition rate achieved is 97.43% in 125 ms. Tang and Huang [29] used also Random Forest in their model and achieved a good result and good execution time: around 97.54% in just 8 ms. Xu et al. [23] also used this method and achieve 92.67% in 200 ms. Template Matching. Template Matching algorithms are used to search for existing similar training samples which are then stored in a database as unknown Regions of Interest (ROIs). The input is compared with differently shaped sign templates extracted from various ROIs and then grouped under a tree structure based on their similarity [traffic sign color]. It was used for TSDR by Torresen et al. [30] and Greenhalgh and Mirmehdi [31]. It has the advantages of being fast, straightforward, and accurate (with a hit rate of approximately 90% on their own pictured images dataset). However, the drawback of this method is that it is very sensitive to noise and occlusions.

An Overview of Real-Time Traffic Sign Detection and Classification

351

4.2 Artificial Neural Networks Based Methods Artificial Neural Networks have gained increasing popularity in recent years due to the ability to recognize and classify objects at the same time while maintaining high speed and accuracy [32]. Also, their robustness, greater adaptability to changes, flexibility, and high accuracy rate. ANN-based classifiers were used by Yin et al. [25], the rate was 98.62%, and the computational time was 2 ms. However, in Islam and Raj [24], the rate was 99.90%, and the computational time in the whole system was 0.33 s (using Malaysian Traffic Sign Database). 4.3 Deep Learning Based Methods Another popular method in traffic sign recognition is the deep learning method. This method has acquired a general interest in recent years owing to its high performance of classification and the power of representational learning from raw data. Deep learning methods use a cascade of many layers of nonlinear processing units for feature extraction and transformation. Each successive layer uses the output from the previous one as input. Higher-level features are derived from lower level features to form a hierarchical representation [33]. Among the deep learning models, the convolutional neural networks (CNN) are widely used in image classification. Since 2013, CNNs, which can learn a hierarchy of features by building high-level features from low-level features, have become the standard for object- detection tasks. Nowadays, more and more object recognition tasks are being solved with Convolutional Neural Networks (CNN). Due to its high recognition rate and fast execution. Permanent and LeCun [34] used Convolutional Networks (ConvNets) to learn invariant features of traffic signs of the GTSRB dataset, and they reached an accuracy of 99.46% with an ensemble of 25 ConvNets. Jin et al. [35] achieved an accuracy of 99.65% using 20 ConvNets. A new ConvNet architecture proposed by Aghdam et al. [36] can reduce the number of parameters compared with ConvNets used by [34]. The accuracy achieved by this method on the GTSRB dataset is 99.23% with only 2 ConvNets (compact ConvNet) and 99.61% with only 5 ConvNets which greatly reduces the execution time (2,1 ms and 5.30 ms respectively). According to the work of Famingo Shao et al. [17], with three types of CNN, two to classify the two super classes of the traffic signs independently (circular and triangular) and the third to classify all the traffic signs achieve 99.43% accuracy in 5 ms. It needs just 3 ms to classify a detected sign with high accuracy of 98.24%.

352

Y. Taki and E. Zemmouri

Table 2 below summarizes the comparison between presented traffic sign recognition methods. Table 2. Comparative study of recent traffic sign recognition methods Technique

Method

Dataset

Image size

AUC

Execution time (ms)

Paper

Methods based on hand-crafted features

Random Forest

GTSRB

320 × 240

97.54

8

[29]

SVM

GTSRB

320 × 240

98.64

40

[18]

Adaptive Boosting

Own created

N/A

94.50

30–40

[27]

Artificial neural network methods

ANN

GTSRB

320 × 240

98.62

2

[15]

ANN

GTSRB

N/A

99.90

N/A

[24]

Deep learning methods

CNN

GTSRB

N/A

98.24

3

[22]

CNN

GTSRB

320 × 320

99.43

5

[17]

CNN

GTSRB

N/A

99.94

N/A

Single CNN

GTSRB

440 × 440

99.55

N/A

3 CNN

GTSRB

440 × 440

99.70

0.7

[21]

5 Traffic Sign Recognition Based on Deep Neural Networks As mentioned above, more and more object detection and recognition tasks are being solved with Deep Neural Networks. They achieved perfect performance in both phases (detection and classification). However, there are systems that have the ability to detect, recognize, and classify objects at the same time. These systems can be qualified as one stage systems. This section analyses the state-of-the-art object-detection systems (Faster R-CNN, R-FCN, SSD, and YOLO V2) combined with various feature extractors (Resnet V1 50, Resnet V1 101, Inception V2, Inception Resnet V2, Mobilenet V1, and Darknet-19) that are used for traffic sign recognition. 5.1 Faster Region-Based Convolutional Neural Networks (Faster R-CNN) In order to overcome some issues in the old version R-CNN, Fast R-CNN authors replaced the use of Selective Search with a Region Proposal Network (RPN) that shares convolutional feature maps with the detection network [37].

An Overview of Real-Time Traffic Sign Detection and Classification

353

5.2 Region-Based Fully Convolutional Networks (R-FCN) Region-based Fully Convolutional Networks (R-FCN) [38] take the architecture of Faster R-CNN but with only convolutional neural networks. That is, the R-FCN approach applies a fully convolutional region-based detector whose computation is shared across the entire image. 5.3 Shot MultiBox Detector (SSD) In comparison with Faster R-CNN and R-FCN architectures, SSD [39] encapsulates all computation in a single feed-forward convolutional neural network to directly infer box offsets and object category scores. 5.4 You Only Look Once (YOLO) YOLO V2 [40] is inspired by the RPN of Faster R-CNN, which uses hand-picked anchor boxes to predict bounding boxes based on the offsets to these anchors at every location in a feature map. These systems need other Convolutional Neural Networks to use them as feature extractors to obtain high level features from input images. Among these feature extractors we have Resnet V1 50, Resnet V1 101, Inception V2, Inception Resnet V2, Mobile net V1, and Darknet-19 (see Table 3). Álvaro Arcos-García et al. in [41] did a great job by comparing previously presented methods with the different Features extractor. In Table 3, we summarize their work: Table 3. GTSDB Mean Average Precision (MAP) results (in %) as attained by each traffic sign detector model. Model

Features extractor

MAP Execution time (ms)

Faster R-CNN Inception Resnet V2 95.77 400 Resnet 101

95.08 123

Resnet 50

91.52 110

Inception V2

90.62

58

R-FCN

Resnet 101

95.15

90

YOLO V2

YOLO V2

78.83

21

SSD

Inception V2

66.10

23

Mobile net

61.64

15

354

Y. Taki and E. Zemmouri

6 Conclusion In this paper, we have presented an overview of some recent and efficient real-time traffic sign detection and classification methods. Detection methods are divided into four categories: color-based that are classified according to the color space, shape-based, hybrid methods, and learning-based which include deep learning methods. The recent detection methods achieve a detection rate varied from 90 to 100% with the available dataset described briefly in the paper. It turns out that the best detection methods are learning-based methods followed by some hybrid methods in terms of accuracy and rapidity. To obtain a high classification rate, it is necessary to adopt discriminative features and a powerful classifier. Acceptable results achieved by learning methods using hand-crafted features, and furthermore, classification methods performances boosted with deep learning methods such as CNN and they achieved a high accuracy rate (more than 95%). The problem with the current state-of-art is the lack of a universal dataset that contains signs from the different regions (including regions not adhering to The Vienna Convention on road signs and signals) and captured in different conditions, so it is time to create universal datasets more complicated and more extensive. The imposed question is: can recent traffic sign detection and classification methods prove the same performances in real-world conditions like insufficient and nighttime illumination or with other ground truth datasets?

References 1. Saadna, Y., Behloul, A.: An overview of traffic sign detection and classification methods. Int. J. Multimedia Inform. Retrieval 6, 193–210 (2017) 2. Biswas, R., Khan, A., Alom, M.Z., Khan, M.: Night mode prohibitory traffic signs detection. In: 2013 International Conference on Informatics, Electronics and Vision (ICIEV), pp. 1–5. IEEE (2013) 3. Guo, J., Lu, J., Qu, Y., Li, C.: Traffic-sign spotting in the wild via deep features. In: 2018 IEEE Intelligent Vehicles Symposium (IV), pp. 120–125. IEEE (2018) 4. Houben, S., Stallkamp, J., Salmen, J., Schlipsing, M., Igel, C.: Detection of traffic signs in real-world images: the german traffic sign detection benchmark. In: The 2013 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2013) 5. Stallkamp, J., Schlipsing, M., Salmen, J., Igel, C.: Man vs. computer: benchmarking machine learning algorithms for traffic sign recognition. Neural Netw. 32, 323–332 (2012) 6. Timofte, R., Zimmermann, K., Van Gool, L.: Multi-view traffic sign detection, recognition, and 3D localisation. Mach. Vis. Appl. 25, 633–647 (2014) 7. Larsson, F., Felsberg, M.: Using Fourier descriptors and spatial models for traffic sign recognition. In: Scandinavian Conference on Image Analysis, pp. 238–249. Springer (2011) 8. Wali, S.B., Abdullah, M.A., Hannan, M.A., Hussain, A., Samad, S.A., Ker, P.J., Mansor, M.B.: Vision-based traffic sign detection and recognition systems: current trends and challenges. Sensors 19(9), 2093 (2019) 9. Mogelmose, A., Trivedi, M.M., Moeslund, T.B.: Vision-based traffic sign detection and analysis for intelligent driver assistance systems: perspectives and survey. IEEE Trans. Intell. Transp. Syst. 13, 1484–1497 (2012) 10. Tagunde, G.A., Uke, N.J., Banchhor, C.: Detection, classification and recognition of road traffic signs using color and shape features. Int. J. Adv. Technol. Eng. Res. 2, 202–206 (2012)

An Overview of Real-Time Traffic Sign Detection and Classification

355

11. Gupta, A., Choudhary, A.: A framework for real-time traffic sign detection and recognition using grassmann manifolds. In: 2018 21st International Conference on Intelligent Transportation Systems (ITSC), pp. 274–279. IEEE (2018) 12. Wali, S.B., Abdullah, M.A., Hannan, M.A., Hussain, A., Samad, S.A., Ker, P.J., Mansor, M.B.: Vision-based traffic sign detection and recognition systems: current trends and challenges. Sensors 19, 2093 (2019) 13. Hu, Q., Paisitkriangkrai, S., Shen, C., van den Hengel, A., Porikli, F.: Fast detection of multiple objects in traffic scenes with a common detection framework. IEEE Trans. Intell. Transp. Syst. 17, 1002–1014 (2015) 14. Garcia-Garrido, M.A., Sotelo, M.A., Martin-Gorostiza, E.: Fast traffic sign detection and recognition under changing lighting conditions. In: 2006 IEEE Intelligent Transportation Systems Conference, pp. 811–816. IEEE (2006) 15. Yin, S., Ouyang, P., Liu, L., Guo, Y., Wei, S.: Fast traffic sign recognition with a rotation invariant binary pattern-based feature. Sensors 15, 2161–2180 (2015) 16. Zaklouta, F., Stanciulescu, B.: Real-time traffic sign recognition in three stages. Robot. Auton. Syst. 62, 16–24 (2014) 17. Shao, F., Wang, X., Meng, F., Rui, T., Wang, D., Tang, J.: Real-time traffic sign detection and recognition method based on simplified Gabor wavelets and CNNs. Sensors 18, 3192 (2018) 18. Barnes, N., Zelinsky, A., Fletcher, L.S.: Real-time speed sign detection using the radial symmetry detector. IEEE Trans. Intell. Transp. Syst. 9, 322–332 (2008) 19. Brkic, K., Pinz, A., Šegvic, S.: Traffic sign detection as a component of an auto- mated traffic infrastructure inventory system. In: Proceedings of the Annual Workshop of the Austrian Association for Pattern Recognition. Citeseer (2009) 20. Liu, C., Chang, F., Chen, Z., Li, S.: Rapid traffic sign detection and classification using categories-first-assigned tree. J. Comput. Inf. Syst. 9, 7461–7468 (2013) 21. Aghdam, H.H., Heravi, E.J., Puig, D.: A practical approach for detection and classification of traffic signs using convolutional neural networks. Robot. Auton. Syst. 84, 97–112 (2016) 22. Yang, Y., Luo, H., Xu, H., Wu, F.: Towards real-time traffic sign detection and classification. IEEE Trans. Intell. Transp. Syst. 17, 2022–2031 (2015) 23. Xu, X., Jin, J., Zhang, S., Zhang, L., Pu, S., Chen, Z.: Smart data driven traffic sign detection method based on adaptive color threshold and shape symmetry. Future Gener. Comput. Syst. 94, 381–391 (2019) 24. Islam, K.T., Raj, R.G.: Real-time (vision-based) road sign recognition using an artificial neural network. Sensors 17, 853 (2017) 25. Yin, S., Ouyang, P., Liu, L., Guo, Y., Wei, S.: Fast traffic sign recognition with a rotation invariant binary pattern-based feature. Sensors 15, 2161–2180 (2015) 26. Chang, X., Yu, Y.-L., Yang, Y., Xing, E.P.: Semantic pooling for complex event analysis in untrimmed videos. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1617–1632 (2016) 27. Chen, L., Li, Q., Li, M., Zhang, L., Mao, Q.: Design of a multi-sensor cooperation travel environment perception system for autonomous vehicle. Sensors 12, 12386–12404 (2012) 28. Ellahyani, A., El Ansari, M., El Jaafari, I.: Traffic sign detection and recognition based on random forests. Appl. Soft Comput. 46, 805–815 (2016) 29. Tang, S., Huang, L.-L.: Traffic sign recognition using complementary features. In: 2013 2nd IAPR Asian Conference on Pattern Recognition, pp. 210–214. IEEE (2013) 30. Torresen, J., Bakke, J.W., Sekanina, L.: Efficient recognition of speed limit signs. In: Proceedings. The 7th International IEEE Conference on Intelligent Transportation Systems (IEEE Cat. No. 04TH8749), pp. 652–656. IEEE (2004) 31. Greenhalgh, J., Mirmehdi, M.: Traffic sign recognition using MSER and random forests. In: 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO), pp. 1935– 1939 (2012)

356

Y. Taki and E. Zemmouri

32. Satılmı¸s, Y., Tufan, F., Sara, ¸ M., Karslı, M., Eken, S., Sayar, A.: CNN based traffic sign recognition for mini autonomous vehicles. In: International Conference on Information Systems Architecture and Technology, pp. 85–94. Springer (2018) 33. Qian, R., Zhang, B., Yue, Y., Wang, Z., Coenen, F.: Robust chinese traffic sign detection and recognition with deep convolutional neural network. In: 2015 11th International Conference on Natural Computation (ICNC), pp. 791–796. IEEE (2015) 34. Sermanet, P., LeCun, Y.: Traffic sign recognition with multi-scale convolutional networks. In: The 2011 International Joint Conference on Neural Networks, pp. 2809–2813. IEEE (2011) 35. Jin, J., Fu, K., Zhang, C.: Traffic sign recognition with hinge loss trained convolutional neural networks. IEEE Trans. Intell. Transp. Syst. 15, 1991–2000 (2014) 36. Aghdam, H.H., Heravi, E.J., Puig, D.: A practical and highly optimized convolutional neural network for classifying traffic signs in real-time. Int. J. Comput. Vis. 122, 246–269 (2017) 37. Girshick, R.: Fast R-CNN. arXiv e-prints (2015). arXiv preprint arXiv:150408083454 38. Dai, J., Li, Y., He, K., Sun, J.: R-fcn: object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems, pp. 379–387 (2016) 39. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37. Springer (2016) 40. Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017) 41. Arcos-García, Á., Álvarez-García, J.A., Soria-Morillo, L.M.: Evaluation of deep neural networks for traffic sign detection systems. Neurocomputing 316, 332–344 (2018). https://doi. org/10.1016/j.neucom.2018.08.009

Classification of the Driver’s Emotions Using a Convolutional Neural Network Abdelfettah Soultana(B) , Faouzia Benabbou, and Nawal Sael Laboratory of Modeling and Information Technology, Faculty of Sciences Ben M’SIK, University Hassan II of Casablanca, Casablanca, Morocco [email protected], [email protected], [email protected]

Abstract. The emotional and mental state of the driver is very important in the driving process for safety and security reasons. In fact, there are several factors that affect driving safety, namely fatigue, stress, nervousness, sadness and anger at the wheel. Hence, the need to detect and understand the emotional state of the driver is primordial to promote driving skills such as attention, good judgment, correct decision making and quick reaction time. This paper presents an approach based on a convolutional neural network (CNN) model for the recognition of emotional expression with an accuracy of 66.14% on the public facial expression database FER2013. Afterwards, we realized a real-time emotion recognition system by transferring the skills acquired on static images, which continuously detects the driver’s face based on a video camera and then classified the emotion state shown by the driver. Keywords: Driver emotions · Driver monitoring · Convolution neural network · Smart cars · Deep-learning · Computer vision

1 Introduction The notion of smart car entails the ability to “see”, “hear”, “understand”, “speak” and “think” by monitoring in real time three main components including the driver’s state, the vehicle’s state and the surrounding physical environment. Nowadays, roads and vehicles have become more stable; however, the driver is still the most sensitive and influential part of this system. In a driver-vehicle-road system, the state of the driver is critical to achieve safe driving. According to National Highway Traffic Safety Administration (NHTSA) and the Virginia Tech Transportation Institute (VTTI) [15], some form of driver inattention was involved within three seconds before 80% of crashes and 65% of near-crashes. Hence, it is important to monitor the behaviour of drivers while driving to prevent this kind of accidents and improve their safety. In this context, a several works have been carried out on the surveillance and the assistance of the drivers. We can classify this researches into three broad categories: detection of drowsiness and fatigue, detection of driver inattention and distraction, and detection of driver behaviour and emotional state. Determining the state of the driver, whether it is the identification © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 357–369, 2021. https://doi.org/10.1007/978-3-030-66840-2_27

358

A. Soultana et al.

of signs of fatigue or drowsiness, is very important for road safety. The objective of this work is to identify the signs and symptoms of drowsiness. There are some plain signs which suggest that the driver is drowsy or tired. For instance, frequent yawning, inability to keep eyes open, swaying the head forward, and facing difficult changes due to blood flow. The second category deals with getting distracted by other tasks when driving that involves conversation with the passengers, answering the phone or making a call, receiving or sending a message, using one of the several in-vehicle information systems, such as navigation system and radio, eating or drinking something, smoking. All of these behaviors produce some degree of driver distraction. The detection of such behavior enables the driver to be alerted at the right time before an incident occurs. The third category concerns the classification of driving behavior. The objective is to locate the parameters and characteristics that reflect the driver’s behavior. This is very useful to improve the driver’s behavior and make it safer. The emotional factors that affect safe driving are nervousness, stress, sadness or anger at the wheel. The driver’s emotions, in this case, should consider the following driving skills such as attention, concentration, good judgment, right decision making and quick reaction time. That is why, Several driver monitoring systems (fatigue/emotions) have been proposed in the past, including intrusive and non-intrusive techniques. The present article propose a CNN model based to detect the emotional state of the driver and classify it into seven categories: fear, disgust, anger, happiness, sadness, surprise and neutrality, based on the public dataset fer-2013. The paper is organized as follows: Sect. 2 deals with face expression recognition. Section 3 is devoted to reviewing related work. Section 4 provides a CNN Background. Section 5 presents a CNN based system for emotion detection. Results and discussion are provided in Sect. 6. Section 7, on the other hand, proposes a real time detection of driver emotional state. Finally, Sect. 8 concludes the paper and presents some prospects.

2 Facial Expression Recognition People’s facial expressions generally reflect their emotional state and this allows other people to recognize their true feelings. Recently, many papers were devoted to emotion recognition software to detect emotions based on either video recordings of human faces or static images. Most of the work focuses on detecting a subset of emotions as described by Ekman [3]. These basic emotions include “happiness”, “sadness”, “anger”, “surprise”, “fear”, “disgust” and “neutral” (the latter indicates lack of emotion). • Anger: This emotional state can seriously affect the driver’s mood. Road rage is nothing other than aggressive driving due to angry moods [5]. The general observation is that the angry driver can drive quickly and is willing to take risks at all costs. • Fear: In this emotional state, the driver becomes panicky and loses control of the car, which can be the cause of an accident. • Sadness: The level of alertness is likely to be too low when the driver is sad. This reduces the driver’s attention and also increases the driver reaction time. • Happiness: The optimal state, viewed as a state of flux, involves a moderate level of arousal, allowing for attention, focus, and productivity. A state of high arousal or extreme positive valence may potentially lead to distraction.

Classification of the Driver’s Emotions Using a CNN

359

• Disgust: which implies a feeling of aversion to something offensive. We may feel disgusted by something we perceive with our physical senses (sight, smell, touch, sound, taste), by people’s actions or appearances, and even by ideas. • Surprise: Surprising things can happen inside cars that suddenly increase the risk of an accident. Drivers may see something upsetting: a car crash on the side of the road or an animal injured by a car etc. Generally, the driver’s aptitude is susceptible to distraction and hence losing control over the car. Thus, the presence in the car of mechanisms to detect the driver’s emotional state allows monitoring the driver’s attentiveness and alerting him in critical cases.

3 Related Work Important work has been done on driver emotion recognition and real time facial analysis. This can help smart cars better understand what’s going on inside the car and increase safety and security while driving. The focus here is on facial expression recognition (FER) which is based on visual information. In this case, visual FER systems can be divided into two groups: conventional and DNN-based approaches. In conventional FER approaches, the FER is performing facial expression recognition in three steps: (1) face and facial landmarks detection and face alignment, (2) feature extraction, and (3) expression classification. Classification is performed by machine learning algorithms such as k-Nearest Neighbor (KNN), Support Vector Machines (SVM), Linear Discriminant Analysis(LDA), or Neural Networks (NN) like the MultiLayer Perceptron (MLP). Paschero et al.[1] proposed an Emotion Recognition system based on classical Neural Networks (MLP) and Neuro-Fuzzy classifiers. The proposed algorithm consists in six main steps: face detection, eyes and mouth detection, eye centers localization, feature vector extraction, feature vector normalization and preprocessing, classification. Finally, emotion recognition is performed in real time starting from a video stream acquired by a webcam monitoring the driver’s face. Dobbins et al. [15] developed a mobile system that measures anxiety and anger during actual driving. They are based on several characteristics like: speed, distance travelled and physiological data, photographs of the road in front (Traffic Density, Road Complexity, Traffic Lights, Pedestrian Crossing, Roundabout, Stopped In Traffic, Weather), and physiological data which has been obtained from an electrocardiogram (ECG). Then they used a Linear Regression analysis (LR) to reproduce an informative representation of this complex data in a concise and meaningful manner. Azman et al. [23] proposed a real time driver anger detection system using webcam. The system first focuses on detecting the human face in the video stream using Viola Jones Haar feature, and then classifies the images into two categories (anger or not angry) using SVM algorithm. They achieved an accuracy of 97% by focusing only on the emotion of anger, but there are other classes of emotions that are also important. Jeong et al. [20] proposed a facial expression recognition algorithm for monitoring a driver’s emotions based on a hierarchical Weighted Random Forest (WRF). The geometric features are extracted from input images then implemented in the proposed hierarchical WRF classifier. They used three databases, extended Cohn-Kanade database (CK +), MMI and the Keimyung University Facial Expression of Drivers (KMU-FED) database, and they obtained an accuracy of 92.6% for CK + and 76.7% for MMI.

360

A. Soultana et al.

Khan et al. [16] proposed an algorithm for facial expression classification, they used a SVM classifier on two public dataset (MMI and CK + database). The input images are converted into four sub-band images by applying a Discrete Wavelet Transform (DWT). In the second step, the high variance features are selected in a zigzag manner using Discrete Cosine Transform (DCT), and the third step is a classification that acheived accuracy of 91.1%. Minhad et al. [21] developed an automotive driver emotion detection and recognition system using biological signals (ECG). Based on the root mean square successive difference and heart rate variability, SVM was used to classify the happy-anger emotions of driver with an accuracy of 83.33%. Jonathan et al. [19] proposed a system that can recognize stress and anger as primary emotions leading to possible accidents involving the driver. A simulated driving assignment with preset neutral, stress, and anger scenarios was developed for emotional stimulation. The work was based on Electrodermal activity (EDA) which measures skin conductance and they achieved 70% for accuracy. Park et al. [22] proposed to recognize negative emotions (sadness and disgust) of human affecting driving by using physiological signals (EDA, SKT, ECG and PPG). They used an ensemble of machine learning algorithm (SVM, SOM, KNN, Naïve Bayes, LDA, CART), and SVM shows 100% as the highest training accuracy. Patil et al. [18] considered five specific kinds of emotions labeled anger, fear, happy, neutral, and sadness, that may a driver feels. Based on SVM and the extended Cohn-Kanade dataset, they obtained a performance of 86.7%. Hsieh et al. [24] proposed a facial expression detection system based on a multi-class SVM to classify six facial expressions (neutral, happiness, surprise, anger, disgust, and fear). This study locates facial components by active shape model to extract seven dynamic face regions (frown, nose wrinkle, two nasolabial folds, two eyebrows, and mouth). The recognition rate reached 93.7% on the Cohn–Kanade database. Acevedo et al. proposed a geometric descriptor based on areas and angles of triangles formed by the landmarks from face images, an adaptation of the KNN algorithm was used as a classifier and they achieved a 89.3% accuracy on the CK + dataset. The second FER approach used Deep Neural Networks (DNNs) algorithm based, such as Convolutional Neural Networks (CNNs), Long-Short Term Memory (LSTM), Generative Adversarial Networks (GANs for feature extraction, classification, and recognition tasks. Kotsia et al. [7] built several models capable of recognizing seven basic emotions from facial expressions using two public dataset (FER-2013 and CK+). They achieve 45.95% test accuracy using SVM and 66.67% using a CNN on the FER-2013 dataset. For the CK+ dataset, the accuracy was of 98.4%. Verma et al. [9] proposed a novel realtime driver emotion monitoring system “in the wild” based on face detection and facial expression analysis. A camera is placed inside the vehicle that continuously monitor the driver’s face and used a pretrained model (VGG16) to extract appearance features from the detected face image. Three public datasets (JAFFE,MMI,CK+) was exploited and the highest accuracy obtained was of 98.7% on the MMI dataset. Pramerdorfer et al. [8] proposed the video classification method using Recurrent Neural Networks (RNN) in addition to CNN to capture temporal as well spatial features of a video sequence. The methodology is tested on The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), and they achieved 61% for test accuracy.

Classification of the Driver’s Emotions Using a CNN

361

The presence of the current accelerated computing DNN-based methods has made their way into most fields of image processing and classification. Even in facial expression recognition CNN-based approaches which typically outperform conventional methods. Deep Learning-based approaches, particularly those using CNNs, have been very Table 1. Comparative study Ref. Objective [1]

Sensors

Emotion Webcam Recognition: seven emotions according with Dr Ekmans

Dataset

Features

Collected Faces, eyes, data: five mouth different data sets of increasing complexity

algorithms Accuracy MLP

97%

[15] Anxiety and anger Recognition

Electrocardiogram, Collected accelerometer data

ECG

LR



[23] Anger detection

Webcam

Faces

SVM

97%

JAFEE database

[20] Facial NIR expression camera recognition: six basic expressions (anger, disgust, fear, happiness, sadness, and surprise)

CK+, Geometric MMI, features KMU-FED

WRF

92.6% for CK + ,76.7% for MMI

[16] Facial expression analysis: seven generic expression classes

MMI, CK +

Faces

SVM

91.1%

[21] Emotion Electrocardiogram recognition: Happy-anger emotions

Collected data

ECG

SVM

83.33%

[19] Stress and anger recognition

Collected data

Skin SVM conductance



Electrodermal activity (EDA)

70%

(continued)

362

A. Soultana et al. Table 1. (continued)

Ref. Objective

Sensors

Dataset

[7]



FER-2013, Feature CK+ extraction

SVM, CNN

45.95% using an SVM, 66.67% using a CNN on the FER-2013 dataset

[17] Real-time driver emotion monitoring system

Camera

MMI, JAFFE, CK+

VGG16

98.7%

[8]



RAVDESS Feature extraction

CNN, RNN

61%

Recognizing seven basic emotions from facial expressions

Recognize facial expressions

Features

Feature extraction

algorithms Accuracy

successful at image-related tasks in recent years, due to their ability to come up with good representations from data (Table 1). The table above shows that there are two ways to work on the analysis and classification of emotions: either by using public databases such as: CK+, MMI, JAFFE, RAVDESS, FER2013, etc. or by using real experiences by collecting data using video cameras that monitor the driver’s face. We also remark that the work on the analysis and classification of emotions is not only based on the visual information of the face but there are other works that use an approach based on physiological information such as: electroencephalogram EEG, electrocardiogram ECG, skin conductance. Others papers relied on internal sensors information of the vehicle, namely speed acceleration or external methods, such as photographs of the road. We have also noticed a wide use of machine-learning and deep-learning algorithms in the classification of emotions. However, the most used ones are support vector machine SVM for machine learning algorithms and CNN neural network for deep-learning algorithms. Pantic et al. [6] have examined a wide variety of these techniques and found that they all involve three main steps: face detection, a process to extract facial expression information, and a process to classify the given information according to some predefined set of categories.

4 CNN Background The CNN is a set of three basic layers which are: Convolutional Layer, Pooling Layer, and Fully Connected Layer. As shown in Fig. 1 the input of this architecture is the raw image which may or may not be pre-processed. Thus, CNN is considered to be an automatic feature extraction method. In mathematical terms, the input to the CNN is

Classification of the Driver’s Emotions Using a CNN

363

matrix X of dimensions r * r * m where r is the height and width of the image and m is the number of channels present in the image i.e. RGB, grayscale or 0/1 image. – The convolutional layer is responsible for feature extraction. A convolutional layer contains multiple filters so that multiple features are extracted from an input image at each level. Each convolution layer has k kernels (or filters) of size n*n*q where n Æ m and q Æ r. Each kernel is convolved over the entire image to form k activation maps for next layers. Each filter includes a set of diverse weights and biases so that they can extract different local features. – The sub-sampling or pooling layer is responsible for reducing the spatial size of the Convolved Feature. This layer basically reduces the number of parameters and computation in the network, controlling overfitting by progressively reducing the spatial size of the network. – A fully connected layer or dense layer: This is the last phase for a CNN network which receives the input from the learned features and redirects it to a classification layer. This layer assigns labels to each class and identifies input images as the corresponding characters. – In addition to the above discussed layers there are two more layers, namely dropout layer and activation layer. – Dropout layer is a regularization technique.. It randomly selects neurons and disables them during training. The selected nodes are dropped-out with a given probability for each weight updated cycle. – Activation layer: The activation function is a node that is put at the end of or in between Neural Networks. They help decide if the neuron would fire or not. We have different types of activation functions: relu, tanh, sigmoid, elu to name a few.

5 CNN Based System for Emotion Detection Inspired by the recent success of deep convolutional neural networks (CNN) in visual recognition, we explore an effective deep learning-based method for image emotion classification.

Fig. 1. CNN classifier architecture

364

A. Soultana et al.

5.1 Dataset We have evaluated the suggested method on a publicly well-known available facial expression database FER2013 (Facial Expression Recognition 2013) [14]. FER2013dataset was created using the Google image search API and faces have been automatically registered. Faces are labeled as any of the six basic expressions as well as the neutral. The resulting database contains 35,887 images which mostly occur in wild settings. The data consists of 48 × 48 pixel grayscale images of faces. The faces have been automatically registered so that the face is more or less centered and occupies about the same amount of space in each image (Fig. 2).

Fig. 2. Example images from the FER2013 dataset

The above examples of images which are taken from the FER2013 dataset, illustrate variabilities in illumination, age, pose, intensity, and occlusions that occur under realistic conditions. Images in the same column depict identical expressions, namely anger, disgust, fear, happiness, sadness, surprise, as well as neutrality (Table 2). 5.2 CNN Hyperparameters The input data of the model is the original gray scale image of 48 × 48 × 1. Convolutional layer conv1 has 64 filters of size 3 × 3 and its stride is 1. Max-pool layer max-pool1 has 2 × 2 filter size. Conv2 has 128 filters of size 3 × 3 and stride is 1. Max-pool2 has 2 × 2 filter size. Conv3 has 256 filters of size 3x3 and stride is 1. Max-pool2 has 2 × 2 filter size. Conv4 has 512 filters of size 3 × 3 and stride is 1. Max-pool2 has 2 × 2 filter size. Finally, the multi-classification of facial expressions is realized by using the SoftMax activation function.

Classification of the Driver’s Emotions Using a CNN

365

Table 2. Fer2013 dataset. Training images

Test images

Total

Happy

7203

879

8028

Angry

1972

490

2462

Disgust

434

55

489

Neutral

3418

624

4042

Sad

4765

592

5357

Fear

3656

527

4183

Surprise

3163

415

3578

Total

24611

3582

28193

6 Results and Discussion We decided to focus our efforts on the FER-2013 dataset as we believe it to more accurately reflect real-time conditions of driver emotional state due to its automatically captured, non-posed photos. 6.1 Performance Metrics Several performance was used to evaluate the proposed model. The confusion matrix is a matrix that measures the quality of a classification system. Each row corresponds to a real class, each column corresponds to an estimated class. (TP: Number of correctly labelled positive samples, FP: Number of negative samples incorrectly labelled as positive, TN: Number of correctly labelled negative samples, FN: Number of positive samples incorrectly labelled as negative). – Accuracy: is the ratio of the number of correct predictions to the total number of input samples (Accuracy = (TP + TN)/(TP + FP + FN + TN)) – Precision: it is the ratio of correctly predicted positive observations to the total predicted positive observations (Precision = TP/TP + FP) – Recall: it is the ratio of correctly predicted positive observations to the all observations in actual class (Recall = TP/TP + FN) – F1-score: it is the weighted average of Precision and Recall. Therefore, this score takes both false positives and false negatives into account. (F1- score = 2*(Recall * Precision)/(Recall + Precision)) (Table 3 and Fig. 3). The model can be evaluated on the training dataset and on a hold out validation dataset after each update during training and plots of the measured performance can created to show learning curves. Reviewing model accuracy during training can be used to diagnose problems with learning, such as an underfit or overfit model. The loss value implies how well or badly

366

A. Soultana et al. Table 3. Confusion matrix Angry Disgust Fear Happy Neutral Sad Surprise Angry

275

9

29

25

69

75

8

Disgust

11

32

1

1

1

7

2

Fear

58

5

176

23

75

122

68

Happy

15

1

14 771

37

28

13

Neutral

27

1

21

35

443

88

9

Sad

47

1

41

37

116

341

9

Surprise

12

1

31

24

14

12 321

Fig. 3. Evaluate model accuracy & model loss

a certain model behaves after each optimization iteration. Ideally, one would expect a reduction in loss after each iteration or several iterations. The goal of training a model is to find a set of weights and biases that have low loss (Table 4). Table 4. Classification report Accuracy Precision Recall f1-score Angry

0.56

0.62

0.56

0.59

Disgust 0.58

0.64

0.58

0.61

Fear

0.33

0.56

0.33

0.42

Happy

0.88

0.84

0.88

0.86

Sad

0.58

0.59

0.71

0.64

Surprise 0.77

0.51

0.58

0.54

Neutral

0.75

0.77

0.76

0.71

Classification of the Driver’s Emotions Using a CNN

367

6.2 Discussion The overall model accuracy of 66.14% does not represent the whole picture, even more so when using unbalanced data., but by using the different metrics (accuracy, precision, recall, f1-score) for each class we can well analyze the performance of the CNN model. According to the performance metrics, we notice that some classes are well classified as (happy, surprise, neutral), others moderately classified as (sad, disgust, angry) and one class purely classified as (fear).

7 Real Time Detection We used OpenCV’s Haar cascades to detect and extract a face region from a webcam video feed, then classified it using our pretrained CNN model. Real-time classification better exposed our model’s strengths: neutral, sad, happy, fear, surprised, and angry were generally well-detected (Fig. 4).

Fig. 4. Real-time deployment on video streams.

8 Conclusion This work presents a deep neural network architecture for automated facial expression recognition. The proposed network consists of four convolutional layers each followed by max pooling and then two dense layers. The task is to categorize each face based on the emotion shown in the facial expression in to one of seven categories (Angry, Disgust, Fear, Happy, Sad, Surprise, Neutral). This work presents an approach based on a convolutional neural network (CNN) model for the recognition of emotional expression while driving a car. The accuracy of 66.14% has been reached on the public dataset fer2013. Understanding how emotional states influence driving behavior is crucial for

368

A. Soultana et al.

the development of advanced driver assistance systems. This would improve and enhance safety by flexibly adapting to the current state of the driver. Therefore, the next work will focus on the following question: once the state of the driver is known, what is the best strategy to improve driver emotion and optimize driving behavior. For the prospects we plan to improve our CNN model to increase its accuracy, also we will work on other public datasets and compare it through the results obtained, then we will focused on the most sensitive emotions involving a negative reaction from the driver and propose a framework involving several aspects of the driver’s state, sleep, emotion, distraction… With risk calculation.

References 1. Paschero, M., et al.: A real time classifier for emotion and stress recognition in a vehicle driver. In: 2012 IEEE International Symposium on Industrial Electronics, Hangzhou, China, May 2012, pp. 1690–1695 (2012). https://doi.org/10.1109/ISIE.2012.6237345 2. Saatci, Y., Town, C.: Cascaded classification of gender and facial expression using active appearance models. In: 7th International Conference on Automatic Face and Gesture Recognition (FGR06), Southampton, UK, pp. 393–400 (2006). https://doi.org/10.1109/FGR.200 6.29 3. Ekman, P., Friesen, W.V., Ellsworth, P.: Emotion in the Human Face: Guidelines for Research and an Integration of Findings, vol. 11. Elsevier (2013). 4. Agrawal, U., Giripunje, S., Bajaj, P.: Emotion and gesture recognition with soft computing tool for drivers assistance system in human centered transportation. In: 2013 IEEE International Conference on Systems, Man, and Cybernetics, Manchester, October 2013, pp. 4612–4616 (2013). https://doi.org/10.1109/SMC.2013.785 5. Liu, Q., Zhang, J., Xin, Y.: Face expression recognition based on improved convolutional neural network. In: Proceedings of the 2nd International Conference on Artificial Intelligence and Pattern Recognition - AIPR 2019, Beijing, China, pp. 61–65 (2019). https://doi.org/10. 1145/3357254.3357275 6. Pantic, M., Rothkrantz, L.J.M.: Facial action recognition for facial expression analysis from static face images. IEEE Trans. Syst. Man Cybern. B 34(3), 1449–1461 (2004). https://doi. org/10.1109/TSMCB.2004.825931 7. Kotsia, I., Pitas, I.: Facial expression recognition in image sequences using geometric deformation features and support vector machines. IEEE Trans. Image Process. 16(1), 172–187 (2007). https://doi.org/10.1109/TIP.2006.884954 8. Pramerdorfer, C., Kampel, M.: Facial Expression Recognition using Convolutional Neural Networks: State of the Art, p. 7. 9. Mollahosseini, A., Chan, D., Mahoor, M.H.: Going deeper in facial expression recognition using deep neural networks. In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, USA, p. 10, March 2016. https://doi.org/10.1109/WACV. 2016.7477450 10. Chen, M., Zhang, L., Allebach, J.P.: Learning deep features for image emotion classification. In: 2015 IEEE International Conference on Image Processing (ICIP), Quebec City, QC, Canada, pp. 4491–4495, September 2015. https://doi.org/10.1109/ICIP.2015.7351656 11. Georgescu, M.-I., Ionescu, R.T., Popescu, M.: Local learning with deep and handcrafted features for facial expression recognition. IEEE Access 7, 64827–64836 (2019). https://doi. org/10.1109/ACCESS.2019.2917266

Classification of the Driver’s Emotions Using a CNN

369

12. Wang, J., Gong, Y.: Recognition of multiple drivers’ emotional state. In: 2008 19th International Conference on Pattern Recognition, Tampa, FL, USA, pp. 1–4, December 2008. https:// doi.org/10.1109/ICPR.2008.4761904 13. Katsis, C.D., Katertsidis, N., Ganiatsas, G., Fotiadis, D.I.: Toward emotion recognition in car-racing drivers: a biosignal processing approach. IEEE Trans. Syst. Man Cybern. A 38(3), 502–512 (2008). https://doi.org/10.1109/TSMCA.2008.918624 14. Goodfellow, I.J., Erhan, D., Carrier, P.L., Courville, A., Mirza, M., Hamner, B., Cukierski, W., Tang, Y., Thaler, D., Lee, D.-H., Zhou, Y., Ramaiah, C., Feng, F., Li, R., Wang, X., Athanasakis, D., Shawe-Taylor, J., Milakov, M., Park, J., Ionescu, R., Popescu, M., Grozea, C., Bergstra, J., Xie, J., Romaszko, L., Xu, B., Chuang, Z., Bengio, Y.: Challenges in representation learning: a report on three machine learning contests. Neural Netw. 64, 59–63 (2015). Special Issue on “Deep Learning of Representations” 15. Dobbins, C., Fairclough, S.: A mobile lifelogging platform to measure anxiety and anger during real-life driving. In: 2017 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops), Kona, HI, pp. 327–332, March 2017. https://doi.org/10.1109/PERCOMW.2017.7917583 16. Khan, S.A., Hussain, S., Xiaoming, S., Yang, S.: An effective framework for driver fatigue recognition based on intelligent facial expressions analysis. IEEE Access 6, 67459–67468 (2018). https://doi.org/10.1109/ACCESS.2018.2878601.B 17. Verma, B., Choudhary, A.: Deep learning based real-time driver emotion monitoring. In: 2018 IEEE International Conference on Vehicular Electronics and Safety (ICVES), Madrid, pp. 1–6, September 2018. https://doi.org/10.1109/ICVES.2018.8519595 18. Patil, M., Veni, S.: Driver emotion recognition for enhancement of human machine interface in vehicles. In: 2019 International Conference on Communication and Signal Processing (ICCSP), Chennai, India, pp. 0420–0424, April 2019. https://doi.org/10.1109/ICCSP.2019. 8698045 19. Khai Ooi, J.S., Ahmad, S.A., Chong, Y.Z., Md Ali, S.H., Ai, G., Wagatsuma, H.: Driver emotion recognition framework based on electrodermal activity measurements during simulated driving conditions. In: 2016 IEEE EMBS Conference on Biomedical Engineering and Sciences (IECBES), Kuala Lumpur, pp. 365–369, December 2016. https://doi.org/10.1109/ IECBES.2016.7843475 20. Jeong, M., Ko, B.C.: Driver’s facial expression recognition in real-time for safe driving. Sensors 18(12), 4270 (2018). https://doi.org/10.3390/s18124270 21. Minhad, K.N., Ali, S.H.M., Reaz, M.B.I.: Happy-anger emotions classifications from electrocardiogram signal for automobile driving safety and awareness. J. Transp. Health 7, 75–89 (2017). https://doi.org/10.1016/j.jth.2017.11.001 22. Park, B.-J., Yoon, C., Jang, E.-H., Kim, D.-H.: Physiological Signals and Recognition of , p. 3 Negative Emotions 23. Azman, A., et al.: Real time driver anger detection. In: Kim, K.J., Baek, N. (eds.) Information Science and Applications 2018, vol. 514, pp. 157–167. Springer, Singapore (2019) 24. Hsieh, C.-C., Hsih, M.-H., Jiang, M.-K., Cheng, Y.-M., Liang, E.-H.: Effective semantic features for facial expressions recognition using SVM. Multimedia Tools Appl 75(11), 6663– 6682 (2016). https://doi.org/10.1007/s11042-015-2598-1

Deep Learning Based Driver’s Fatigue Detection Framework Zakaria Boucetta1(B) , Abdelaziz El Fazziki2 , and Mohamed El Adnani2 1 Faculty of Science and Technology, Cadi-Ayyad University, Marrakesh, Morocco

[email protected] 2 Faculty of Science, Cadi-Ayyad University, Marrakesh, Morocco

{elfazziki,md-eladnani}@uca.ac.ma

Abstract. Drivers’ fatigue is considered one of the main causes of fatal accidents. Its detection can be a challenging task especially in complex environment. In this work, we suggest a real-time and low-cost framework, its main goal is to augment the drivers ‘safety by detecting driver ‘fatigue using an embedded camera and deep learning techniques. The proposed approach starts with face and landmark localization using a multi-tasking convolutional neural network (MTCNN). Next, the eye region extraction, the eye status recognition using an optimized convolutional neural network and finally a fatigue judgment model is developed that lies on eye blinking counter technic. Our model is trained and tested using public face datasets, which made it possible to reach a higher accuracy of 94,84% compared to other existing works in the litterature. The final output of the proposed framework are notifications and alerts sent to the drivers in case of drowsiness situation. Keywords: Fatigue detection · Deep learning · Eye status recognition

1 Introduction Traffic accidents are considered to be one of the most common accidents in the world, and they cause many property losses, injuries and deaths. It mainly depends on the severity and strength of the accident, and there are many types, including collision with a foreign object, animal, or other car. Fatigue while driving is considered to be one of the main causes of extremely disastrous accidents [1, 2] and should be taken into account as extremely dangerous for the driver himself and other road users. To overcome this issue, a fatigue detection system for drivers is highly recommended. This field of research has attracted the attention of many researchers, as a result, multiple strategies have been developed to measure the driver’s drowsiness. The strategies can be categorized into 3 groups: 1. Vehicle movement while driving 2. Drivers mental and physical activities 3. driver monitoring using Computer Vision

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 370–380, 2021. https://doi.org/10.1007/978-3-030-66840-2_28

Deep Learning Based Driver’s Fatigue Detection Framework

371

In the first category of solutions, considerable amount of existent methods are established based on supervising steering wheel movement [3, 4]. While others focus on acceleration or breaking time series, lane departure to determine the level of drowsiness in [4, 5]. The solutions in the second group are lying on the driver’s mental and physical activities. Physical signals such as: electroencephalogram (EEG) that can be extracted to estimate brain dynamics [6]. The solutions in the two aforementioned groups have big limitations. The first category of methods are not robust and requires specific driving conditions. The second offers a very high level of precision for driver fatigue detection. However, in order to produce the required measurements, they require an electrode contacts attached to the chest or to the driver’s head. Consequently, The third group is becoming popular [7, 8]. Computer Vision techniques focus on determining eye status, yawning frequency and the entire face expressions and head positions. The objective of this paper is to measure the fatigue index based on driver eye activity using deep learning techniques. Driver’s images are retrieved from an embedded camera in the car focused on the driver’s seet. After detecting the face and its relevant landmarks using the preformed multifunctional convolutional neural network (MTCNN) and based on the highly optimized convolutional neural network (CNN) model, we judge driver fatigue. With these new technologies and methodologies, our solution is effective in detecting driver fatigue in real time with a very low cost that can be easily and widely used by drivers. This paper is organized as follows: Sect. 2 represents the related works, Sect. 3 illustrates the proposed approach, the framework architecture, the fatigue detection process as well as the details of each component of the proposed framework. The case study and the experimental results are presented in Sect. 4 and finally a discussion and conclusion are presented in Sect. 5 and 6 respectively.

2 Related Works In this section, we introduces some fatigue detection methods based on driver facial feature analysis. In [9], the authors developed an automatic computer vision based control system that judges a yawning condition in real time using a smart embedded camera. A back projection algorithm is used in order to detect yawning. It transforms a fully-closed mouth picture into gray picture as a reference image then computes its histogram. Depending on the comparison between the histogram of the reference image and the histogram of the candidate mouth region, the yawning state can be determined. The research study presented in [10] suggests an approach to retrieve blood volume pulse (BVP), eye blink and yawning states using multi-channel second order blind identification(SOBI). By analyzing the extracted variables in parallel, a combined determination of fatigue is made. The authors in [11, 12], implemented a fatigue judgment method based on the eye state determination using the Otsu’s algorithm in order to detect the pupil or iris. When

372

Z. Boucetta et al.

having a closed eye, the iris circle has more skin pixels. Thus, the eye state is concluded by detecting the amount of skin pixels in the iris circle. In [13] and [14], the author suggests an easy to implement eye state detection method that lies on estimating of the iris height and width using integral projection. The eye state is defined according to the comparaison between the iris height and width. The advantage of this method is no training is required however it is severely affected by illumination conditions. In [15], an image processing method is presented. It aims to enhance driver face image characteristics gathered by cameras under different and dynamically changing illumination conditions. an adaptive attenuation quantification retinex (AAQR) method is suggested in this study in order to enhance the characteristics of night-time pictures and put the light on the ability of the method for future driver face detection applications. In [16], an SDN-based method is proposed to develop the safety-oriented vehicular controller area network (SOVCAN). Its main objective is to detect drivers’ fatigue and mood changes in controller area network to ensure a safe driving.

3 Overview of the Proposed Fatigue Detection Approach 3.1 Framework Architecture and Fatigue Detection Process The human face markers contain very relevant information that can be used to predict his state and especially the state of fatigue. The proposed fatigue detection framework architecture is illustrated in Fig. 1.

MTCNN pre-trained model Input image

Bounding Box MTCNN

Facial landmark

Model training Fague detecon model

Keras API

DS

Opmized models

Eye region extracon Image classificaon Fague judgment Alert/ Noficaon

Driver

Model training Model tesng Model validaon

Fig. 1. Fatigue detection framework architecture

Five steps compose the fatigue detection process as shown in Fig. 2: • Step 1: Face detection is a typical first step in many drivers’ surveillance systems to establish a region of interest and reduce computational cost.

Deep Learning Based Driver’s Fatigue Detection Framework

373

Fig. 2. The fatigue detection process

• Step 2: In this step, if a face is detected in the previous step, the system proceeds to the eye region extraction. • Step 3: the extracted region in step two presents the input parameter in this step for the recognition of eyes status. • Step 4: Based on the results of ocular status recognition, our framework judges the state of driver fatigue. • Step 5: Depending on the state of fatigue, the system triggers the alert or notification process for the drivers or it comes back to the first step in the absence of fatigue. 3.2 Input Image Image capture is done in real time. High resolution cameras will be used for this purpose. As there are 1000 ms per second, and a blink takes between 300 and 400 ms, so a wink takes about a third of a second. Although it seems short, given the duration of a single second, it’s quite important. Standard front cameras typically have 30 fps recording speeds. For our system, a capture device of 20 frames per second is suitable. 3.3 Face Detection and Landmark Localization Inspired by the work done by [17], we use their method for the Joint Face Detection and Alignment. It’s a deep learning approach that utilizes three phases of Convolution Neural Networks to determine the face and five landmark locations which help in face alignment. Firstly, the bounding boxes of an image should be predicted. To do so, a Proposal Network (P-Net) that uses regression techniques is suggested. And in order to merge highly overlapped candidates, non-maximum suppression (NMS) is used after that. Secondly, the Refine Network (R-Net) is launched; it uses the output candidates of the previous P-Net as its input. Its main goal is to predict more accurate bounding boxes and eliminate larger number of false candidate to finally apply NMS to erase overlapped boxes. Thirdly, another CNN called Output Network (O-Net) is called. This network provides to the system five landmark positions after predicting more accurate bounding boxes (see Fig. 3). In our research study, we opted for a convolutional multi-tasking neural network (MTCNN) for face recognition [17]. It allows the detection of face and also allows the detection of key points of the human face, including five key points: the corners of the

374

Z. Boucetta et al.

Original Image P-Net Bounding boxes

R-Net Bounding boxes

O-Net Bounding boxes & Landmarks

Fig. 3. MTCNN process

right and left mouth, the center of the nose and the centers of the right and left eyes. The stages of the MTCNN process are described in Fig. 4.

Fig. 4. MTCNN stages

3.4 Eye Region Extraction In real life situations the driver’s head can be tilted, the central position of the right eye detected by the MTCNN network is p1 (x1, y1), the central position of the left eye is p2 (x2, y2), the distance between the two eyes is d, the width of the image of the eye is w and the height h. The corresponding relationships are as follows:  d = (x1 − x2 )2 + (y1 − y2 )2 (1) w=

5 ×d 3

(2)

H=

10 ×d 9

(3)

Image patches are then extracted around the eyes. Considering that the size of the driver’s eye image can differ according to the driving environment, we firstly adjust the

Deep Learning Based Driver’s Fatigue Detection Framework

375

input image to 48 × 32. Thereafter, the input images are fed to the proposed model for learning and classification. 3.5 Eye Status Recognition To determine the eye status, this paper uses a convolutional neural network. As shown in Fig. 5, the network is structured over five layers. The step size is 1 and the kernel size 3 × 3 for all convolutional layers. The size of the convolution kernel in all of the pooling layers is 2 × 2, and the step size is 2 without filling. The pooling layer can reduce the image size while retaining the main features and reduce the image size to half of the original image, which excessively reduces the number of parameters for the model; thus, the model is relatively simple. The trained model prevents over-fitting. The size of the image is reduced to half of the original after the pooling layer. The number of nodes in the full-connection layer, which integrates the features extracted from the previous layers.

Fig. 5. Eye key points

3.6 Fatigue Judgment Our framework will predict whether the eye is open or closed and flag every transition from open to closed state. An eye blinking counter (BC) is incremented by one after each two successive changes of opening to closing and closing to opening. According to statistics, a driver should blink an average of ten times per minute under normal conditions, blinking every 2 to 6 s, and spending 0.2 to 0.4 s each time. The proposed method determines the fatigue state of the drivers using the blinking counter. In our experiment, the video in the data set was sampled at a rate of 20 frames per second to obtain a set of pictures, so the time period of each frame of image is 50 ms, when the number of consecutively-eyes-closed frames exceeds 40 frames, the driver is in danger.

4 Experimentation and Results In this section, we present the fatigue detection service and discuss its safety benefits for road users in general and for drivers in particular. In addition, the service monitoring the

376

Z. Boucetta et al.

state of driver fatigue is illustrated by presenting some analysis results which demonstrate the importance of our solution. Figures 6 and 7, respectively present examples of face images and cropped ocular zones from the CEW [18], ZJU [18] and DDR [19] datasets. The number of ocular images extracted from the three data sets is shown in Table 1. In this study, we test our methods using the images of the eyes cropped by Song et al. [18]. Features extraction CONV: 3x3

MP: 2x2

CONV:3x3

MP: 2x2 CONV:3x3

Fully connected

MP: 2x2

Open Close

Input size 48x32x3 48x32x3

24x16x3

24x16x6

12x8x64

12x8x124

6x4x128

512

Fig. 6. The eye status recognition convolutional neural network.

Fig. 7. Sample face images from dataset

Table 1. Details of used datasets Dataset

Eye open Eye close Total

CEW

2120

2332

4452

Synthetic dataset 9140

8521

17661

The diversity, quality and size of the training data present the essential factors for an efficient estimation of each machine learning problem, in order to meet these criteria, for the estimation of the state of the eyes we have combined two types of data sets. The first is CEW dataset, the second type is a synthetic eye dataset that we created.

Deep Learning Based Driver’s Fatigue Detection Framework

377

CEW dataset: The AI dataset for the detection of ocular conditions in nature was first introduced in [18]. It contains a large amount of data and takes into account environmental changes such as lighting, blur, occlusion, and disguise. Our synthetic dataset was generated using UnityEyes [20]. UnityEyes is a new method for fast, varied and high-quality generation of eye rea images that can be used as training data (Figs. 8 and 9).

Fig. 8. Sample eye patches in dataset

Fig. 9. The generated eye image with UnityEyes

Data normalization is an important step that ensures that each input parameter (pixel, in this case) has a similar data distribution. It also removes the influence of high- and very low-frequency noise. This makes convergence faster while training the network. The input data are normalized as follows: N_ I =

I − mean(I) max(I) − min(I)

(4)

where N_I denotes the normalized data and I denotes the input data. Data augmentation is another common preprocessing technique that involves augmenting the existing dataset with disturbed versions of existing images in order to expose the neural network to a wide range of variations. In this research work, we used five types of data augmentation which are: Image rotations via the rotation_range argument. Brightness of the image via the bright_range argument.

378

Z. Boucetta et al.

The image moves via the width_shift_range and height_shift_range arguments. The image toggles via the horizontal_flip and vertical_flip arguments. Image zoom via the zoom_range argument The augmentation phase is implemented using the ImageDataGenerator class from The Keras library which gives the ability to automatically use data augmentation when training a model by specifying constructor arguments. In this research work, we compared the performance of our model with those already proposed in the literature. Gabor [18] and Hog [21] are used in this comparison. In addition, the results of these methods are presented in Table 2. Table 2. Comparison results of the proposed method and state of the art methods sing our dataset Research

Method Accuracy

Song [18]

HOG

94,57

Dong [21]

Gabor

94,72

Ours

CNN

94,84

The CNN was implemented using Python [22] with the Keras [23] library. Keras is a high-level neural networks API, written in Python, handles the way we make models, defining layers, or set up multiple input-output models. In this level, Keras also compiles our model with loss and optimizer functions, training process with fit function (Fig. 10).

Fig. 10. Model accuracy on training set and verification set.

5 Discussion The objective of this study is to detect fatigue will driving. Being particularly interested in fatigue is not arbitrary because the fact of detecting it can significantly reduce the

Deep Learning Based Driver’s Fatigue Detection Framework

379

number of accidents and avoid serious damage. Based on the data classification result, our framework determines the presence of fatigue by combining several parameters to provide a clear view of its condition. The proposed framework represents a fatigue driver condition monitoring notification service based on a CNN. The data is retrieved from a camera embedded in the car focusing mainly on the driver. These data present the input for the proposed framework, however, reducing the cost of processing retrieved images presents a significant challenge in this study. To overcome this problem, the extraction of areas of interest using a pre-trained model was considered. We chose the Multitask Cascade Convolutional Networks (MTCNN) as the facial detection model and the relevant face landmarks due to its high reliability and flexibility. Our objective is to classify the recovered images from the embedded camera using an optimized CNN because it has the capacity to implicitly extract the characteristics and predict the classes of images according to training and testing phase. The size of the training data set has a strong influence on the accuracy of the model. training data from our CNN model was retrieved from public data sets, a data augmentation phase was adopted to add more precision to our model. In this study, we tested several CNN techniques and architectures and we experimented with them to find the optimal composition of the CNN model. High precision classification performance was obtained compared to other solutions, and the average precision was greater than 94.8%.

6 Conclusion The contribution of this paper aims to deal with the drowsiness while driving. This latter is considered to be one of the main daily causes of fatal accidents on the road. In order to do that, an automatic fatigue detection framework is proposed. It lies on receiving images from embedded camera in the user’s car. After that, a multi-task convolutional neural network is adopted to retrieve the image landmarks. Next, the eye region is extracted in order to be inputted into another convolutional neural network for eye status recognition. A fatigue judgment based the eye blinking is then applied to determine either to send a notification or an alert to the driver. Our solution provides a high accuracy in fatigue detection compared to other solutions as aforementioned. In future works, we will consider other parameters in the fatigue judgment mainly the yawning and the Perclos measure.

References 1. Akbar, I.A., Rumagit, A.M., Utsunomiya, M., Morie, T., Igasaki, T.: Three Drowsiness Categories Assessment by Electroencephalogram in Driving Simulator Environment *, pp. 2904–2907 (2017) 2. Reddy, B., Kim, Y., Yun, S., Seo, C., Jang, J.: Real-time Driver Drowsiness Detection for Embedded System Using Model Compression of Deep Neural Networks (2017) 3. Fagerberg, K.: Vehicle-Based Detection of Inattentive Driving for Integration in an Adaptive Lane Departure Warning System - Drowsiness Detection, no. March (2004)

380

Z. Boucetta et al.

4. Sommer, D.: Steering Wheel Behavior Based Estimation of Fatigue, no. October (2017) 5. Mattsson, K.: In-Vehicle Prediction of Truck Driver Sleepiness Lane Position Variables InVehicle Prediction of Truck Driver Sleepiness (2007) 6. Mardi, Z., Naghmeh, S., Ashtiani, M., Mikaili, M.: EEG-based drowsiness detection for safe driving using chaotic features and statistical tests, vol. 1, no. 2, pp. 130–137 (2011) 7. Danisman, T., Bilasco, I.M., Djeraba, C., Ihaddadene, N.: Drowsy driver detection system using eye blink patterns, In: 2010 International Conference on Machine and Web Intelligence, ICMWI 2010 - Proceedings, pp. 230–233 (2010) 8. Abtahi, S., Hariri, B., Shirmohammadi, S.: Driver drowsiness monitoring based on yawning detection. In: Conference Record - IEEE Instrumentation and Measurement Technology Conference, no. May 2011, pp. 1606–1610 (2011) 9. Omidyeganeh, M., et al.: Yawning detection using embedded smart cameras. IEEE Trans. Instrum. Meas. 65(3), 570–582 (2016) 10. Zhang, C., Wu, X., Zheng, X., Yu, S.: Driver drowsiness detection using multi-channel second order blind identifications. IEEE Access 7, 11829–11843 (2019) 11. Rahman, A., Sirshar, M., Khan, A.: Real Time Drowsiness Detection using Eye Blink Monitoring, no. Nsec, pp. 1–7 (2015) 12. Lei, J., Han, Q., Chen, L., Lai, Z., Zeng, L., Liu, X.: A novel side face contour extraction algorithm for driving fatigue statue recognition. IEEE Access 5, 5723–5730 (2017) 13. Lu, Y., Li, C.: Recognition of driver eyes’ states based on variance projections function. In: Proceedings - 2010 3rd International Congress on Image and Signal Processing. CISP 2010, vol. 4, pp. 1919–1922 (2010) 14. Omidyeganeh, M., Javadtalab, A., Shirmohammadi, S.: Intelligent driver drowsiness detection through fusion of yawning and eye closure. In: VECIMS 2011 - 2011 IEEE International Conference on Virtual Environments, Human-Computer Interfaces and Measurement Systems Proceedings, pp. 18–23 (2011) 15. Shen, J., et al.: Nighttime driving safety improvement via image enhancement for driver face detection. IEEE Access 6(c), 45625–45634 (2018) 16. Zhang, Y., Chen, M., Guizani, N., Wu, D., Leung, V.C.M.: SOVCAN: safety-oriented vehicular controller area network. IEEE Commun. Mag. 55(8), 94–99 (2017) 17. Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint Face Detection and Alignment using Multi - task Cascaded Convolutional Networks, no. 1, pp. 1–5 18. Song, F., Tan, X., Liu, X., Chen, S.: Eyes closeness detection from still images with multi-scale histograms of principal oriented gradients. Pattern Recognit. 47(9), 2825–2838 (2014) 19. Wang, Z., Zhang, G.: Human fatigue expression recognition through image-based dynamic multi-information and bimodal deep learning, vol. 25, no. 5 (2016) 20. Bermudez, C., Plassard, A.J., Davis, L.T., Newton, A.T., Resnick, S.M., Landman, B.A.: Learning an Appearance-Based Gaze Estimator from One Million Synthesised Images Erroll, pp. 131–138 (2018) 21. Dong, Y., Zhang, Y., Yue, J.: Comparison of random forest, random ferns and support vector machine for eye state classification (2015) 22. Python documentation. https://www.python.org/. Accessed 03 Feb 2020 23. Keras documentation. https://keras.io/. Accessed 10 Feb 2020

DSRC vs LTE V2X for Autonomous Vehicle Connectivity Kawtar Jellid(B) and Tomader Mazri National School of Applied Sciences, Kenitra, Morocco [email protected], [email protected]

Abstract. Autonomous vehicle uses a fully automated driving system to enable the vehicle to respond to external conditions that a human driver manages; V2X represents an important role in allowing vehicles to become more and more autonomous so for autonomous cars v2x communication is essential. The first difficulty of V2X is to integrate the management of this data on the roads, to transmit them, and then integrate them into the control system. An autonomous car today uses Lidars, cameras, radars and a GPS for its perception, several access technologies support V2X, we can cite for example DSRC and cellular communications, as well as LTE, and they are promising for reliable and efficient vehicular communications. DSRC enables low direct latency communications between different vehicles i.e. V2V and between vehicles and roadside units (RSU) V2I, in addition to the DSRC there are other technologies such as, 5G, LTE V2X, C-V2X, In our paper we will present a comparative study between the DSRC and LTE V2X based on several criteria as well as a simulation to evaluate the performance of packet delivery. Keywords: V2X · Autonomous vehicle · DSRC · LTE V2X

1 Introduction Recent research and advancements in detection, automation, computing, communication and networking vehicle technologies promise improved and more efficient road safety and traffic efficiency and fuel consumption and emissions, by exploiting detection and communication capabilities, vehicles can cooperate and extend their awareness of context beyond the visual field. Cooperative vehicles share their driving intentions with other traffic actors, thus accurately predicting which others traffic participants will make and optimize their own decisions and maneuvers [1], in fact there are 5 levels of automation which leads to an autonomous vehicle or car which requires no human intervention using several technologies: 5G, V2X communication. Dsrc and one of the technologies used by V2X communication which suffers suffers from the quality of the links degradation with the presence of buildings and vehicles, especially in urban areas, where canal collisions become serious when the density of vehicles is high on the other hand the DSRC is still to be implemented, the cellular V2X © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 381–394, 2021. https://doi.org/10.1007/978-3-030-66840-2_29

382

K. Jellid and T. Mazri

(C-V2X) catches up thanks to the advancement of radio access technologies such as well as the well-maintained infrastructure [2]. We understand that the autonomous vehicle needs to capture billions of data to know what is happening (and especially what will happen) around it, and with a latency time of the order of a thousandth. Seconds, so that it can translate this into a safety benefit, to capture all this data, manufacturers have therefore planned sensors placed on cars capable of detecting what is happening up to 250 m. To “see” beyond, arrives V2X (Vehicle To Everything), which means that cars and other 5G connected objects communicate with each other to transmit information that each has received, and which therefore becomes useful to everyone. In fact many researchers are looking for ways to improve the performance of the DSRC; many researchers are looking for alternative technologies that could be used in V2X system. LTE-V is considered to be one of the most promising communications technologies that could replace DSRC. LTE-V is a recent version of LTE, which can provide low latency mobile communication speed. LTE is now predominant in UMTS and the ubiquitous deployed LTE base stations make building the V2X system much easier. 3GPP actively conducts studies and specifications work on V2X based on LTE. An element of study on LTE V2X Services has been approved by 3GPP, in which PC5V2V based had been given the highest priority. This radio Access Network Feasibility Study (RAN) has completed part of the PC5 transport for V2V services [3], VANET (Vehicle Ad Hoc Network) applications can present their own unique requirements and challenges of wireless communication technology, although considered the first standard for VANETs, IEEE 802.11p is still in the field testing stage. Recently, the LTE V2X (Long-Term Evolution Vehicular to X) protocol appeared as a systematic V2X solution based on 4G TD-LTE (Time Division Long-Term Evolution). In this article we first we will present a comparison between the DSRC and LTE V2X technologies based on several criteria and the principle and architecture of each technology and finally a simulation to assess the success rate of package delivery in case of congestion.

2 Related Work Several researchers are exploiting complementary or alternative vehicle communication technologies because of the limitations of DSRC. Lately, researchers became interested in exploiting and using cellular communication networks as well as 5G for V2X, this technology is also known under the Cellular name-V2X (C-V2X), This technology has been standardized by the 3rd Generation Partnership (3GPP) Cellular networks are expected to develop and improve the performance of vehicle communication (V2X). 3GPP released in version 14 C-V2X (also called LTE-V or LTE-V2X) which uses the LTE PC5 dedicated interface for V2V (Vehicle-to-Vehicle) communications, This standard was designed to support load cooperative traffic efficiency and security applications, and it is composed of two modes of operation. In C-V2X Mode 3, vehicles communicate directly with each other, however communications are handled by the cellular infrastructure which selects the radio resource or sub-channels for each V2V

DSRC vs LTE V2X for Autonomous Vehicle Connectivity

383

transmission. On the other hand, the CV2X Mode 4 does not require the support of the cellular network infrastructure and the vehicles autonomously select the sub-channels or the radio resources for their V2V transmission. This is the reason why the 3GPP standard defines a semi-persistent distribute scheduling program that all vehicles must implement. C-V2X Mode 4 is very powerful and efficient because it can support V2V security applications in the absence of cellular infrastructure coverage. Therefore, careful configuration of C-V2X Mode 4 is necessary to increase its communication range, efficiency and capacity [4]. Simulation research shows that LTE latency and message delivery rate outperform DSRC, on a number of parameters such as range, vehicle speed and number of vehicles on a channel [5]. Several simulation experiments that have been carried out prove that the latency is always less than 100 ms. As the number of vehicles increases, the latency increases, but when there are 150 vehicles in the same channel it does not exceed 60 ms [5]. The research literature also shows that the packet delivery rate of LTE is better than that of DSRC, which at its absolute best is only 80% and decreasing rapidly, while that of LTE is 95% or more. The communication range for LTE is greater than that of DSRC, where the range of DSRC is considered to be between 300m and 1km, while cellular radios, depending on the power and type of cell tower, and might have coverage up to about 10 miles [6].

3 Comparison between LTE V2X and DSRC4 In This section, we mainly focus on a comparison between LTE V2X and DSRC at the physical level, we present to you a vision allowing to know the principle of DSRC and LTE V2X, Architecture, Physical Layer, Frame Structure, and frequency offset estimation algorithm. 3.1 DSRC (Dedicated short-range communications) • Definition DSRC (Dedicated Short Range Communications) is one of the research hotspot and has already become the V2X communication standard in some regions, such as America and Europe. In America, the Federal Communications Commission (FCC) has allocated 75 MHz bandwidth for DSRC, from 5.850 GHz to 5.925 GHz, which is divided into 7 channels, 6 service channels (SCH) and 1 control channel (CCH). Indeed On the basis of the allocated spectrum, the IEEE has published a series of communication standards for the entire protocol stack. The standards are collectively referred to as Wireless Access for Vehicle Environments (WAVE). The WAVE stack includes IEEE 802.11p and IEEE 1609.x, the former defines the physical layer and part of the middle access control (MAC) layer, which is mostly changed from 802.11a. Compared to 802.11a, some procedures such as authentication and acknowledgment are omitted to speed up the access process, and other changes are made to suit the transport environment. The IEEE 1609.x family defines security services, architecture, resource

384

K. Jellid and T. Mazri

Fig. 1. DSRC protocol suite [8].

management, networking services, multi-channel operations, and physical access for short-range, low-latency communications in vehicular environments [7]. Dedicated Short Range Communication (DSRC), it is often used in Wireless Access in Vehicle Environment (WAVE), is a suite of dedicated protocols for low latency networks in vehicular environments. This group of protocols as can be seen in Fig. 1 looks a lot like TCP / IP over Wifi. In fact, it supports the IPv6 stack in parallel with a network and transport layer protocol called Wave Short Message Protocol (WSMP) dedicated to the DSRC suite. The WSMP branch of the protocol suite allows faster configuration and more space-saving transmissions [9]. 3.1.1 DSRC Architecture The DSRC protocol for vehicle-to-beacon communications has been defined as a lightweight OS1 communication stack ([10]).

Fig. 2. DSRC protocol stack

In fact it consists of three layers: L1: the physical layer, L2 the data link layer and L7: the application layer, see Fig. 2. This architecture is popular for real-time systems

DSRC vs LTE V2X for Autonomous Vehicle Connectivity

385

because it reduces protocol overhead and meets the challenges and time constraints. The system was dedicated to support different physical media, multi-application, scenarios and an environment containing several channels. This will ensure a wide variety of possible fields of application for this technology [10]. 3.1.2 Physical Layer Architecture The layers shown in Fig. 1 allow you to examine the different layers of the DSRC Protocol Stack in detail, from bottom to top and starting with the physical layer. The DSRC PHY protocol is defined in IEEE 802.11, specifically, as modified by IEEE 802.11p [11]. In fact it is divided into two sublayers: the dependent physical medium (PMD) sublayer and physical layer convergence procedure (PLCP) sublayer. As the name suggests, PMD interfaces directly with wireless support. It uses the familiar orthogonal frequency division multiplexing (OFDM) technique, originally added to 802.11 in the 802.11a amendment. PLCP represents the mapping between the MAC frame and the basic PHY layer data unit, the OFDM symbol. In 2003, an earlier version of DSRC PHY was published under the auspices of ASTM International in ASTM E2213–03 [11], which was also based on IEEE 802.11. In 2004, interested parties obtained approval to create the WAVE IEEE 802.11p amendment for DSRC within the IEEE 802.11 (WG) working group. The amendment was released in 2010. Deviations from the main 802.11 standard has been minimized to encourage 802.11 silicon vendors to add support for 802.11p, which would help reduce costs by taking advantage of the large volume of 802.11 chips produced annually. There are orders of more WiFi equipped cell phones sold each year than new vehicles. The automotive industry considers the PHY and MAC parts of ASTM E2213–03 to be obsolete in favor of IEEE 802.11 and 802.11p. The United States Federal Communications Commission (FCC) the regulations for DSRC [12, 13], however, still incorporate by reference rules contained in ASTM E2213–03. It was anticipated that FCC regulations would eventually be developed and updated to instead require compliance with IEEE 802.11 and 802.11p. 3.1.3 Frame Structure Figure 3 shows the physical layer data frame structure.A1-A10 are ten identical short training symbols, each 16 samples long. A subset of these symbols are used for automatic packet detection gain control (AGC) and various diversity combination schemes. The remaining short training symbols are used for the coarse estimate of the frequency offset and the coarse estimate of the symbol timing. These training symbols are followed by two identical long training symbols, C1-C2, which is used for channel estimation, fine frequency and symbol timing estimation. C1 and C2 are 64 samples long and the 32 sample long CP1 is the cyclic prefix which protects against intersymbol interference (ISI) from short training symbols. After short and long drive symbols, comes the modulated actual OFDM payload symbols. The first OFDM data symbol is the physical layer header which is BPSK modulated and specifies the modulation scheme used in the following payload OFDM symbols.

386

K. Jellid and T. Mazri

Fig. 3. DSRC PHY frame format

Each OFDM symbol consists of 64 samples and a length of 16 CP samples which is pre-affixed for each OFDM symbol to combat ISI [14]. 3.1.4 DSRC Frequency Offset Estimation Algorithm For the DSRC receiver, there are two steps to estimate and correct frequency errors. The detailed steps can be seen in Algorithm 1. 1: The short training sequences (for coarse frequency offset estimation) and the long training sequence (for Frequency offset estimation) are utilized in the PLCP preamble to correct the frequency error, and the integer and non-integer parts of the frequency error can be corrected at the same time; 2: Four pilot subcarriers of every OFDM symbol are used for carrier phase tracking to alleviate the residual frequency error and phase noise [15]. 3.2 LTE V2X (Long Term Evolution Vehicle to Everything) • Definition LTE based V2V (Vehicle to Vehicle) WI (Work Item) was approved in December 2015 [16], and LTE based V2X (Vehicle to Everything) WI was approved in September 2016 [17] in 3GPP. LTE-V2X is relatively considered to be a new technology and is specifically designed to support vehicular communication scenarios. As already mentioned the first version of LTE-V2X was released by 3GPP in 2016 under the umbrella of LTE-release 14 specification, as an extension of LTE Device-toDevice (D2D) functionality which is standardized in LTE version 12. LTE-V2X uses a secondary link which describes the physical channels and is basically based on the waveform of the LTE uplink. In LTE-V2X, there are two communication radio interfaces: firstly LTE-PC5 which is also known as LTE side link (PC5 refers to the radio interface name where user equipment (UE) communicates directly with another UE on the direct channel) and secondly LTEUu (UTRAN (Universal Terrestrial Radio Access Network), (The radio interface between the eNodeB and the user Equipment) as shown in Fig. 4 [18].

DSRC vs LTE V2X for Autonomous Vehicle Connectivity

387

Fig. 4. LTE-V2X architecture[15].

3.2.1 Frame Structure The following figure shows the frame structure of LTE V2X. In the frame structure, there are 14 TTI (Transmission. Time intervals), in which four DMRS (Demodulation Reference signals) and a GP (guard period) are included, and the rest are data symbols (Fig. 5).

Fig. 5. Frame structure of LTE V2X [12]

For V2V, the data frame structure of D2D defined in 3GPP TS 36.211 and 3GPP TS 36.212 is reused: There are 14 symbols in a TTI which lasts 1 ms, and the last symbol is used as the on-call period. In PSSCH/PSCCH/PSDCH of 3GPP Rel 12/13 D2D, there are two DMRS per PRB and the DMRS time interval is 0.5 ms. When the speed of the mobile terminal increases, for example 140 km / h, and the center frequency of the signal is 6.0GHz, the coherence time (about 0.277ms) of the signal will be less than the current DMRS time interval (approximately 0.5 SP). On the other hand, the demodulation performance of the data will rise sharply due to poor channel estimation and a consequent lack of channel information. There is a consensus that the DMRS density over time the domain should be increased to four symbols. 3.2.2 LTE V2X Frequency Offset Estimation Algorithm 1: Timing detection by searching the peak of channel estimation transformed to the time domain, ! d; 2: Local DMRS sequence is transformed to the time domain, ! P(n); 3: Sequence shift of sequence in Step 2 according to timing in Step 1, ! e P(n) = P(mod(n + d;N)); 4: Received DMRS symbol is transformed to the time Domain, ! r(n); 5: Correlation is done for sequence in Step 3 and Step 4;

388

K. Jellid and T. Mazri

6: Frequency offset is estimated by comparing the angle Difference offset half and second half of sequence in Step 5.

 f =  ×

1 2π t tan−1

N /2−1  n=0

˜ + P(n

N /2−1  n=0

 ˜ P(n)r(n) 

N N 2 )r(n + 2 )

To conclude this part shows us several points of difference allowing to see the difference between LTE V2X and DSRC namely the architecture of each of the two technologies, physical layer, frame structure the next section will help us to have more clarification on the comparison between these two technohnologies based on a simulation that evaluates the delivery performance of the packages.

4 DSRC vs LTE V2X Comparative Study In this section a comparative study was carried out between the DSRC protocol and LTE V2X as well a table is presented which summarizes a comparative study between DSRC and LTE V2X based on several criteria allowing differentiating the performance of each protocol. The DSRC is based on the IEEE 802.11p standard, which is an amended version of the IEEE Std. 802.11a to take advantage of the distributed capability and simplicity of operation of 802.11 networks, such as dynamic spectrum access, rapid deployment, and efficient network access. In a matter of fact, various V2X technologies have been developed to support ubiquitous, large-scale, high-performance communication methods for vehicle users, including both IEEE 802.11 V2X and cellular V2X (C-V2X). There are three major steps in order to improve V2X applications. The first and second stages focus on the areas of ITS telematics and advanced auxiliary driving, respectively. As the era of 5G approaches, V2X technology moves into the third stage which can support a wider range of advanced automotive applications, such as autonomous vehicles, remote and cooperative driving, and environmental perception and control. Real-time ITS [19],With its development, LTE has marked great success around the world, LTE-V2X can greatly benefit from the design, integration and scale of the LTE market. With the versatile communication types of one-to-one to one-to-many transmissions in LTE and the harmonization of the re-use of the application layer standard of SAE, the standardization of LTE-V2X in 3GPP can focus on standard developments for radio and layered network with a spectrally efficient air interface, the performance of LTE V2X can be proven to be superior to that of IEEE 802.11pe, based on simulation results, LTE- V2X can provide better performance and leverage successful deployments and ecosystem [20]. The following table shows the comparison points between DRSC and LTE VX based on several criteria namely Channel width, Frequency band, bite rate, range, capacity, coverage, mobility assistance, Market penetration (Table 1).

DSRC vs LTE V2X for Autonomous Vehicle Connectivity

389

Table 1. Table showing a comparison between DSRC and LTE V2X Feature

DSRC

LTE-V2X

Channel Width

10 MHz

10 MHz Up to 20 MHz

Frequency Band

5.86–5.92 GHz 3.4–3.8 GHz –5.9 GHz

Bite Rate

3–27 Mbits/s

Up to 1 Gbits/s

Range

Up to 1 km

Up to 30 km/h

Capacity

Medium

Very High

Coverage

Intermittent

Ubiquitous

Mobility Support

Medium

Very High

Market Penetration Low

High

To summarize this part we can see from the comparative table that the LTE VX exceeds the DSRC especially for Channel Width, a capacity which is very high with the LTE V2X as well as mobility support and market penetration.

5 Simulation and Performance Evaluation of DSRC and LTE V2X Protocol In this section, the article presents the approaches used to compare DSRC and LTE for vehicular communication. There are several software tools, namely OMNET ++, and ns-3 widely used to develop V2X simulations, for. The evaluation of DSRC versus LTE in this article we used the software Ns3, which is a discrete network simulator that uses. C/C++ programming to create networking scenarios for DSRC and LTE. In order to analyze the results of the simulation, the simulator the pure data output is imported to other software, such as MATLAB, to create a visual representation of the Results. • Motivation for using the simulator NS3: – NS-3 provides a controlled environment to perform experimental evaluation of protocols when equipment is limited. – NS3 contains model libraries to simulate the architecture of the vehicle environment wireless access system (WAVE), it is considered one of the most reliable simulators for testing V2X protocols. – NS3 is a discrete event simulator, which means the simulation time update is event based. – the NS3 is developed to study the V2V, V2I and V2X communication models in an urban and road environment. In this design, several parameters are taken into account and each of them must be carefully configured in order to avoid complications.

390

K. Jellid and T. Mazri

A. DSRC setup In order to test the performance of the DSRC, various tests were created with parameters such as traffic type, maximum latency, congestion and range. In each scenario, the packet delivery success rate is measured against each parameter. The overall maximum latency was one of the major factors in the packet delivery success rate. He determines the time required delivering a message in a sometime and if not, it is considered a failure message. Congestion and reach measure the number of vehicles on the road and distance between vehicles deliver messages, respectively. The DSRC simulation output is the success rate of delivering packets as a percentage at different ranges for each of the different parameters. For DSRC tests, congestion tests started at 20 vehicles and increased to 160 vehicles in ten increments. These congestion tests are tested to three max latencies of 10 ms, 50 ms and 100 ms. These latencies were chosen due to standard being within 100ms for message delivery. B. LTE V2X setup In fact LTE code settings were not as flexible as the parameters of the DSRC code and it also did not provide a user Friendly outing. The parameters tested were latency and congestion using the packet delivery success rate as a test metric. For output, he gave a list of all messages received by different ports using the LTE communication standard with the success rate of delivering packets attached to the message. For comparison, LTE and DSRC were tested in the same conditions using congestion and latency parameters. The congestion test was carried out at 200 m with the motorway scenario at the same three latencies. Though there are only three different tests, the data from these tests provide a good comparison with the highway part of the DSRC congestion tests. C. Simulation Parameters (See Table 2). Table 2. Simulation parameters Feature

DSRC

LTE-V2X

Carrier frequency

5 Ghz

5 Ghz

Time synchronization Ideal time synch Ideal time synch Range

Up to 200 m

Up to 200 m

Nb vehicles

20–160

20–160

Latencies

10,50,100 ms

10,50,100 ms

DSRC vs LTE V2X for Autonomous Vehicle Connectivity

391

D. Simulation Results This part presents the result of the LTE vs. DSRC congestion comparison at 10 ms, 20 ms, 50 ms latency based on the effect of the congestion on the packet delivery success rate.

Fig. 6. Congestion comparison at 10 ms latency LTE vs. DSRC

It can be seen from Fig. 6 that LTE performs better than DSRC at the lowest latency which is 10ms; this comparison is based on how the packet delivery success rate is affected by the congestion.

Fig. 7. Congestion comparison at 50 ms latency LTE vs. DSRC

We can notice that for a latency of 50ms we can see that the DSRC improves and approaches LTE, so the packet delivery success rate improves when the latency increases. Now we can see from Fig. 8 that for the maximum latency which is 100ms both have the same level of packet delivery rate.

392

K. Jellid and T. Mazri

Fig. 8. Congestion comparison at 100 ms latency LTE vs. DSRC

To conclude this part, we see that LE V2X is more efficient than DSRC on the other hand for the simulation for a latency set to the maximum we have the same result for both.

6 Discussion We notice From the comparative table that based on the channel Width criterion the LTE V2x can reach up to 20 MHz on the other hand,DSRC just 10 MHz so for Frequency Band we notice that there is an equity because both can reach up to 5.9 GHz, for Bit Rate the Lte V2X exceeds Dsrc so for the range the LTE V2x can reach up to 30 km/h see on other simulations we can test it up to 120 km/h and for the capacity the LTE V2X is more efficient than the DSRC the same thing for mobility support and market penetration the LTE V2X takes the higher part on DSRC. Based on the simulation and performance evaluation of DSRC and LTE V2X protocol, LTE testing is not as thorough as DSRC testing; a comparison can be made based on the effect of congestion on the package delivery success rate. LTE and DSRC data shows that LTE performs very well and better than DSRC for low maximum latency of 10 ms, however higher congestion levels decrease packet delivery success rate as long as maximum latency authorized increases, we can notice that the performances improve in Fig. 7 and 8, we see that the DSRC improves and approaches LTE until the latency is set maximum to 100 ms we see that both are the same level of package delivery success.

7 Conclusion and Future Works To conclude, Several studies and research have been carried out to compare the effectiveness of direct communication technologies between LTE-V2X PC5 and 802.11p from the point of view of the accident avoided and the reduction of fatal and serious injuries. The study shows that LTE-V2X achieves a high level of accident avoidance and injury reduction. It also indicates that LTE-V2X achieves a high percentage of successful packet delivery and communication range.

DSRC vs LTE V2X for Autonomous Vehicle Connectivity

393

From the simulation in our paper, we can notice that LTE V2X exceeds DSRC technology on several levels but there are also equalities that we observed from the simulation on the effect of congestion on the packet delivery success rate on each of the two especially when the maximum latency is set, So lTE V2x has a great capacity and performance, on the other hand the best solution to have an efficient performance of autonomous vehicles is the combination between the two technologies. In our next work we will propose a coexistence solution: a hybrid approach deploying both DSRC and LTE-V2X would combine the advantages of both technologies to generate more efficient solution promising for vehicular communication. For example, DSRC supports more robust security message delivery than LTE-V2X, while the data transmission rate is provided by LTE V2X. This solution is based on a selection algorithm that enables a heterogeneous LTE / DSRC solution, where LTE and / or DSRC are selected according to services. Each vehicle is assumed to be equipped with both LTE and DSRC interfaces, this proposed heterogeneous LTE / DSRC approach will be based on available radio access technologies and infrastructure to support future automated driving with high reliability and low latency requirements. The approach will make it possible to provide low latency for messages linked to transmission security by the DSRC, on the other hand high reliability for bandwidth-intensive services by LTE, and integrating these two radio access technologies taking into account the requirements, service and network performance in real time.

References 1. Nardini, G., Virdis, A., Campolo, C., Molinaro, A., Stea, G.: Cellular-V2X Communications for Platooning: Design and Evaluation. Sensors (2018) 2. Shen, X., Li, J., Chen, L., Chen, J., He, S.: Heterogeneous LTE/DSRC approach to support real-time vehicular communications. In: 2018 10th International Conference on Advanced Infocomm Technology (2018) 3. Shi, M., Lu, C., Zhang, Y., Yao, D.: DSRC and LTE-V communication performance evaluation and improvement based on typical V2X application at intersection. In: 2017 Chinese Automation Congress (CAC) (2017) 4. Molina-Masegosa, R., Gozalvez, J., Sepulcre, M.: Configuration of the C-V2X Mode 4 sidelink PC5 interface for vehicular communication. In: 2018 14th International Conference on Mobile Ad-Hoc and Sensor Networks (MSN) (2018) 5. Mir, Z.H. Filali, F.: TE and IEEE 802.11 p for vehicular networking: a performance evaluation. EURASIP J. Wirel. Commun. Netw. 1 (2014) 6. Harris, M.: How Cell Towers Work. UNISON, New York (2011) 7. Shi, M., Lu, C., Zhang, Y., Yao, D.: DSRC and LTE-V communication performance evaluation and improvement based on typical V2X application at intersection. In: 2017 Chinese Automation Congress (CAC) (2018) 8. Kenney, J.: Dedicated short-range communications (dsrc) standards in the United States. Proc. IEEE 99(7), 1162–1182 (2011). https://doi.org/10.1109/JPROC.2011.2132790 9. Gao, S., Lima, A., Bevly, D.: An empirical study of DSRC V2V performance in truck platooning. Scenarios, Digital Communications and Networks. Elsevier (2016) 10. Architecture of the dedicated short-range communications (DSRC) protocol. In: VTC ’98. 48th IEEE Vehicular Technology Conference. Pathway to Global Wireless Revolution (Cat. No.98CH36151) (2002)

394

K. Jellid and T. Mazri

11. Kenney, J.B.: Dedicated short-range communications (DSRC) standards in the United States. In: Proceedings of the IEEE, vol. 99, no. 7, pp. 1162–1182 (2011). https://doi.org/10.1109/ JPROC.2011.2132790 12. Code of Federal Regulations, Title 47, Part 90, Private Land Mobile Radio Services, U.S. FCC, CFR 47 Part 90. https://www.access.gpo.gov/nara/cfr/waisidx_08/47cfr90_08.html 13. Code of Federal Regulations, Title 47, Part 95, Personal Radio Services, U.S. FCC, CFR 47 Part 95. https://www.access.gpo.gov/nara/cfr/waisidx_08/47cfr95_08.html 14. Yin, J., Elbatt, T., Yeung, G., Ryu, B.: Performance evaluation of safety applications over DSRC vehicular ad hoc networks. In: Conference: Proceedings of the First International Workshop on Vehicular Ad Hoc Networks, 2004, Philadelphia, PA, USA (2004) 15. Hu, J., Chen, S., Zhao, L., Li, Y., Fang, J., Li, B., Shi, Y.: Link level performance comparison between LTE V2X and DSRC. J. Commun. Inf. Netw. (2017) 16. RP-152293, “Support for V2V services based on LTE sidelink,” LG Electronics, Huawei, CATT, RAN#70 (2015) 17. RP-161298, “LTE-based V2X services,” LG Electornics, Huawei, CATT, RAN#72 (2016) 18. Karoui, M., Freitas, A., Chalhoub, G.: Performance comparison between LTE-V2X and ITSG5 under realistic urban scenarios. In: 2020 IEEE 91st Vehicular Technology Conference (VTC2020-Spring) (2020) 19. Zhou, H., Xu, W., Chen, J., Wang, W.: Evolutionary V2X technologies toward the internet of vehicles: challenges and opportunities. In: Proceedings of the IEEE, vol. 108, no. 2 (2020) 20. Zhao, L., Fang, J., Hu, J., Li, Y., Lin, L., Shi, Y., Li,C.: The performance comparison of lte-V2X and IEEE 802.11p. In: 2018 IEEE 87th Vehicular Technology Conference (VTC Spring) (2018)

Dynamic on Demand Responsive Transport with Time-Dependent Customer Load Sonia Nasri1(B) , Hend Bouziri2,3 , and Wassila Aggoune-Mtalaa3 1

3

Higher Business School of Tunis, Manouba University, Tunis, Tunisia [email protected] 2 Higher School of Economic and Commercial Sciences Tunis, Tunis University, Tunis, Tunisia [email protected] Luxembourg Institute of Science and Technology, 4362 Esch/Alzette, Luxembourg [email protected]

Abstract. Ensuring Dial-A-Ride services in a pure dynamic framework is a challenging task. Indeed, the related works previously proposed where both dynamic requests within a time-dependent DARP environment are scarce. In this paper, we propose a new enhanced model for dynamic DARPs with network flows over time. Requests with timedependent customer loads are introduced. More rigorous travel time computations are achieved based on rate-dependent transit times which vary with the congestion degree. A new adaptive Tabu Search method is provided to cope with dynamic requirements. Preliminary results indicate the applicability of this proposed methodology in real-life on-demand transport problems. Promising results demonstrate the ability of the method to dynamically insert requests under a time-dependent environment. Keywords: Transport on demand · Dynamic Dial-A-Ride · Rate-dependent transit time · Time-dependent demand · Flows over time

1

Introduction

Transport on demand (TOD) was introduced in the form of Dial-A-Ride Problems (DARPs) [10] which are NP-hard [14]. The flexibility in such systems lies in their adaptive services providing door-to-door transportation including that of disabled and elderly people. Although this latter type of transport system concerns a limited social category, it has demonstrated multiple and excessive research concern [1,19,28]. Moreover, it has opened up other areas of application in various fields such as integrated transportation, health care transportation services, transport complementary to public transport, and private on-demand mobility. What differentiates this problem from others is the quality of service c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021  M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 395–409, 2021. https://doi.org/10.1007/978-3-030-66840-2_30

396

S. Nasri et al.

[26] offered to customers. In this concern, authors in [25] emphasized the synergy between travel costs and service quality in DARPs. They also stated that common service quality terms are focused on maximal ride time and time windows expressed in the form of constraints such as in the works of Wong et al. [36] and Chassaing et al. [8]. Then, operational effects of service level designs on operational costs were recently highlighted by Molenbrush et al. in [21]. In this regard, limited works [8,20,27] are addressing service level designs aligned to customer requirements in DARP models. One service quality design is to define a maximal ride time for each customer to be exploited for providing time windows restrictions. To our knowledge, more customized design efforts for service quality are found in [22]. Restricted maximal ride times are proposed for customers. These customer-dependent attributes are then used with other customer specifications such as charging time, desired arrival time, origin, destination, and user inconvenience (the travelling time from an origin to a destination) for designing rigorous and elementary bounds relating to time windows. In this paper, we are motivated by a real-life application of DARPs arising in a dynamic context. Two classes are addressed in the literature. The first one is related to the dynamic requests under a static environment [4] and the second one concerns a dynamic environment with known requests beforehand [15]. In our opinion, to enhance the flexibility and the adaptability of DARPs in real-life frameworks, both the two classes should be considered while satisfying high expected customers’ needs of nowadays. A little field of research [33,37] was dedicated to this issue. For instance, authors in [33] integrated a timedependent travel speed in solution methods for the dynamic dial-a-ride problem. A dynamic dial-a-ride problem under time-dependent environment is studied in [37]. However, in these previous works, models require more precision to compute travel times closer to reality. Such insight was provided as rate-dependent transit time in a model that is firstly proposed in [13]. This model is useful to give more precision on transit time computations where a dispatching vehicle must consider flows at each vehicle’s position on an arc relating to each period of time. Hall and Schilling [13] stated that all rate-dependent transit time models are NP-Hard problems. Our contribution is twofold: Firstly, we propose a new dynamic model for DARPs incorporating customer-oriented service quality to address expected transport on demand needs. In this new problem, we address a dynamic insertion of requests under a time-dependent environment where transit times are ratedependent. The originality of the model is that aside from the gradual insertion of the requests over time, we propose a time-dependent customer load where the number of passengers related to a request may fluctuate over time. Secondly, to solve this new problem, we propose a new Dynamic Tabu Search method (DTS). An insertion heuristic for the dynamic requests is provided. A new neighbourhood strategy is suggested seeking for improving solutions in such a dynamic framework. Therefore, two scenarios static and dynamic are simulated to show the applicability of the model as well as the impact of the dynamic events on the problem. Experiments are investigated based on real-life transportation problems reported in reference [7].

DDRT

397

The remainder of this paper is organized as follows. In Sect. 2, a brief literature review is proposed. Next, the problem is modelled in Sect. 3. In Sect. 4, we describe the proposed resolution method for the problem. In Sect. 5, we present our numerical experiments followed by a discussion and concluding remarks in Sect. 6.

2

Related Work

In general, two critical factors are impacting the vehicles routing and the requests’ schedule in dynamic DARPs: dynamic requests and travel time variation in a dynamic environment. These two factors are considered as the cases concerning the majority of works. For instance, the dynamic requests under a static environment are treated in [4] and a dynamic environment with requests known in advance is the subject of the following work [15]. In [4], a part of requests is known beforehand and the other requests are scheduled overtime. The degree of dynamism is defined by Lund et al. [18] as the proportion of the number of dynamic requests relating to the total number of requests in the system. A purely static system has a degree of dynamism of 0 and a purely dynamic system has a degree of dynamism of 1. In this regard, several degrees of dynamism were provided in various works with DARPs [12,32,36]. Operational techniques acting on vehicle fleet size and requests rejection are proposed in [36] to cope with the variation in the degree of dynamism. Authors of [12] introduced a greedy insertion heuristic measuring the impact of each request’s insertion on the others not yet inserted. This tool is useful for dynamic insertion of requests in DARPs, but a lack of robustness may exist with a higher number of inserted requests. A double dynamic fast algorithm was proposed in [5] for coping with dynamic requests under constant travel costs. The algorithm aimed at firstly checking the requests’ insertion in an existing schedule, and secondly at trying to optimize the solutions over the search. In [15], the dynamic environment may be simulated through real-time traffic conditions such as congestion, accidents, and weather. These events may be produced at each period of time simulating time-dependent travel times. For instance, a DARP defined in [16] is based on known demands and time-dependent travel times which are translated as average travel speeds. Furthermore, in routing problems, there are three models of travel time dependent on traffic: inflow-dependent transit times, load-dependent transit times, and rate-dependent transit times. The inflow-dependent and loaddependent models are introduced in [17]. In the inflow-dependent model, the travel time depends on the current rate of congestion on roads. In the loaddependent model, the travel time depends on both current and new incoming traffic on roads. These two previous models are enhanced in a model of [13] named rate-dependent transit time model. In the latter model, the travel time depends on the traffic at each vehicle’s position on the arc at each period of time. The only model which considers this rigorous travel time model is that of [23]. The authors proposed a static insertion of requests through a DARP

398

S. Nasri et al.

under a time-dependent environment with a rate-dependent transit time. There are only a few works where both the requests and the environment are dynamic as it increases the complexity of the problem. In the dynamic DARP of Xiang et al. [37], stochastic events are suggested to simulate a time-dependent network. The travel time is related to a period of time in a day. Various scenarios with different degrees of dynamic requests are investigated indicating the capability of a proposed scheduling heuristic to cope with events. In [6], the dynamic requests insertion is managed in a discrete events environment in which events may produce vehicles’ stops and delays caused by traffic congestion. A time-dependent DARP is proposed in [33] with dynamic requests. The authors highlighted the relevance of the time-dependent travel speed which influences solutions. This paper is following the direction of the preceding works addressing the dynamic DARPs. It includes the two main factors combined together namely the dynamic insertion of the requests and the variation of the travel time within a time-dependent environment.

3 3.1

The Problem Description and Formulation Problem Description

The dynamic DARP with time-dependent customer load is defined on a graph with a set of nodes and a set of arcs having symmetrical distances. This transportation network is treated here in an ongoing manner since transport requests appear over time. A request concerns uniquely a pickup node and a delivery node. The number of requests is assumed to equal half of the nodes other than the depot. This latter is the point of departure and arrival of the vehicle tours. No demand is assigned in the depot. For this dynamic DARP, a predefined number of homogeneous vehicles is available to service all the requests. The number of passengers related to a request may exceed one. This number is the same on the pickup node and the delivery one. The assumptions on the dynamic requests are summarized as follows: – The number of requests is predefined and these requests must be fully satisfied over time. – All requests are dynamic so no request is known beforehand by the system. – The number of passengers in a request may vary over time. This fluctuation is possible from its appearance on the pickup node until the arrival of the vehicle on the same node. – The minimum number of passengers in a request is always assumed to be one. – Requests cannot be rejected but reinserted in another period of time. Moreover, dynamic evolution of the information is assumed on the set of arcs connecting the nodes. Dynamic assumptions are set on the environment. At the initial time of the horizon, arcs are supposed empty. Traffic congestion is supposed on arcs at each period of time. The time-dependent congestion fluctuates according to each vehicle’s position on an arc. The quality of service is defined

DDRT

399

from a customer-oriented point of view. Indeed, each node has time windows. Moreover, for these bounds, we consider customer-oriented constraints set as it is in [22,24]. These constraints are defined based on customers’ expectations and needs. Thus, time windows are separately redefined on origins and destinations. In these bounds, initial time windows which are suggested by the transportation system are redefined for each customer either on the pickup or the delivery service. This redefinition is based on a higher level of customers’ specifications such as the desired arrival time, the time to charge or discharge loads (the passengers), the customer related maximal ride time, and the travel time between the origin and the destination. Provided bounds are used for limiting beginning of services on nodes. The only case which produces a waiting time is when a vehicle arrives before the start of the service. 3.2

Problem Formulation

The dynamic DARP with time-dependent customer load is defined on a symmetric graph G = (N, A). Let N be the set of nodes where N = {0, 1..n, n+1, ..2∗n}. The set of pick up nodes is P = {1..n} and that of delivery nodes is denoted D = {n + 1..2 ∗ n}. The depot is represented by a node (i = 0). Each vehicle’s tour starts from the depot and returns to it. A request corresponds to a set of passengers to be transported from an origin i to a destination (i + n). The set of the problem’s parameters is described in Table 1. Table 1. The parameters of the problem Parameters Definition n

Total number of requests

m

Total number of vehicles

v

Vehicle number where v ∈ {1 . . . m}

Nv

Set of nodes traversed by a vehicle v

Av

Set of arcs traversed by a vehicle v

xθi,j

Load of an arc (i,j) at a period of time θ ∈ T

θ Ci,j

Capacity of an arc (i,j) at a period θ ∈ T

np

Total number of positions on arcs

p

Position on an arc (i,j)

λi,j (θ, p)

Transit time on an arc (i,j)

α, β

Two constant parameters

xθi

Load to pickup or to deliver on node i at time θ ∈ T

Ci

Maximal number of passengers at node i ∈ N

C

Maximal vehicle’s capacity



Number of incoming requests at θ ∈ T

γθ

Requests’ acceptance rate relating to a period of time θ ∈ T

δ

Penalty term in the objective function

Wiv

Waiting time by a vehicle v at node i ∈ N

Bi

Beginning of service at node i

Ariv

Arrival time of vehicle v at node i

400

S. Nasri et al.

Given a time horizon T , each arc (i,j) ∈ A may witness an amount of traffic θ which is xθi,j ∀ θ ∈ T . Besides, each arc has a practical (residual) capacity Ci,j time-dependent. The set of nodes undergoes also changes over time. Each node i ∈ N has a time-dependent load xθi which is equal to its current number of passengers linked with the same request of transportation at θ ∈ T . This customer load may change over time while respecting a maximal bound. A solution (S) in the search space is a set of vehicles’ tours. A vehicle tour corresponds to a set of satisfied requests from the depot to it. The objective function is defined in (1). It is the sum of all the transit times on the arcs traversed by the vehicles and all the penalized waiting times on the visited nodes. M in f (S) =

v=m 



p=np 

λi,j (θ, p) +

v=1 (i,j)∈Av p=0

v=m 



δWiv

∀θ ∈ T

(1)

v=1 i∈N v

The objective of the problem consists in minimizing the travel costs according to the vehicles’ tours. The transit time on an arc is equal to the sum of all the fractional transit times related to the np positions. Each position has a particular congestion at a period of time θ ∈ T . A penalty term δ is added in case of waiting times and expressed in Eq. (2). Wiv = Bi − Ariv

∀i ∈ N v ,

∀v ∈ {1 . . . m}

(2)

A waiting time is produced in the case when a vehicles’ arrival time Ariv is planned after the beginning of the service Bi . The arrival time Ariv at node i ∈ N v , is equal to the total transit times from the depot (i = 0) to it. No waiting time is maintained at the depot. Thus, we have (W0v = 0) at the beginning of the tour and its end. New constraints are introduced to cope with the requirements of the dynamic requests’ insertion under time-dependent loads (see Eqs. (3) to (6)). Other constraints are commonly used as in [11,22,23] as customer-oriented constraints as well as standard DARP constraints such as precedence between pick ups and deliveries, uniqueness of visiting nodes, and maximal tour duration. These constraints are not detailed here, but they are considered in the implementation of the resolution method. For the sake of clarity in the explanations, we focus here on rate-dependent transit time constraints [23] (see Eqs. (7) to (9)) coping with dynamic network flows on arcs. Equation (3) expresses the dynamic incoming of the requests over time, whereas Eq. (4) constraints the acceptance rate of the requests. θ=T 

Rθ ≤ n

(3)

θ=0

Rθ ≤ γθ n

∀θ ∈ T

(4)

Equation (3) explains that the total number of incoming requests in the time horizon T must not exceed the total number of requests in the transportation

DDRT

401

system. Besides, the dynamic requests’ allocation is scheduled over time using a parameter γθ as described in (4). It contributes in calibrating the maximal number of incoming requests at each period of time θ ∈ T . This parameter is updated taking into account the remaining requests waiting for allocation and other parameters such as vehicles’ availabilities and capacities. This operation is useful for avoiding the requests rejection by the system through a dynamic acceptance rate management (see the heuristic in Sect. 4). To simulate a more flexible passengers demand, a time-dependent customer load is assumed as a fluctuating load over time. This interval of time starts from a period of time θ when the request appears at node i and finishes at θ” which corresponds to the period of the vehicle’s arrival at the same node. Thus, a demand at node i must not exceed its maximal capacity. Equations (5) define the time-dependent customer load at pickup nodes. 1 ≤ xθi ≤ Ci

∀θ ∈ {θ ..θ”} \ θ , θ” ∈ T,

∀i ∈ P

(5)

Equation (5) explains that a request’s load may exceed one passenger linked with a request at node i ∈ P . However, this load must not exceed the maximal demand Ci . It is considered as negative at delivery nodes in D. Besides, the control on vehicles’ capacities over time is expressed by (6).  xθi ≤ C ∀θ ∈ T, ∀v ∈ {1 . . . m} (6) i∈N v

In (6), we explain that the cumulative load relating to the visited nodes by a vehicle v must respect its maximal capacity. Besides, this constraint considers both pickup and delivery nodes by assigning positive values and negative values respectively. Traffic congestion is controlled through constraint (7) and (8) considering the control of arcs capacities over time. xθi,j ≤ Ci,j

∀θ ∈ T,

θ Ci,j = Ci,j − xθi,j

∀(i, j) ∈ A

∀θ ∈ T,

∀(i, j) ∈ A

(7) (8)

Equation (7) explains that an arc flow xθi,j must not exceed the maximal capacity Ci,j of an arc (i,j) at each period of time. The practical capacity of an arc at time θ implies its maximum possible flow at θ and it is derived by (8). Practical capacities of arcs are computed over time providing residual arcs capacities. Moreover, to compute a more accurate travel time in a time-dependent transportation network, a rate-dependent transit time is included in the problem as in [23]. A rate indicates the congestion’s degree that is expressed by (x(i, j)/C(ij)) at each period of time θ ∈ T . This rate impacts the transit time λi,j relating to the position of a vehicle p ∈ {0 . . . np } and a period of time θ ∈ T . The total number of positions np is supposed common for all the arcs. The rate-dependent transit time in a position on an arc is expressed by Eq. (9). xi,j (θ) β ) ) Ci,j (θ) ∀p ∈ {0 . . . np }, ∀θ ∈ T

λi,j (θ, p) = λi,j (0, p) ∗ (1 + α ∗ ( ∀(i, j) ∈ A,

(9)

402

S. Nasri et al.

In Eq. (9), an initial transit time λi,j (0, p) is supposed in the case of a freeflow arc (empty) at time (θ = 0) in a given position p ∈ {0..np }. Next, this transit time is changed at each period of time considering the congestion degree on the arc. This fluctuation is based on two traffic networks parameters α and β which are constants defined and argued in [35]. These parameters involve the level of transit time fluctuation relating to the case when the arc is empty.

4

The Dynamic Tabu Search Method

As exact methods have a limited efficiency on complex optimization problems [2,3,34], we propose here an efficient heuristic to cope with the complexity of the dynamic constraints [29–31]. The robustness of the Tabu Search has been proved while solving various real-life transport on-demand problems such as in the static case [11,22]. In [22], a simple insertion heuristic is proposed with a complexity Θ(n, m). This heuristic consists in following the nearest successors while allocating requests to the dispatched vehicles. The operations of delivery are operated respecting a sorted list of requests related to minimal riding times. First, pick ups and last, deliveries are executed according to the nearest distances to the depot. Besides, a neighbourhood strategy was proposed improving the search for better transport on-demand plans in a static framework. It aimed to optimize the vehicles routing schedules while transporting passengers from their origins to their destinations. The optimization consisted in moving an allocated random request to another vehicle path. Given the efficiency of the Tabu Search in providing good results to the DARP, we are motivated by investigating it to solve our problem while adding new settings to cope with the dynamic requirements. Therefore, we propose a new Dynamic Tabu Search (DTS) which is based on a set of components including a tabu list and a diversification mechanism. In the tabu list, we save the best solutions encountered during the search. These solutions are overlooked for a predetermined number of iterations which corresponds to the tabu list length. This latter has been used to diversify the search. It is reduced for a random time duration allowing new promising solutions to appear during the search. 4.1

The Insertion Heuristic for the Dynamic Requests

A heuristic for the dynamic insertion of the requests is proposed to construct the initial feasible routing plan. The dynamic nature of the problem requires a periodical checking of the constraints. The main steps of the dynamic insertion of the requests are presented in Fig. 1. At the beginning of the process (θ = 0), an acceptance rate for the requests is supposed as satisfying the condition (0 60 km/h) 50–70 s Major arterial (≤60 km/h)

40–60 s

Minor arterial

30–50 s

Collector & distributor, local

20–40 s

As stated previously, the green interval is set between the minimum and maximum green parameters defined above. However, the main aim of this paper is to suggest the optimal value for the green phase based on different time intervals of the day and demand patterns (i.e. vehicle densities), in order to adjust the traffic signal timings. Given the cycle length and the average vehicular density, the optimum value for the green phase can be calculated using the following equation [21]: Gopt =

d×C +1 1200 × n

(1)

d: average vehicular density C: cycle length N: number of lanes 1200: average saturation flow; i.e. the average number of vehicles passing in a dense flow of traffic Using formula (1), Table 7 lists the calculations done by taking into account all of the parameters mentioned previously, including the cycle lengths set for different time intervals, and the minimum and maximum green intervals: Table 7. Optimal green duration based on cycle length and vehicular density (for a single-lane, high-capacity freeway) Density (vehicles/hour/lane)

Cycle length (in seconds) 60

70

80

90

100

Optimal green, Gopt (in seconds) 10%

10

10

10

10

10

20%

11

13

14

16

18

30%

16

19

21

24

26

40%

21

24

28

31

34

50%

26

30

34

39

43

60%

31

36

41

46

51

70%

36

42

48

54

59

80%

41

48

54

61

68

90%

45

54

61

69

70

420

D. Bujari and E. Aribas

The main objective of this contribution is to optimize the traffic signal timings, and as a result, minimize the traffic congestions. In addition to the suggestions given so far, a simulation was done on a 3 × 3 intersection, as illustrated in Fig. 5. The cycle length for the following case is set to be 80 s, as suggested during rush hours. Given the vehicular densities of the roads, the optimal green durations for each single intersection are defined based on the values from Table 7 and some heuristic approach. For instance, in the top-left intersection, the left part has a 50-s green phase and a 30-s red phase; conversely, the top part has a 30-s green phase and a 50-s red phase. The values indicated in Fig. 5 are optimal for a coordinated-actuated smart traffic control system based on the cycle length and demand patterns. Surely, the suggestions made in this part will drastically improve the operation of the traffic lights, minimize the rate of traffic congestions and maximize the driver expectancy. This can be proven also using a particular intersection from the case study above; for example, if traffic lights

Fig. 5. The set-up for a 3 × 3 intersection, given the vehicular densities of the roads and the optimal green durations for each single intersection

Encryption Issues in Traffic Control Systems in Smart Cities

421

were operating in the chaotic mode, the top-middle intersection would have a 40-s green and a 40-s red phase for both of the intersecting roads. Assuming that the flow rate is 1 vehicle/second, the total number of vehicles exiting the intersection would be 70 at the end of the cycle. However, as a result of the optimization, the total number of vehicles passing through the intersection is 80.

6 Conclusion The current situation of traffic control systems’ infrastructure, especially in many developed and modern cities, is concerning and should increase the public consciousness about the security issues in this field. As it is seen in the studies above, although DES has high performance capabilities, it should not be utilized in the encryption procedure of signaling in traffic control systems positioned in critical street networks since it can endanger the traffic safety if hacked. Nevertheless, in local roads, where any manipulation would not cause a lot of chaos, it can be used alternatively. On the other hand, it is possible to claim that AES and especially RSA, should be optimized in time constraints in order to serve faster, for preventing congestions, and in the most secure approach, for averting accidents in traffic. However, since the encryption and decryption times calculated above are not heavy loads in the operation of traffic lights, they can be minimized, or even eliminated, with the help of different techniques, like pipelining. In this way, traffic control systems will continue to function normally, without any extra burden coming from the encryption and decryption procedures. In the last part of this paper, it was discussed about the enhancements in traffic flowcontrol strategies, in order to solve or at least minimize the problem of congestions and its effects. As discussed previously, in smart traffic control systems, signal light timings are adjusted based on the demand patterns registered by the detector. Given the cycle length and the average vehicular densities of the roads, the optimal green durations were calculated, as listed in Table 7. The suggestions made in this part will surely improve the traffic signal timings, especially on high-capacity freeways’ intersections with heavy traffic. In this paper, the advanced time-plan suggested to optimize the traffic signal timings considers only vehicular timing parameters, like minimum and maximum vehicular green intervals. Our next research will extend over the inclusion of pedestrian timing parameters, such as pedestrian walk interval, pedestrian clearance, etc., which will enhance furthermore the optimization of smart traffic control systems.

References 1. Anderson, R.: Security Engineering, 2nd edn. Wiley India Private Ltd., New Delhi (2008) 2. Bujari, D., Aribas, E.: Comparative analysis of cryptographic algorithms and smart traffic control systems. In: International Conference on Engineering and Research & Applications, Istanbul (2017) 3. Singh, S.: The Code Book, 1st edn. Doubleday, New York (1999) 4. Schneier, B.: Applied Cryptography, 2nd edn. Wiley, Hoboken (1996)

422

D. Bujari and E. Aribas

5. Shannon, C.E., Weaver, W.: The Mathematical Theory of Communication. UI Press, Champaign (2015) 6. Sanayha, W.: Hardware Implementation of the Data Encryption Standard. Department of Telecommunications Engineering, King Mongkut’s Institute of Technology (2002) 7. Ratnadewi, B., Adhie, R.P., Hutama, Y., Ahmar, A.S., Setiawan, M.I.: Implementation cryptography data encryption standard (DES) and triple data encryption standard (3DES) method in communication system based near field communication (NFC). J. Phys. Conf. Ser. 954, 012009 (2018) 8. Bilham, E.: A fast new DES implementation in software. In: Biham, E. (eds.) Fast Software Encryption. FSE 1997. Lecture Notes in Computer Science, vol. 1267. Springer, Heidelberg (1997) 9. Park, S.J.: Analysis of AES Hardware Implementations. Department of Electrical and Computer Engineering, Oregon State University (2003) 10. Bertoni, G., Breveglieri, L., Fragneto, P., Macchetti, M., Marchesin, S.: Efficient software implementation of AES on 32-bit platforms. In: Kaliski, B.S., Koç, K., Paar, C. (eds.) Cryptographic Hardware and Embedded Systems - CHES 2002. CHES 2002. Lecture Notes in Computer Science, vol 2523. Springer, Heidelberg (2003) 11. Chow, S., Eisen, P., Johnson, H., Van Oorschot, P.C.: White-box cryptography and an AES implementation. In: Nyberg, K., Heys, H. (eds.) Selected Areas in Cryptography. SAC 2002. Lecture Notes in Computer Science, vol 2595. Springer, Heidelberg (2003) 12. Paar, Ch., Pelzl, J.: Understanding Cryptography, 1st edn. Springer, Berlin (2009) 13. Giraud, C.: An RSA implementation resistant to fault attacks and to simple power analysis. IEEE Trans. Comput. 55(9), 1116–1120 (2006) 14. Shand, M., Vuillemin, J.: Fast implementations of RSA cryptography. In: Proceedings of IEEE 11th Symposium on Computer Arithmetic, Ontario, Canada, pp. 252–259 (1993) 15. Kahn, D.: The Codebreakers, 1st edn. Scribner, New York (1996) 16. The Electronic Frontier Foundation’s DES Cracker Machine. https://web.archive.org/web/ 20170507231657/w2.eff.org/Privacy/Crypto/Crypto_misc/DESCracker/HTML/19980716_ eff_des_faq.html#howsitwork. Accessed 04 Aug 2020 17. Skiena, S.S.: The Algorithm Design Manual, 2nd edn. Springer, London (2008) 18. How Do Traffic Signals Work? https://www.traffic-signal-design.com/how_do_traffic_sign als_work.htm. Accessed 04 Aug 2020 19. UTC/SCOOT and Pedestrian Pushbuttons. https://www.greensignals.co.uk/news/utcscootand-pedestrian-pushbuttons. Accessed 04 Aug 2020 20. Traffic Manual: Idaho Supplementary Guidance to the MUTCD. https://apps.itd.idaho.gov/ apps/manuals/Traffic_Manual.pdf. Accessed 04 Aug 2020 21. Traffic Signal Timing Manual. https://www.signaltiming.com/The_Signal_Timing_Manual_ 08082008.pdf. Accessed 04 Aug 2020

Evolutionary Heuristic for Avoiding Traffic Jams in Road Network Using A* Search Algorithm Safa Belhaous(B) , Soumia Chokri , Sohaib Baroud , Khalid Bentaleb, and Mohammed Mestari SSDIA Laboratory, ENSET, Hassan II University, Mohammedia, Morocco [email protected], [email protected]

Abstract. Nowadays, population and urban growth become a serious issue in the world, and traffic jams or traffic congestion is a part of the problem that citizens experience daily especially those who live in a big city. The paper aims to propose a new approach based on the A* search algorithm for avoiding traffic jams. The main idea of this approach is to check in each roundabout the state of the next direction, and if there is a collision the heuristic will be the big one to force the system to perform a parallel search for another path using A* algorithm. The proposed path will be the optimal one. Real-time path change is one of the great challenges, which minimizes road congestion.

Keywords: A* search algorithm jam · Smart city · Parallelism

1

· Evolutionary heuristic · Traffic

Introduction

From now on, the majority of countries around the world will recognize a significant increase of their population residing in urban areas. By 2030, it is estimated [2] that the world’s population will reach about 5 billion. This urbanization will lead crucial social, economic and environmental transformations. To accompany this demographic growth, the use of information and communication technology (ICT) will become a necessity in this regard, improving the lives of citizens through improving services’ efficiency in the majority of country’s sectors such as the implementation of intelligent transport systems, intelligent management of administrative services and intelligent management of energy. The integration of ICT, and different physical devices connected to the Internet of Things (IoT) network to improve city operations and services [3], resulting in the so-called smart city. There are several definitions for smart cities, going from those that focus solely on the infrastructure to those that focus more on made reactions of citizens and communities smarter [1]. This paper presents the definition provided by the International Telecommunication Union (ITU) and the United Nations c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021  M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 423–434, 2021. https://doi.org/10.1007/978-3-030-66840-2_32

424

S. Belhaous et al.

Economic Commission for Europe (UNECE) in 2015: “A smart sustainable city is an innovative city that uses ICTs to improve quality of life, the efficiency of urban operations and services and competitiveness, while ensuring that it meets the needs of present and future generations with respect to economic, social, environmental and cultural aspects.” Due to intelligent technologies used in smart city, traffic jams and the accident rate will be effectively reduced; therefore, environment and citizens’ quality of life will improve significantly. Basic idea of this article aims to avoid traffic jams through the change of the road for drivers especially in case of congestion. Finding the optimal path from a start node to s goal node is calculated in parallel using the A* search algorithm. For achieve parallel programming, many threads are used in depends of the number of neighbors of start node to find the path to the goal node. Each thread returns a path, the optimal one corresponds to the path that found with a minimal cost. The organization of this paper is as follows: Sect. 2 is devoted to explaining the main concepts used in this article. Section 3 is focused to an examination of the state of the art concerning the path optimization and traffic management. Section 4 is presented the parallel implementation of A* algorithm using an evolutionary heuristic to find the optimal path. The result of the real execution is showed in Sect. 5. Finally, the last section is been for the conclusion.

2

Background

This section presents the most interesting concepts in this work which are A* search algorithm and parallel programming by giving more detail about them. 2.1

A* Search Algorithm

In general, a pathfinding algorithm requires two inputs, a source state, and a destination state, and only one output the short path from the source state to reach the goal. An error is produced when the algorithm is unable to find the destination [10]. Among the pathfinding that exists, we find the A* algorithm which was described in 1968 by Hart, Nilsson, and Raphael. The A* algorithm is a well-known pathfinding algorithm based on a heuristic [11] function and an evaluation function f(n) to select the next node that will be expanded. For a node n, its f(n) is computed as follows: f (n) = g(n) + h(n)

(1)

where g(n) is the cost to achieve the node n from the source state, and h(n) is the estimated cost of reaching the destination node from the node n. In this paper, we applied the A* algorithm (see Algorithm 1) on a grid which is our research environment to find the best path between two cells as shown in Fig. 1. The blue square corresponds to the start node, and the yellow square is the goal node. The white squares are accessible cells, and the black squares represent some obstacles.

Evolutionary Heuristic for Avoiding Traffic Jams

425

Fig. 1. Grid with obstacles.

The search process presented in Algorithm 1 needs two lists: open and closed [12]; For the open list, all the nodes were sorted due to their f(n) values and the first one with the lowest f(n) will be expanded. While the closed list contains the set of nodes already expanded [13]. In general, the efficient search process guarantees that its heuristic [21] function should be admissible, i.e., h(n) is never greater than the actual cost to the destination node. If the search graph [22] is not a tree, a stronger condition called consistency. A heuristic is consistent if for each node n, each of its successors n’ generated by an action checks this equation: h(n) ≤ c(n, n ) + h(n )

(2)

where c(n, n’) represents the cost from n to m, guarantees that once a node is extracted from the open list, the path that it follows is optimal [14]. The heuristic h(n) [15] can be estimated in different ways, some of these ways are: Manhattan Distance. It allows a move in four directions (North, South, East and West). The following equation is used to calculate the Manhattan distance: h(n) = C ∗ (|node.x − goal.x| + |node.y − goal.y|)

(3)

Where C represents the cost of moving one node to one of its neighbors. In the simple case, we can set C to be 1. The advantage of the Manhattan heuristic is that it runs faster than the other distance measures, and the main disadvantage is that it is used to find the shortest path to the goal; however, an optimal solution is not a certainty with this approach. Figure 2 shows three paths, the blue one corresponds to the Manhattan heuristic.

426

S. Belhaous et al.

Algorithm 1: Pseudo code of A* search algorithm. 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23:

OPEN is the set of nodes to be evaluated CLOSED is the set of nodes already evaluated Add the start node to OPEN while the OPEN list is not empty && goal not found do node current = node in OPEN with the lowest f cost() remove current from OPEN add current to CLOSED if current is the goal node then return path end if for each neighbor of the current node do if neighbor is obstacle OR neighbor is in CLOSED then skip to the next neighbor end if if new path to neighbor is shorter OR neighbor is not in OPEN then set f cost of neighbor set parent of neighbor to current if neighbor is not in OPEN then add neighbor to OPEN end if end if end for end while

Diagonal Distance. In this method, computation speed is slower than that in the Manhattan method. The equation used to calculate the Diagonal distance is as follows: h(n) = C ∗ M ax(|node.x − goal.x| , |node.y − goal.y|)

(4)

The green path in Fig. 2 corresponds to the output path of the A* algorithm using Diagonal distance. Euclidean Distance. The Euclidean heuristic is admissible, however, can’t estimate the real cost by a significant sum. It is also costly to apply contrasted with the Manhattan heuristic, as it additionally involves two multiplication operations and calculating the square root [16]. This distance is showed in Fig. 2 through the red path. The equation used to calculate the Euclidean distance is as follows:  h(n) = C ∗ (node.x − goal.x)2 + (node.y − goal.y)2 (5) In this paper, the euclidean distance was implemented as a function to compute the heuristic cost of each node of the grid, with the proviso that there is no problem preventing traffic. Otherwise, the value of h(n) receives the square value of the largest h calculated in order to eliminate that path and require the system to find another.

Evolutionary Heuristic for Avoiding Traffic Jams

427

Fig. 2. A* algorithm with different heuristics.

In this paper, the heuristic used is as follows:  Euclidean distance, if road is empty h(n) = H 2, if road is busy

(6)

Where H is the largest h calculated. 2.2

Parallel Programming

Parallel hardware has been ubiquitous for some time. It’s difficult to find a desktop or server that doesn’t use a multicore processor [20]. There are two main types of parallel systems namely shared-memory systems and distributedmemory systems. We’ll be focusing on the first one, therefore the cores can share access to the computer’s memory; in principle, each core can read and write each memory location. Thread or Pthreads (for POSIX thread) and OpenMP are both APIs for shared-memory programming, they have numerous differences. Pthreads necessitates that the software engineer explicitly indicates the main job of each thread. Contrariwise, OpenMP in some cases permits the software engineer to identify which block of the program must be executed in parallel, and the exact determination of the thread which can be able to execute them. This recommends a further distinction among OpenMP and Pthreads, that will be, that Pthreads (like MPI for distributed-memory systems) is a library of functions that can be connected to a C compiler. OpenMP, on the other hand, requires compiler support for certain operations, and consequently it’s altogether conceivable that you may keep running over a C compiler that can’t compile OpenMP programs into parallel programs [17].

428

3

S. Belhaous et al.

Related Work

Many approaches and systems have been proposed in previously years to improve road traffic. The authors in [8] proposed a multiobjective path optimization (MOPO) model to perform a more exact simulation of the decision-making behavior of driver way choice. Three single-objective path optimization (SOPO) models were considered to set up the MOPO model. They identified with cumulative distance (shortest-distance path), passed intersections (least node path, LNP) and the number of turns (minimum-turn path, MTP). For solving the proposed MOPO problem, a two-phase method that consolidates a path genetic algorithm (PGA) and weight-sum method was programmed. To exhibit the benefits of the MOPO model in helping drivers in path determination, different experimental studies Performed using two real road networks with several road types and numbers of nodes and connections. The empirical results showed that the optimal paths of the MOPO and SOPO problems can be effectively identified by the PGA in simply an issue of seconds, regardless of the way that these issues are profoundly intricate and hard to solve manually. Kponyo et al. [9] demonstrated that ant colony optimization can effectively ameliorate the traffic situation in an metropolitan environment. A Dynamic Travel Path Optimization Framework (DTPOS) focused on Ant Colony Optimization (ACO) is proposed for the forecast of the best path to a given destination. The proposed method is displayed in NetLogo. The simulation results exhibited that the DTPOS model can largely decrease the normal travel time of vehicles in metropolitan cases and improves the mean travel time by 47% when compared to similar models where the vehicles select their way without ACO. Chen [5] surveyed some solutions and methods for cooperative intersections focusing on the non-signalized intersections. This paper discussed in detail the cooperative methods, trajectory planning, and virtual traffic lights. But for vehicle collision warning and avoidance techniques, are discussed to deal with uncertainties. Concerning vulnerable road users, pedestrian collision avoidance methods are discussed. A further discussion of the gave works is given features of future exploration themes. This work serves as an exhaustive study of the field, targeting at stimulating new techniques and accelerating the progression of automated and cooperative intersections. Meghana et al. [6] proposed a Comprehensive Traffic Management System (CTMS) focusing on Radio Frequency Identification (RFID) and analytics for real-time implementation. The system is both practical and simple to implement. The results are dynamic traffic signal timers that operate based on car density, deviation of cars at precedent intersections in case of blockage, traffic signal control for passage of emergency cars, detecting red signal violations, detecting road accidents, vehicle breakdowns for giving quick help and following of cars. In another review [7], the authors presented an auto-optimized heuristic search algorithm named EHA* which had the ability to plan and optimizes a Multi-Weighted-Heuristic function. EHA* conquered the trouble to plan high complexity heuristic functions. The exploratory results showed the capability of

Evolutionary Heuristic for Avoiding Traffic Jams

429

EHA* to optimize a complex heuristic function within a sensible measure of time. EHA* is tried against precedent works in various benchmarks to demonstrate that EHA* adjusted the solution optimality, the memory space occupation, and the optimization time. The simplicity and effectiveness of EHA* showed the possibility to tackle complex arranging problems.

4

Parallel Implementation of A* Algorithm

This paper achieved the implementation of A* search algorithm using Python language (see Fig. 3), the start node is expanded in order to recover all its neighbors. Then, each neighbor is assigned to a thread. Generally, all the neighbors will be the start nodes. In other words, every thread calls the sequential version of A* using a novel start node. On the other hand, the destination node will remain the same for all threads. Finally, we classify all threads created according to their f(n) score. As a result, the best path score will be chosen as the final path.

Fig. 3. Pseudo-code of parallel A* algorithm in Python.

5

Results

This section presents many experiments to evaluate the parallel implementation of A* algorithm, presented in the previous section.

430

S. Belhaous et al.

The experiments environment comprises a single physical processor(CPU) Intel(R) i5-6300U with 2 cores at a speed of 2.40 GHz, which means it’s a dualcore processor. So we have a total of 4 logical processors. The memory (RAM) used is about 8 GB. The operating system installed is Windows 10 operating system. The experimental runs focused on four problem sizes. We started by a grid of 2000*2000 up to 6000*6000. We ran each version ten times for every problem size; then we took the average value of the time generated during the search process. For the parallel version, the number of threads created was limited to eight according to the presence of many obstacles in the grid. Table 1. Time execution using Python Grid size

Sequential A* (s) Parallel A* (s) Speedup

2000*2000 1,930

0,505

3.821

3000*3000 7.488

2.038

3.674

4000*4000 9.619

2.795

3.441

5000*5000 19.227

9.283

2.071

6000*6000 26.572

13.186

2.015

Table 1 displays execution time in seconds to find the optimal path in different grid sizes using Python. The environment used to develop our programs is Anaconda, because it is a well-known standard platform. As the problem size increases, automatically the execution time increases. We noted that in all experiments, the parallel version was the fastest. In order to evaluate the proposed program, the speedup is so necessary for that reason. We calculate the speedup metric as follows: speedup =

S P

(7)

The speedup is defined as the ratio of the execution time achieved when using a sequential A * algorithm (S) to the execution time that is required through the parallel A* (P), as shown in (7). Usually, the execution time is the sum of time required by an algorithm to solve a problem. The sequential A* algorithms execution time is the time expended from the beginning of the search process until an answer is found. This metric is defined in parallel A* algorithm as the total of time required by the search process since the start of the principal phase until the second phase of the algorithm is finished. Therefore, the execution time incorporates the time that is required for creating and finishing concurrent threads in addition to the time of serializing the algorithm [16]. The speedup presents an evaluation of the performance improvement of the parallel A* algorithm over the sequential A* algorithm [18].

Evolutionary Heuristic for Avoiding Traffic Jams

431

Fig. 4. Relationship between change in speedup and number of threads.

Figure 4 shows the speedup gained when we increased the number of thread in each problem size. In our experiments, the grid size used are represented by lines with different colors. From the figure, it is clear that by increasing the number of threads, the speedup also increase.

Fig. 5. Speedup against grid size.

However, the speed gained will reach a saturation point in a certain number of threads and we can observe that there wasn’t any difference between grid of 5000 and 6000 cells in terms of speedup. Figure 5 displays the speedup obtained for each thread when we increase the grid size. The x-axis and y-axis represent, respectively, the problem size and the

432

S. Belhaous et al.

speedup. As the curve chart shows, each line represents the possible number of threads going from 1 to 8, it depends on the obstacles which surround the starting cell. The graph clearly shows that the effectiveness of parallelism was very significative when the maximum number of threads. We can also observe that the acceleration gained was maximum in case of the maximum number of threads and the best gain obtained was the one that corresponded to the smallest problem size (3000*3000), due to the increased number of threads used in a reduced grid. The last curve chart in this subsection shows the execution time according to different problem size. The x axis and y axis represent, respectively, the size of the problem and the running time; each line shows the result obtained using a different number of threads (see Fig. 6).

Fig. 6. Running time according to different grid size.

From the Fig. 6, we can deduce that there is a significant reduction in the running time due to the parallelism. Table 2 presents a comparison between the proposed contribution of this paper and another work existed in the literature [16]. It is clear that the implementation presented in this paper using python (see Sect. 4) is the fastest one. The proposed contribution presented in this paper, aims to improve on a previous study [19] in the same context. The previous study also based on A* search algorithm to find the optimal path among Moroccan cities using a graph. The previous parallel implementation based on graph data was developed using Java language, which many threads running simultaneously to find the path from each neighbor of start node to goal node. The number of threads generated by the program depends on the number of starts neighbor. Each thread ran A* function from its new start node but the goal was the same for all threads. This previous study, presented some experimental results using two different graphs and a comparison between sequential and parallel versions of the A* algorithm to evaluate the path search performance between two nodes in a graph.

Evolutionary Heuristic for Avoiding Traffic Jams

433

Table 2. Positioning of the proposed implementation Grid size

6

Python implementation Java implementation [16]

2000*2000 0,505

16.752

3000*3000 2.038

52.725

4000*4000 2.795

123.189

Conclusion

Efficient road traffic management plays a relevant role in building a smart city. Traffic jams cause traffic problems every day, affecting the profitability of people across the world. In this paper, a parallel implementation of A* search algorithm was proposed to find the optimal path between two nodes. The contribution of this implementation focused on the heuristic function which should be calculated in two different ways depending on the state of the road. In this case, the system can change automatically the path for the other drivers if the traffic will be congested by changing the value of the heuristic function. This work has not yet been accomplished, a real study of road traffic data must be carried out as well as a simulation with Multi-agent systems must also be achieved to better explain this contribution.

References 1. Lea, R.: Smart cities: an overview of the technology trends driving smart cities. IEEE 3(March), 1–16 (2017) 2. United Nations Population Fund, Urbanization. https://www.unfpa.org/ urbanization. Accessed 10 Aug 2020 3. Peris-Ortiz, M., Bennett, D.R., Y´ abar, D.P.: Sustainable Smart Cities: Creating Spaces for Technological, Social and Business Development. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-40895-8 4. International Telecommunication Union (ITU), Smart sustainable cities. https:// www.itu.int/en/mediacentre/backgrounders/Pages/smart-sustainable-cities.aspx. Accessed 11 Aug 2020 5. Chen, L., Englund, C.: Cooperative intersection management: a survey. IEEE Trans. Intell. Transp. Syst. 17(2), 570–586 (2016) 6. Meghana, B.S., Kumari, S., Pushphavathi, T.P.: Comprehensive traffic management system: real-time traffic data analysis using RFID. In: Proceedings of the International Conference on Electronics, Communication and Aerospace Technology, ICECA 2017, January 2017, pp. 168–171 (2017) 7. Yiu, Y.F., Du, J., Mahapatra, R.: Evolutionary heuristic A* search: heuristic function optimization via genetic algorithm. In: Proceedings - 2018 1st IEEE International Conference on Artificial Intelligence and Knowledge Engineering, AIKE 2018, pp. 25–32 (2018) 8. Chiu, C.S.: A genetic algorithm for multiobjective path optimisation problem. In: Proceedings - 2010 6th International Conference on Natural Computation, ICNC 2010, 5(ICNC), pp. 2217–2222 (2010)

434

S. Belhaous et al.

9. Kponyo, J., Kuang, Y., Zhang, E.: Dynamic travel path optimization system using ant colony optimization. In: Proceedings - UKSim-AMSS 16th International Conference on Computer Modelling and Simulation, UKSim 2014, pp. 142–147 (2014) 10. Russell, S.J., Norvig, P.: Artificial Intelligence A Modern Approach. Pearson Education, London (2003) 11. Chen, K.: Heuristic search and computer game playing IV. Inf. Sci. 175(4), 245–246 (2005) 12. Hart, P., Nilsson, N., Raphael, B.: A formal basis for the heuristic determination of the minimum cost paths. IEEE Trans. Syst. Sci. Cybern. 4(2), 100–107 (1968) 13. Mahafzah, B.A.: Performance evaluation of parallel multithreaded a heuristic search algorithm. J. Inf. Sci. 40(3), 363–375 (2014) 14. Pearl, J.: Heuristics: Intelligent Search Strategies for Computer Problem Solving. Addison-Wesley Longman Publishing Co., Inc., Boston (1984) 15. Kilin¸carslan, M.: Implementation of a Path Finding Algorithm for the Navigation of Visually Impaired People (2007) 16. Zaghloul, S.S., Al-Jami, H., Bakalla, M., Al-Jebreen, L., Arshad, M., Al-Issa, A.: Parallelizing A* path finding algorithm. Int. J. Eng. Comput. Sci. 6(9), 22469– 22476 (2017) 17. Pacheco, P.: An Introduction to Parallel Programming. Elsevier, Amsterdam (2011) 18. Hennesy, J., Patterson, D.: Computer Architecture: A Quantitative Approach, 3rd edn. Morgan Kaufmann, San Francisco (2003) 19. Belhaous, S., Baroud, S., Chokri, S., Hidila, Z., Naji, A., Mestari, M.: Parallel implementation of a search algorithm for road network. In: 3rd International Conference on Intelligent Computing in Data Sciences (ICDS’19) (2019) 20. Chokri, S., Baroud, S., Belhaous, S., Khouil, M., Youssfi, M.E., Mestari, M.: Impact of communication volume on the maximum speedup in Parallel computing based on graph partitioning. In: 3rd International Conference on Intelligent Computing in Data Sciences (ICDS’19) (2019) 21. Chokri, S., Baroud, S., Belhaous, S., Bentaleb, M., Mestari, M., El Youssfi, M.: Heuristics for dynamic load balancing in parallel computing. In: Proceedings of the 2018 International Conference on Optimization and Applications (ICOA’18) (2018) 22. Hidila, Z., Belhaous, S., Bentaleb, M., Naji, A., Mestari, M.: Airspace sectorization and comparison via computational geometry, openCV and networkx. In: 2019 3rd International Conference on Intelligent Computing in Data Sciences (ICDS’19) (2019)

Geometric Feature Extraction of Road from UAV Based Point Cloud Data Mustafa Zeybek1(B) 1

and Serkan Bi¸cici2

Engineering Faculty, Geomatics Engineering Department, Artvin Coruh University, Artvin, Turkey [email protected] 2 Engineering Faculty, Civil Engineering Department, Artvin Coruh University, Artvin, Turkey

Abstract. This study presents a new approach to achieving the high accuracy geometric feature extraction of road surface automatically from UAV based images. The proposed methodology begins with the automatic extraction of road surface from point cloud. The extraction of road is based on point clouds and machine learning classification algorithm. Then, road boundaries are derived from extracted road surface points and are used to estimate the road centerline. The point clouds are then used to create digital elevation models to extract profile and cross-section elevations at specified intervals by referenced the estimated smooth road centerline. The accuracy of the road surface classification is evaluated by comparing manual classified points. According to the results, precise road extraction, road centerline, profile, and cross-sections are produced with high accuracy using the proposed approach. Keywords: UAV

1

· Road · Centerline · Profile · Cross-section

Introduction

One of the urban characteristics among the topographic object properties is roads that connect long distances to each other safely and quickly. The roads also serve numerous different supplements, such as traffic control, finding shortest path in emergencies, navigation of unmanned vehicles, and urban planning. For this reason, the road environmental database needs to be updated periodically. In addition, the current geometric conditions of existing roads are critical for ensuring road safety, planning and comfort [1]. Planning of road designs along specific routes or investigating road conditions is a highly relevant subject [2]. In order to deal these problems from past to present, numerous techniques such as ground measurement systems, mobile measurement systems, and satellite-based systems have been actively used in building and modernizing road databases [3,4]. In particular, database used in the preparation of maps from remotely taken images gives a significant contribution. Nevertheless, manually editing and updating these obtained optical database is time-consuming and high-priced. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021  M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 435–449, 2021. https://doi.org/10.1007/978-3-030-66840-2_33

436

M. Zeybek and S. Bi¸cici

Several methods for automated and semi-automatic road network extraction from high-resolution satellite images have been proposed in several studies [5–7]. The fact that the images are very high resolution and high volume brings the expectation of high accuracy of feature results. It has become quite tricky for automatic extraction to be faster and more detailed, mainly to produce up-todate maps and construct geometric designs. In addition, weather conditions or satellite services may not be sufficient to provide this data at the demanded time and resolution. In recent years, feature extraction studies with various machine learning methods in computer vision and image processing techniques have become widespread, depending on texture and shape characteristics [8]. These classification and clustering algorithms are supported by high-tech computer algorithms, especially for processing large data sets, increasing their performance day by day. Road extraction studies can be classified into two groups as road centerline determination and road surface extraction. Road centerline extraction is a vital road process needed to analyze the alignment of roads and giving information about the road network. On the other hand, the extraction of road surface from images focuses on identifying pixels describing the road surface with image segmentation algorithms in the literature [9]. However, the profile and cross-section extraction that form the basis of geometric road designs are not very common. In order to make new road designs over existing roads, especially in the production of geometric designs, automated processes should be sufficiently demonstrated from the dense point clouds data provided with low-altitude unmanned aerial vehicle (UAV) images. The two most commonly used road features for these designs are road profile and road cross-sections. This study aims to perform automatic extraction of road profiles and road cross-sections from dense point clouds by UAV images for the latest geometric designs of the existing road corridor. The reminder of this paper is organized as follows. First, related works are summarized in Sect. 2. Then, proposed data pipeline is presented in Sect. 3. Specifically, data pre-processing is presented in Sect. 3.1. Then, ground and nonground classifications are summarized in Sect. 3.2. Road surface and centerline extraction are given in Sect. 3.3 and 3.4, respectively. Similarly, the extraction of road profile and cross-sections are presented in Sect. 3.5 and 3.6, respectively. The results and discussion are discussed in Sect. 4. Finally, conclusions are presented in Sect. 5.

2

Related Works

Studies have been collecting information regarding road conditions and features using different types of data sources. Satellite images and aerial photographs have been commonly used to collect road information [3,9–11]. However, they only provide pixels and two-dimensional (2D) information. That is, images have to be high-resolution to collect road information successfully using their methodologies. Moreover, several studies had serious problems to extract road and non-road sections [3,10].

Geometric Feature Extraction of Road from UAV Based Point Cloud Data

437

UAV is currently growing technology to collect information regarding road condition [1,12–14] as well as other fields [15,16] since it is very flexible and easy to maneuver. Abdollahi et al. [1] used UAV images to extract road features in three steps. Firstly, trainable weka segmentation was used for image segmentation. Then, the level set (LS) method was performed for the road surface extraction. Finally, morphological operators were applied to the images for improving extraction accuracy. The road extraction accuracy is conducted comparing manually digitized road layers, and 94%, 86% and 81% were obtained for completeness, correctness and quality, respectively. Similarly, Cao and Yan [13] used images obtained from UAV to automatically extract the road network in the mountain area. Specifically, regional growth algorithms were applied to the segment after arranging UAV images based on their generation time. Then, parallel criteria were used to identify potential road segments. Finally, the author proposed an algorithm to connect broken road segments. Bulatov et al. [12] was also another study to use UAV images for road feature extraction. Their methodology based on a classification of road pixels from a combination of topographical feature elevation and radiometric data. They proposed a pipeline with a five-step procedure for the acquisition of digitized roads. The main steps of their proposed algorithm are preprocessing, thinning, polygonization, filtering, and generalization. Up to 85% were achieved for correctness and completeness value. Three-dimensional (3D) point cloud is another data source to be used in the extraction of road features. Light detection and ranging (LiDAR) scanning and mobile laser scanning (MLS) are usually the most commonly used technologies to produce point clouds since they capture high-resolution 3D topography data quickly, which have advantageous over traditional roadway data collection methods [17]. For example, Wang et al. [18] proposed an algorithm to identify and extract traffic signs, light poles, roadside furniture and other objects from the MLS-based point cloud. Similarly, Guan et al. [4] and Rodriguez-Guenca et al. [19] used MLS-point clouds to extract curbs and road surface. Further, road geometry elements such as horizontal alignment and super-elevation data were also extracted in several studies [20,21]. However, MLS is highly cost technology for surveying applicators. In this study, a 3D point cloud obtained from UAV images rather than LiDAR or MLS was used to extract geometric features of the road. Specifically, image processing algorithms were used to produce 3D point clouds from UAV images. Then, the cloth simulation filtering (CSF) algorithm was applied to classify ground and non-ground points. In addition, the random forest algorithm was applied to extract the road surface from the point cloud. Then, the road centerline was extracted using the Voronoi diagram (or Thiessen-Polygons) algorithm. Finally, the road profile and several cross-sections were determined from the extracted road centerline.

438

3 3.1

M. Zeybek and S. Bi¸cici

Material and Method Data Acquisition and Processing

Aerial images were collected using an fully-automated UAV in February 2020 according to the planned mission over the study area. The study area is located on the Seyitler campus of Artvin C ¸ oruh University in Artvin, Turkey (Latitude: 41.1984172◦ , Longitude: 41.8495734◦ ). The DJI Phantom 4 RTK mapping UAV system was used for a complete end-user model. The UAV system has three-band image capable camera which captures an RGB values (R: Red, G: Green, B: Blue). The spatial resolution of 1.5 cm per pixel was produced on average over the study area (at 50 m flight altitude). This value is too high resolution when compared with satellite or other platform image. Raw UAV images were processed in Pix4D Mapper software to generate 3D point cloud (containing RGB values), which will be used to detect the roads’ geometric condition. The detailed processing steps can be found in Zeybek and S ¸ anlıo˘ glu [22]. The initial results of the image processing software are given in Table 1 for initial processing and processing parameters for dense reconstruction. Table 1. Initial processing results and dense point cloud reconstruction parameters. Properties

Value

Avg. Ground Sampling Distance (GSD) Area covered Time for initial processing Flight altitude Image scale Point density Minimum number of matches Time for point cloud densification

1.55 cm 1.8441 ha 4 m:57 s 50 m 1/2 (Half image size) (Multiscale) Optimal 3 09 m:32 s

Image processing was completed with structure from motion (SfM) algorithms. The SfM processing pipeline consists of EXIF information, camera calibration parameters optimization, and external orientation parameters estimation. The next step is to produce 3D sparse points (tie points) that determine camera locations and orientations. With SfM, hundreds of thousands of tie points were created in each image, and these corresponding points match on sequent images. Thanks to these auto-matching points step, camera location and orientation information can be estimated with photogrammetric equations. All the positional information and orientation of the cameras were calculated. Then, similar pixels on the images that match the multi-view stereo were matched to produce dense point clouds. Finally, the dense point clouds in millions of points whose position is known were obtained.

Geometric Feature Extraction of Road from UAV Based Point Cloud Data

3.2

439

Ground and Non-ground Classification

The first step of the proposed pipeline is the classification (or filtering) of the points. That is, points are assigned as ground or non-ground. This classification is required since the most fundamental requirement in roads’ formation is that each point corresponding to the road has to be in a ground class. There are many point cloud filtering methods available in the literature [22]. The cloth simulation filtering (CSF) algorithm [23] was applied in this study for the ground and non-ground classification step. This algorithm is based on physical principle. The main principle of the CSF is that the point clouds are inverted with respect to the Z-axis and the cloth is covered over points. Then, the points intersecting with the cloth are considered as ground. That is, points that provide specific limit values are selected as the ground point according to the grid model by showing tolerance. Non-intersecting points are assigned as non-ground. The open source software called CloudCompare plugin was used for this classification step [24]. Figure 1a shows raw dense point cloud data (in RGB) of the study area. The CSF algorithm result is shown in Fig. 1b.

Fig. 1. Classification and training sample data, a) raw dense point cloud data in RGB, b) ground (green) and non-ground (blue) classification result, c) training data samples.

440

3.3

M. Zeybek and S. Bi¸cici

Road Surface Extraction

After ground and non-ground classification step, another classification algorithm was required in this study to extract road surface. Higher classification accuracy for road surface is often obtained than other objects such as building and tree. The main reason for this is that the geometric structure of the road surface points during the production phase, and the filtering phase is a more typical circumstance. Therefore, the overall accuracy value is high, compared to the classification of other objects. In this study, points are classified using both geometric and colored (RGB) band properties to get the accurate road surface. The random forest (RF) algorithm, which is a machine learning algorithm, was applied in this study. This algorithm is commonly used for classification algorithms in obtaining the road class from data and has high accuracy classification results [25]. Basically, the RF classifier is an assemblage case of a set of decision trees. These decision trees in RF algorithm are constructed from a subset of training data by a bagging procedure. The training data set is manually selected from dense point cloud. Figure 1c shows training data samples in the study area. Then, bagging randomly selects about third-fourth of the samples from the training data to train the decision trees. Then, the remaining samples of training data are used in an internal crossvalidation technique to estimate RF algorithm performances. This process is repeated many times. This means that the same samples can be selected several times, while others may not be selected at all. Two basic input parameters are needed to use an RF classifier. These are the number of trees (ntree ) and the number of features (nf eat ). Ground and non-ground points, planarity, surface normals, curvature, omnivariance, linearity, surface variation, anisotropy, above ground level, and RGB features were used in the RF algorithm to train the classifier model. The classification result of the sample set is determined by a simple vote as follows [8]: M (x) = maxY

k 

I(hi (x) = C)

(1)

i=1

where M (x) is the model of RF, hi (x) is a decision tree, x denotes input data, C presents the classification target, k presents the number of the model and I() is the indicator function. The class number is given in binary. Road points were assigned to class 1, while other points were assigned to class 0. CloudCompare software has been used for the preparation of training data. The RF classifier algorithm was implemented with the caret package in the R programming environment [26]. Ground truth data (reference class) classified by the operator is used to test accuracy of RF classification. The confusion matrix consists of the true positive (TP), the false negative (FN), the true negative (TN), and the false positive (FP) values [26]. Then, completeness, correctness and quality measures are produced based on confusion matrix [1].

Geometric Feature Extraction of Road from UAV Based Point Cloud Data

441

Noisy classification results are likely to occur because similar characteristics are found between dense point clouds after the RF classifier. Thus, highdensity road data has been extracted using the connected component algorithm which is a clustering and distance-based algorithm for determining optimum road points [27,28]. 3.4

Road Centerline Extraction

Road centerlines are vector line data representing the geographic situation of the road in transportation networks. In this study, the centerline was extracted using the following steps. The 3D structure of point clouds was converted into road edge polygons in the first step with the help of 2D road boundaries. Then, the small gaps left in the road boundary polygons were identified according to varying length threshold values and extracted regarded to topological rules. Concave hull algorithm [29] also known as α-shape was used to determine the boundaries of the road edges. To extract the centerline, the Voronoi diagram (or Thiessen-Polygons) algorithm [30,31] was implemented to create Voronoi cells. Then, the corners of each Voronoi cell converted into individual points. A Voronoi diagram is defined as one of the most critical structures in computational geometry. A Voronoi diagram lays out the relation between objects spatially. Let P = P1 , P2 , ..., Pn be considered as a set of points in the 2D plane. For a particular “Q” point in the plane, the closest neighbor of “Q” is the point with the minimum distance between “Q” and “P”. V (Pi ), which is the Voronoi polygon for Pi , is defined as the point set in the plane where Pi is the closest neighbor of this point. If it is desired to show the distance |P Q| between “P” and “Q”, this can be expressed as: V (Pi ) = {Q|Pi − Q| ≤ |Pj − Q|∀j = i}

(2)

Voronoi polygons split the plane into segments adjacent to each other (These polygons can be infinite and extend indefinitely in areas close to the border points). The Voronoi polygons’ confluence is called the Voronoi diagram of P , represented by V D(P ). It has been assumed that the road centerline can only be presented as midpoints. The candidate midpoints are Voronoi points from the intersection of points in perpendicular to the road boundaries and buffer areas. In the next step, the points obtained in the centerline converted into spatial lines. Voronoi points also consist of the starting and end points of Voronoi lines. Therefore, the buffer application allows the removal of Voronoi points outside the road area. In the center of the road alignment, numerous small lines with sharp corners are grown, caused by errors in the road boundaries of the Voronoi and Thiessen polygons. In such cases, the centerline is wrinkled due to the road boundary points’ inconsistency at the roadsides. For this, a five-degree polynomial curve fitting method was implemented. The sub-center lines with high curvature or sharp features and the lines that incorrectly existed on centerline were smoothed.

442

3.5

M. Zeybek and S. Bi¸cici

Profiling from Point Clouds

Profiling refers to the vertical section taken along the alignment axis (centerline) of the road. In traditional surveying measurement, piqu´e stakes are usually piled along the road axis for every 50 m and points where the slope of the terrain changes. In this way, the road axis is determined in the field. In order to determine the distance of these points from the beginning, the distance pile is nailed. The mileage value is assigned to the points that are applied by taking the starting kilometer 0 + 000 from the beginning to the end of the axis application and all points. Then, profile leveling is completed. The vertical plane that passes along the axis and the intermediate section of the ground surface is called “profile” or “profile section”. Decision making with the help of a computer-aided design (CAD) program, design construction has proceeded through the obtained profiling. In this study, the automatically extracted road centerlines were obtained on the road surface. Road profiles were extracted through the digital surface model (DSM) after the road centerline was determined. DSM data was produced from the point cloud with the inverse distance weighted (IDW) algorithm [32]. The DSM is the height data recorded as raster grid data. The Z component of the centerline was extracted along the line corresponding to the positional (x, y) on the DSM. These steps were implemented in the R programming sp package and raster packages. Finally, the profile section plot was generated. 3.6

Cross-Sectioning from Point Clouds

After the road centerline determination, the cross-sections are taken transversely at regular intervals, usually at 50 m distances. Distance values are taken according to the designed road width, or the width desired to be obtained for the existing road plan. In order to make a cross-sectional drawing or to provide cross-section data, the road surface must be precise. The profile related to the road centerline obtained in Sect. 3.5 was accepted as a 2D route at this stage, and cross-section can be produced at desired intervals along this route. Cross-section lines are also positioned at 90 and 270◦ (100 gradians and 300 gradians) from the profile lines. In this way, spatial information about the road alignment is provided at an absolute distance perpendicular to the road alignment. The direction of the cross-sections to be handled on the points determined along the profile direction has been calculated according to the 100 gradians and 300 gradians directions with the corresponding profile point and the azimuth value of the next point. Calculated directions, again, by using the coordinate geometry functions, the endpoints of the cross-sections are calculated according to the cross-section distance. The two cross-section points are processed as defined lines on each crosssection point, and the Z values in the corresponding DSM data are extracted on this line. It was exported as a graphic according to the Z value and distance information of each point. Thanks to the cross-sections obtained, high accuracy 2D information about the road surface or the road environment is provided.

Geometric Feature Extraction of Road from UAV Based Point Cloud Data

4

443

Results and Discussion

The proposed methodology was carried out on a road close to Seyitler campus of Artvin Coruh University. It has been visually seen that the road is sloped, and some parts of the surface asphalt are damaged. In addition, there are building and retaining wall around the road. Therefore, the proposed methodology was examined under these circumstances. To extract road surface and centerline, dense point clouds obtained from UAV images were used. The number of neighborhoods in the point cloud within the radius of 1 m was calculated according to each point’s neighborhood values. Minimum size of the point counted 1 point in 1 m radius area and maximum 2867 points, averagely, on the road surface over 1000 points/1 m radius were acquired on study area. As the flight mission has a fixed height on the study area, the point densities increase on the terrain surface with the enhancement of the GSD images in high-elevation objects (building, hills, etc.). It is observed that there is not much variation in their density on the road surface. To summarize the software used in the proposed methodology, Pix4D Mapper software was used to produce point cloud from UAV images. R programming language was used to process point cloud. In addition, to the arrangement of data and visualization, Global Mapper GIS software and open-source CloudCompare software were also used. In the proposed methodology, firstly ground and non-ground points were classified. Figure 1b shows the ground and non-ground classification result. Then, the RF algorithm was applied to the point cloud. As explained in Sect. 3.3, for large data sets, the raw data is partitioned into a training set to train the classifier model, and a validation set to validate the produced model, and a test set to evaluate the trained model. However, cross-validation (CV) is also beneficial with small data sets to improve performance. CV is valuable for determining a model and predicting errors that occur in the model. When a model is selected using CV, this model can be trained on a specific data set and select the model’s hyperparameters by choosing a model from other models. In this study, the data is split into two parts. These were used for training and verification purposes. Besides, the CV model was repeatedly set to 10-folds, and 98% overall accuracy was achieved according to the accuracy criteria from the selected model confusion matrix. Figure 2 shows the points after the RF classification. Specifically, Fig. 2b presents points representing the road with large number of noise points. Highnoise RF results are expertly filtered with the CC algorithm at an optimum level, and the road surface is obtained. CC algorithm parameters were obtained by the trial and error method. Figure 3 presents several cases under different octree level (OL) and minimum number of points per clusters (CL). It is seen that each case in Fig. 3 eliminate noise points differently. Visually, Fig. 3e was selected as optimum road points. The road extraction accuracy was conducted comparing with manually digitized road layers and 94%, 76% and 73% were obtained for completeness, correctness and quality measures, respectively. To give insight of the proposed

444

M. Zeybek and S. Bi¸cici

Fig. 2. Random forest classification result, a) road (class: 1) and non-road (class: 0) points, b) classified as road points with large number of noise, c) optimum road points.

Fig. 3. Connected components (CC) algorithm results, different values of octree level (OL) and minimum points per clusters (CL), a) OL: 8 (0.68 m) CL: 1000, b) OL: 9 (0.34 m) CL: 1000, c) OL: 11 (0.08 m) CL: 1000, d) OL: 11 (0.08 m) CL: 3000, e) OL: 11 (0.08 m) CL: 15000, f) OL: 12 (0.04 m) CL: 1000

methodology, these acquired measures are compared with other studies. Figure 4 shows these three measures across different studies. Proposed methodology produces one of the highest completeness measures when comparing the other studies. However, correctness and quality measures are in the middle. Several errors might cause small correctness and quality measures. For example, there are several vehicles parked on the side of the road as seen in Fig. 5. In addition, there is some part on the side of the road which is not main road but identified road in the proposed methodology. These errors reduce the performance of the proposed methodology. After extracting road surface, the centerline was determined as summarized in Sect. 3.4. Three different centerline methods were compared in this study. These are GIS software Global Mapper algorithm, manual path drawing (7 m width parallel line), and the centerline algorithm proposed in this study. Figure 6 shows

Geometric Feature Extraction of Road from UAV Based Point Cloud Data

445

Proposed work Shi 2013 Sujatha 2015 Abdollahi 2019 Kamangir 2017 Singh 2012 Zhang 2018 Bulatov 2016 Liu 2019 Wang 2015

0.8 0.6 0.4

Quality

0

Correctness

0.2 Completeness

Corrresponding Value

1

Fig. 4. Comparing performance of the proposed methodology with other studies.

Fig. 5. Identifying several error source.

three centerlines along with road boundaries obtained by the proposed algorithm and manually. As can be seen in Fig. 6b, the centerline obtained from Global Mapper contains many irregularities, and very rough lines. On the other hand, proposed methodology results are more smoother as well as retain road properties. It is also closer to alignment and curve formation. In addition, proposed methodology results are also consistent with the manually drawn centerline.

446

M. Zeybek and S. Bi¸cici

Fig. 6. Centerline comparison with various methods, a) general view of orthomosaics with details of measurements, b) close view of region of interest (ROI).

In the proposed methodology, the profile was also obtained through the digital surface model (DSM) after the road centerline extraction as discussed in Sect. 3.5. Figure 7a illustrates DSM raster data with the optimum obtained centerline and Fig. 7b presents profile of the road. The profile results is reasonable given the knowledge of the area.

Fig. 7. Elevation extraction of profile, according to the centerline, a) centerline overlayed onto the DSM, b) elevation and distance values of profile.

In addition to the profile, fifteen cross-sections with 20 m width were also produced on the profile at 10 m intervals from the beginning point (0 + 000 km). Figure 8a and b show location of fifteen cross-sections on the profile and on the DSM, respectively. Two cross-section examples are also presented in Fig. 8c and d. Again, these cross-sections are reasonable given the knowledge of the area.

Geometric Feature Extraction of Road from UAV Based Point Cloud Data

447

Fig. 8. Cross-sections designed 10 m distance interval and 20 m width right and left side, a) plotted cross-sections, each section has different color, b) designed cross-sections overlayed on the DSM, c) cross-section from start point, d) sample cross-section left side with building.

5

Conclusion

This study presents a new approach to achieving the high accuracy geometric feature extraction of road surface automatically from UAV based images. The proposed pipeline consist of five steps. Firstly, ground and non-ground points are classified. Then, road surface is extracted using supervised machine learning algorithm. Road boundaries are derived from extracted road surface points and are used to estimate the road centerline in the next step. Finally, using the road centerline, profile and cross-sections are obtained. The proposed algorithm was conducted on a road which is not on high standard of highway. However, when comparing other studies, proposed pipeline produces promising results. Acknowledgements. This work was financed by the Artvin Coruh University Scientific Research Projects Coordinatorship (Grant No. 2019.F40.02.02). We want to acknowledge the Turkish General Directorate of Highways for supporting us to study on the road. The developed R codes can be downloadable from https://github.com/ mzeybek583/RRoad.git.

References 1. Abdollahi, A., Pradhan, B., Shukla, N.: Extraction of road features from UAV images using a novel level set segmentation approach. Int. J. Urban Sci. 23(3), 391–405 (2019). https://doi.org/10.1080/12265934.2019.1596040 2. Zeybek, M., Bi¸cici, S.: Road distress measurements using UAV. Turk. J. Remote Sens. GIS 1(1), 13–23 (2020) 3. Zhang, J., Chen, L., Zhuo, L., Geng, W., Wang, C.: Multiple saliency features based automatic road extraction from high-resolution multispectral satellite images. Chin. J. Electron. 27(1), 133–139 (2018). https://doi.org/10.1049/cje.2017.11.008

448

M. Zeybek and S. Bi¸cici

4. Guan, H., Li, J., Yu, Y., Wang, C., Chapman, M., Yang, B.: Using mobile laser scanning data for automated extraction of road markings. ISPRS J. Photogramm. Remote Sens. 87, 93–107 (2014). https://doi.org/10.1016/j.isprsjprs.2013.11.005 5. Kamangir, H., Momeni, M., Satari, M.: Automatic centerline extraction of covered roads by surrounding objects from high resolution satellite images. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 42(4), 111–116 (2017). https://doi.org/10. 5194/isprs-archives-XLII-4-W4-111-2017 6. Shi, W., Miao, Z., Wang, Q., Zhang, H.: Spectral-spatial classification and shape features for urban road centerline extraction. IEEE Geosci. Remote Sens. Lett. 11(4), 788–792 (2013). https://doi.org/10.1109/LGRS.2013.2279034 7. Wang, J., Song, J., Chen, M., Yang, Z.: Road network extraction: a neural-dynamic framework based on deep learning and a finite state machine. Int. J. Remote Sens. 36(12), 3144–3169 (2015) 8. Li, Z., Cheng, C., Kwan, M.P., Tong, X., Tian, S.: Identifying asphalt pavement distress using UAV lidar point cloud data and random forest classification. ISPRS Int. J. Geo-Inf. 8(1), 39 (2019). https://doi.org/10.3390/ijgi8010039 9. Sujatha, C., Selvathi, D.: Connected component-based technique for automatic extraction of road centerline in high resolution satellite images. EURASIP J. Image Video Process. 2015(1), 8 (2015). https://doi.org/10.1186/s13640-015-0062-9 10. Singh, P.P., Garg, R.: Automatic road extraction from high resolution satellite image using adaptive global thresholding and morphological operations. J. Indian Soc. Remote Sens. 41(3), 631–640 (2013). https://doi.org/10.1007/s12524-0120241-4 11. Liu, R., Miao, Q., Song, J., Quan, Y., Li, Y., Xu, P., Dai, J.: Multiscale road centerlines extraction from high-resolution aerial imagery. Neurocomputing 329, 384–396 (2019) 12. Bulatov, D., H¨ aufel, G., Pohl, M.: Vectorization of road data extracted from aerial and UAV imagery. ISPRS - Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. XLI–B3, 567–574 (2016). https://doi.org/10.5194/isprs-archives-XLIB3-567-2016 13. Cao, Y., Yan, L.: Automatic road network extraction from uav image in mountain area. In: 2012 5th International Congress on Image and Signal Processing, pp. 1024–1028. IEEE (2012) 14. Tan, Y., Li, Y.: UAV photogrammetry-based 3D road distress detection. ISPRS Int. J. Geo-Inf. 8(9), 409 (2019). https://doi.org/10.3390/ijgi8090409 15. Ag¨ uera-Vega, F., Carvajal-Ram´ırez, F., Mart´ınez-Carricondo, P., L´ opez, J.S.H., Mesas-Carrascosa, F.J., Garc´ıa-Ferrer, A., P´erez-Porras, F.J.: Reconstruction of extreme topography from UAV structure from motion photogrammetry. Measurement 121, 127–138 (2018). https://doi.org/10.1016/j.measurement.2018.02.062 16. W´ ojcik, A., Klapa, P., Mitka, B., Piech, I.: The use of TLS and UAV methods for measurement of the repose angle of granular materials in terrain conditions. Measurement 146, 780–791 (2019). https://doi.org/10.1016/j.measurement.2019. 07.015 17. Puente, I., Gonz´ alez-Jorge, H., Mart´ınez-S´ anchez, J., Arias, P.: Review of mobile mapping and surveying technologies. Measurement 46(7), 2127–2145 (2013). https://doi.org/10.1016/j.measurement.2013.03.006 18. Wang, J., Lindenbergh, R., Menenti, M.: SigVox-a 3D feature matching algorithm for automatic street object recognition in mobile laser scanning point clouds. ISPRS J. Photogramm. Remote Sens. 128, 111–129 (2017). https://doi.org/10.1016/j. isprsjprs.2017.03.012

Geometric Feature Extraction of Road from UAV Based Point Cloud Data

449

19. Rodriguez-Cuenca, B., Garcia-Cortes, S., Ord´ on ˜ez, C., Alonso, M.C.: An approach to detect and delineate street curbs from MLS 3D point cloud data. Autom. Constr. 51, 103–112 (2015). https://doi.org/10.1016/j.autcon.2014.12.009 20. Holgado-Barco, A., Riveiro, B., Gonz´ alez-Aguilera, D., Arias, P.: Automatic inventory of road cross-sections from mobile laser scanning system. Comput. Aided Civ. Infrastruct. Eng. 32(1), 3–17 (2017). https://doi.org/10.1111/mice.12213 21. Shams, A., Sarasua, W.A., Famili, A., Davis, W.J., Ogle, J.H., Cassule, L., Mammadrahimli, A.: Highway cross-slope measurement using mobile lidar. Transp. Res. Rec. 2672(39), 88–97 (2018). https://doi.org/10.1177/0361198118756371 ˙ Point cloud filtering on UAV based point cloud. Mea22. Zeybek, M., S ¸ anlıo˘ glu, I.: surement 133, 99–111 (2019). https://doi.org/10.1016/j.measurement.2018.10.013 23. Zhang, W., Qi, J., Wan, P., Wang, H., Xie, D., Wang, X., Yan, G.: An easy-to-use airborne lidar data filtering method based on cloth simulation. Remote Sens. 8(6), 501 (2016). https://doi.org/10.3390/rs8060501 24. Girardeau-Montaut, D.: CloudCompare point cloud software (2019) 25. Ni, H., Lin, X., Zhang, J.: Classification of ALS point cloud with improved point cloud segmentation and random forests. Remote Sens. 9(3), 288 (2017). https:// doi.org/10.3390/rs9030288 26. Ghatak, A.: Machine Learning with R. Springer, Singapore (2017) 27. Lumia, R., Shapiro, L., Zuniga, O.: A new connected components algorithm for virtual memory computers. Comput. Vision Graph. Image Process. 22(2), 287–300 (1983). https://doi.org/10.1016/0734-189X(83)90071-3 28. Griffioen, S.: A voxel-based methodology to detect (clustered) outliers in aerial lidar point clouds. Master’s thesis, Delft University of Technology (2018) 29. Edelsbrunner, H., Kirkpatrick, D., Seidel, R.: On the shape of a set of points in the plane. IEEE Trans. Inf. Theory 29(4), 551–559 (1983) 30. Bowyer, A.: Computing dirichlet tessellations. Comput. J. 24(2), 162–166 (1981) 31. Panigrahi, N.: Computing in Geographic Information Systems. CRC Press, Boca Raton (2014) 32. Bartier, P.M., Keller, C.P.: Multivariate interpolation to incorporate thematic surface data using inverse distance weighting (IDW). Comput. Geosci. 22(7), 795–799 (1996). https://doi.org/10.1016/0098-3004(96)00021-0

Parking Availability Prediction in Smart City El Arbi Abdellaoui Alaoui1,2(B) and Stephane Cedric Koumetio Tekouabou3,4 1

2 3

4

EIGSI-Casablanca, 282 Route of the Oasis, Casablanca, Morocco [email protected] Department Computer Science, Faculty of Sciences and Technologies-My Ismail University, Errachidia, Morocco Laboratory LAROSERI, Department of Computer Science, Faculty of Sciences, Chouaib Doukkali University, B.P. 20, 24000 El Jadida, Morocco [email protected] Center of Urban Systems (CUS), Mohamed VI Politechnic University (UM6P), Hay Moulay Rachid, 43150 Ben Guerir, Morocco

Abstract. Smart cities are part of continuous advances in technology to provide a better quality of life for its inhabitants. Important topics in smart cities include urban mobility. Due to the increasing number of vehicles passing through cities, traffic and traffic are becoming common problems. In addition, finding places to park in the city center is not a simple task for drivers. In fact, drivers looking for parking spaces contribute up to 30% traffic congestion. In this context, this paper proposes a model for predicting the availability of parking spaces in car parks using different data sources, regression techniques and machine learning. The objective is to propose a model capable of making an enlightened estimate of the number of free parking places. Keywords: Smart parking systems Machine learning

1

· Internet of Things · Smart city ·

Introduction

The continuous growth of the population, the gradual migration of the population to the cities and the progress of information and communication technologies (ICT) have given rise to the phenomenon of “smart cities” [1]. In particular, smart cities are a growing global trend, which aims to integrate ICT solutions to improve the quality of life of its citizens and their interaction with government officials. Considering that traffic and urban mobility are one of the major problems of city development, they face the challenges of sustainable mobility in limited physical spaces. This growing demand is normally limited by the limited physical capacity of a city’s transportation system, traffic and parking [3,11]. The urban mobility refers to the management of the means of transportation in the city and the costs, time and money, which force citizens to move from c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021  M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 450–462, 2021. https://doi.org/10.1007/978-3-030-66840-2_34

Parking Availability Prediction in Smart City

451

one place to another to carry out their daily activities [11]. A smart city must engage in efficient and multimodal use of transport, promoting public transport and options with less impact on environmental pollution. In large cities, a large number of citizens use public systems (for example, collectives) and private companies (eg. motorists), and many of them depend on how well it works. These types of questions, almost routine, of people affected by the experience they perceive of transportation services are key factors when considering alternatives to improve it. The urban mobility is one of the typical applications of a smart city, for example, via public transport applications or the provision of customized routines to a user. For the design of these applications (usually with mobile device support), valuable information must be provided to users to enable intelligent travel choices [5]. At the same time, transport companies must commit to improving the quality of the services provided, which also depends on the quantity and quality of information provided by users. With regard to traffic, the traffic flow makes traffic and parking in various areas of an intermediate city, such as the downtown area, tedious. Currently, some tools can help drivers by reporting traffic flows to avoid crowded areas or to report accidents that do not allow normal operation of streets, roads or highways [2,6]. However, one of the aspects that affects driver activity is the search for parking spaces. This activity not only affects drivers who want to park, but also the fluidity of traffic in the city. For example, according to [4], 30% of traffic jams are caused by vehicles in search of parking. In this context, if drivers could know in advance which parking spaces were available, it would make it easier for them and would also help to manage the flow of traffic in a more orderly manner. In this work, a data-centric approach is proposed that aims to predict the percentage of availability of parking spaces for a given city parking. This approach uses machine learning techniques and also integrates different sources of information. The main sources of data used for the development of our model is the data of the car parks of the city Birmingham. For the predictions, different regression techniques were used that predict a block occupancy numeric value for a given date and time, and interesting results were obtained with predictors based on different regression techniques. 1.1

Main Contributions

The key contributions of this work are : – We show interest in using the Smart City, IoT and ITS concept in public transport; – We will propose an efficient architecture for smart parking; – We will detail a model for predicting the availability of places in a smart parking; – We will use new machine learning techniques to predict the availability of places in a smart parking; – We will evaluate the proposed model using performance metrics.

452

E. A. Abdellaoui Alaoui and S. C. Koumetio Tekouabou

The rest of this paper is organized as following: Sect. 2 presents the preliminary concepts we will need. Section 3 presents the architecture of our system. In Sect. 4, we presents our methodology for prediction of the availability of parking. Section 5 presents the Experimentation Results and discussion. We conclude the paper in Sect. 6.

2

Smart Cities and Parking Prediction Challenges

In the literature we can find many definitions of a “smart city” [7,9]. The definition we propose is the one taken from [10] because it seems to us to be the most appropriate for our conception of such a system. The authors define a “smart city” as an instrumented, interconnected and intelligent city. Instrumentation enables the capture and integration of real-world data through sensors, counters, vending machines, personal digital devices, image acquisition systems, smart phones, implanted medical devices, the web and other data acquisition systems, including social networks such as human sensor networks. Interconnection means the integration of this data into a company’s IT platform and the communication of this information among various services. Intelligence refers to the presence of complex analyses, modeling, optimization and visualization in operational business processes to make the best operational decisions (e. g. using machine learning). Many concepts are linked to the concept of “smart city”. These concepts have varying definitions and have both complicated and independent uses. They can still be classified according to three dimensions [8]: technological, human and institutional. These three dimensions are mutually connected to each other to give the concept of “smart city” (Fig. 3). The technological dimension focuses on infrastructure and software, the human dimension on creativity, diversity and education, and the institutional dimension on politics and governance. With regard to the technological dimension, a properly functioning infrastructure is absolutely necessary but not sufficient to become a “smart city”. An ICT-based structure is a prerequisite, but without a real commitment and willingness to collaborate between public institutions, the private sector, associations, educational institutions and citizens, there would be no “smart city”. Generally a smart city is based on the following three streams: – Logistics flow: Logistics flow in a “smart city” corresponds to urban traffic systems (URBAN Traffic Systems -UTS). – Energy flows: Urban energy flows correspond to all energy transfers between production sources on the one hand and storage systems and/or loads (habitat, public lighting, charging stations, etc.) on the other piece. – Data flow: With the arrival of new technologies in cities (smart phones, sensor networks, demotic and immotic systems,. . .), a lot of data from different applications is stored and can be transformed into Knowledge.

Parking Availability Prediction in Smart City

453

Fig. 1. Components of smart city

A smart city can be improved with the mechanisms of IoT and STIs. Indeed, the concepts of IoT and STI can offer valuable real-time information to smart city players. For example, the various services provided by ITS in a smart city are shown in Figure X: Smart car parks are a privileged application area for ITS and IoT, due to the need for good articulation between many players in the exchange of information. Indeed, THE ITS and IoT are a key to implementing efficient modes of public transport as well as to respond to the problems of timing and quality demanded by customers (Fig. 1). 2.1

Intelligent Transport System (ITS)

The public transport in the city is a privileged field of application of intelligent transport system (ITS), because of the need for good articulation between many actors in the exchange of information. Indeed, the intelligent transport system (ITS) is a central key to implement efficient common transport modes as well as to answer the issues of timing and quality demanded by customers [12]. Intelligent transport systems (ITS) use new information and communication technologies (NTICs) to make transport more automated, thereby increasing the performance of these systems beyond the limits sensory impairment of human drivers. An ITS has the potential to increase traffic safety, its efficiency, driving comfort and reduce the negative impact of transport on the environment. With these different benefits, the development of ITS is actively pushed by Governments, automobile manufacturers and global regulators and standardization bodies [12]. An example of ITS application is the alert of road users to a critical event on the track (accident, breakdown, unworkable pavement, track work, etc.). Other examples are electronic tolls, traffic information systems, and navigation systems. In all these cases, the most important technology to meet these needs is wireless communication. This allows road users to share information

454

E. A. Abdellaoui Alaoui and S. C. Koumetio Tekouabou

and manage their behavior cooperatively. The various services provided by the TSIs are illustrated in Fig. 2: – Mobility assistance: Concerning modal choice assistance, the new ITS helps users to determine the appropriate mode of transport according to their needs (speed, cost, safety...). In the route selection help pane, they offer routes that match user expectations. For the assistance of the choice of schedules, they inform the users of the travel schedules for example (peak hours). Booking systems also simplify the management of reservations in real time and remotely (car rental, bus, parking...). Concerning fleet management, the ITS allows controlling the management of products according to the logistics of the company. – Real-time travel assistance: Electronic payment of ticketing, e-tolls, toll-free, parking or other payment allows people to adjust their costs, save time substantially, etc. Real-time travel help and advicing is a real-time traffic information service (their probable journey time, etc.). The monitoring of the fleets allows to control access to certain sensitive or forbidden areas and to increase the safety and security of the transport of dangerous goods, etc. – Road safety assistance: Driving assistance is used to improve the safety of users and to ensure the comfort of people. These applications include the automatic gearbox, programmable electronic stabilizer, speed limiter, anticollision system, GPS navigation assistance, etc. For the knowledge of the regulations: geolocation systems used to know the regulations in urban areas, etc. Regarding the application of the regulations, we mention for example radars, the use of cameras in public transport to fight against fraud, etc. 2.2

Internet of Things (IoT)

The Internet of things (IoT) has the potential to transform the transportation industry by profoundly modifying how to gather data, connect with users and automate processes. IoT consists of networking physical objects that, through the use of embedded sensors, actuators, and other devices, can collect and transmit information about network activity in real time. The data collected from this equipment can then be analyzed by the transport agencies for: * Improve passenger experience with more reliable transportation, better customer service, more accurate communication and information. * Increase safety, including the operation of the transport system through data sensors that detect all anomalies in train speed, pavement temperatures, the condition of aircraft parts, or the number of cars waiting at an intersection. * Reduce energy use and congestion through real-time data mining to facilitate the adaptation of resources to operators who need to meet demand, through agility to react quickly to traffic patterns in change, or to address the impact of traffic on fuel consumption, the environment and regional economic competitiveness.

Parking Availability Prediction in Smart City

455

Fig. 2. Different services provided by ITS

3

System Model of Smart Parking

The Intelligent Parking System is an intelligent parking system that uses a detection device to define the occupancy rate of the parking space. It helps the driver to park safely and informs him/her of the availability of parking spaces through appropriate vehicle management. Thanks to intelligent technologies, optimized parking can reach the city centre. A sensor system indicating to drivers where the nearest free parking space is located has already been successfully tested in multi-storey car parks. Street tests are currently underway. In San Francisco, CA, 6,000 sensors have been embedded in the asphalt and are working in conjunction with an application and a GPS. An advanced smart parking system architecture needs to have the following elements: – Sensor: In each and every parking facility, a set of RF sensors have to be installed. These sensors would relay real-time data about the availability (or otherwise) of slots to nearby drivers. These ‘occupancy sensors’ need to run on battery (wire-connectivity is impractical, given the high number of sensors that have to be used), have long-range RF capabilities (either with a mesh network or with LoRa technology), and offer excellent accuracy. The number of ‘false positives’ generated by a sensor have to be minimal. – Gateway hardware: Changes in the availability of parking slots will be reflected through a status change of the sensors – and that, in turn, will be

456

E. A. Abdellaoui Alaoui and S. C. Koumetio Tekouabou

Fig. 3. Smart parking system

collected in a gateway. Unlike the sensors, the gateways have to be operational round-the-clock (the sensors are functional only when their status changes). The data collected in the gateway are then sent on to the centralized server. – Server: The server in a smart parking setup needs to have 2-way communication protocol support. On the one hand, it will receive the data from the gateways – to generate updated information about the free and occupied parking slots in any facility. At the other end, it will send real-time notifications to the dedicated mobile app in the user’s device – to guide them to the parking area, and the particular ‘vacant’ parking slot. A detailed map of the parking facility will also be sent from the server to the user, for additional guidance. – Mobile application: The final component of the smart parking architecture is the dedicated mobile app – which serves as the touchpoint for the final users (the app can be installed on smartphones and tablets). After receiving notifications from the server, the app would help the driver to navigate to the empty parking slot (no more confused driving around and resultant frustrations!). For this navigation, the application will either rely on the tools placed in the parking slots (off-road parking) or the phone GPS (for on-road parking).

4

Methodology

The prediction process is shown in Fig. 4 and consisted of several phases including data collection and preprocessing, application of machine learning models, evaluation and prediction.

Parking Availability Prediction in Smart City

4.1

457

About Ensemble-Based Models

Ensemble-based prediction methods combine several independent basic models that are in most cases decision trees or neural networks. Each of these basic models provides an alternative prediction of the problem and the final prediction is a combination (usually by weighted or unweighted vote) of alternative predictions. The prediction technique by combining the predictions of a set of individual base models generally allows for more stable and accurate output prediction because the error is much smaller than that provided by one of the individual base models which form the overall model. Indeed, the final ensemble-based model corrects the errors made individually by the basic models so that to drastically reduce the total error. To be so effective, the basic models should be forced to fulfill two conditions namely to be independent and to be weak models. The initial idea was to divide the training data D into n basic data to train n models m1 , m2 , ... mn . But this technique was quickly exceeded because promotes underfitting when n becomes high. To overcome this limit, the methods of resampling the training data into n independent and larger data subsamples to generate weak models. To do this, various techniques among which the most used known are: bagging and boosting.

Fig. 4. Global ensemble-based System for Real-time Parking availability Prediction

4.2

Ensemble-Based Models for Regression

The general idea of the ensemble methods is summarized in Fig. 5 which shows that these models are based on three main stages namely: boostraping, intermediate modeling and aggregation. The boostraping consists in dividing less the

458

E. A. Abdellaoui Alaoui and S. C. Koumetio Tekouabou

Fig. 5. Process of ensemble-based model

data D into n data D1 , D2 , ....., Dn . From each data set Di we will construct an intermediary regressor Ri and the final regressor will be an aggregation of the intermediate regressors Ri . From this general idea will be born several methods among which the most powerful are the bagging used in the Random forest algorithm and the Boosting used by Gradiant Boosting and Adaptive Boosting. 4.3

Random Forest Regressor

Random forest is nothing more than a particular bagging method consisting of an aggregation of trees based on random variables. Most often, trees are built with the classification and regression tree (CART) algorithm whose principle is to recursively partition the space generated by the explanatory variables in a dyadic way. More precisely, at each stage of the partitioning, a part of the space is cut into two sub-parts according to a variable Xj . Algorithm 1. Random Forest Regressor (RFR) Input: – x the observation to predict; – dn the observation; – B the number of Trees; – m ∈ N the number of candidate variables to cut a node. 1 B h(x, y) Output: h(x) = B k=1 1: for k = 1 to B do 2: Draw a bootstrap sample in dn 3: Construct a CART tree on this bootstrap sample, each cutoff is selected by minimizing the cost function of CART over a set of m randomly selected variables among the p. We note h(., k) the built tree. 4: end for

Parking Availability Prediction in Smart City

459

Gradient Boosting Regression (GBR). Another very popular reinforcement algorithm is gradient enhancement. Gradient Boosting works similarly to AdaBoost by sequentially adding predictors to a set, so that everyone tries to correct the errors of its predecessor. However, instead of adjusting the instance weights at each iteration, as AdaBoost does, this method tries to fit the new predictor to the residual errors committed by the previous one [6].

5

Experimentation Results and Discussion

In this section, we will detail the experiments of the process of parking availlability prediction based on ensemble method. Algorithm 2. Gradient Boosting Regression (GBR) Input: – x the observation to predict; – h a weak rule, – dn = (x1 , y1 ), ..., (xn , yn ) the sample; – λ a regularization parameter such as 0 < λ < 1; – M the number of iterations.;  Output: the estimator gˆM (x) = M m=1 αm gm (x)

1 1: Initialization : Initialize the weight distribution of training data by wi = ,i = N 1, 2, ..., N 2: for m = 1 to M do 3: Adjust the weak rule on the sample dn weighted by the weights w1 , ..., wn , we note gm (x) the estimator resulting from this adjustment. N i=1 wi ∗ 1yi =gm (x)  4: Compute em = i=1 nwi 1 − em 5: Compute αm = log( ). em 6: Readjust the weights: wi = wi ∗ exp(wi ∗ 1yi =gm (x) ), i = 1, ..., N 7: end for

5.1

Dataset

As shown in Fig. 4, our global predictive system consists of several phases. The first step consists to collect data from the sensors installed in the different smartparkings. At this level, the data is collected in parking database as a csv file. The data analized in this paper come from the Birmingham car park and were first used in (ref 1 and 2)comprising valid occupancy rates of 29 car parks operated by NCP (National Car Parks) in the city of Birmingham in the U.K. Birmingham, is a major city in the West Midlands of England, standing on the small River Rea. It is the largest and most populous British city outside London, with an estimated population of 1,124,569 as of 2016 [13]. Several cities in the

460

E. A. Abdellaoui Alaoui and S. C. Koumetio Tekouabou

U.K. have been publishing their open data to be used, not only by researchers and companies, but also for citizens for better know the place where they live. The Birmingham data set is licensed under the Open Government License v3.0 and it is updated every 15 min from 8:00 AM to 4:30 PM (18 occupancy values per car park and day). In our study, we worked with data collected from Oct 4, 2016 to Dec 19, 2016 (11 weeks) which is availlable on UCI machine Learning Repository. The selection of relevant data consists in eliminating irrelevant and redundant information. For the Birmingham Parking Database, the features considered relevant to the problem are: SystemCodeN umber : is an alphanumeric code that identifies a car park. LastU pdated : contains the date and time of the last update for occupancy data for each parking block. Schedules are recorded between 8:30 am and 6:30 pm. Capacity : Contains the capabilities of each car park. Occupancy : contains the occupations of each car park which are updated every 30 min. Other features such as the fill rate and the exit rate of each block were not considered in this work. From these features we have generated a specific feature called the availability rate that we have noted AV R which is the ratio of the capacity minus the occupation at time t of the date d, on the capacity of the parking block. In our case, it is calculated by the following formula: AV Rp (d, t) = 5.2

Capacityp − Occupancyp (d, t) Capacityp

Performance Measures

In order to build an optimal approach, we compared the performance of different models using three main measures : the mean absolute error (MAE), the (R2 ) and the root mean square error (RMSE). The three terms can judge the difference between the real and the predicted parking availlability rate in different aspects. They are calculated as :  N 2 i=1 (AV Rip − AV Ri ) RM SE = N −1 N | AV Rip − AV Ri | M AE = i=1 N N (AV Rip − AV Ri ) R2 = 1 − i=1 N i=1 (AV Rip − AV Ri ) Where N is the total number of instances, AV Rip is thes predicted Availlability Rate of the instance i and AV Ri is the real Availlability rate of this instance. The choice of a single measure may not always allow to separate the models. If the RMSE will show the error characterized by the variance and mean between

Parking Availability Prediction in Smart City

461

Fig. 6. Comparison of techniques Table 1. Comparison of techniques Regressor

RMSE MAE

R2

Bayesian Ridge Regressor

0,1737 0,1416 0,5879

KNeighbors Regression

0,0014 0,0008 1,0000

Gradient Boost Regression 0,0348 0,0259 0,9835 Random Forest Regression 0,0017 0,0006 1,0000 Extra Trees Regression

0,0027 0,0010 0,9999

the predicted and the real by favoring the effects of the high deviations, the absolute error may reflect the effect of the precision in the prediction of the waiting time and R2 will show us the proportion of the actual waiting time that has been correctly predicted. The optimal model will result from the homogeneity between these three measures. 5.3

Resulsts Analysis and Discution

In order to find a model that gives the best prediction performance, we tested several regression models by starting with linear models (Bayesian Ridge Regresor) which did not prove very effective although very fast with RMSE of 0, 1737, MAE of 0.1416 and R2 of 0.5879. To improve these performances we have tested other stochastic and probabilistic models, so the results are shown in Fig. 6 and Table 1. According to these results KNeighbors and Random Forest give the best near-perfect performances reaching respectively 0.0014 and 0.0017 in terms of RMSE, 0.0008 and 0.0006 in terms of MAE and R2 of 100%. They are followed by Extratrees Regressor who gave RMSE performances of 0.0027, MAE of 0.0010 and R2 of 0.9999. With RMSE performance of 0.0348 MAE of 0.0259 and R2 of 0.9835, Gradient Boost Regression is in the last position just ahead of the much less efficient linear algorithms.

462

6

E. A. Abdellaoui Alaoui and S. C. Koumetio Tekouabou

Conclusion

Urban mobility is one of the most important components interesting smart cities, and it’s one of those It can directly benefit citizens. With the prediction of parking places, citizens they can reduce their time in the search for parking lot. In this paper we have proposed a prediction model of availability of parking spaces for the city of Birmingham. Specifically, the prediction techniques proposed showed better results than those of the baseline predictors.

References 1. B´elissent, J.: Getting clever about smart cities: New opportunities require new business models. Massachusetts, USA, Cambridge (2010) 2. Tang, S., Gao, H.: Traffic-incident detectionalgorithm based on nonparametric regression. IEEE Trans. Intell. Transpo. Syst. 6, 38–42 (2005) 3. De Fabritiis, C., Ragona, R., Valenti, G.: Traffic estimation and prediction based on real time floating car data. In: 11th International IEEE Conference on Intelligent Transportation Systems, ITSC 2008 (2008) 4. Zheng, Y., Rajasegarar, S., Leckie, C.: Parking availability prediction for sensorenabled car parks in smart cities. In: IEEE Tenth International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP), p. 2015. IEEE (2015) 5. Tooraj, R., Ioannou, P.A.: On-street and off-street parking availability prediction using multivariate spatiotemporal models. IEEE Trans. Intell. Transp. Syst., 29132924 (2015) 6. Lin, T., Rivano, H., Le Mou¨el, F.: A survey of smart parking solutions. IEEE Trans. Intell. Transp. Syst. 18, 3229–3253 (2017) 7. De Almeida, P.R., et al.: PKLot-A robust dataset for parking lot classification. Exp. Syst. Appl. 42, 4937–4949 (2015) 8. Lee, J.H., Hancock, M.G., Hu, M.C.: Towards an effective framework for building smart cities: lessons from Seoul and San Francisco. Technol. Forecast. Soc. Change 89, 80–99 (2014) 9. Nam, T., Pardo, T.A.: Conceptualizing smart city with dimensions of technology, people, and institutions. In: Proceedings of the 12th Annual International Digital Government Research Conference: Digital Government Innovation in Challenging Times, pp. 282–291. ACM (2011) 10. Sharad, S., Sivakumar, P.B., Narayanan, V.A.: The smart bus for a smart city–a real-time implementation. In: 2016 IEEE International Conference on Advanced Networks and Telecommunications Systems (ANTS), pp. 1–6. IEEE (2016) 11. Haitao, X., Ying, J.: Bus arrival time prediction with real-time and historic data. Cluster Comput. 20(4), 3099–3106 (2017) 12. Rajabioun, T., Ioannou, P.A.: On-street and off-street parking availability prediction using multivariate spatiotemporal models. IEEE Trans. Intell. Transp. Syst. 16(5), 2913–2924 (2015) 13. Camero, A., Toutouh, J., Stolfi, D.H., Alba, E.: Evolutionary deep learning for car park occupancy prediction in smart cities. In: International Conference on Learning and Intelligent Optimization, pp. 386–401. Springer, Cham, June 2018

Smart Infrastructure and Integrated Platform of Smart and Sustainable Cities Maysoun Ibrahim(B) Researcher, Ramallah, Palestine [email protected]

Abstract. Developing a city into a Smart and Sustainable City (SSC) requires an adequate smart infrastructure and integrated platform. The smart infrastructure aims at responding intelligently to business and public-sector needs, user demands and other infrastructures. It connects the city’s physical, social, business, and digital infrastructures in a way that allow the city to integrate, analyze, gather, optimize, and make decisions based on detailed operational data. In turn, the integrated platform is a digital platform from which all needed information and knowledge could be created. Such a platform should be designed in a way not only to facilitate the aggregation of city data and information analysis, but also to facilitate collaboration between different city levels and to better understand how the city is functioning in terms of services, resource consumption, and lifestyle. This paper aims at exploring in details the main components that should be considered when embarking onto the journey of developing a city into a smarter one in relation to its infrastructure, data manipulation and sharing, and connectivity issues. It constitutes a contribution to knowledge by showing the complexity of SSC performance as a multidimentional system-of-systems and highlighting the elements to be considered when developing a SSC. Keywords: Smart cities · Sustainable cities · Smart infrastructure · Integrated platform · Transformation process · Smart solutions

1 Introduction More than half of the world population is living currently in urban areas. This number is expected to increase and reach around 65–75% by 2050 [1]. This unprecedented urbanization growth forces many governments and cities’ administrators to find the most appropriate ways to keep their cities sustainable, capable to provide high quality services to their residence while maintaining the ecosystem of these cities. As a result, the concept of smart cities has emerged, which then expanded to be known as Smart and Sustainable Cities (SSCs). The concept of SSCs is getting the global attention as it forms the desired goal for present and future urban development. The International Telecommunication Union (ITU) defines a SSC as “an innovative city that uses Information and Communication Technologies (ICTs) and other means to improve quality of life, efficiency of urban operation and services, and competitiveness, while ensuring that it meets the needs of present and future generations with respect © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 463–476, 2021. https://doi.org/10.1007/978-3-030-66840-2_35

464

M. Ibrahim

to economic, social and environmental as well as cultural aspects” [2]. This city is developed over six main dimensions, namely [3], Smart Economy (Competitiveness), Smart Environment (Natural Resources), Smart Governance (Participation), Smart Living (Quality of life), Smart Mobility (Transport and ICTs), and Smart People (Social and Human Capital). The Information and Communication Technologies (ICTs) are used in this city as an enabler that offers better quality of life for citizens through environmentally friendly and viable smart services [4, 5]. It has a major role in realizing each of the six dimensions through a list of appropriate and digitally smart solutions. A city that can be transformed into a SSC can provide various types of benefits to its citizens, businesses, institutions, and administration bodies. This includes: (1) better city governance (2) better quality of life for citizens (3) more convenient and better urban services and operations (4) better and sustainable environmental conditions (5) more intelligent and smarter infrastructures (6) modern and innovative industry, and (7) better, innovative, and dynamic economy. To realize these benefits, cities needs to be transformed from their current traditional form into smart, sustainable ones, following a coherent, systematic transformation process. One of the issues to be considered during this transformation process includes the development of the SSC smart infrastructure and integrated platform. The latter aims at helping in developing the needed services and solutions to residents as well as connecting different types of systems at all city levels, creating a “system-of-systems” city. The main purpose of the smart infrastructure is to response intelligently to business and public-sector needs [6], user demands and other infrastructures [7]. It connects the city’s physical infrastructure, social infrastructure, business infrastructure, and ICT infrastructure (i.e. digital infrastructure) in a way that allow the city to integrate, analyze, gather, optimize, and make decisions based on detailed operational data [8]. A smart infrastructure provides the needed foundation to the six dimensions of a SSC, noting that its components are context-specific, depending on the developing level of a city. While developing a smart infrastructure, the recommendation is to reuse existing infrastructures, enhance them if possible, and add new ones that are resulted from emerged technologies [2, 6, 7, 9, 10]. In turn, the SSC’s integrated platform is a technological architecture, available across the city to all its members and community. It is a digital platform from which all needed information and knowledge could be created. Such a platform should be designed in a way not only to facilitate the aggregation of city data and information analysis, but also to facilitate collaboration between different city levels and to better understand how the city is functioning in terms of services, resource consumption, and lifestyle. City administrators and relevant stakeholders can use the information made available by the integrated platform to take actions and create policy and regulation directions that would help in improving the quality of life of citizens and society as a whole [2, 7, 11]. This research attempts to contribute to knowledge by demonstrating the need of developing the SSC smart infrastructure and integrated platform in a way to support the development and the connectivity of different services and systems at all city levels. It proposes and highlights the minimal required components to be considered during the development of these two main components as well as proposes a coherent, high level design of the SSC’s smart digital infrastructure architecture. It is worth noting that none

Smart Infrastructure and Integrated Platform of Smart and Sustainable Cities

465

of existing studies in the literature consider or discuss all types of infrastructures needed to be improved for the benefits of implementing SSCs. Most of existing studies focus on the digital (ICT) infrastructure. However; the concept of SSCs is far from being limited to the application of different types of technologies to cities. It includes many other issues, such as the hard and soft infrastructures, that should be considered during the transformation process. The rest of this paper is as follows. In the next section, a brief literature review is highlighted. Section 3 is devoted to the research findings regarding the development of the SSC’s smart infrastructure and integrated platform. A list of recommendations is provided in Sect. 4. The paper concludes in Sect. 5 with a brief summary and ideas for possible future directions.

2 Literature Review Cities are the result of agglomeration of hard and soft infrastructures in addition to its ICT or digital infrastructure. These infrastructures are urban features that have been installed by human activities and are essential for a city to operate. The hard infrastructure refers to tangible (i.e. physical) structures such as buildings, roads, pipes, wires, shared spaces, bridges, ports, among others. The soft infrastructure, in turn, refers to intangible structures such as laws, regulations, rules, conventions, financial systems, government systems, healthcare systems, education systems, human capital, business environments, and others [12, 13]. These two structures are completing each other. For instance, an airport as a hard infrastructure of a city cannot function without a set of soft infrastructures that provides a list of rules about the minimum acceptable size of runways, required distance between landing and taking off planes, conventions regarding passenger loading and unloading, and others. The hard and soft infrastructures are also increasingly becoming interlinked with and operated using new technologies, such as sensor devices, online services, GPS takers, and computing systems. This interlink offers opportunities for improving existing city services and systems and creating new sustainable ones. Various studies in the literature agree on the necessity of developing the SSC smart infrastructure and integrated platform. The former forms the foundation of the services to be provided under each of the six dimensions of a SSC while the latter facilitates the aggregation and sharing of data between various SSC services [2, 6, 7, 9–11, 14–16]. However; none of these studies consider all types of the needed infrastructures to be developed and/or improved. The latter includes the hard, soft, and digital infrastructures that should be considered during the transformation process without neglecting any of them. For instance, the British Standards Institution [9] emphasizes the necessity of connecting all city systems through a strong, well developed ICT infrastructure. The ITU-T FG-SSC [2], in turn, focuses on the need for investing in the ICT infrastructure, considering this infrastructure as a critical component of a SSC transformation process. The ITU-T divides the most important technologies needed for a SSC into three categories named the (1) Network Facilities, such as data center, communication layer, and access networks; (2) ICT Facilities, such as network management software, cloud computing

466

M. Ibrahim

and data platform; and (3) Terminals, Sensing & Multi-device layer, which includes thermals & gateways, sensors and Internet of Things. In relation to standardizations, the ISO/IEC [6] highlights the necessity of developing smarter and more intelligent ICT-infrastructures than existing ones. This may require revisiting and revising existing ICT infrastructure standards to meet the additionally added components to the SSC. The ISO/IEC shows that the ICT infrastructure standers should consider the (1) Infrastructure and supply chain (2) Built environment, such as smart buildings (3) Transportation, logistics, and services, such as electric/hybrid vehicles and utility grid (4) Security (5) Education and training, such as the standards needed for distance learning, and (6) Emergency planning and response. It is worth nothing that a full, comprehensive comparison between related existing studies and this research proposed solions (Sect. 3) is available at [19, 20].

3 Proposed Smart Infrastructure and Integrated Platform 3.1 Developing of the Smart Infrastructure: Proposed Solution A smart infrastructure allows new innovative services and solutions to be implemented to address urbanization challenges at all city levels and to respond to the sustainable development needs of the society. For example, the data collected from the smart mobility infrastructure could be used to generate information to redesign the transportation networks of a city and to build new smart mobility applications for the benefit of citizens. A smart infrastructure is highly context specific and the nature of its components is determined by the development level of cities and specific development challenges. For instance, existing transportation systems in developed countries are more advanced than those in developing countries or Least Developed Countries (LDC). This includes the availability of trains, metros, trams, water transports, and others. In relation to challenges, one of the development challenges may be related to the shortage of financial resources that are needed to improve existing infrastructure or developing of new ones. The level of this financial shortage and existing resources for securing it varies between the developed, developing, and LDC countries, even between cities within the same country. This research divides a city infrastructure into two categories named as the smart physical infrastructure and digital infrastructure. The reason behind this classification is that the city infrastructure is often categorized into the hard and soft infrastructures (named as non-ICT based infrastructure) and the digital infrastructure (named as ICTbased Infrastructure) as follows: 1. Smart Physical Infrastructure (Non-ICT based infrastructure): includes the (a) hard infrastructure, such as buildings, roads and bridges; and (b) soft infrastructure, such as existing financial systems, laws, and regulations. 2. Smart Digital Infrastructure (ICT-based infrastructure): includes the hardware and software components of a city, such as the network infrastructure, access devices, sensors, and social applications. The combination of these infrastructures is known as a traditional city infrastructure, where its components often are rarely connected to each other and have very limited

Smart Infrastructure and Integrated Platform of Smart and Sustainable Cities

467

operational functionalities for controlling and monitoring them [14]. This traditional infrastructure must be improved to a level that never previously achieved, where the ICT infrastructure and capacities are used as an enabler to increase its smartness level. Thereby, to differentiate it from the traditional infrastructure, the SSC smart infrastructure, as the foundation of the development of SSCs, is broadly divided into two categories named: (1) Smart Physical Infrastructure and (2) Smart Digital Infrastructure [2, 6, 7, 10, 11, 15, 17]. Smart Physical Infrastructure. Smart physical infrastructure refers to the SSC assets of the hard and soft infrastructures and the use of the city’s digital infrastructure (i.e. ICT networks and systems) to improve these assets. For the hard infrastructure, a list of innovative change activities should be planned and implemented to efficiently improve the current city’s hard infrastructure systems through the use of appropriate technologies. This includes the development and use of the smart water management systems, smart grids for smart energy, smart waste management systems, smart healthcare systems, Intelligent Transportation Systems (ITS), e-government solutions and services, smart safety and security management systems, e-business solutions and services, smart buildings and homes, and smart systems for education and tourism management. Table 1 provides examples of some of SSC hard infrastructure solutions and how they could be used to address some urbanization challenges aiming to enhance the sustainable development of a city and provide a high quality of life for its citizens. Table 1. Use of SSC smart infrastructure to meet challenges of urbanization Sustainability Need/Challenge

Example of the solution

Description

Improve Energy and Utility infrastructure

Smart Grids

Use of the digital technology to improve flexibility, resiliency, efficiency, and reliability of the electric delivery system through applications of smart appliances, smart meters, and renewable energy resources

Smart Meters

Real-time electric devices to measure electricity, water, and natural gas consumption

Smart buildings

Use of technologies and sensors to improve security, safety, usability, and energy efficiency

Smart Parking

Use of technologies and sensors to provide real-time information about car parking capacities and locations to citizens

Smart traffic lights

Use of technologies and sensors to manage vehicle and pedestrian traffic

Buildings, parking, and streets

(continued)

468

M. Ibrahim Table 1. (continued)

Sustainability Need/Challenge

Example of the solution

Description

Improve Environment Performance

Environmental Sensor Network

Use of technologies and sensors to continuously collect data about the condition and level of pollutants of water, air, and soil

Smart waste management

Use of technologies and sensors to continuously monitor the efficiency and performance of waste collection. Appropriate technologies can be used to provide waste recycling and disposal solutions

Improve Education and Health Services

Online education and remote healthcare

Use of technologies to facilitate the remote access to education and health services

Increase efficiency of City Management

SSC operation centers

To monitor and manage a range of transport, government, environmental, and emergency services

Ensure Public Safety and Security

Video security

Use of sensors and cameras networks for crowd management, public safety and people counting

Improve government services

e-Government solutions

Use of ICTs to improve public sector organizations processes and provide public services to citizens, for example, through online portals

The soft infrastructure is an intangible infrastructure that refers to different types of human institutions that maintain the country, thereby, the city core social and economic standards. This includes laws, regulations, rules, conventions, financial systems, government systems, healthcare systems, education systems, human capital, business environments, and others [12, 13]. These standards should be improved in a way to allow achieving the SSC objectives and goals. Without improving a city’s soft infrastructure, the hard infrastructure of a city will not be able to communicate probably [18]. For instance, the soft infrastructure forms a foundation on which to achieve interoperability, i.e. make SSC systems operate together and communicate with each other following agreed rules. Improving existing interoperability standards or creating new ones would ensure smooth and secure interactions between various SSC systems. SSC soft infrastructure is also affected by the use of ICTs as an enabler to various solutions to be provided by a SSC project. With this numerous use of ICTs, creating and agreeing on the standards to control the usage of vital public services and critical components of a SSC smart infrastructure created by the ICTs becomes highly important. Central governments should ensure that the local needs to be realized through ICT solutions are met by relevant standards. For example, a SSC allows different types of data sharing between various city systems including citizens’ information. A SSC soft infrastructure should have special standardizations to guarantee information sharing security. In addition, using of cloud computing techniques should be managed using special standards for data exchange, security issues, and type of contracts to establish.

Smart Infrastructure and Integrated Platform of Smart and Sustainable Cities

469

Categorizing all city soft infrastructure is not a simple task due to the huge number of standards and systems at all city levels. Therefore; it is recommended for each SSC project’s team to benefit from the information and data analysis collected during the phase of checking the “City Readiness” for change regarding the current status of its hard and soft infrastructures and decide either to enhance existing systems and standards or to introduce new innovative ones [19, 20]. Smart Digital Infrastructure. The ICTs play a crucial role in developing cities into smart and sustainable ones. They provide the needed tools to improve the smartness level of a city through smart services over each dimension of a SSC. The ICTs also allow various SSC systems to capture and share information in a timely manner. Without the ability of providing and sharing an accurate information quickly on real-time, the city would not be able to take potential actions to solve specific problems, such as the traffic congestion, rapidly and before the problem begins to escalate. This in turn improves the quality of life of citizens as it allows them to be more informed about different local situations and make decisions about next actions easier. Moreover, the digital infrastructure has a great impact on a SSC soft infrastructure. The digital integrated platform of a SSC is used to create information and knowledge network, which can be used to better understanding how the city is functioning [21]. City administrators and relevant stakeholders can use this information to upgrade existing soft infrastructure to improve the quality of life of citizens. The smart digital infrastructure, as a result, should be designed in a way to facilitate the development of different SSC services and solutions. In SSC projects, the main issue that should be considered during the transformation process is to maximize the potential for the reuse of existing digital resources of a city before developing of new ones. Existing digital assets should be measured during the “City Readiness” phase of the transformation roadmap and/or framework [19, 20]. These resources are then prioritized based on the planned transformational change solutions with the greatest potential of reuse; including establishing of usage policies and governance processes to ensure a more efficient use of these resources. The SSC smart digital infrastructure consists of four main layers that are (1) sensing layer (2) network communication layer (3) platform layer (data and support layer), and (4) (smart) application layer. The first three layers form the hardware infrastructure of the SSC’s smart digital infrastructure, while the smart application layer contains the software applications developed to realize different types of SSC services. These four layers in turn are connected to a SSC smart physical infrastructure, providing the needed services and solutions to its hard and soft components [2, 6, 7, 22]. This research adopted the ITU-T FG-SSC (2016) architecture design of a SSC digital infrastructure, as it is the only architecture related to SSCs with a clear design that is being referenced by some related studies in the literature. This architecture design has been upgraded, as illustrated in Fig. 1, based either on recommendations from various studies in the literature or on this research point of view, aiming to provide a holistic view of a SSC smart digital infrastructure. The latter includes making the smart application layer focusing on all smart solutions to be implemented under the six dimensions of a SSC without limiting it into a specific list of solutions. This layer is named as a ‘smart application layer’ instead of ‘application layer’ to differentiate it from the one

470

M. Ibrahim

being used in a city traditional infrastructure design and to highlight the term ‘smart’ as a concept to be considered while developing cities into SSCs. Moreover, based on the ISO/IEC smart cities preliminary report [6], the network communication layer is divided into two main types namely the public networks and private networks to make sure that all types of networks in a SSC are connected into its smart digital infrastructure. The communication protocols for each network determine the type of communication channels to be established with these networks and the type of data to be shared taking into consideration the security issues.

Fig. 1. Proposed smart sustainable cities smart digital infrastructure architecture design

On another note, after doing a wide research on the role of the integrated platform in SSCs and how and where it should be illustrated in the smart digital infrastructure architecture design, this research renamed the “Data and Support Layer” proposed by the ITU-T FG-SSC into the “Platform Layer”. Despite that one of the main roles of the integrated platform is to deal with the city big data collected from various systems and applications [2], the holistic roles go far beyond this particular role [6, 23–25]. The roles of the integrated platform are to be discussed later. Finally, the ‘Operation and Maintenance Systems’ along with the three types of end users that would be able to access the SSC services are adopted from ISO/IEC [6]. This includes the public citizens, enterprises and other institutions, and public sector.

Smart Infrastructure and Integrated Platform of Smart and Sustainable Cities

471

The purpose and functionality of each layer in a smart digital infrastructure are summarized below: 1. Sensing Layer: contains smart devices that are used to monitor and measure different types of parameters related to water, air quality, humidity, energy, temperature, occupancy, solar flux, state of equipment among others. This layer consists of two types of components that are the Terminal Node and Capillary Network. The terminal node, such as the Radio-Frequency Identification (RFID) readers, cameras, actuators, barcode symbols, transducers, Global Positioning System (GPS) trackers, etc., are used for sensing the hard-physical infrastructure of a city. They have the ability to detect, monitor and control the environment of this infrastructure intelligently. The capillary network connects various terminal nodes to the network communication layer, allowing a real-time and continuous sharing and exchanging of data and information. This layer includes the video surveillance, Supervisory Control and Data Acquisition (SCADA), GPS related networks, Highway Addressable Remote Transducer (HART), RFID among others. 2. Network Communication Layer: this layer provides the needed framework and technology foundation for managing, designing, and building of a SSC communication network. It consists of two types of networks, namely, the public network with an open access feature; and the private network, which is often dedicated to a specific group(s) of users. This layer provides both the wired and wireless connectivity via different types of services such as 2G/3G/4G/5G, Wi-Fi, General Packet Radio Service (GPRS), fibers (FFTx), Ethernets, all types of Digital Subscriber Lines (xDSL) among others. This layer does not only connect objects that are harvesting data and information form the environment using sensors, but also uses existing Internet standards and protocols to provide services for data analytics, data transfer, applications and communications. It could be seen as a superhighway layer that transfers huge amount of data (i.e. known as big data) to a SSC integrated platform. 3. Platform Layer (Data and Support Layer): the data collected from the network of interlinked objects, such as energy, water, transport, public areas, etc., is made accessible to various services and applications at all city levels throughout this layer. The aim of this layer is to provide the needed tools for big data analysis and management as well as to ensure the support capacities for different city-level services and applications. It contains different types of data centers and databases from enterprises, industries, institutions, and government and/or municipal(s). It also includes huge amount of data collected from the clouds and Internet of Things (IoT) objects, such as smart devices, vehicles, buildings, heart monitors, among others. It includes data warehousing that are established for the realization of data processing, such as the use of analytical tools and application support systems. The available analytical tools facilitate the analysis of large structured and unstructured data (i.e. big data) to generate insights about what type of data to be used for each service and application at all city levels, such as classifying this data to be used for public safety, transportation planning, water management systems, and others. The management of data enables the citywide consolidation and exchanging of these data across different sectors and owners. The latter includes citizens, public sector, private sector, international organizations, NGOs, among others.

472

M. Ibrahim

4. Smart Application Layer: this layer comprises of smart applications to enable the SSC’s six dimensions and their related domains such as energy, water, transportation, buildings, healthcare, and education. It also includes the core enterprise applications such as the Enterprise Resources Planning (ERP), Business Process Management (BPM), and performance management. In a SSC, these services should be designed to be used across different types of delivery channels (i.e. through mobiles, kiosks, websites, call centers, interactive voice responses, points-of-sale, etc.). It aims at increasing both the speed of services delivery and cost saving as well as improving the quality of life of citizens, efficiency of urban operations and services, and competitiveness. SSC smart digital infrastructure should be designed as a fully integrated servicebased architecture that enables future capacities and solutions to be easily added to the overall system as well as collects, manipulates, and exchanges data in an efficient way. With this architecture design, the newly designed and implemented services can quickly be added and interlinked to other services, creating composite services and enhancing the city processes. In addition, the data collected could be used to make better decisions, improve economic competitiveness, provide green and sustainable environment, and facilitate social inclusiveness and citizen’s engagement. The core component of this desired architecture is the integrated platform, which is responsible about interlinking various components of a SSC smart infrastructure and to collect data and information from different parties of a city. Due to its importance, the integrated platform, that has been represented as the “Platform Layer” in the introduced SSC smart digital infrastructure architecture design, is to be discussed further below. 3.2 Developing of the Integrated Platform: Proposed Solution A city traditional operating model is often based around functionally oriented service providers. These services operate as unconnected and isolated silos and often built without considering the end user needs [2, 9, 19, 20]. A SSC, in turn, aims to develop new operating model to support collaboration and innovation across these silos. To highlight the difference between these two models, this research introduces a graphical representation of the operating model of a traditional city and a SSC. To start with, Fig. 2 illustrates how the traditional city operating model systems are not interlinked with each other and how each system controls and exchanges its data and applications in an isolated silo. It is clear from the illustration that individual citizens, businesses and public-sector institutions must be engaged in each service separately. The potential for collaboration and innovation across a city is limited as the data and information are often locked within these silos. Adopting this type of models also limits the ability to drive city-scale change at speed. On the other hand, a SSC operating model allows the city data to be unlocked from individual silos, ensuring that the digital assets (i.e. digital data, applications and services) of a city are available in real time and on an open and interoperable basis. This enables a real-time integration and optimization of city resources. Instead of having a complete separation between different silos of a city, a logical separation between data, services, and users is used, which represented as separated layers in a SSC operating model as

Smart Infrastructure and Integrated Platform of Smart and Sustainable Cities

473

Fig. 2. The traditional operating model of a city

illustrated in Fig. 3. In this Figure, the blended application refers to an application that uses data from more than one resource. This model allows both the externally-driven and internally-driven innovations within a city. The former enables new and centralized marketplace for city information and data and enables residents, such as citizens, social entrepreneurs, and small and medium size enterprises, to co-create public services and create new values using city’s data. The internally-driven innovation aims at improving and integrating delivery of services and optimization of resources. It provides the end users with public services that are accessible in one stop, over multiple channels. This allows end users to be directly engaged in the creation of services based on their needs instead of building these services around the city’s organizational structure. This type of models also has the ability to drive city-scale change at speed. The SSC integrated platform forms the heart of a SSC operating model [19, 20]. It allows different systems of a city to be connected together and enables new applications and services to be developed across these systems, forming a true representation of a SSC as a “system of systems”. It is responsible about various functions, such as management, coordination, storage, mining, computing, analysis, and providing public services to end users. This includes data management and processing systems, network management systems, business management systems, and services delivery management systems. Using these functions, a SSC integrated platform can combine data from different and large number of resources, distributed around a city. It can generate new insights on how to create new services based on the city needs and local interests and enhance existing ones, resulting in improving the society quality of life and services.

474

M. Ibrahim

Fig. 3. A SSC operating model

4 Recommendations To ensure an effective and efficient development of the smart infrastructure and integrated platform during the SSC transformation process, this research provides a list of related recommendations as follows: 1. Perform city state and needs analysis regarding the needed smart infrastructure and integrated platform: the current city state report that sheds light on existing infrastructure challenges including an overall analysis of the city needs and initial suggestions of the integrated platform and related smart solutions should be created. It is recommended to work with related stakeholders to identify the initial required SSC deliverables that represents quick wins for the city and will be implemented over the smart infrastructure and integrated platform [26, 27]. This helps in identifying the needed assets under these two components. 2. Benefit from existing city’s assets: the hard and digital infrastructures are known to be costly and complex. Therefore, it is highly recommended to find the appropriate ways to reuse existing hard and digital resources of a city by improving their capabilities before developing of new ones. 3. Secure adequate financing for the development of the smart infrastructure, integrated platform, and smart solutions: to ensure the continuity of a SSC transformation process, there is a necessity to find adequate financial resources to support the implementation of SSC smart infrastructure and integrated platform. Without the financial support, a transformation process may be seized, or worse, fail. 4. Follow the incremental development approach: investment in smart infrastructure, especially the hard and digital ones, are never one-offs. The idea is to start with small

Smart Infrastructure and Integrated Platform of Smart and Sustainable Cities

475

improvements, based on the context and the needs of a city, and continue providing other improvements over time, as the transformation process is a long-term journey with continuous improvements that could not be achieved overnight.

5 Conclusion This paper has sought to advance understanding of transformational change within the area of Smart and Sustainable Cities (SSCs), particularly, proposing and highlighting the minimal required components to be considered during the development of the SSC smart infrastructure and integrated platform. It demonstrates the need of developing these two main structures of a SSC in a way that support the implementation and connectivity of different services and systems at all levels of a city. It proposes a coherent, high-level design of architecture of a SSC’s smart digital infrastructure. It also provides a list of recommendations that could be considered by governments and city’s administrators and planners that are planning to transform their cities into smart and sustainable ones. As for a future work, this paper opens the door for proposing a generic list of smart solutions to be considered by the city planners and administrators under each of the six dimensions of SSCs.

References 1. UN: Report of the World Urbanization Prospects: the 2014 Revision, Highlights. United Nations, Department of Economic and Social Affairs, Publication Division (ST/ESA/SER.A/352) (2014) 2. ITU-T: Shaping Smarter and more Sustainable Cities: Striving for Sustainable Development Goals. International Telecommunication Union Telecommunication Standardization Sector (ITU-T). Geneva: Switzerland (2016) 3. Giffinger, R., Fertner, C., Kramar, H., Kalasek, R., Pichler-Milanovic, N., Meijers’, E.: Smart Cities: Ranking of European Medium-sized Cities. Center of Regional Science (SRF), Vienna University of Technology. Graz, AG: Austria (2007) 4. Ibrahim, M., El-Zaart, A., Adams, C.: Paving the way to smart sustainable cities: transformation models and challenges. J. Inform. Syst. Technol. Manag. (JISTEM) 12(3), 559–576 (2015) 5. Ibrahim, M., El-Zaart, A., Adams, C.: Smart sustainable cities: a new perspective on transformation, roadmap, and framework concepts. In: The Fifth International Conference on Smart Cities, Systems, Devices and Technologies, pp. 8–14 (2016) 6. ISO/IEC: Smart Cities – Preliminary Report (2014). International Organization for Standardization and International Electrotechnical Commission (ISO/IEC). Genève: Switzerland (2015) 7. UNCTAD: Smart Cities and Infrastructure. United Nations Commission on Science and Technology for Development (UNCTAD). Budapest: Hungary (2016) 8. Harrison, C., Eckman, B., Hamilton, R., Hartswick, P., Kalagnanam, J., Paraszczak, J.: Foundations for smarter cities. IBM J. Res. Dev. 54(4), 1–16 (2010) 9. BSI: Smart City Framework – Guide to establishing Strategies for Smart Cities and Communities. British Standards Institute (BSI), BSI Standards Publication. London: United Kingdom (2014) 10. KPMG: Dubai – a New Paradigm for Smart Cities. KPMG. Dubai: UAE (2015)

476

M. Ibrahim

11. Escher Group: Five ICT Essential for Smart Cities. Escher Group, Ireland (2015) 12. Pincetl, S.: Cities as novel biomes: recognizing urban ecosystem services as anthropogenic. Front. Ecol. Evol. 3, 140 (2015) 13. Anderton, D.: Science in the city region: establishing liverpool’s life science ecology. Reg. Stud. Reg. Sci. Taylor Francis Online 3(1), 434–444 (2016) 14. Al-Hader, M., Rodzi, A.: The smart city infrastructure development & monitoring. Theor. Empirical Res. Urban Manag. 4(2), 87–94 (2009) 15. IEC: Orchestrating Infrastructure for Sustainable Smart Cities. International Electrotechnical Commission (IEC). Geneva: Switzerland (2014) 16. EIP-SCC: The European Innovation Partnership on Smart Cities and Communities – Strategic Implementation Plan. The European Innovation Partnership on Smart Cities and Communities (EIP-SCC). Brussels: Belgium (2015) 17. Soom, E.V.: Measuring levels of supply and demand for e-services and e-government: a toolkit for cities. The Interreg IVB North Sea Region Programme, Smart Cities Research Brief, 3. Brussels: Belgium (2009) 18. GOS: ICT for Everyone – A Digital Agenda for Sweden. Government Offices for Sweden (GOS), Ministry of Enterprise, Energy and Communication, Article N2011.19 2011/342/ITP (2011) 19. Ibrahim, M., El-Zaart, A., Adams, C.: Smart sustainable cities roadmap: readiness for transformation towards urban sustainability. Sustain. Cities Soc. 37, 530–540 (2018) 20. Ibrahim, M.: Developing smart sustainable cities: a validated transformation framework. In: 2019 International Conference on Smart Applications, Communications and Networking (SmartNets), IEEE Xplore, pp. 1–5 (2019) 21. Geyer, H.S.: International Handbook of Urban Policy: Issues in the Developed World. Edward Elgar Publishing Limited, Cheltenham 2 (2009) 22. Vedashree, R., Bose, M.: Integrated ICT and Geospatial Technologies: Framework for 100 Smart Cities Mission. NASSCOM Publications, International Youth Center. New Delhi: India (2015) 23. Atzori, L., Iera, A., Morabito, G.: The internet of things: a survey. J. Comput. Netw. 54, 2787–2805 (2010) 24. Li, Y., Lin, Y., Geertman, S.: The development of smart cities in china. In: 13th International Conference on Computers in Urban Planning and Urban Management, Cambridge MA, USA (2015) 25. ZTE: ZTE iCity Solution: Sharing Wisdom Enjoying Life. ZTE. Shenzhen: China (2014) 26. Ibrahim, M., El-Zaart, A., Adams, C.: Stakeholders engagement in smart sustainable cities: a proposed model. In: Sensors, Networks, Smart and Engineering (SENSET), 2017 International Conference on, IEEE Xplore, pp. 1–4 (2017) 27. Ibrahim, M., El-Zaart, A., Adams, C.: Theory of change for the transformation towards smart sustainable cities. In: Computer and Applications, 2017 International Conference on, IEEE Xplore, pp. 342–347 (2017)

Study to Reduce the Costs of International Trade Operations Through Container Traffic in a Smart Port Ouail El Imrani(B) Abdelmalek Essaadi University, Tetouan, Morocco [email protected]

Abstract. The Containerisation is considered a key component of the global maritime transport network and has experienced a significant growth rate in the maritime transport of containerised goods in recent years. In this context, optimising logistics costs is an important objective for any port authority in order to achieve port efficiency. This research work makes an important research contribution in this respect by considering the case of the port of Tangier Med in Morocco. The latter enjoys a certain advantage due to its unique geographical location, its proximity to other continents and its good connectivity with several countries and ports, which allows it to have favourable commercial opportunities. But at the same time, it faces high cost challenges due to operational inefficiency and this may lead to increased container logistics costs. In this context, this research work identified the main challenges faced by carriers in this port. Keywords: Competitiveness · International trade · Efficiency · Maritime traffic productivity

1 Introduction In this research work, the results and interpretations of the primary data collected and evaluated were represented. Several inter-variable tests: reliability, correlation, regression and ANOVA tests were performed based on primary data collected from employees of different shipping companies. Reliability tests were performed to show internal consistency between the variables in the study, while correlation, regression and ANOVA tests provided an analysis of the existing situation.

2 Container Port Performance The economic development of businesses in general is based on their ability to offer high quality goods and services. The project will deal with how digital transformation could act in favor of the prosperity of these companies; in other words, draw the attention of entrepreneurs to the impact of the digitalization of the economy on the future of businesses, thus the challenges of digitalization for the country’s economy, then the © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 477–488, 2021. https://doi.org/10.1007/978-3-030-66840-2_36

478

O. El Imrani

stages of the digital transformation of the climate of business, then the priority axes for a successful digital transformation of a company and finally a roadmap on the choice of a digitalization tool which will improve the busines within smart ports. In this context we will directly deal with the elements relating to the planning of flows which are fully digitalized. The container port performance is by and large categorized under three operational heads in the available literature, namely; seaside operations, terminal operations and landside operations which together determine the port’s performance efficiency. In this section, all the three operational sections related to container port performance are discussed by critically reviewing varied issues faced at each level and proposed solutions by differing research scholars. 2.1 Seaside Operations Carlo et al. (2013) in their study discussed the increasing significance of seaside operations owing to increasing volume of containers being managed at ports during the last 10–12 years. They proclaim that out of all the operation related issues, seaside operations act as a bottleneck and are faced by almost all ports across the globe. In the same year, Ruiz et al. (2013, p: 28) defined seaside operations as “Those arising in the quay area of a maritime container terminal and directly related to the service of container vessels”. Ruiz et al. (2013) further elaborate upon the activities undertaken as part of seaside operations of any port, while they were studying seaside operations efficiency of port Meisel. The varied activities are: • Stage 1- Allocation of berthing location in the quay on arrival of container based upon features like container measurements, storage products, container layout to name a few. • Stage 2- Allocation of cranes on the quay to the vessel based on container requirements and crane availability • Stage 3- Activities of shipping cargo and unburdening cargo from the container are undertaken Ruiz et al. (2013) in this particular study focused on two major issues that disrupt the efficiency of seaside operations, namely the Tactical Berth Allocation Problem (TBAP) and Quay Crane Scheduling Problem (QCSP). Berth allocation problems are related to the time required for a ship to wait and being handled (also known as stay time at the port for a ship). Such problems arise when ports fail to minimize the time required for such activities thus fail to consume optimal time for seaside operations (Gharehgozli et al. 2016; Ruiz et al. 2013). Quay Crane problems are related to those arising due to inappropriate scheduling. It is due to this problem more number of cranes is put to use and higher travel time by cranes is experienced. By resolving this problem, ports will be in a position to control the QC setups assigned to a particular container as well as optimize time required for travel by each QC (Gharehgozli et al. 2016; Ruiz et al. 2013). Ruiz et al. (2013) further claimed that these issues emerge primarily because of a lack of assimilation and interface between differing activities at seaside operations. As a solution, the study proposed Variable Neighborhood Searches (VNS) which is a simple

Study to Reduce the Costs of International Trade Operations

479

yet effective approach for overcoming prevailing issues. VNS is a metaheuristic approach which helps in optimising issues thereby enhancing planning of vessels at the port based upon time schedules. Hu (2010), like Ruiz et al. (2013), delineated that the issues in seaside operations basically arise due problems related to berth allotment and assignments of quay cranes. They also deduced that usually the issues of integration are behind such operational problems faced at the seaside. The proposed solution to this problem was the use of commercial operational software package which has abilities to integrate both the activities in a manner that they tend to perform optimally with a nominal number of shifts. The study also emphasized that such software needs to be integrated with the main system to derive better results. A pictorial representation of both the problems has been provided in the study by (Ambrosino and Tànfani, 2012) wherein a real case study of the Southern European Container Hub (SECH) terminal container, a medium-sized container terminal sited in the Port of Genoa, Italy was chosen to investigate about seaside operational problems. The study emphasized that not only the problems occurring at seaside operations, but also relationships among them should be stress upon. This research unlike other studies focuses on seaside operational problems in a comprehensive manner rather than treating them individually. In the research study by Gharehgozli et al. (2016), a detailed analysis of seaside operation problems was undertaken. This study too concluded that seaside operational problems were mainly related to berth allocation and quay cranes. Gharehgozli et al. (2016) proposed designing of effective stowage plan to overcome seaside operation problems. This will enable ports to enhance their efficiency by controlling and minimizing the time required for a ship to stay at a port, guarantee steadiness and comply with a maximum value of stress operations of a ship and capitalize on QC employment. 2.2 Terminal Operations With the globalization of industries and the world emerging as a single marketplace, trading activities across the globe have escalated beyond leaps and bounds. This has also resulted in an exponential increase in container traffic across differing ports. Kulak et al. (2008) notified that the already existing terminals are subjected to elevated container turnover. In order to handle this humongous amount of business, an array of new terminals is being opened. The ports are thus forced to enhance their overall terminal operations and logistics related performance to improve container port performance. Vacca et al. (2007, p: 3) defined container terminal as “a zone of the port where sea-freight dock on a berth and containers are loaded, unloaded and stored in a buffer area called a yard.” In this study, they demarcated a terminal into two major areas namely quayside and yard. Quayside is the section of the terminal that comprises of berths for differing container vessels and quay cranes which are mainly required for moving containers. The yard is mainly the section that is kept as an extra area for pursuing activities like of shipping cargo and unburdening cargo from the container and transshipping containers.

480

O. El Imrani

In a detailed study on container terminals and its operations, Günther and Kim (2007) developed a model which depicted varied stages involved in the effective planning of terminals to support effectual port performance. While developing this model, Günther and Kim (2007) determined that the most significant problem of terminal operations are those related to planning and controlling of logistics of these container terminals. They emphasized that when the planning goes wrong during preliminary stages of terminal designing, the overall operation would go haywire resulting in work inefficiencies and increased wastages. Hence designing effective terminals was considered to be an imperative to support proficient terminal operations. Some of the major issues identified by Günther and Kim (2007) related to terminal operations are those associated with allocations of berth, duty and split of cranes, planning and aligning of stowage, rules and regulations related to storage and stacking and planning and aligning of human resources. Stahlbock and Voß (2008) in their study agreed to the presence of such problems as highlighted by Günther and Kim (2007).The study proposed the deployment of integrated approaches like analytical approaches, simulation approaches and multi-agent approaches and advanced automation to overcome multiple problems occurring at terminal operations. They proclaimed that analytical approaches will enable ports in better decision-making regarding optimal utilization of storage spaces, minimizing the time required for berthing, supporting efficient usage of resources including manpower, control traffic clogging at terminals (both in and out) and also controlling waiting time taken by outside vehicles especially trucks. Through Simulation approaches, the ports will be in a position to determine the need for automation and other inefficiencies that can be improvised upon thereby enhancing port performances and optimizing costs. With regards to simulation approach, Park and Dragovi´c (2009) also accepted its importance in their study on Korean container terminals. They affirmed that simulation model wherein Arena has been employed led to successful results in forecasting real terminal operations. The overall outcome of such approach was a reduction in standard time spent by a ship at the port thereby enhancing overall terminal passing time, especially at Korean port terminals. The study, however, fails to confirm the universal applicability of this approach. The multi-agent approach as proposed by Stahlbock and Voß (2008) is based on the principle of continuous improvement to overcome inefficiencies arising at terminal operations thereby affecting overall port performance. Overall modernization and automation of differing aspects handling terminal operations were proposed in this study at the end to overcome terminal inefficiencies. This automation was mainly related to the information and communication systems and intelligent routeing and scheduling devices, to name a few. In another study by Bierwirth&Meisel (2010), they, however, stated that there were two major problems namely those related to allocations of berth and duty and split of cranes in their study. They specified that all other problems are related to these primary problems. Nonetheless, this study also accepted the need for automation and deployment of modern machinery and techniques for enhancing terminal operations thereby augmenting container port performance. In a study by (Nils Kemme, 2013), they argued that container terminals are challenging because they are simultaneously confronted to the restrictions as well as varied

Study to Reduce the Costs of International Trade Operations

481

demands of the stakeholders which lead to multiple performance indicators for container terminals. They explicated their point by stating that while staff looks for work security, residents demand lower pollution; the authorities necessitate law compliance, truckers want minimum processing times; shipping lines favor shorter and flexible vessel turnaround times and economy in loading, discharging and storage of containers whereas shareholders are generally interested in a high shareholder value. They however concluded that the storage function of a container terminal is of utmost importance while determining its operational efficiency. Modernization has been a primary solution for enhancing container port performances by dealing with terminal operations. This was also established in a recent study by Chen et al. (2015) wherein they put forward use of technology like GPS tracking and maritime open data rather than depending on data collected by hand. Through an integrated system wherein data is collected from across the globe annually and the movement of ships is tracked accurately such issues of inefficient terminal operations can be controlled. 2.3 Landside Operations Out of the three significant operational areas of a container port, management of landside operations is equally crucial for ensuring higher levels of port performance as the landside operations encompass the loading and unloading of ships. The cargo initially passes through the landside operations for loading into the ship and after unloading of containers from ships, the landside operations again come into the picture (Steenken et al., 2004). The containers are loaded and unloaded from trucks or trains, as the case may be. Brinkmann (2011, p: 25) defined landside operations as those “including the gate, parking, office buildings, customs facilities, container freight station with an area for stuffing and stripping, empty container storage, container maintenance and repair area etc.”. The operational efficiency at the landside is primarily affected by three significant resources: storage space at the yard, trucks and cranes at the yard. Further, Steenken et al. (2004) claim that the success of landside operations is evaluated on the basis of overall landside transportation system; if it has been aligned with loading and unloading schedules. In a paper presented at a conference by Ng & Ge (2006), they establish scheduling as one of the most intricate and complex issues that affect operational efficiency to a considerable extent for landside operations thus, necessitating overcoming issues primarily related to arrangement and time schedules of trucks and yard cranes. This will inherently assist ports to overcome issues affecting the optimization of performance at the container terminals. They proposed projected and planned fuzzy heuristic approach to surpass scheduling issues thus escalating port performance. While studying the landside operational issues at the Port Botany terminal in Sydney, Froyland et al. (2008), delineated that through time scale decomposition wherein long term planning is executed based upon real-time data, ports and container lines would be in a position to get rid of landside operational issues. The major issues identified in this study wherein this solution can be implemented are crane arrangements and schedules, problems arising due to the inappropriate control of container stacking for a

482

O. El Imrani

short period of time and inappropriate allotment of delivery sites for trucks and additional transporters. In a recent study by Joerss et al. (2016) for McKinsey, they laid stress upon forming strategic alliances for ensuring higher levels of operational efficiency at landside of container ports terminals. Through these alliances, container lines are in a position to rule out hurdles faced due to operational intricacies at the landside and thus enhancing the landside operational efficiencies. Such alliances further would ensure higher levels of economies of scale thereby swelling the scope and minimizing costs for operating at oceans. They elaborated that strategic alliances would aid in dealing with increased trucking at the terminals and delivery problems arising due to increasing work pressures during peak seasons as well as in handling problems like lack of control of shipping lines on truck. Joerss et al. (2016) justified their point with the example of CKYHE which has been considered to be the largest coalition in the trade route of Asia–to–North America. Irrespective of the parties forming the alliance being of diminutive size as compared to the standard magnitude for the significant lines, CKYHE has benefitted to a large extent from such alliance.

3 Research Methodology For the statistical analysis of the quantitative aspect of the study, statistical precision must be taken into account. Statistical precision shows the narrow margin of error in the study. For greater accuracy, a larger sample size is required. However, the accuracy increases slowly due to the square root of n in the denominator of the formula. Thus, to reduce the margin of error by half, the sample size would have to be multiplied by four. The margin of error is also influenced by the level of significance or confidence, but this tends to remain fixed in a field of study. Inputs are the assumed or estimated value of the proportion, the desired level of confidence, the desired precision of the estimate and the size of the population for limited population sizes. The desired precision of the estimate (also sometimes referred to as a permissible or acceptable error in the estimate) is half the width of the desired confidence interval. Statisticians use a confidence interval to express the degree of uncertainty associated with a sample statistic. A 99% confidence interval will be wider than a 95% confidence interval or less accurate. Thus, in this study, a 95% accuracy was taken into account. A 95% confidence interval has a 0,95 probability of containing populations, i.e. 95% of the population distribution is contained within the confidence interval. 3.1 Sampling Plan A sampling plan is a simple format that indicates what action should be taken, in which study, at what time, in which way and by whom (Rajasekar et al. 2006). Sampling plans should contain such a sample size that can provide the result as a representative of the entire population. A sampling method can be divided into two parts; Probabilistic and non-probability sampling method. The probability sampling method can be divided into simple random, stratified, clustered, systemic and multi-stage sampling methods (Yin 2009). These are mainly used for the analysis of primary or quantitative data while the

Study to Reduce the Costs of International Trade Operations

483

non-probability sampling method is used for the analysis of secondary or qualitative data. The target population considered for this study was: Chief Executive Officer, Department Head, Agents and Coordinators, as it was perceived that these employees will have a better quality of response to the type of questionnaire administered. Thus, in this study, a simple randomized study was conducted among employees of designated transmission line companies in 6 different organizations, namely Shipping companies, carriers, port operator, port authority, freight forwarder and ship at the port of Tangier Med. 3.2 Data Analysis Procedure Quantitative data from 50 port employeesat the Tanger med port were analyzed using the statistical tools such as; SPSS software and Microsoft Excel. Descriptive analysis was done for both demographic and the port performance and service factors which included frequencies test, mean variance and standard deviation. Further, inferential analysis was done which included normality test, reliability test, regression, correlation and ANOVA. The formula used for sampling the population size; ss = (Z (2∗) (p)(1 − p))/C 2

(1)

Where, Z = Z value (95% confidence level) p = percentage (.5 used for sample size needed) c = confidence interval, expressed as decimal Means, Medians and Standard Deviation are used in frequency distribution statistical. Their formulas are; Mean; M =

x1 + x2 + . . . + xN N

(2)

Where, M = Arithmetic mean X = the values or variables N = No. of variables or values Median; Median = {(n + 1)/2}

(3)

Where, n = no. of variables Standard Deviation;   N 1  σ = (xi − μ)2 N i−1

(4)

484

O. El Imrani

Where, σ = Standard Deviation μ = Mean of all variables xi = individual values N = no. of variables The formula that is used in analysis of Pearson correlation is:    N xy − ( x)( y) r =   2    2  N x2 − N y2 − x y

(5)

Where, N = number of pairs of scores xy = sum of products of paired scores x = Sum of x scores y = Sum of y scores For elements testing Regression and ANOVA test is done. The formulas used are; Regression;   2  

y xy y x − (6) a=   2 x n x2 −   

n xy − y y b= (7)   2 x n x2 − Where, a and b = equations to find relations between variables X = Independent variable Y = Dependent variable (this is followed by) y2 = a + bx

(8)

(This formula gives the result of regression analysis) ANOVA; The main formula is; F=

MST MSE

(9)

Where, F = Anova Coefficient MST = Mean sum of squares due to treatment MSE = Mean sum of squares due to error. MST is given by;  MST =

n(x − n˙ ) p−1

(10)

Study to Reduce the Costs of International Trade Operations

p = Total number of populations n = Total number of samples in a population MSE is given by;  (n − 1)S MSE = N−p

485

(11)

S = Standard deviation of the samples N = Total number of observations

4 Data Analysis and Interpretations-Primary Data The Primary data analysis in the present section will present the Port performance as perceived by the employees. The importance of Cronbach’s Alpha Reliability test is to find the internal consistency among the variables in the study. It is also done to check the inter-relationship among the variables of how close the variables are related to each other (Tavakol and Dennick 2011). However, high reliability does not show the unilaterality of the variable and is a simply a test to find the consistent measure of a theory. Moreover, another assumption test was done to find the validity of the data that was collected and expected. Normality test of assumption was performed to evaluate the fact that the data set found has been collected from a normal distribution (Rochon et al. 2012). While performing normality test it is required to check if the variables shows statistic test of Shapiro Wilk’s (W) lesser than “1”, which can be interpreted in the sense that the observed distribution does not fit the normal distribution (Kieser & Rochon 2011). It is also required to check that when “W” value is small enough, p-value should also be less than 0.05 (5% level of significance) (Ghasemi and Zahediasl 2012). 4.1 Normality Test The Table 1 shows the Shapiro-Wilk normality test for the respondents working at the Tanger med port. The variables used for the normality test shows that all other port operational factor are not normal. This is because the p-value for Technical (0.015), Organisation (0.003) and Governance (0.003); is less than 0.05 and so they do not show a normal distribution. Thus, from the normality test it can be assumed that the data set collected does not have a normal distribution. The plotted graphical representation and histogram representation of the abnormality of the data set has been represented in the Appendix II. 4.2 Reliability Reliability test was done to check the stability and consistency among the Technical, Organizational and Governmental factors that influence the port operations. It was undertaken to determine the consistency of the data collected during the survey. The Cronbach’s alpha coefficients as represented in the Table 2 was found to be above 0.6; (0.887) for Technical (0.890) for Organization and (0.847) for Government, thus showing relatively high consistency. Thus, it can be interpreted that the factors are highly stable and consistent.

486

O. El Imrani Table 1. Shapiro-Wilk Normality test for the data set collected from the port employee Shapiro-Wilk Statistic df

Sig.

Level of Service

0.864

50

0.000

Technical

0.941

50

0.015

Organisation 0.924

50

0.003

Governance

50

0.003

0.922

Table 2. Co-efficient of Cronbach’s Alpha Reliability test of level of services at the Tanger med port Cronbach’s Reliability Test Factors

Cronbach’s Alpha No. of Items

Technical

0.887

10

Organization 0.890

12

Government 0.847

5

References Abd El-nasser Said, G.A., Mahmoud, A.M., El-horbaty, E.M.: solving container terminals problems using computer-based modeling, Ain shams university Akyeampong, E.K., Gates, H.I.: Dictionary of African Biography. Oxford University Press (2012) Alphaliner Alphaliner - Top 100 : Operated fleets as per 12 October 2010, Alphaliner-TOP 100 (2017) Ambrosino, D., Tànfani, E.: ‘An integrated simulation and optimization approach for seaside terminal operations’. In: Proceedings 26th European Conference on Modelling and Simulation (2012) Amineh, A., Khaddam, H.J., Irtaimeh, Basema, S.B.: The effect of supply chain management on competitive advantage: the mediating role of information technology. Uncertain Supply Chain Management, vol. 8, pp. 547–562. Growing Science (2020) Brinkmann, B.: Operations systems of container terminals: a compendious overview. In: Böse, J.W. (ed.) Handbook of Terminal Planning, Spring (2011) Bourekkadi, S., Imrani, E.L., Kandili, O., Slimani, M.E.L., Khoulji, S., Babounia, A.: Intelligent solution based on information technologies - the correct value of the business in economic organization isintangible asset. In: Proceedings of the 33rd International Business Information Caldeirinha, V., Felício, J.A., Dionísio, A.: Effect of the Container Terminal Characteristics on Performance. Évora University (UÉ) CEFAGE-UE, CEFAGE-UÉ (2013) Gogor, A., et al.: Strategic fit implication of technological innovation capabilities for SMEs with new product development, Management Science Letters, vol. 10, pp. 2875–2882. Growing Science (2020) Elferjani, A.: Examination of port performance in a developing economy : a case study of Libyan ports. RMIT University (2015)

Study to Reduce the Costs of International Trade Operations

487

El Imrani, O., Aziz, B.: “Tangier med port: what role for the moroccan economy and the international trade?”. Int. J. Res. Manage. Appl. Econ. (MaLoGEA), 384(2), 73–81 (2016) El Imrani, O., Babounia, A.: Benchmark and competitive analysis of port performances model: algeciras bay, rotterdam, new york-new jersey and tangier med. Int. J. Res. Manage. Appl. Econ. (MaLoGEA) 384(2), 36–49 (2018) Esmer, Soner: Performance measurements of container terminal operations. Dokuz Eylül Üniversitesi Sosyal Bilimler Enstitüsü Dergisi Cilt 10, 1 (2008) Esmer, S.: Performance Measurements of Container Terminal Operations, Dokuz Eylül Üniversitesi Sosyal Bilimler Enstitüsü Dergisi Cilt, 10(1) (2008) Froyland, G., Schwalb, M., Padberg, K., Dellnitz, M.: A transfer operator based numerical investigation of coherent structures in three-dimensional Southern Ocean circulation. In: Proceedings of the International Symposium on Nonlinear Theory and its Applications (NOLTA 2008), Budapest, Hungary, pp. 313–316 (2008) Gharehgozli, A., Roy, D., de Koster, R.: Sea container terminals: new technologies and OR models. Marit. Econo. Logist. 18, 103–140 (2016). https://doi.org/10.1057/mel.2015.3 Ghasemi, Asghar, Zahediasl, Saleh: Normality tests for statistical analysis: a guide for nonstatisticians. Int. J. Endocrinol. Metabolism 10(2), 486–489 (2012) Günther, H.O., Kim, K.H.: Container terminals and automated transport systems: logistics control issues and quantitative decision support. Springer, Heidelberg (2005) Hu W.: A study of consumer’s risk percetpion on food safety. Ph.D. thesis, Zhejiang University, China (2010). (in Chinese). Joerss, M., Schröder, J., Neuhaus, F., Klink, C., Mann, F.: Parcel Delivery. The Future of Last Mile, McKinsey & Company (2016) Kemme, N.: Design and Operation of Autamated Container Storage System, Springer, Heidelberg, New York, Dordrecht London (2013). https://doi.org/10.1007/978-3-7908-2151-2 Kieser, M., Rochon, J.: A closer look at the effect of preliminary goodness-of-fit testing for normality for the one-sample t-test. Br. J. Math. Stat. Psychol. 64(3), 410–26 (2011). https:// doi.org/10.1348/2044-8317.002003 Laissaoui, M.A., Imrani, E.L., Babounia, O.A.: Benchmaking in logistics: literature Review. Int. J. Emerging Technol. 11(4), 225–232 (2020) Layti, M.B.M., El Imrani, O., Medouri, A., Rajaa, M.: Logistics information systems and traceability of pharmaceutical products in public hospitals in morocco: what solutions to improve the supply chain? In: International Conference on Advanced Intelligent Systems for Sustainable Development, pp. 429–438. Springer, Cham (2019) Park, N.-K., Dragovic, B.: A study of container terminal planning. FME Trans. 37, 203–209 (2009a) Rajasekar, S., Philominathan, P., Chinnathambi, V.: Research methodology. Methods, 23, 531 (2006) Rochon, J., Gondan, M., Kieser, M.: To test or not to test: preliminary assessment of normality when comparing two independent samples. BMC Med. Res. Methodol. 12(1), 81 (2012) Ruiz-Torres, A.J., Mahmoodi, F., Zeng, A.Z.: Supplier selection model with contingency planning for supplier failures. Comput. Ind. Eng. 66, 374–382 (2013) Steenken, D., Voss, S.: Stahlbock, R: Container terminal operation and operations research - a classification and literature review. OR Spectr. 26, 3–49 (2004) Tavakol, M., Dennick, R.: Making sense of cronbach’s alpha. Int. J. Med. Educ. 2, 53–55 (2011) Tongzon, J.L.: Determinants of port performance and efficiency. Transport. Res. Part A 29(3), 245–252 (1995) Tseng, Y., Wen, L.Y., Michael, A.P.T.: The role of transportation in logistics chain. In: Proceedings of the Eastern Asia Society for Transportation Studies 5, 1657–1672 (2005) Park, N.-K., Dragovic, B.: A study of container terminal planning. FME Trans. 37, 203–209 (2009b)

488

O. El Imrani

Port Finance International (2014) Morocco unlocks port investment opportunities – PFI Morocco conference, Port Finance International Vacca, I., Bierlaire, M., Salani, M.: Optimization at container terminals: status, trends and perspectives. In: Swiss Transport Research Conference. Ascona: Swiss Transport Research Conference, p. 21 (2007) Wu, J.: The development of port and the container transport chain -a case study of Tianjin Port. University of Gavle (2011) Yin, R.K.: Case Study Research: Design and Methods, 4th edn., Sage, Thousand Oaks (2009). https://doi.org/10.33524/cjar.v14i1.73

The Global Performance of a Service Supply Chain: A Simulation-Optimization Under Arena Badr Bentalha1(B)

, Aziz Hmioui1 , and Lhoussaine Alla2

1 National School of Business and Management Sidi Mohammed, Ben Abdellah University,

Fez, Morocco [email protected], [email protected] 2 National School of Applied Sciences, Sidi Mohammed Ben Abdellah University, Fez, Morocco [email protected]

Abstract. The service supply chain is a set of partners linked to service activities. It aims to achieve operational efficiency and organizational excellence through the fluid management of different resources and skills. This chain requires different costs and involves several processes. Knowing and evaluating the different costs of a service supply chain will allow to better manage the performance of the supply chain and subsequently improve the overall performance of the service company. We have established a conceptual basis for the different costs of a pharmaceutical distribution company. The proposed model allowed us to synthesize the different costs of a supply chain service. Furthermore, through a simulation-optimization, with Arena simulation, we improved the operating scheme of this service chain. The optimization was carried out according to several scenarios and allowed us to improve three main performance indicators: the average response time per hour, the average response rate and the cycle time per hour. Keywords: Supply chain service · Service logistics · Arena · Optimization · Simulation · Logistics costs · Global performance

1 Introduction Initially designed for industry, logistics management methods are no longer being opened up to services, given the growing weight of the sector in the world economy, both in industrialized and low-income countries. Indeed, the sector mobilizes around 70% of the working population and nearly 75% of the world’s market activities [1]. Service management has several intrinsic specificities and therefore requires in-depth reflection on the sector and on the best practices to adopt in supply chain management, within the framework of service supply chain management (SSCM). There is a dearth of research on SSCM in contrast to the abundance of studies on Manufacturing Supply Chain [2]. SSCM appears to be a dual concept combining traditional supply operations and activities to coordinate various resources in order to achieve customer satisfaction, while integrating constraints of time, capacity, shared resources and co-production. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 489–502, 2021. https://doi.org/10.1007/978-3-030-66840-2_37

490

B. Bentalha et al.

Faced with the investment and operating costs involved, companies are more than ever concerned to constantly assess the impact of logistics processes on performance, particularly at the commercial level. Indeed, logistics costs remain a major component of product and service costs, and thus indirectly influence sales prices. This field of research is increasingly coveted by many authors [3–6], as an object and a performance lever. Performance is no longer measured simply in terms of quality, time and cost, but also in terms of responsiveness, agility, efficiency and positive externalities in the territories. Meeting the challenge of reducing costs and simultaneously increasing customer value on a global scale therefore requires a radically different approach from responding exclusively to the market. Service logistics management has to deal with the integration of activities, the complexity of flows and the constraints of optimization. It is therefore a necessary factor in controlling the density of supply and demand through cost and price parameters. These optimizations seem to be more complex in service supply chains. Hence the following problematic of this research: How can the overall cost of the Supply chain management (SCM) process in the service enterprise be optimized? The purpose of this article is to estimate the overall cost of the SSCM process and to measure its impact on the final price of the service. To address this issue, we will proceed to an empirical modeling based on a case study of a service company located in Morocco. It is therefore necessary to assess through this modelling the impact of SSCM on the performance of the company under study. After a conceptual approach of SCM and SSCM, we consider a theoretical framework for modelling the costs of an SSCM. Finally, we will present a real example of a simulation of an SSCM through a case study.

2 Conceptual Framework 2.1 From Supply Chain to Supply Chain Management The logistics chain includes all the operations carried out to manufacture a product or service, from the extraction of the raw material to the delivery to the end customer, including the processing, storage and distribution stages. In addition to the flow of materials, the supply chain involves the flow of information and financial flows. [7] give an operational view of supply chains: “a network of facilities that perform the functions of sourcing raw materials, processing these raw materials into components and then into finished products, and distributing the finished product to the customer”. [8] state that “the supply chain is a global network of organizations that cooperate to improve the flow of materials and information between suppliers and customers at the lowest cost and at the highest speed. The goal of the supply chain is customer satisfaction”. This definition inspires that the supply chain encompasses independent partners with a single global strategy. The vertical structure refers to the number of suppliers and customers in each link. Thus, a distinction can be made [9]: – Product-related supply chains. – Local supply chains: These are the small company supply chains through which product flows. They are local (corresponding to a focal company) and are, for example, made up of the different workshops of the same factory. These can be considered as customers and suppliers of each other.

The Global Performance of a Service Supply Chain

491

– Intra-organizational supply chains: large companies with sites located in different countries. However, this term can be extended to companies with several locations in one country. – Inter-organizational supply chains: These include at least two independent companies. – International supply chains: These are supply chains where one or more organizational units operate in different countries. In modern business management, sole proprietorships cannot compete as independent entities, but rather as active members of the broader supply chain involving a network of enterprises and multiple relationships [10]. As such, supply chains operate in an ever-changing environment and are vulnerable to a myriad of risks at all levels [11]. SCM emerged in the 1980s and became widely promoted in the 1990s. It is difficult to identify and advocate for a single, standardized definition of SCM. Indeed, as a unifying concept, its approach is close to a management philosophy or vision of the networked enterprise that aims to reduce costs and improve the quality of service to the consumer. [12] define it as “the systemic, strategic coordination of traditional operational functions and their respective tactics within a single company and between partners in the supply chain, with the aim of improving the long-term performance of each member company and the entire chain”. The management of a supply chain must be analyzed at two distinct levels: on the one hand, strategic management consisting of forecasting demand (final or intermediate), and planning medium- and long-term resource requirements. On the other hand, operational management, which consists of constantly adapting the scheduling of activities according to unforeseen events and hazards [13]. Despite its relative newness, the concept of SCM has been the subject of a rich literature. Numerous literature reviews have been conducted to examine trends in the publication, methods and theories of SCM [14]. According to [15], it could be said that there is at least a consensus that SCM is an evolving discipline or branch of knowledge, and it is natural that, in the process, researchers would disagree on the meaning of the term SCM and its applications. While the concept of SCM has emerged and flourished in the context of industry, at both the academic and practitioner levels, there is a wealth of research addressing the issue in the services field. 2.2 From Supply Chain Management to Service Supply Chain Management [16] define the Service Supply Chain (SSC) as a network of suppliers, service providers, consumers and other support units that perform the functions of transacting the resources required to produce services, transforming these resources into core or support services, and delivering these services to customers. Thus, a service supply chain represents an institutionalized configuration of one or more service providers engaging with one or more service customers for a common purpose [17]. SSCM is the management of information, processes and resources throughout the service supply chain to the delivery services or products effectively served to customers [18]. The structure of SSC has some similarities to that of the product supply chain, as services are created, purchased and transferred from one element to another in a chain form [19]. The structure of the SSC is a network complex, combining direct or indirect service around the service integrator [20]. The structural difference in a service supply

492

B. Bentalha et al.

chain is essentially due to the unique characteristics of services, which distinguish them from goods. These differences also change the nature of service operations in practice. The main distinguishing feature of services is intangibility. Services cannot be seen, touched, smelled or tasted. This intangibility of services is the main reason why a number of logistics activities cannot be applied to service supply chains. In a service supply chain, it is not inherently possible to ensure the physical delivery of the service from the supplier to the producer and then to the consumer. Simultaneity reflects the fact that customers must be present for the service to be provided. In a service environment, the customer generally contributes to the production process, and once production is achieved, it is followed by instantaneous consumption in a simultaneous manner. Heterogeneity takes into account the fact that it is not easy to standardize services. Each client experiences a different service each time he or she receives it, depending on his or her perceptions, mood and service atmosphere. This is one of the main reasons for the complexity of planning and analyzing service production and measurement. Services are perishable and if a service is not consumed when it is available, there is no chance of storing it for future use. Unused capacity is lost forever. This characteristic makes it impossible to store services in a warehouse, which means that the storage function is completely inapplicable in service supply chains. Finally, service industries are labor-intensive. Thus, the impact of the human aspect in service operations is remarkable, accompanying the complexity it creates. In the service sector, supply chain management focuses on dyadic customer-supplier relationships rather than the unidirectional movement of physical goods. The services supply chain indicates that service providers have relationships with other service companies that contribute to customer satisfaction. It is important to note that the performance of each firm depends on the activities and performance of other firms and therefore research studies should move from dyadic business relationships to triads and commercial networks. In services, the customer-supplier duality implies that production flows not only from suppliers to customers, but also from customers to suppliers. As a result, production flow is bi-directional, which is a key factor in linking traditional supply chain concepts to the realities of the service process. The simplest form of a bi-directional supply chain is for customers to provide their inputs to the service provider, who converts the input into an output that is delivered to customers [21]. Thus, several operations are concomitant in an SSC. In the SSC model [16] there are seven main activities (Fig. 1): – – – – – – –

Demand management; Capacity and resource management; Customer relationship management; Supplier relationship management; Order process management; Service performance management; And information and technology management.

The Global Performance of a Service Supply Chain

493

Fig. 1. The IUE-SSC model [16]

In a service supply chain, it is important that each member of the supply chain cares for and works with the other members of the system [22]. In fact, the key spirit of modern supply chain management, which differentiates it from more traditional logistics management, is the emphasis on coordination and collaboration among supply chain members [23]. Coordination is therefore essential in the supply chain management of services [24]. Adapting the supply chain approach to different service sectors is therefore essential. To help supply chain actors meet customer demand at the best conditions, the SSC must offer several tools and levers for action in the service of operational excellence through cost control.

3 SSCM Cost Estimation Model 3.1 General Principles of Supply Chain Management Service Modelling The problem at issue here is the modelling-simulation of the overall cost of an SCM, to inform the relevance of operational or tactical logistics management decisions in the context of a service activity. [25] the founder of modelling, defines this methodology as an approach to presenting company policies and as a tool to help solve top management problems. It schematizes the management of the mutation of systems in which flows circulate and their modifications are represented by differential equations. The design of a dynamic Foresterian-type system model consists in defining the boundary between the system and its environment.

494

B. Bentalha et al.

In this sense, several SCM modeling works are already published. First, [26] analyzed a survey of supply chain management in two small manufacturing firms in Campo Limpo. The objective of the authors was to create the conditions for strategic decision making to prioritize the implementation of SCM in small firms and to achieve agility, flexibility and cost reduction for the operating system through supply chain cost modeling. [27] has also chosen SCM modeling specifically for pharmaceutical distribution. He presented extensions of the inventory optimization model of [28] by complementing the vertical cooperation between the supplier and its customers with a horizontal alliance and an exchange of information between customers. [29] developed a simulation model using a combination of ARENA and OptQuest. The multi-scale system studied is subject to several assumptions, such as: stochastic demand and capacity, deterministic supply time. [30] estimated supply chain expenditures in the cost of the final product. The authors evaluated the main supply chain processes and their components through the relationship between supply chain expenditures and the price of the final product, classifying supply chain costs and minimizing them as an assumption for final price competitiveness. For [31] future research directions include developing models for forecasting service demand, combining time series and causal methods, developing models for service resource planning, examining the relationships between service provision, service capacity and service resources, and strengthening quantitative assessment of supply chain performance of products and services from a systems perspective. Our analytical model (Fig. 2) is based on that of [32]. This choice is justified by the authors’ interest in the integration and transversely of logistics activities in order to better account for the requirements of overall performance.

Fig. 2. Estimate logistics costs [32]

3.2 Proposal for Modelling the Cost of Supply Chain Management Services Our objective is therefore to determine the cost impact of supply chain processes on the entire operations of the service company. For example, we used a case study example of a Moroccan drug distribution company to model the cost determination process of the SSCM. The pharmaceutical supply chain must make it possible to make the products administered to patients as efficiently as possible, in conditions that guarantee safety

The Global Performance of a Service Supply Chain

495

and traceability while complying with the many regulations surrounding pharmaceutical products and their dispensing [33]. We conducted interviews with those responsible for this drug supply chain and modelled the chain according to the service chain flow diagram (Fig. 3).

Fig. 3. Scheme of operation of an SSCM (Authors)

While being inspired by the above model and taking into account the specificities of the logistic process of the selected company (urgency, perishability, availability, etc.), we propose the following simulation model: TCSSCM = CS + CRC + CSW + COP + CIM + TC With: TCSSCM = Total Cost of Service Supply Chain Management CS = Cost of supply CRC = Cost of reception and control CSW = Cost of storage and warehousing COP = Cost of Order Processing CIM = Cost of Information Management TC = Transportation Cost The components of this equation make the task of achieving effective supply chain cost management difficult. Indeed, the total cost must be optimized, taking into account the service level defined for the company, based on the approach chosen by the company. We have broken down the various cost parameters of SSCM in Table 1: It therefore appears that the overall cost of SSCM remains multi-dimensional, including different parameters corresponding to the different phases of the reference company’s logistics process. An integrated approach to these costs is therefore necessary to develop the competitiveness and therefore the performance of the company. However, the relevance of such a model requires a significant weighting of the different cost parameters.

496

B. Bentalha et al. Table 1. Decomposition of SSCM costs (Authors)

Elements

Criteria

CS

Cost of supplier selection Other procurement costs

CRC

Delivery handling costs Cost of control (internal/external)

CSW

Cost of stock ownership (insurance, security, rent, depreciation, labour,…)

COP

Cost of placing supplier orders Supplier order tracking cost Cost of preparing customer orders

CIM

Cost of managing a client file Supplier file management cost Amortization cost (Plant, equipment, material and software…) Operating costs (Insurance and IT security)

TC

Cost of transport on purchases Cost of transport on sales

4 Service Supply Chain Modeling and Optimization with Arena 4.1 Simulation and Optimization of the Drug Supply Chain with Arena The growing importance of SCM is encouraging research to find techniques to better analyze it. Modelling and simulation are among these tools [34]. The choice of simulation is justified when we seek to reproduce the operation of a system, a logistics distribution network, and to test scenarios to propose possible improvements. The main advantages of simulation are flexibility, speed and low cost. In addition, simulation has become for years a very good decision support solution [35]. Simulation offers the possibility to study the behavior of complex phenomena. It takes into account stochastic variables and their evolution over time [36]. It also offers the possibility of studying scenarios by varying the simulation parameters. Simulation and optimization are two powerful tools that are widely used in a wide range of industrial and engineering applications. On the one hand, simulation refers to the reproduction of real-world processes or systems over time [37], while optimization seeks to find the best (solution) of a given solution space with respect to certain criteria [38]. Arena is a powerful discrete-event simulation software. Systems are described from the perspective of entities that are traversed by flows using available resources. ARENA models are structured hierarchically and modularly. Elementary modeling components, called modules, can be selected from model panels, such as Basic Process, Advanced Process, and Transfer Process, and placed on a canvas being built [39]. ARENA uses a hierarchical architecture for simulation modeling, i.e., modules are defined using other modules. ARENA makes it possible to build a model by proposing more or less detailed representation primitives (later called blocks or modules). It also allows the creation of graphic animations to visualize the behavior of the model during the simulation. The

The Global Performance of a Service Supply Chain

497

blocks are grouped in different libraries (templates). ARENA offers a great flexibility of use and combination of objects. It has more than 5000 complex animation objects included in the Arena animation library. It also allows for data compatibility and unlike other tools that use proprietary scripting languages, Arena uses a standard VBA editor and the Arena object model to build custom user interfaces and custom data interfaces to Arena models. As far as model building and execution is concerned, a collection of panels is provided. We have performed two simulations. The first one is performed with initial data from the interviews conducted. The second simulation is done with the integration of the desired optimizations after several scenarios. We have presented only the optimized version of the model. This optimization concerned: – – – –

Reduction of procurement lead times for suppliers and distribution center; Increasing the size of lots purchased; Introducing two sales lines in each distributor; Centralization of purchasing and delivery functions.

The model was simulated on a one-year horizon. Stock levels are monitored in real time by the logistics service provider and the supplier concerned. The logistics service provider still sends a replenishment order to the vendor if necessary, to confirm the requested quantity. The set of flow charts made it possible to depict the important business processes of the multi-step distribution system from the placing of orders to their dispatch. The simulation tool will be used to analyze the different management strategies and network configurations. The stock is kept in a central distribution center (DC) from where it can be moved or stored. Assuming that there are only two stores in the DC and that there is only one product in the system. Each customer orders only one unit. Both the DC and the stores work 7 days a week and 12 h a day. The first model relates to the sales operation in the retailer’s store. The distribution process is triggered by an order from the customer at retailer level. At the beginning, the customer unit allows the generation of customer demand throughout the simulation. The number of customers that will arrive at store A per hour is based on a fish distribution of 10. The number of customers that will arrive at store B per hour is based on a fish distribution of 4. Then, for each customer demand, the value of the quantity ordered is assigned by the demand variable. When the demand for each product is defined, the retail stock check is done by the Stock Control variable. If the retailer has sufficient stock, the customer demand is met, otherwise a replenishment order is triggered in the block Launch Distributor Order (Fig. 4). Otherwise a purchase order is automatically initiated in DC. If the demand is satisfied, the delivered quantity must be deducted from the stock amount (inv_store [Type] - Demand). The replenishment lead time for a purchase order from the vendor to the DC is based on an even distribution of 3 days. The replenishment lead time for a purchase order from DC to either of the stores is based on an even distribution of (4 to 24) h.

498

B. Bentalha et al.

Fig. 4. Distributor model (Authors)

At the level of the central distributor, an entity is generated to trigger the arrival of retail orders in the module Arrival of Retail Orders. This entity allows the creation of supply orders for the products requested by retailers. Once the supply order is created, the stock check process (the StockDistributor variable) is activated. If the stock is sufficient, the retailer order is fulfilled after a delivery time ensured by the block Delayed Supply Distributor (follow normal law), otherwise the quantity not fulfilled is reserved to be fulfilled in the next replenishment (Reserve Unsatisfied Quantity Distributor). The process continues in case there is sufficient stock by updating the retailer stock, otherwise an order is launched at the supplier level (Launch Supplier Order) (Fig. 5).

Fig. 5. Central distributor model (DC) (Authors)

In the vendor segment, the procurement process is triggered by the receipt of distributor orders, and since the vendor has no capacity restrictions, the purchase order goes through the Vendor lead time block representing the vendor’s lead time. Then a check of the distributor stock is made through the variable StockDistributor, to update its inventory (Fig. 6).

The Global Performance of a Service Supply Chain

499

Fig. 6. Supplier model (Authors)

4.2 Results and Discussions The performance indicators taken into account by this simulation model are: – Satisfaction rate: Rate of orders fulfilled within the deadlines. – The Average Response Time: Response time to a sales order – Cycle time between retailer and DC The four proposals led to a significant improvement in the three selected indicators Table 2. The results we have obtained coincide with [34] and [35]. Indeed, the response rate allows a reduction in inventory, which improves the performance of the supply chain. The indicators used are to be compared with the performance indicators taken into account by the simulation model of [9]. The difference in performance observed may be due to the difference between the two models or to the divergence of periods and study horizons. In this sense, the model we have proposed in this study places more emphasis on managerial variables and seems to be more explicit and comprehensive than other works. Also, the model obtained seems to be adaptable to different contexts and can be adapted to several future circumstances. Table 2. Decomposition of SSCM costs (Authors) Indicators

Average response time per hour

Average response rate

Cycle time per hour

Averages before optimization

0, 7

67, 52%

3, 2

Averages after optimization

0, 4

77, 41%

2, 8

We find that the logistics costs of the drug distribution company are strongly determined by the costs of the chain and the organization of the SSC. In this sense, there are several measures to reduce the costs of the SSCM: using Electronic Data Interchange (EDI) and new Big-Data technologies to reduce transaction costs and cycle time while improving predictive and instantaneous analysis of information. Use new forecasting

500

B. Bentalha et al.

and planning tools to centralize this information. It is essential to have a database that is both fast and comprehensive. Also, it is necessary to produce according to a production schedule that aims to optimize the balance between profit and customer service. As a result, there are several integrated productions, inventory planning, customer service, distribution and transportation functions to improve the feasibility of information, reduce inventory and improve service. In this sense, alliances and collaborative chains are an opportunity for service chain optimization. Centralizing SSCM support functions, such as central purchasing, to achieve economies of scale, reduce staff and increase productivity reduces transaction costs. The application of the model has enabled the implementation of a path to visualize the operational and supply chain mapping related to the operation of services. In this research, the focus was on the detail of pharmaceutical service delivery in order to identify existing gaps in the system. This led us to optimize several improvements in relation to the results obtained. The main objective of implementing an effective logistics management system is to reduce the overall time spent in the system. In addition to providing a solution for time management and reduction, Research Plus encourages an improvement in the time spent by the customer and rapid inventory management. The difficulty of modeling is compensated by the great flexibility offered by this simulation tool, thanks to the block principle. This principle, close to programming languages, does not limit the modeler with a standard vision of the entities making up the supply chain, because the modeler has direct access to the smallest details on the model, and consequently the designed chain is highly customizable. Whether in terms of products and services, or processes, it is essential to “decomplexify” everything that can be de-complexed in order to reduce or eliminate potential interruptions in the chain. The future belongs to increasingly differentiated and flexible supply chains.

5 Conclusion The objective of this research paper was to discuss the nature of the service supply chain through SSC modeling. The work consists in identifying the different cost parameters of the SSCM and their impacts on the final price of the product. We chose to model SSCM in order to identify the main characteristics of performance management. Thus, we have modelled an SSC in order to propose ways to optimize costs by controlling the operations of the supply chain and the functional coordination of the different parameters of the chain. In view of the differences in weight of the components of logistics costs of service companies, we intend to estimate their weighting coefficients, based on a quantitative study of companies in the sector. The relevance of such a simulation is likely to help company managers to rationalize their logistics projects and to improve the relevance of investment, outsourcing and collaboration decisions in this area. In the literature, several works have dealt with supply chain simulation via the Arena tool, addressing the development of modeling strategies or ready-to-use packages, but none of these works proposes a complete model for full-scale cases. Our model comes from reality and seems to be consistent with the interviews conducted. The optimizations

The Global Performance of a Service Supply Chain

501

carried out have allowed a considerable progress of the different performance indicators selected. This methodological and empirical originality is predominant for this work. Nevertheless, a great difficulty of modelling is compensated by the flexibility offered by this simulation tool. This does not hide the intrinsic limitations of this model, such as the importance of a global vision of the company, the relativity of the data or the introduction of macro-economic data. These different perspectives offer new prospects for future research.

References 1. Vargo, S.L., Lusch, R.F.: It’s All B2B … and beyond: toward a systems perspective of the market. Ind. Mark. Manage. 40(1), 181–187 (2011) 2. Colin, J.: Le supply chain management existe-t-il réellement? Revue française de gestion 156(3), 135–149 (2005) 3. Liu, W., Bai, E., Liu, L., Wei, W.: A framework of sustainable service supply chain management: a literature review and research agenda. Sustainability 9, 421 (2017) 4. Altuntas Vural, C.: Service-dominant logic and supply chain management: a systematic literature review. J. Bus. Ind. Mark. 32(8), 1109–1124 (2017) 5. Liu, W., Wang, D., Long, S., Shen, X., Shi, V.: Service supply chain management: a behavioural operations perspective. Mod. Supply Chain Res. Appl. 1(1), 28–53 (2019) 6. Chehbi-Gamoura, S., Derrouiche, R., Damand, D., Barth, M.: Insights from big data analytics in supply chain management: an all-inclusive literature review using the SCOR model. Product. Plann. Control 31(5), 1–27 (2019) 7. Lee, H.L., Billington, C.: Material management in decentralized supply chains. Oper. Res. 41, 835–847 (1993) 8. Govil, M., Proth, J.-M.: Supply Chain Design and Management: Strategic and Tactical Perspectives. Academic Press, Cambridge (2002) 9. Valla, A.: Une méthodologie de diagnostic de la performance d’une chaîne logistique. Thèse de doctorat en Informatique, Lyon INSA (2008) 10. Lambert, D.M., Cooper, M.C., Pagh, J.D.: Supply chain management: implementation issues and research opportunities. Int. J. Logistics Manag. 9(1), 1–20 (1998) 11. Ben-Daya, M., Hassini, E., Bahroun, Z.: Internet of things and supply chain management: a literature review. Int. J. Product. Res. 57(15–16), 1–24 (2017) 12. Mentzer, J.T., DeWitt, W., Keebler, J.S., Min, S., Nix, N.W., Smith, C.D., Zacharia, Z.G.: Defining supply chain management. J. Bus. Logistics 22(2), 1–25 (2001) 13. Meurier, B., Paché, G.: Capacités dynamiques au sein des chaînes logistiques : Proposition d’une grille de lecture. Les Actes des RIRL 2018. Colloque AIRL-SCM (2018) 14. Ellram, L.M., Monique, L.: Ueltschy murfield: supply chain management in industrial marketing—relationships matter. Ind. Mark. Manage. 79(1), 36–45 (2019) 15. Gibson, B.J., Mentzer, J.T., Cook, R.L.: Supply chain management: the pursuit of a concensus definition. J. Bus. Logistics 26(2), 17–25 (2005) 16. Baltacioglu, T., Ada, E., Kaplan, M., Yurt, O., Kaplan, C.: A new framework for service supply chains. Serv. Ind. J. 27(2), 105–124 (2007) 17. Bentalha, B., Hmioui, A., Alla, L.: Digital service supply chain management: current realities and prospective visions. In: Ben Ahmed, M., Boudhir, A., Santos, D., El Aroussi, M., Karas, ˙I. (eds) Innovations in Smart Cities Applications Edition 3. SCA 2019. Lecture Notes in Intelligent Transportation and Infrastructure. Springer, Cham (2020)

502

B. Bentalha et al.

18. Lin, Y., Shi, Y., Zhou, L.: Service supply chain: nature, evolution, and operational implications. In: Proceedings of the 6th CIRP-Sponsored International Conference on Digital Enterprise Technology, pp. 1189–1204 (2010) 19. Song, D., Y, Xu: Integrated design of service supply chain in the perspective of producer service outsourcing. In: International Conference on Management and Service Science (MASS), pp. 1–4. IEEE (2011) 20. Chowdhury, Y., Alam, M.Z., Habib, M.: Supply chain management practices in services industry: an empirical investigation on some selected services sector of Bangladesh. Int. J. Supply Chain Manage. 6(1), 152–162 (2017) 21. Shahin, A.: SSCM: service supply chain management. Int. J. Logistics Syst. Manage. 6(1), 60–75 (2010) 22. Xinping, C.: Application of the IUE-SSC model in the information service industry. Inf. Technol. J. 12(1), 5512–5518 (2013) 23. Bentalha, B.: Big-Data et service supply chain management: challenges et opportunités. Int. J. Bus. Technol. Stud. 1(3) (2020) 24. Wang, Y., Wallace, S.W., Shen, B., Choi, T.-M.: Service supply chain management: a review of operational models. Euro. J. Oper. Res. 247(3), 685–698 (2015) 25. Forrester, J.W.: Principes des Systèmes. 3e édition, Presses universitaires de Lyon (1984) 26. Hamilton, P., Tachizawa, T., Getulio Kazue, A., Washington, S.: Supply chain management as a competitive strategy for costs reduction: a case study in two small manufacturing companies, (2013) 27. El Azizi, M.B.: Modélisation multi-agents de la coopération au sein des chaînes logistiques à deux échelons: application à la distribution de produits pharmaceutiques au Maroc. Gestion et management. Université Paris-Nord — Paris XIII, (2014) 28. Zhu, Z., Nakata, C.: Re-examining the link between customer orientation and business performance: the role of information systems. J. Mark. Theor. Pract. 15(3), 187–203 (2007) 29. Niranjan, S., Ciarallo, W.: Supply performance in multi-echelon inventory systems with intermediate product demand: a perspective on allocation. Decis. Sci. 42(3), 575–617 (2009) 30. Lapinskait˙e, I., Kuckailyt˙e, J.: The impact of supply chain cost on the price of the final product. Bus. Manage. Educ. 12(1), 109–126 (2014) 31. Xu, Z., Elomri, A., Zhang, Q., Liu, C., Shi, L.: Status review and research strategies on product-service supply chain. Proc. Inst. Mech. Eng. Part B: J. Eng. Manuf. (2020) 32. LaLonde, B.J., Pohlen, T.L.: Issues in supply chain costing. Int. J. Logistics Manage. 7(1), 1–12 (1996) 33. Di Martinelly, C., Guinet, A., Riane, F.: Chaîne logistique en milieu hospitalier: modélisation des processus de distribution de la pharmacie. 6e Congrès international de génie industriel. Besançon (France), pp. 1–8 (2005) 34. Rouibi, S., Burlat, P., Ouzrout, Y., Frein, Y.: La modélisation Arena comme outil d’étude de l’influence du VMI sur les niveaux de stocks des chaînes logistiques. 8e Conférence Internationale de Modélisation et SIMulation. Hammamet, Tunisie. (2010) 35. Madelin, G.: Modélisation et amélioration d’un réseau de distribution logistique en milieu hospitalier. Mémoire de maîtrise, École Polytechnique de Montréal, France (2017) 36. Thierry, C., Thomas, A., Bel, G.: Simulation for supply chain management. CAM, control systems, robotics ans manufacturing series. Wiley (2008) 37. Kelton, W.D., Sadowski, R.P., Sadowski, D.A.: Simulation with Arena, 2nd edn. McGrawHill, Inc., New York, NY, USA (2002) 38. Borodin, V., Bourtembourg, J., Hnaien, F., Labadie, N.: COTS software integration for simulation optimization coupling: case of ARENA and CPLEX products. Int. J. Model. Simul. 39(3), 178–189 (2018) 39. Rockwell Automation, Inc. Users guide. Allen Bradley—Rockwell Software (2012)

Traffic Signs Detection and Recognition System in Snowy Environment Using Deep Learning Hamou Chehri1 , Abdellah Chehri2(B) , and Rachid Saadane3 1 Bell-Canada, 671 Rue de La Gauchetière Ouest, Montréal, Québec H3B 2M8, Canada

[email protected] 2 Department of Applied Sciences, University of Québec in, Chicoutimi, Canada

[email protected] 3 SIRC/LaGeS-EHTP, EHTP Km, 7 Route El Jadida, Oasis, Morocco

[email protected]

Abstract. A fully autonomous car does not yet exist. But the vehicles have continued to gain in range in recent years. The main reason? The dazzling progress made in artificial intelligence, in particular by specific algorithms, known as machine learning. These example-based machine learning methods are used in particular for recognizing objects in photos. The algorithms developed for the detection and identification must respond robustly to the various disturbances observed and take into account the variability in the signs’ appearance. Variations in illumination generate changes in apparent color, shadows, reflections, or backlighting. Besides, geometric distortions or rotations may appear depending on the viewing angle and the panels’ scale. Their appearance may also vary depending on their state of wear and possible dirt, damage. In this work, to improve the accuracy of detection and classification of sign road partially covered by snow, we use the Fast Region-based Convolutional Network method (Fast R-CNN) model. To train the detection model, we collect an image dataset composed of multi-class of road signs. Our model can simultaneously multi-class of a road sign in nearly real-time. Keywords: Deep learning · Automatic classification · Traffic sign · Detection

1 Introduction The detection and recognition of road signs is a major issue in the analysis of road scenes by image processing. There are many applications, such as route calculation with estimated travel times, the development of tools for the management and maintenance of road assets, real-time driving assistance systems or, in connection with robotics, vehicle automation or even the development of a new generation of multimedia tools on the web for geographic 3D navigation. Whatever the applications, detection and recognition methods come up against the difficulties linked both to the uncontrolled nature of the shots used and to the variability in appearance of the objects sought. Signs are manufactured and standardized objects whose shape, dimensions, color and position are fixed by standards. variations in illumination © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 503–513, 2021. https://doi.org/10.1007/978-3-030-66840-2_38

504

H. Chehri et al.

generate changes in apparent color, shadows, reflections or backlighting. In addition, geometric distortions or rotations may appear depending on the viewing angle and the scale of the panels. Their appearance may also vary depending on their state of wear and possible dirt, damage or partially covered by snow. The algorithms developed for the detection and identification of road signs must respond in a robust manner to the various disturbances observed and take into account the variability in appearance of the signs. These algorithms used in general the methods by image processing consist of two steps: the detection of signs in the road scene and the recognition of their type. These algorithms can be roughly classified into three categories: Color-based, Shape-based, and Machine Learning-based methods. Segmentation, which is low-level processing, is defined as a process of partitioning the image into homogeneous regions where each of the groups together a set of points with common properties. Many segmentation methods have been proposed in the literature. Still, two main categories can be distinguished if we place ourselves in the context of mathematical morphology: contour approaches and region approaches. The first category includes techniques that detect rapid changes in contrast in the image. These techniques generally assume an a priori model of the discontinuities sought and operate in a very localized manner. The Color-based methods, segmentation is applied to detect regions of interest. Color image segmentation is low-level processing that remains an open problem in image analysis. This task aims to create a partition of an image into disjoint and related subsets, called regions and having common attributes. Image segmentation is an essential task in any image processing and analysis process. This task is present in several computer vision applications such as medical imaging, video analysis, remote sensing, assistance in the operation of video surveillance systems in transport, etc. In remote sensing, the segmentation process is becoming highly used with the advent of very high spatial resolution satellite and aerial images. Several segmentation approaches dedicated to remote sensing images have been proposed in the literature. The robustness of each of these methods depends significantly on the acquired image (image resolution, presence of noise, etc.). Providing a state of the art of image segmentation methods is a delicate task, given the number of articles published in the literature over the past thirty years. Most of the methods have been developed for specific applications [1–3]. It emerges from the literature that two inseparable aspects coexist in the problem of the automation of the extraction of information from images. The first aspect concerns image processing techniques (in most cases, segmentation). In contrast, the second involves processing complex data in analysis and data mining (classification, for example). These two aspects benefit from many years of research already carried out in image segmentation and data classification. Two primary analysis approaches can then be identified: supervised analysis and unsupervised analysis. They differ mainly by their purpose but also by the algorithms implemented. Supervised analysis requires processing processes based on supervised segmentation and classification techniques. Supervised segmentation integrates a priori knowledge (shapes, spectral information, etc.) of the objects to be extracted during the segmentation process.

Traffic Signs Detection and Recognition System

505

The supervised classification consists of a set of objects of known classes called a learning set, to produce a model of the types characterizing these data and, above all, generalized to a larger data set [4]. Convolutional Neural Networks (CNNs) share the overall mechanism of a MultiLayer Perceptron MLP [5, 6]. CNN’s are defined by a set of layers, forming a hierarchy of characteristics. The main operation on CNN is convolution. Its role is to extract the layer’s features that precede it and produce the result in a characteristic map. The following paragraph explains this mechanism [7]. The computer’s 2D image is a matrix where each element represents the intensity of pixels. This matrix can be in 3 dimensions if the image is in color. Each size represents a channel that corresponds to either red, blue, or green. Taking a training image as an input, the network produces convolution operations using several filters on each channel with a particular pitch. Each filter is a matrix whose dimensions are fixed in each layer. They cycle through the input image or map, and each produces an output channel where the result of each convolution is recorded in its corresponding location [8–10]. Detection and recognition of traffic-sign in snowy environment remains an open question. Various previous works have addressed the traffic-sign recognition and detection, However, several of them focused only on traffic-sign detection in clear visibility. This paper attempts to train its convolutional neural network for partially covered traffic signs. First, a complete dataset of road signs with 20 classes. Next, the sample images were preprocessed through faster RCNN to realize the intelligent detection of the object. Finally, we will have a program that can identify and draw boxes around specific objects in pictures, videos, or in a webcam feed. This paper is organized as follows. The related works on road signs detection and classification is given in Sect. 2. In Sect. 3, we give an overview of the object detection techniques. The sampling and image preprocessing procedures are given in Sect. 4. Section 5 concludes the paper.

2 Related Works Since 1989 [11] researchers started to apply machine learning methods for sign detection. Subject of machine learning is to study how to use computer imitate human learning activities, and to discover self-improvement methods of computers that to gather new skills and new knowledge, classify existing knowledge, and deliberately improve the performance [12]. Over the recent few years, deep learning has had unprecedented success in fields like speech recognition, image classification, etc. Deep learning is a subset of machine learning in Artificial Intelligence (AI) that has networks which are capable of learning unsupervised from data that is unstructured or unlabeled and is about learning multiple levels of representation and abstraction that help to make sense of data such as images, sound, and text [13]. So, it is making them very effective for traffic sign detection. Especially when track any objects in an actual video sequence where these objects are non-rigid, the background of the scene is not fixed and in the case of several objects in the same scene.

506

H. Chehri et al.

Solving a segmentation problem with discrimination is very popular. The earliest CNN-based approaches to achieve segmentation were just a pixel-by-pixel classification where a pixel was classified based on pixel value in its neighborhood. In inference, a filter in the form of a sliding window centered in one pixel traverses the entire image. Each pixel is individually classified, and the operation is repeated as many times as the number of pixels present in the image. CNN’s have been shown to be very robust in solving image classification problems, and several architectures have been proposed to increase their performance. They thus demonstrated the importance of choosing the values of hyper-parameters. They showed that a network’s depth is critical to achieving better performance, provided you have an extensive training database. The architectures mentioned above have managed to achieve the best performance in state of the art, but their use comes at a price: training data. Models with such a great capacity need to learn continuously. Otherwise, the gradient decreases each time until it disappears from the network, which will penalize the optimization of the parameters [14–16]. Xiong et al. [17] trained a traffic sign detection model based on deep CNNs using Region Proposal Network (RPN) in Fast R-CNN. Running on the hardware environment of NVIDIA GTX980Ti 6 GB GPU, the average detection time is about 51.5 ms per image with the detection rate above 99% in continuous image sequence. The database they used is Chinese traffic sign with 7 main categories. Another Faster R-CNN-based model was proposed in [18], with two parts of using selective search to detect candidate regions firstly and then using CNNs to extract features, make classifications and modify parameters. New way to detect traffic signs was discovered by Peng et al. [19]. They use Faster R-CNN based on Region Proposal Networks to achieve 90% of accuracy with NVIDIA GTX 1070 8 GB GPU, Intel Core i5, 16 GB RAM on GTSDB dataset.

3 Machine Learning-Based Methods Machine learning algorithms are constantly evolving today and improving at a breakneck pace. The opportunities in intelligent transport and autonomous vehicles are growing, providing a database is assembled that is representative and of sufficient size. Indeed, these methods of learning representations are based on the availability of numerous data, which must be representative of the problem that we wish to solve. Therefore, an algorithm like Faster R-CNN has been developed to find these occurrences and find them fast. 3.1 R-CNN Algorithm Where the methods described above perform a classification, that is to say, assign a unique class to an image, other algorithms, also based on CNN’s, allow you to detect several objects of interest in an image (detection/location phase), then assign each of these objects a class (identification phase). CNN’s dedicated to localization and identification are divided into algorithms that carry out localization and identification jointly (one-step algorithms or one-shot). Those who carry it out in two successive stages (two-step algorithms).

Traffic Signs Detection and Recognition System

507

One-step architectures (YOLO, SSD, MobileNet, RetinaNet) offer much lower computation times than two-step architectures (R-CNN, Faster R-CNN, etc.) but exhibit more inadequate results. In a preliminary study, we were able to corroborate these results, in which we found that a model was created from a faster architecture [20]. R-CNN performed better than those created using SSD and RetinaNet architectures with the same training. Our study did not cover cases where the computing power and the processing time were limited (no processing time real). We are exclusively interested in two-step architectures, which will be detailed below. The architectures performing the two tasks separately (region-based CNN, R-CNN) present a first phase, which consists in searching for a certain number of areas of interest (Region Proposal), at all possible sizes, initially thanks to a search method selective search. This number of proposals for areas of interest is defined a priori by the user (e.g., 2000 regions for the original R-CNN architecture) (see Fig. 1).

Fig. 1. A figure caption is always placed below the illustration. Short captions are centered, while long ones are justified. The macro button chooses the correct format automatically [21].

3.2 Fast R-CNN The selective search algorithm begins by initializing small homogeneous regions in the image. Then, an algorithm will allow merging these regions. For this, a similarity index between neighboring regions is calculated. This index compares the similarity between colors (in several domains of color representation, e.g., RGB, HVS, grayscale…), textures, sizes, and overlapping regions (i.e., two areas with a high overlap rate will be more likely to merge than distant areas). At each merger, the indices are recalculated with the newly obtained areas until all regions are merged. This method thus makes it possible to search for objects of all dimensions. Once the regions have been acquired, a second step will take place in the same way as a conventional CNN, and each proposed area will be identified in a class (labeled). Finally, the algorithm returns a class and four values corresponding to the dimensions of the region. Subsequently, improvements were made to this method, the main drawback of which is the computing time required for each iteration. The authors of [19] therefore propose to calculate the characteristic vector of the entire image, rather than to calculate it independently on each region (Fig. 2), which allows a considerable saving of time compared to the previous method, which had to recalculate the characteristic vector for each object of interest.

508

H. Chehri et al.

Fig. 2. Fast R-CNN [21].

More recently, articles propose [20, 21] to remove the use of selective search and replace it with a Region Proposal Networks (RPNs). The learning phases then alternate between modifying the RPN parameters and modifying the identification parameters [22]. The RPN is used in the same way as the selective search algorithm, taking as input an image and returning as output a list of bounding boxes, each associated with a score (object score). Since the RPN is made up of hidden layers, it takes advantage of the backpropagation and the learning function, which can then adapt specifically to the training base (Fig. 3). The algorithms based on RPNs are, to date, the most efficient in object detection tasks, their main weakness being the long learning computation time.

Fig. 3. Faster R-CNN [21].

From Fig. 4, you can see that Faster R-CNN is much faster than its predecessors. Therefore, it can even be used for real-time object detection. Tensorflow detection model zoo provides a collection of detection models pretrained. Some model has high speed with lower accuracy, other models such as Faster R-CNN have a lower speed but a higher efficiency.

Traffic Signs Detection and Recognition System

509

Fig. 4. Fast Comparison of test-time speed of object detection algorithms [20].

4 Sampling and Image Preprocessing 4.1 Experimental Environment Software environment: Windows 10 64-bit operating system, TensorFlow 1.15.1, Python 3.8.0 64-bit. Hardware environment: Intel (R) Core (TM) i7-7700 [email protected] GHz processor, 16.00 GB memory, NVIDIA GeForce GTX 1050, 512 SSD hard disk. 4.2 Data Collection General image datasets such as ImageNet [23] and Microsoft COCO have been generated by downloading Internet images retrieved by search engines using keywords. To mimic a real-world application scenario, we select the images for sign road partially covered by snow [24, 25]. Traffic signs in Canada follow international patterns, and can be classified into three categories: warnings (mostly yellow rectangles with a black boundary and information), prohibitions (mostly white surrounded by a red circle and also possibly having a diagonal bar), and mandatory (mostly green circles with white information) [26]. Other signs exist that resemble traffic-signs but are in fact not; like the one illustrated in Fig. 5. (d). Such signs are placed in an ‘other’ class of a particular category. The data samples were composite of a total of 600 images, 150 images for each class. The dataset contains the pictures of 4 classes of the road signs. 4.3 Sampling All implementation in this part has been done using an OpenCV environment. Preimplemented functionality from the OpenCV library has been used to keep the program robust and will be stated when presented. This section will show a step-by-step solution of the image processing from source images to extracted features data of each leaf. From the data collection images, 80% were randomly selected and allocated to the training set, and the remaining 20% were allocated into the test set.

510

H. Chehri et al.

Fig. 5. The different classes one our data set.

4.4 Labeling The images collected were next annotated by hand. We used the LabelImg tool for labeling desired objects in every picture. Then draw a box around each object in each image. LabelImg saves a.xml file. This file will contain the label data for each image. These.xml files will be used to generate TFRecords, which are one of the inputs to the TensorFlow trainer. 4.5 Generate Training Data After labeling for all objects, TFRecords were generated, serving as input data to the TensorFlow training model. The image.xml data were used to create.csv files containing all the data for the train and test images. This creates a train_labels.csv and test_labels.csv file in the training folder. 4.6 Create Label Map and Configure Training The last thing to do before training is to create a label map and edit the training configuration file. The label map tells the trainer what each object is by defining class names’ mapping to class ID numbers. The label map ID numbers should be the same as what is defined in the generate_tfrecord.py file. The fault detection for traffic road must be configured. It defines which model and what parameters will be used for training. There are several changes to make to the. config file, mainly changing the number of classes and examples, and adding the file paths to the training data (Fig. 6).

Traffic Signs Detection and Recognition System

511

Fig. 6. Example of labeling traffic sign using LabelImg tool.

4.7 Run and Time the Data Training Each step of training reports the loss. It will start high and get lower and lower as training progresses [27]. For our training on the Faster-RCNN-Inception-V2 model, it started over1 and dropped below 0.5 after 2200 steps. The model was left to train until the loss consistently drops below 0.10, which will take about 5150 steps (Fig. 7).

Fig. 7. One important graph is the loss graph, which shows the overall loss of the classifier over time.

5 Conclusion The system architecture has proved to be a promising approach at this stage of development. All though there is still major parts of the system to be developed, its modularity makes it easy to develop and understand. The implemented program works well and the training mode is fully functional. However, a classifier to be used in the spraying mode in the classification part of the

512

H. Chehri et al.

system, is to be implemented. In this paper, the use of Python has been applied to examine the different classifiers. This is done in order to investigate the results and the performance before choosing a classifier and implementing it into the program. On the other hand, a framework for the program has been created and is easy to develop further.

References 1. De La Escalera, A., Moreno, L.E., Salichs, M.A., Armingol, J.M.: Road traffic sign detection and classification. IEEE Trans. Ind. Electron. 44(6), 848–859 (1997) 2. Yakimov, P.: Traffic signs detection using tracking with prediction. International Conference on E-Business and Telecommunications Colmar, pp. 454–467. Springer, France (2015) 3. De la Escalera, A., Armingol, J.M., Mata, M.: Traffic sign recognition and analysis for intelligent vehicles. Image Vis. Comput. 21(3), 247–258 (2003) 4. Ruta, A., Li, Y., Liu, X.: Real-time traffic sign recognition from video by class-specific discriminative features. Pattern Recogn. 43(1), 416–430 (2010) 5. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of ICLR Workshops Track (2013) 6. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of NIPS, pp. 1106–1114 (2012) 7. Russakovsky, O., Deng, J., Su, J., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei- Fei, L.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015) 8. Dean, J., Corrado, G.S., Monga, R., Chen, K., Devin, M., Le, Q.V., Mao, M.P., Ran- zato, M., Senior, A., Tucker, P., Yang, K., Ng, A.Y.: Large scale distributed deep networks. In: Proceedings of NIPS, pp. 1232–1240 (2012) 9. Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (2016) 10. Molenaar, D., Sadler, B.A.: Anode rodding basics. In: Grandfield J. (eds) Light Metals 2014. Springer (2014) 11. Lubin, J., Kornhauser, A.: Using back-propagation networks to assess several image representation schemes for object recognition. In: International Joint Conference on Neural Networks (1989) 12. Wang, H., et al.: A brief review of machine learning and its application. In: 2009 International Conference on Information Engineering and Computer Science (2009) 13. Investopedia, “definition of ‘Deep Learning’. http://www.investopedia.com/terms/d/deep-lea rning.asp 14. Aghdam, H.H., Heravi, E.J., Puig, D.: A practical approach for detection and classification of traffic signs using convolutional neural networks. Robot Auton Syst. 84, 97–112 (2016) 15. Wu, Y., Liu, Y., Li, J., Liu, H., Hu, X.: Traffic sign detection based on convolutional neural networks. In: The 2013 International Joint Conference on Neural Networks (IJCNN), pp 1–7 (2013) 16. Zang, D., Zhang, J., Zhang, D., Bao, M., Cheng, J., Tang, K.: Traffic sign detection based on cascaded convolutional neural networks. In: 17th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), 2016, IEEE, pp 201–206 (2016) 17. Xiong, C., Wang, C., Ma, W., Shan, Y.: A traffic sign detection algorithm based on deep convolutional neural network. IEEE ICSIP (2016) 18. Zuo, Z., Yu, K., Zhou, X., Wang, X., Li, T.: Traffic signs detection based on faster R-CNN. In: 2017 IEEE 37th International Conference on Distributed Computing Systems Workshops (2017)

Traffic Signs Detection and Recognition System

513

19. Peng, E., Chen, F., Song, X.: Traffic Sign Detection with Convolutional Neural Networks ICCSIP in. Springer, Singapore (2017) 20. Gandhi, R.: R-CNN, Fast R-CNN, Faster R-CNN, YOLO — Object Detection Algorithms Understanding object detection algorithms (2018) 21. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: 28th International Conference Neural Inforamtion Process System, pp. 91–99 (2015) 22. Deng, J., Dong, W.,. Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: A Large-Scale Hiera chical Image Database. In: CVPR09 (2009) 23. Lin, T., Maire, M., Belongie, S., Bourdev, L.D., Girshick, R.B., Hays, J., Perona, P., Ramanan, D., Dollar, P., Zitnick, C.L.: Microsoft COCO: common objects in context. CoRR, abs/1405.0312 (2014) 24. Chehri, A., Fortier, P.: Low-cost localization and tracking system with wireless sensor networks in snowy environments. In: Chen, Y.W., Zimmermann, A., Howlett, R., Jain, L. (eds) Innovation in Medicine and Healthcare Systems, and Multimedia. Smart Innovation, Systems and Technologies, vol 145. Springer, Singapore (2019) 25. Chehri, A., Fortier, P.: Wireless positioning and tracking for Internet of Things in heavy snow regions. In: Zimmermann, A., Howlett, R., Jain, L. (eds) Human Centred Intelligent Systems. Smart Innovation, Systems and Technologies, vol. 189. Springer, Singapore, June 2020 26. Chehri, A., Fortier, P., Tardif, P.M.: Geo-Location with wireless sensor networks using non-linear optimization. In: Proceedings of International Journal of Computer Science and Network Security (IJCSNS), pp. 145–154 January (2008) 27. Chehri, A., Mouftah, H.T.: Autonomous vehicles in the sustainable cities the beginning of a green adventure. Sustain. Cities Soc. 51, 101751 (2019) 28. Chehri, H., Chehri, A., Kiss, L., Zimmerman, A.: Automatic anode rod inspection in aluminum smelters using deep-learning techniques: a case study. Procedia Comput. Sci. 176, 3536–3544 (2020)

Smart Healthcare

A Machine Learning Approach for Initial Screening of Polycystic Ovarian Syndrome (PCOS) Joshua Rei Jaralba(B)

, Renann Baldovino , and Homer Co

Manufacturing Engineering and Management Department, Gokongwei College of Engineering, De La Salle University, 2401 Taft Avenue, 0922 Manila, Philippines [email protected]

Abstract. Polycystic ovarian syndrome (PCOS) is one of the most common gynecological disorders affecting women globally. The economic repercussions brought by this disease and its associated comorbidities, require the development of tools and procedures that will enable its early and accurate diagnosis. However, early diagnosis, especially for adolescent women remains a persistent challenge. In this paper, machine learning (ML) algorithms were evaluated for their performance in screening PCOS using 23 non-invasive screening parameters. The purpose of this research is to identify suitable machine learning algorithms that can be used to screen PCOS patients without the use of invasive tests. The dataset used consists of clinical data of 540 patients from different hospitals in Kerala, India, 378 of which were used for training and 162 were used for testing. Performance of the ML models were measured and compared based on accuracy, precision, sensitivity, specificity, ROC and AUC scores, and the effect of data imbalance was also explored. It was shown that these models offer great promise as screening tool for PCOS in terms of sensitivity and ROC, with some models performing better than other proposed methods utilizing invasive tests. Keywords: Classification · Machine learning · PCOS · Diagnosis

1 Introduction 1.1 Background Polycystic ovary syndrome (PCOS) is an endocrine disorder affecting women of reproductive age characterized by an excess in the male hormone androgen, resulting to a number of symptoms such as obesity, irregular menstrual cycle acne, hirsutism and the abnormal development of ovarian follicles [1]. These follicles are fluid-filled sacs in the ovaries that are responsible for the release of hormones necessary for menstrual cycle and egg cell development. Around 400,000 follicles are present in a woman’s ovary at the onset of puberty. Follicles are characterized by their sizes. Sizes that are below 18 mm are considered antral, while those between 18 to 28 mm are considered dominant and are capable of releasing mature egg cells during ovulation. In a normal ovary, around © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 517–529, 2021. https://doi.org/10.1007/978-3-030-66840-2_39

518

J. R. Jaralba

5 to 9 of these follicles grow to within 2 to 28 mm range. In polycystic ovaries, antral follicles retire growth at 5 to 7 mm, resulting to a buildup of immature ovarian follicles, hence the term polycystic [2]. According to WHO, around 116 million or 3.4% of women worldwide are affected by this dysfunction. Aside from the primary symptoms, women with PCOS experience increased risk of Type-2 diabetes mellitus, cardiovascular diseases, hypertension, gynecological cancer and first trimester miscarriage [3, 4]. PCOS patients were also found to be twice likely to be admitted than the average individual [5]. That being said, the economic repercussions of this disease are not minimal. The Australian Health System spends 800 million dollars annually for the disease. In the US, it is estimated that the costs associated with screening and treatment add up to 4.36 billion dollars yearly [6]. Hence, accurate and early diagnosis and treatment is necessary in order to prevent the associated complications of the disease. The diagnosis of PCOS is made complicated by its unknown etiology and the variation in its prevalence among the affected women [7]. Diagnostic criteria are offered by three groups. The most recognized is the Rotterdam Criteria defined by the European Society for Human Reproduction and Embryology (ESHRE) and the American Society for Reproductive Medicine (ASRM). Under this criteria, three main symptoms, namely anovulation, hyperandrogenism and PCOS, were identified. The presence of 2 out of the 3 main symptoms is required for diagnostics [8]. Diagnostic procedures include a thorough examination of the patient’s medical history, an assessment of the metabolic status of the patient, laboratory screening to examine hormone levels, and ultrasound imaging to check for follicle count. Despite this, diagnostic criteria tend to be inconsistent among endocrinologists and gynecologists. Diagnosis for children and adolescents tend to be challenging also due to similarities between the symptoms and the physiological changes associated with pubertal development. This overlap can cause an overdiagnosis among adolescents, leading to unnecessary costs for treatment along with its psychological repercussions [9]. 1.2 Machine Learning (ML) Approach In light of this issue, ML approaches have been used for PCOS screening and diagnosis. Works of [2] and [10] used feature extraction, image descriptors and ML to detect and classify polycystic follicles from ultrasound images. Mehrotra et al. used logistic regression and Bayesian classification for PCOS automated screening from metabolic and laboratory data of 250 patients [11]. Meena et al. employed data mining techniques, feature selection employing the use of information gain subset evaluation (IGSE), neural fuzzy rough subset evaluation, and decision tree classifiers to determine relevant attribute for PCOS screening [12]. Anuhya et al. used multiple ML techniques to predict the PCOS occurrence using 18 attributes comprised of risk factors and patient medical history [13]. However, these studies failed to consider the hierarchy of diagnostic procedures. In addition, one of them have utilized data with no information about the attributes being considered. In this paper, five different ML approaches, Naïve-Bayes classification, k-nearest neighbors (k-NN), artificial neural network (ANN), support vector machine (SVM) and random forest (RF), along with an ensemble algorithm, were evaluated in terms of

A Machine Learning Approach for Initial Screening

519

accuracy for the screening of PCOS patients using an updated dataset composed of 40 features in Kaggle of the 40 features, 23 were categorized as screening parameters. These attributes exclusively include low-level features such as patient’s history, metabolic and physiological data.

2 Methodology 2.1 Data Collection and Preparation The data used in this paper is a subset of the dataset prepared by Kottarathil on Kaggle [14]. The data consists of diagnostic and clinical information of 541 patients collected from 10 different hospitals in Kerala, India. The original dataset consists of two files. The first file contains the physical and clinical PCOS parameters and the second file contains infertility data of the 541 patients. Only the first file was used since it focused primarily on screening and diagnosis. This data consists of 42 parameters, two of which were determined to be unique identifier values and were removed. Of the remaining 40 parameters, marriage status was identified to be extraneous. The target variable “PCOS” is composed of binary data, with 1 indicating positive and 0 as negative. Data cleaning was performed by removing data from a patient with erratic, non-numeric information, leaving 540 entries in the final dataset. Two empty fields in the attribute “marriage status” and one empty field in “Fast food” were populated with the median of the data. 2.2 Feature Selection The 38 remaining attributes were manually examined to classify them as either screening or diagnostic parameter [15]. Screening parameters are set of non-invasive diagnostics that do not require patient sample extraction. As shown in Table 1, the screening parameters include the history, physiological and metabolic data of the patient. Table 1. Classification of attributes in the PCOS dataset Attribute

Unit of measurement

Classification

Age

Years

Screening

Weight

kg

Screening

Height

cm

Screening

Body mass index

kg/m2

Screening

Blood group

Type

Screening

Pulse rate

bpm

Screening

Respiratory rate

breaths/min

Screening

Cycle regularity

R/I

Screening (continued)

520

J. R. Jaralba Table 1. (continued) Attribute

Unit of measurement

Classification

Cycle length

Days

Screening

Pregnant

Binary

Screening

No. of abortions

Count

Screening

Hip

Inch

Screening

Waist

Inch

Screening

Waist-to-hip ratio

Ratio

Screening

Weight gain

Binary

Screening

Hair growth

Binary

Screening

Skin darkening

Binary

Screening

Hair loss

Binary

Screening

Pimples

Binary

Screening

Fast food

Binary

Screening

Regular exercise

Binary

Screening

BP_systolic

mmHg

Screening

BP_diastolic

mmHg

Screening

Hb

g/dl

Diagnostic

FSH

mIU/mL

Diagnostic

LH

mIU/mL

Diagnostic

FSH/LH

Ratio

Diagnostic

TSH

mIU/mL

Diagnostic

AMH

ng/mL

Diagnostic

PRL

ng/mL

Diagnostic

Vit D3

ng/mL

Diagnostic

PRG

ng/mL

Diagnostic

RBS

mg/dl

Diagnostic

Follicle No. (L)

Count

Diagnostic

Follicle No. (R)

Count

Diagnostic

Avg. F size (L)

Mm

Diagnostic

Avg. F size (R)

Mm

Diagnostic

Endometrium

Mm

Diagnostic

On the other hand, diagnostic parameter is defined as a further test that will either require a fluid sample or an invasive vaginal ultrasound. The remaining 15 are all hormonal and gynecological data from invasive tests. Definitions were not provided by the

A Machine Learning Approach for Initial Screening

521

original author, but was interpreted from the literatures shown here about diagnostic procedures [8, 16]. 2.3 Training and Testing The cleaned data was split into 70% training and 30% testing sets using the train_test_split. Since there is some dataset imbalance, stratification was applied to preserve the ratio of the number of positive and negative values in the training and testing sets. The data imbalance is reflected in Fig. 1.

Fig. 1. Count of each class in the dataset for the unbalanced data

In the dataset, there are around 360 instances of negative values, while there are only 180 instances of positive values. Scaling was applied on the training and testing data using StandardScaler() and 5 ML algorithms, NB, k-NN, ANN, SVM and RF were implemented. Individual hyperparameter optimization was performed using GridSearchCV. The accuracy and robustness of each model were evaluated using stratified 10-fold crossvalidation. Then, performance scores such as accuracy, precision, sensitivity, specificity, receiver operating curve (ROC) and area under the curve (AUC) scores against the test dataset were measured. Finally, an ensemble learner using VotingClassifier was generated from two classifiers based on performance and ROC. 2.4 Oversampling Random oversampling (see Fig. 2) was applied on the training data to investigate the effect of data imbalance. This was performed after splitting the data in order to preserve the original data for testing. Same hyperparameters were used for both the unbalanced and oversampled datasets.

522

J. R. Jaralba

Fig. 2. Count of each class in the dataset for the balanced data

2.5 Performance Evaluation Six performance parameters were measured for each model and compared in between models namely accuracy, ROC, AUC scores, sensitivity, negative predictive value (NPV), specificity, positive predictive value (PPV). Accuracy is a measure of the overall correctness of the model, which takes into account both true positive and true negative predictions against the entire set of predictions. NPV and PPV are precision measures for all negative and positive predictions, respectively. Precision is the measure of quality and can be calculated by dividing the number of true positives or negatives to the total number of predictions. Sensitivity is a measure of the proportion of the actual positive cases that got detected as positive. It is also termed as the true positive rate (TPR) of the model. In this paper, this equates to the percentage of patients that were correctly diagnosed with PCOS. Specificity, on the other hand, is the true negative rate (TNR) or the proportion of PCOS-free patients that were correctly identified as PCOS-free. Compared to the other parameters, these parameters are of higher importance in measuring the performance of the models because they reflect the actual capacity of the model to determine the actual positive or sensitivity and the actual negative or specificity. This translates to the ability of the ML models to accurately determine whether a patient has or does not have PCOS [17]. Lastly, ROC visualizes the performance of the models. It shows the TPR changes in response to FPR changes. As a summary, the ROC gives out the AUC score, which represents the overall accuracy of the model [18].

3 Results and Discussion 3.1 Cross-Validation Results As discussed in the previous section, cross-validation was performed to check for overfitting and evaluate the robustness of the models. Results for the 10-fold stratified crossvalidation performed on the training set for both the models using unbalanced data and oversampled data are shown in Figs. 3 and 4, respectively.

A Machine Learning Approach for Initial Screening

523

Fig. 3. Cross-validation results using unbalanced data

Fig. 4. Cross-validation results using balanced data

The blue bars represent the average accuracy for the 10 folds while the orange bars represent the standard deviation. For models using unbalanced data, except for ANN and Naïve-Bayes, all other models faired comparably in terms of average accuracy, with average accuracy values greater than 80%. In terms of standard deviation, which serves as a measure of the model’s robustness to new data, k-NN performed best with a standard deviation of approximately 4%, followed by RF and the ensemble algorithm, with approximately 5%. Nevertheless, all models have standard deviations less than 8%, indicating good generalization capability and robustness to new data. For models using oversampled data, cross-validation results show comparable accuracy of just over 80% for all models except for k-NN and RF, which each have an average accuracy of over 85%. Standard deviations for oversampled data were also found to be generally lower compared to unbalanced data. ANN and RF showed better robustness in comparison to others. After the training, the remaining 30% of the data was used to evaluate their performance.

524

J. R. Jaralba

3.2 Model Performance In terms of accuracy, as shown in Fig. 5, all models exhibit similar performance to the cross-validation results, with most models performing comparably at around 80% accuracy, except for ANN. SVM performed best with 84% accuracy, while ANN performed worst with only 76%.

Fig. 5. Summary of performance of the ML models using unbalanced data

However, as earlier discussed, accuracy is a poor indicator of model performance as it says little about the actual capability of the model to diagnose or misdiagnose patients. A better measure of performance for diagnostic tools is their sensitivity and specificity scores. In terms of sensitivity, ANN and Naïve-Bayes performed best with sensitivity scores of 81% and 78%, respectively, coupled with specificity scores of 72% and 83%, respectively. There is an observed inverse relationship between specificity and sensitivity, with an increase in sensitivity resulting to a decrease in specificity and vice versa. This is also observed for the other models, where characteristically low sensitivities in the range of 55 to 68% and very high specificities in the range of 90 to 93% is observed. This phenomenon can be attributed to the tendency of the models to over-reject in order to maximize the accuracy, and is a possible consequence of data imbalance. This tendency is what caused the spike in specificity, as over-rejection can have a good impact on specificity. Since the models’ performance during training are measured on the basis of accuracy, maximizing accuracy by leaning towards rejection increases the rate of false negative predictions for imbalance data, while also improving specificity. As earlier discussed, another way to measure performance is precision. In this paper, this is expressed as the NPV and PPV. A good screening test should be able to correctly identify negative cases and limit the occurrence of false negatives. Higher tolerance is placed for false positives, as this can be screened out in later stages of the diagnostic procedure. NPV is considered as the more relevant measure of precision. It represents the fraction of true negative predictions against the total number of negative predictions. NPV, however, can be susceptible to data imbalance, especially for diseases with low prevalence. Nevertheless, it is a good indicator of performance and examining the effects

A Machine Learning Approach for Initial Screening

525

of data imbalance can help elucidate this uncertainty. NPV for all models were all within the range of 80 to 89%. NPV results correlate well with the specificity results, with the ANN performing best and the k-NN performing worst. To further examine the performance of the models, the ROC for each model was generated. Figure 6 shows the ROC of ML models trained using the unbalanced data.

Fig. 6. ROC of all models trained using unbalanced data

ROC shows the change in sensitivity values or TPR in response to change in the FPR which is equivalent to 1-specificity. Hence, ROC shows how the model can improve in response to changes in the discrimination threshold. The dashed diagonal line in the figure represents the graph for totally random predictions. The further away the curves are from this line, the better the performance of the model. Based on the ROC, all models performed comparably well with some models performing better than the others. AUC scores range from 0.844 for ANN to 0.880 for RF and SVM. RF performs best on the basis of the high sensitivity it can achieve with minimum adjustment to the discrimination threshold. Based on its ROC, allowing for a FPR of 30% increases the TPR to approximately 95%, and setting the threshold to 82% completely eliminates false negative predictions. This is followed by the Naïve-Bayes algorithm, where a FPR of 46% yields approximately 95% true positives and a FPR of 90% eliminates all false negative predictions. The ensemble algorithm was developed by trial-and-error using a combination of different classifiers. The best result was obtained by combining the high-specificity, low-sensitivity RF classifier and the low-specificity, high-sensitivity Naïve-Bayes algorithm. The resulting ensemble classifier has the best qualities of both models, with 89% specificity and 78% sensitivity scores. The AUC score of the ensemble is slightly higher than the RF classifier at 0.884 and the resulting ROC performs better than Naïve-Bayes algorithm alone. Based on the ROC, allowing for an FPR of around 40% increases the TPR to 95%, and setting the threshold to 85% eliminates all false negative predictions.

526

J. R. Jaralba

3.3 Effect of Data Imbalance The effect of data imbalance was investigated by performing random oversampling on the training data (see Fig. 7). There was an observed minimal increase in the accuracy of most models except ANN and SVM when oversampling is applied. However, accuracy values remained within the same range, from 77% to 81%. In terms of sensitivity, all models recorded an increase but ANN and Naïve-Bayes remained consistent with high sensitivity values.

Fig. 7. Summary of performance of the ML models using balanced data

Interestingly, remarkable increase was observed in the other models with characteristically low sensitivities: k-NN, from 55% to 70%; SVM, from 68% to 79%; and RF, from 58% to 70%. In terms of specificity, a slight decrease was observed for most models except ANN, where a slight increase was observed. It is important to note, however, that most of the specificity values remained high, except for k-NN, where a jump from 91% to 80% was observed. This increase in sensitivity compensated by a decrease in specificity can be attributed to the influence of oversampling in these models. In the unbalanced model, the three models with characteristically low sensitivities and high specificities can be said to have maximized accuracy by over-rejection. Oversampling eliminated this tendency, resulting to an increase in sensitivity. Meanwhile, the lack of significant changes in the performance measure of the Naïve-Bayes algorithm indicates the model’s robustness to imbalance in the data. The explanation as to why these models behaved the way they did, however, is beyond the scope of this paper. As earlier discussed, precision scores become a good measure of performance when dealing with balanced data. In terms of NPV, there was a general increase for all models except Naïve-Bayes and ANN, which remained in the same level. This result correlates well with the changes in sensitivity scores, confirming that the three models with low sensitivities described earlier are sensitive to the data imbalance. All models performed well in terms of NPV, with values ranging from 84 to 90% and three models, NaïveBayes, ANN and SVM performed best. In terms of PPV, no recognizable trend in changes

A Machine Learning Approach for Initial Screening

527

for all models was observed. Instead, the values tend to approach a certain middle value, with a decrease observed for models with higher PPV and an increase observed for models with lower PPV after oversampling. In terms of ROC (see Fig. 8), SVM performed best among non-ensemble models and performed comparably with the ensemble model. Based on their ROC, adjusting the discrimination threshold to 77% completely discriminates all true positive predictions while allowing for 23% of all negative cases to be correctly identified. AUC scores are also in agreement with minimal difference of 0.001 between AUC scores of the two models.

Fig. 8. ROC of all models trained using balanced data

3.4 Comparison to Other Studies Sahmay et al. proposed a PCOS screening tool that combines the clinical symptoms and AMH level of patients to diagnose PCOS [19]. This screening tool achieved 99% specificity and 73% sensitivity when combined with hyperandrogenism (HA). When combined with either Oligo/amenorrhea (OA) or HA, the sensitivity increased to 83% and specificity to 100%. While high specificity is generally favorable for a model’s performance, higher sensitivity is preferred for screening models. The purpose of this model is to identify as much patients that manifest symptoms for a particular disease, and a low sensitivity means a high chance that the disease will not be detected and for a screening test, eliminating the opportunity to conduct further test. Higher tolerance is given for false positives, as this can be screened out in later stages of the diagnosis. Hence, some of the ML models were able to outperform this model in terms of sensitivity, including the unbalanced Naïve-Bayes, ANN and ensemble models, and all balanced models except for k-NN and RF. In the study of Bedrick et al., they proposed a self-administered questionnaire to screen PCOS based on clinical symptoms [20]. The questionnaire includes symptoms for hirsutism, whether the person shaves, waxes or bleach (SWB), and a self-assessment

528

J. R. Jaralba

of bodily hair from the modified Ferriman-Gallway (sFG) scale. Based on the results, symptoms of hirsutism achieved a sensitivity of 76% and a specificity of 70% in diagnosing PCOS, while having either a positive sFG score or SWB achieved 83% sensitivity but only 66% specificity. Same algorithms in both the unbalanced and balanced models outperformed the hirsutism test in terms of both sensitivity and specificity. In comparison to the sFG-SWB test, none of the models performed better in terms of sensitivity, but most unbalanced models and all of the balanced models performed better in terms of specificity.

4 Conclusion ML models showed promising results in screening patients for PCOS. For models trained with unbalanced data, Naïve-Bayes and ANN performed best in terms of sensitivity. The effect of data imbalance was explored by performing oversampling, and it was shown that these two models were more robust than the others. In terms of ROC, RF performed best on the basis of the lower discrimination threshold it requires to achieve 100% sensitivity. For models utilizing balanced data, SVM performed best with a good balance of sensitivity and specificity, and high NPV and AUC scores. Furthermore, it was shown that some models performed better than other proposed screening tests in literature in terms of sensitivity, including one that involves an invasive blood test. Acknowledgement. The authors would like to thank the Engineering Research and Development for Technology (ERDT) of the Department of Science and Technology (DOST) for the dissemination support.

References 1. Saravanan, A., Sathiamoorthy, S.: Detection of polycystic ovarian syndrome : a literature Survey. 7, 46–51 (2018) 2. Purnama, B., Wisesti, U.N., Adiwijaya Nhita, F., Gayatri, A., Mutiah, T.: A classification of polycystic ovary syndrome based on follicle detection of ultrasound images. In: 2015 3rd International Conference on Information and Communication Technology ICoICT 2015, pp. 396–401 (2015). https://doi.org/10.1109/ICoICT.2015.7231458 3. Vidya Bharathi, R., Swetha, S., Neerajaa, J., Varsha Madhavica, J., Janani, D.M., Rekha, S.N., Ramya, S., Usha, B.: An epidemiological survey: effect of predisposing factors for PCOS in indian urban and rural population. Middle East Fertil. Soc. J. 22, 313–316 (2017). https://doi. org/10.1016/j.mefs.2017.05.007 4. Zhang, X.Z., Pang, Y.L., Wang, X., Li, Y.H.: Computational characterization and identification of human polycystic ovary syndrome genes. Sci. Rep. 8, 1–7 (2018). https://doi.org/10.1038/ s41598-018-31110-4 5. Hart, R., Doherty, D.A.: The potential implications of a PCOS diagnosis on a woman’s longterm health using data linkage. J. Clin. Endocrinol. Metab. 100, 911–919 (2015). https://doi. org/10.1210/jc.2014-3886 6. Azziz, R., Marin, C., Hoq, L., Badamgarav, E., Song, P.: Health care-related economic burden of the polycystic ovary syndrome during the reproductive life Span. J. Clin. Endocrinol. Metab. 90, 4650–4658 (2005). https://doi.org/10.1210/jc.2005-0628

A Machine Learning Approach for Initial Screening

529

7. Shan, B., Cai, J.H., Yang, S.Y., Li, Z.R.: Risk factors of polycystic ovarian syndrome among Li People. Asian Pac. J. Trop. Med. 8, 590–593 (2015). https://doi.org/10.1016/j.apjtm.2015. 07.001 8. Williams, T., Mortada, R., Porter, S.: Diagnosis and treatment of polycystic ovary syndrome. Am. Fam. Physician 94, 106–113 (2016). https://doi.org/10.3803/jkes.2007.22.4.252 9. El Hayek, S., Bitar, L., Hamdar, L.H., Mirza, F.G., Daoud, G.: Poly cystic ovarian syndrome: an updated overview. Front. Physiol. 7, 1–15 (2016). https://doi.org/10.3389/fphys. 2016.00124 10. Dewi, R.M., Adiwijaya Wisesty, U.N.: Jondri: classification of polycystic ovary based on ultrasound images using competitive neural network. J. Phys. Conf. Ser. 971, (2018). https:// doi.org/10.1088/1742-6596/971/1/012005 11. Mehrotra, P., Chatterjee, J., Chakraborty, C., Ghoshdastidar, B., Ghoshdastidar, S.: Automated screening of polycystic ovary syndrome using machine learning techniques. In: Proceedings - 2011 Annual IEEE India Conference Engineering. Sustainable Solution INDICON-2011 (2011) https://doi.org/10.1109/INDCON.2011.6139331 12. Meena, K., Manimekalai, M., Rethinavalli, S.: A novel framework for filtering the PCOS attributes using data mining techniques. Int. J. Eng. Res. Technol. 4, 702–706 (2015) 13. Anuhya, B.S., Chilla, M., Sarangi, S., Professor, A.: A critical study of polycystic ovarian syndrome (PCOS) classification techniques. IJCEM Int. J. Comput. Eng. Manag. 21, 2230– 7893 (2018) 14. Kottarathil, P.: Polycystic ovary syndrome (PCOS). https://www.kaggle.com/prasoonkottarat hil/polycystic-ovary-syndrome-pcos 15. Muir Gray, J.A.: The first report of the national screening committee. J. Med. Screen. 5, 169 (1998). https://doi.org/10.1136/jms.5.4.169 16. Ndefo, U.A., Eaton, A., Green, M.R.: Polycystic ovary syndrome: a review of treatment options with a focus on pharmacological approaches. P T. 38, 336–355 (2013) 17. Gilbert, R., Logan, S., Moyer, V.A., Elliott, E.J.: Assessing diagnostic and screening tests: Part 1, concepts. West. J. Med. 174, 405–409 (2001). https://doi.org/10.1136/ewjm.174.6.405 18. Cüvito˘glu, A., I¸sik, Z.: Evaluation machine-learning approaches for classification of cryotherapy and immunotherapy datasets. Int. J. Mach. Learn. Comput. 8, 331–335 (2018). https:// doi.org/10.18178/ijmlc.2018.8.4.707 19. Sahmay, S., Aydin, Y., Oncul, M., Senturk, L.M.: Diagnosis of polycystic ovary syndrome: AMH in combination with clinical symptoms. J. Assist. Reprod. Genet. 31, 213–220 (2014). https://doi.org/10.1007/s10815-013-0149-0 20. Bedrick, B.S., Eskew, A.M., Chavarro, J.E., Jungheim, E.S.: Self-administered questionnaire to screen for polycystic ovarian syndrome. Fertil. Steril. 111, e41–e42 (2019). https://doi.org/ 10.1016/j.fertnstert.2019.02.099

Patient Transport and Mobile Health Workforce: Framework and Research Perspectives Yosra Lahmer1(B) , Hend Bouziri2 , and Wassila Aggoune-Mtalaa3 1 Tunis University, ESSECT, 1089 Montfleury, Tunis, Tunisia

[email protected] 2 LARODEC Laboratory, ESSECT, Tunis University, 1089 Montfleury, Tunis, Tunisia

[email protected] 3 Luxembourg Institute of Science and Technology, 4362 Esch sur Alzette, Luxembourg

[email protected]

Abstract. The issue of personnel planning is still a topic that continues to attract the interest of researchers in various fields of application. This issue is at the core of health decision- makers’ concerns regarding the deficit in human resources. The shortfall in health personnel being one of the obstacles to achieving a Universal Health Coverage, it is a priority to be addressed by health managers and policy makers as part of the plans of the more global Human Resources for Health strategy. To help better manage such a strategy, our analysis focuses on exploring the flow of people in hospital logistics and reviewing the contribution of operational research on patient transport and mobile health workers, particularly approaches using Dial A Ride system. In this study, we provide a comprehensive description of the problem of patient transport and mobile health care personnel in relation to any health care service, focusing on the main contributions of operational research to current optimization problems in this area. Keywords: Patient transport · Mobile health workforce · Operational research

1 Introduction Personnel scheduling problem is still a topic that continues to capture the interest of researchers in different fields of application because of its ad hoc characteristics and specificities relating to the nature of each field. Operational research has paid a lot of attention to this problem and has proved its worth in various industrial and service areas and has developed considerably in health care over the years [1]. Various literature reviews exist that reflect the work of the research community, such as the article in [2], which presents different classification models of the personnel scheduling problem, listing multiple fields of application related to this problem in the service, transportation, military, manufacturing, retail and general sectors. Based on the analysis carried out for a set of 291 articles, the author points out that the staff rostering problem in services receives more attention than those in a production environment and © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 530–545, 2021. https://doi.org/10.1007/978-3-030-66840-2_40

Patient Transport and Mobile Health Workforce

531

the nurse rostering is the most explored with 64 articles. Other health care services are presented in 23 articles and the theme of protection-emergency is the subject of 6 articles. A second research paper, [3] presents a review and classification of the literature that takes into consideration the skills of the staff in solving the planning problem. Several other constraints were the subject of the work illustrated in a review of [4] which deals with the nurse rostering problem. The interest of the research community is in line with statements published in the World Health Report 2006 “Working Together for Health” [5] which highlighted a global shortfall of nearly 3.4 million physicians, midwives, nurses and support staff. According to news reports from the World Health Organization’s (WHO) including guidelines on health policy and system support for optimizing community health worker programs, the organization estimates that an additional 9 million nurses and midwives will be needed by 2030 to achieve the sustainable development goal 3 on health. A second publication from the same source indicates a projected shortfall of 18 million health workers by 2030, mostly in low- and lower-middle income countries [6]. Owing to the complexity of the health sector context and the interest in the issue of Human Resources for Health (HRH), the Global Health Workforce Network was established in 2016. It operates within WHO and contributes in support of the implementation of the Global Strategy on Human Resources for Health and the recommendations of the Commission [7]. As one of the obstacles to achieving Universal Health Coverage (UHC), the shortfall in health personnel represents a priority for health managers, decision-makers and a challenge for the operational research community. For a better management of available HRH, we propose in the following a review of the literature on workforce planning. The research methodology adopted is as follows. All papers presented in this review were found using the Scopus and Google scholar databases. The following keywords were used: health care planning, patient transport, resource scheduling and routing problem, vehicle routing problem, dial-a-ride problem, nurse rostering problem and hospital logistics. This work includes mainly publications since the two last decades and examine the contribution of each paper on patient transport and mobile healthcare workforce. The remainder of the paper is structured as follows: a definition of hospital logistics and associated concerns are highlighted in Sect. 2. Section 3 gives a classification of variants of the problems studied in operational research and we present relevant models on DARPs for patient flows modeling. The last section provides conclusions and recommendations for future researches.

2 Hospital Logistics 2.1 Definition Several definitions of hospital logistics exist in the literature. For Blua et al. [8] hospital logistics is a field of study and optimization of two types of physical hospital flows: personal flow consists in the movement of patients in the care unit, as well as of staff and visitors. For material flow, a difference can be made between incoming and outgoing flows in the care unit. This very diffused definition also taken up in the research work of [9]. In a second definition, authors in [10] specify that internal logistics plays a

532

Y. Lahmer et al.

fundamental role in the daily activities of any hospital. Several internal logistic flows are distinguished: these flows concern the retrieval of waste, dirty laundry, food trays and used surgical instruments; and the transport of patients to and from consultations, medical imaging and surgeries [11–13]. The diversity of these logistics flows, each with its own rules and complexities, makes managing this process a challenging task. The authors of [13] note that patient trajectories within the hospital are the source of all underlying interactions. With regard to patient flows, one can distinguish three possible developments in logistics, see [9]: 1. The segmented patient flow management handles patients as they arrive (except in emergencies), without any real attempt to influence their waiting time, since the patient must adapt to the structure and take responsibility for moving Patients are autonomous in their movements between the various services for their treatment. 2. The global management of the patient trajectory is based on a coherent vision of all the movements of each patient, within the same hospital and between care structures. It may be static logistics, which focuses on the infrastructure to guide flows, or dynamic logistics, which involves managing flows and linking demand and capacity. An information system is then crucial for planning special medical services, a simplification of admission formalities, and anticipation of the examinations according to the patient’s pathology can make flows more fluid. 3. The full patient management describes the situation where it is no longer the patient who adapts to the structure, but the opposite. Travel is reduced to a strict minimum: everything is done at the patient’s bedside (X-rays, samples, care, etc.). This is made possible by a very flexible organization (small, versatile teams). We propose to extend these definitions beyond the internal management of patient movement within the hospital structure and to consider any health service supplying the patient with the equipment and health personnel necessary to transport the patient within or out of the hospital facility. 2.2 Problem Description This section proposes to classify the patient transport problem or the mobility of health care personnel in relation to the levels of decision-making. The operational level deals with the mobile healthcare workforce that assists the patient in his path through the hospital, such as X-rays, blood tests and care units. The resulting problems concern the mobility of the patient or medical staff within the hospital. The tactical level represents issues identified at the hospital unit level including intra-hospital transfer such as transfer from the emergency department (ED) to a specialist department. The strategic level represents at the national level two types of extra-hospital services: – Upstream: • Pre-hospital: ensures the patient’s movement to the associated emergency service (ES). • Inter-hospital: refers to the inter-hospital transfer process

Patient Transport and Mobile Health Workforce

533

– Down Stream: • Inter-hospital: representing the transfer request structure transfer. • Home Health Care (HHC): concerns the mobility of the healthcare workforce. The diversity of these logistical flows, each with its own rules of direction (segmented, global or full care) and complexities, in addition to the patient’s health status (autonomous or disabled) make the management of patient transport a difficult process. Figure 1 shows the patient transport problem model in its different possible forms.

Fig. 1. Patient transport model.

In the following, an overview of operations research works applied to the abovementioned themes is provided.

3 Relevant Literature on Operational Research on Patient Transport and Mobile Healthcare Workforce The characteristics of the patient transport problem and the mobile health worker problem depend on the context and framework in which the problem is described. There are two main situations: extra-hospital or intra-hospital assistance. 3.1 Extra-Hospital Services These are health care services that are insured outside a hospital facility. In this context three cases can be described:

534

Y. Lahmer et al.

1. A medical staff rescues a remote patient: it is the case of a pre-hospital service. 2. A medical staff ensures the mobile transfer of a patient from a hospital to a host hospital: the case of inter-hospital transfers. 3. A medical staff visits the patients requesting care services at their own homes: Home Health Care (HHC). The three cases require the management and planning of the vehicle that transports the medical staff and of the caregiver. These problems are studied separately in operational research. The review of the literature in [12] aims at maximizing the coverage of injured patients, planning the ambulance service, locating the best base for a limited number of ambulance vehicles. Reference [14], examines the problem of deployment and redeployment of ambulance vehicles in the management of a pre-hospital emergency service. The work in [8] studies the case of a product transport in the context of hospital logistics at the Champagne South hospital in France. Three variants of the problem of pick-up and delivery in a short period of time are solved. First a homogeneous fleet of vehicles is considered, then a heterogeneous fleet of vehicles of finite size is used and last a heterogeneous fleet of vehicles of infinite size is considered. These problems are tackled from the viewpoint of the wellbeing of the driver including constraints on breaks, rest time and driving time for drivers. Within HHC research, Gutiérrez and Vidal [15] separate the management of transport and personnel management. Firstly, transport management determines the selection of the fleet, its size and the allocation of the fleet. Later, workforce management starts at the strategic level with staffing, then at the tactical level with staff planning, and finally at an operational level for staff assignment. The authors note that the staff allocation problem is closely related to the staff routing problem as decisions are generally made simultaneously, but some cases in HHC require decisions to be sequential. Finally, inventory management is divided into supplier selection, inventory policy, and inventory control for three planning horizons. The author points out that each decision-making creates a top-down hierarchy of logistics functions which imposes constraints at the lower levels and influences the performance of the service. The authors define a fourth level, the real-time level, which refers to operations undertaken or modified in a very short period depending on the actual execution of service processes: this is the dynamic aspect of the system. In a similar perspective, Demirbilek et al. [16] discuss the nurse routing problem in a HHC environment. Their contribution presents a heuristic for dynamic nurse routing and scheduling problem in HHC that anticipates future demand implications and can manage multiple nurses and different skill levels. The objective of this problem is to maximize the average number of daily visits made by nurses. They propose several scenarios such as grouped service areas, different service times and service horizons. The health care provider must decide whether to accept the patient and, if accepted, assign suitable appointment days, times and a nurse. The problem is dynamic, and acceptance and assignment time decisions must be made as soon as patient requests arrive. Empirical insights show that it is better to plan nurses’ routing and scheduling without restricting nurses to districts.

Patient Transport and Mobile Health Workforce

535

Furthermore, the work in [17] plans the visits to the patient’s home for a given number of days. The sub-problems are as follows. First a set of operators is assigned to a set of patients whilst considering the care plans of each patient in the planning horizon composed of several planning periods (week/month). Then, the days on which operators are scheduled to visit the assigned patients are determined. Finally, the routing problem specifies the sequence in which patients are visited by each operator. Therefore, [18] noted whether the problem formulations are based on VRP (Vehicle Routing Problem), TSP (Travelling Sales-man Problem) or DARP (Dial-A-Ride Problem). Table 1 presents several formulations for the problem mentioned. Table 1. Types of problems References

Problem

Horizon

Event/ information

Case instance

Solution

Objective

[Begur et al. 97] [19]

Assignment & scheduling pb

M

Static/Deter

R/RG

MIP

Min travel time

[Cheng et al. 98] [20]

multi-depot VRP-TW

S

Static/Deter

MIP

Min total cost of overtime hours nurses

[Bertels et al. 06] [21]

Assignment & scheduling pb

M

Static/Deter

RG

2PA

Min total travel cost Max satisfaction of patients and operators

[Eveborn et al. 06] scheduling pb [22]

S

Static/Deter

R

SP

Min total travel cost

[Thomsen, 06] [23]

VRP-TW

S

Static/Deter

2PA

Min total travel cost Min the unshared-unlocked visits Min the shared-unlocked visits

[Akjiratikarl et al. 07] [24]

VRP-TW

S

Static/Deter

R

PSO

Min total travel distance

[Bredstorm et al. 08] [25]

VRPTW-SP

S

Static/Deter

RG

SP

Min preferences Min travel time Min max difference workload

[Dohn et al. 08] [26]

VRP-TW

S

Static/Deter

R

B&P&C

Min total cost Max level services

[Elbenani et al. 08] [27]

VRP

S

Static/Deter

TS

Min total travel cost

[Ben bachouch et al. 09] [28]

VRP-TW

S

Static/Deter

ILP

Min total travel distance

[Chahed et al. 09] [29]

TSP & scheduling pb

S

Static/Deter

B&P&C

Min delivery cost Max the visits

[Hertz et al. 09] [30]

bi-objective pb

M

Static/Deter

R

MIP

Max balanced workload

[Kergosien et al. 09] [31]

MTSP-TW

S

Static/Deter

RG

ILP

Min total travel cost

[Trautsamwieser et al. 11] [32]

VRP

S

Static/Deter

R

VNS

Min nurses travel time Min the dissatisfaction level of clients and nurses

RG

(continued)

536

Y. Lahmer et al. Table 1. (continued)

References

Problem

Horizon

Event/ information

Case instance

Solution

Objective

[Barrera et al. 12] [33]

TTP & CSP

M

Static/Deter

R/RG

MIP

Min workers required

[Braysy et al. 12] [34]

VRP

S

Static/Deter

R

A/VNS

cost, service level and balanced work

[Nickel et al. 12] [35]

MSP

M

Static/Deter

R

2PA: LNS/TS

generating a medium term and weekly MSP

[Rasmussen et al. 12] [36]

multi-depot VRP-TW

S

Static/Deter

R/VRPTW [25]

B&P&C

Min total distance cost Min uncovered visit Max operator-visit preference

[Shao et al. 12] [37]

MTSP-TW

M

Static/Deter

R

MIP/GRASP

Min rehabilitative services cost

[Liu et al. 13] [38] VRP-TW

M

Static/Deter

VRPTW

TS

Min routing costs

[Allaoua et al. 14] [39]

VRP & rostering pb

S

Static/Deter

RG

ILP

Min routes & Min caregiver

[Bard et al. 14] [40]

MTSP-TW

M

Static/Deter

R/RG

MIP/ASGRASP

Min the travel, treatment, and administrative costs

[Lanzarone et al. 14] [41]

Assignment pb

M

Static/Stoc

R

exact solution

Max care continuity Min overtimes of nurses

[Mankowska et al. 14] [42]

Scheduling pb

S

Static/Deter

RG

AVNS

Min total travel cost

[Mutingi et al. 14] [43]

Scheduling pb

S

Static/Deter

RG

PSO

Max workload balance

[Trautsamwieser et al. 14] [44]

VRP

M

Static/Deter

R

B&P&C

Min total nurses working time

[Cappanera et al. 15] [45]

routing and scheduling pb

M

Static/Deter

R

ILP

Max min OR Min max (operator utilization factor)

[Yuan et al. 15] [46]

VRP

S

Static/Stoc

VRPTW

B&P&C

Min total cost

[Braekers et al. 16] [47]

bi-objective pb

S

Static/Deter

R

LNS

Min operating costs Max patient satisfaction

[Cissé et al. 16] [17]

assignment pb, sub-pb TSP

M

Static/Deter

R/VRP

MIP

Min overlapped visit Max equity among operators

[Redjem et al. 16] [48]

VRP

S

Static/Deter

R

2PA

Min vehicle tours

[Rest et al. 16] [49]

VRP

S

Dyn/Deter

R

TS

Min travel and waiting times Max satisfaction clients, nurses Min additional shifts

(continued)

Patient Transport and Mobile Health Workforce

537

Table 1. (continued) References

Problem

Horizon

Event/ information

Case instance

Solution

Objective

[Wirnitzer et al. 16] [50]

rostering pb

M

Static/Deter

R

MIP

Min different assignment nurses to patient

[Alves et al. 19] [51]

P-VRP

M

Static/Deter

R

ILP

Min total costs

[Haddadene et al. 19] [52]

VRP-TW-SP

S

Static/Deter

VRPTW [25]

LS

Min travel costs care providers Max client preferences

Legend: master schedule problem (MSP)/Set Partitioning (SP) - Horizon: Single/multi period - Events: Static/Dynamic - Information: Determinist/stochastic - Instance: R: Real case/RG: Randomly Generated

Regarding the solution approaches used, some have experimented exact solution procedures where the Branch-and-Price algorithm is predominant [17], but it has a limited efficiency for complex optimization problems [53–55]. Mostly, metaheuristics are privileged [56–58] with procedures that include algorithms based on various populations or local search procedures [17]. Hiermann et al. [59] compare the implementation of several metaheuristics to the same set of instances but due to different objectives and constraints the comparison is difficult to perform. 3.2 Intra-Hospital Service Organizing and providing intra-hospital transport of patients, supplies and equipment are part of the daily logistical activities carried out in a hospital. Although these auxiliary services may seem simple and straightforward, they have a significant impact on the quality of health care and hospital costs, according to [60]. For example, the late patient arrival to an expensive service facility, such as an operating room or magnetic resonance imaging department, leads to the under-utilization of valuable resources (staff and equipment). In addition, it disrupts the scheduling of these units. On the other hand, if a patient misses an appointment, the hospital unit must reschedule the appointment, which has a negative impact on patient satisfaction. The work [61] refers to a study carried out by the MEAH (European Hospital Audit Mission) which consisted in assigning a stretcher-bearer exclusively responsible for the patient transports from the emergency room to the imaging department for the fluidity of the routes from upstream to downstream of this department. The inter-departmental patient transport function is fully integrated into the management of patients in the hospital. Extremely dependent on appropriate coordination, the efficiency of its organization has an impact on the quality and safety of the patient’s visit, the operation of clinical services and the technical platform and the satisfaction of healthcare professionals [62]. At the university hospital campus in Québec-University of Laval, the authors of [63] conducted studies on patient transport between the various clinical services in Child Jesus Hospital. A mixed integer linear model was used to determine the optimal number of employees on each route in order to minimize completion time. The results are promising

538

Y. Lahmer et al.

because they show that it is possible to simply rearrange the schedule of employees to reduce the time required to process requests. Vancroonenburg et al. [10] present a logistics transport organization in hospitals as a model of logistics flow management based on the dynamic pick-up and delivery problem (DPDP).The authors point out that in health care, only a limited number of dynamic systems have been developed, mainly for patient transport due to its strong ad hoc nature. Two scheduling policies are developed: the first applies a least costly insertion matching heuristic. The second uses a local search to improve the current schedule. Fiegl et Pontow [64] discuss the development of an algorithm that dynamically schedules general pick-up and delivery tasks (patients, but also lab results, materials) in hospitals. Their approach minimizes the weighted average throughput time, allowing a high task throughput. The scope of the paper is on patient transport within a hospital but is also applicable to equipment that can be combined with patient transports. In the intra-hospital context, the problem is at the tactical level where the decisionmaker oversees assigning tasks according to the profile and skills of the health care workers. It is a nurse rostering problem, as shown in Table 1. 3.3 Relevant Models on DARPs for Patient Flows Modeling The Dial-a-Ride Problem (DARP) consists in designing vehicle routes and schedules for users who specify pick-up and drop-off requests between origins and destinations. Very often the same user will have two requests during the same day: an outbound request from home to a destination and an inbound request for the return trip. The aim is to plan a set of minimum cost vehicle routes capable of accommodating as many requests as possible, under a set of constraints [65]. This definition corresponds to what is known as the Home Care Routing and Scheduling Problem (HHCRSP). This problem describes notably all the extra-hospital patient transport services shown in Sect. 3.1 and can also address the problem of caregiver mobility in intra-hospital services. The Dial-a-Ride Problem differs from the wellknown multiple Traveling Salesman Problem with Time Windows (mTSPTW) in the following aspects: 1. First, in the HHCRSP there exist patients that must be visited more than once. 2. Secondly, the caregivers possess different skills and qualifications. 3. Third, temporal interdependencies of double services necessitates a careful synchronization of the interdependent working plans of the staff [42]. 4. The scheduling and routing may be periodic (multi-periodic) [51]. 5. The HHC problem differs from a traditional resource allocation problem since the degree of constraints satisfaction is decisive to the quality of the service. The Dial-A-ride Problem (DARP) includes constraints on the quality of service offered to customers. This type of constraint does not exist in other vehicle route problems. Originally, it is the transport time constraint which is added to model the quality of service. This constraint is not enough to consider all the dimensions of quality of service associated with passenger transport, especially since the problem of transport on demand is often associated with the transport of elderly or disabled people. Therefore, many versions of DARP have emerged in the literature with other types of constraints for

Patient Transport and Mobile Health Workforce

539

modelling the quality of service. Table 2 presents a classification of constraints related to patient transport and health worker mobility, revealing that operational constraints vary by actor and type. Some constraints can concur for the same objective function while other contradictory ones require priority treatment. Table 2. DARP constraints of the literature for HHC Actors

Temporal constraints Assignment constraints

Geographic constraints

Patient

Starting hour

Patient’s preferences

Type of network between home locations

Hard/soft time windows

Same caregiver

Preference days

Limited number of caregivers

Costs

Disjunction Synchronization Frequency of visits Caregiver

Maximum working time for the week/day

Caregiver preferences Starting/finishing at fixed point

Overtime

Qualification

Availability time windows Availability days Breaks Days off Employment contracts (full-time/part-time) Driver

Starting hour

Driver’s preferences

Break

Same route/district

Only district/all routes

Rest time Driving time Provider

Time windows

Single (S)/Multi (M) objectives

Number of caregivers (continued)

540

Y. Lahmer et al. Table 2. (continued)

Actors

Temporal constraints Assignment constraints Ride time

Geographic constraints

Costs

Single (S)/Multi (M) depots

Number of vehicles

Single (S)/Multi (M) trips

Number of requests

Single (S)/Multi (M) vehicles

Working time/overtime

Homogeneous (HO)/Heterogeneous (HE) fleet

Adding new patient to the planning

Vehicle capacity

Rejection

Selective visits

Travelling distance

DARPs can be classified considering two main aspects. If all relevant information for decision making is provided at the beginning of the operations the DARP is static and when the decision makers adjust existing plans in response to new information received during implementation, the DARP is dynamic. If the information is known with certainty, the DARP is deterministic, whereas when some uncertainty remains at the time of the decisions, it is said stochastic. A survey in [66] presents DARPs as approaches prompted by real-life applications and highlights hospital specificities such as the urgency of time and equipment/staff compatibilities, staff and maintenance planning, the non-sharing of ambulances for isolated patients, accompanying staff/equipment, the specific pick-up and delivery sequence of doctors and patients and priority management (urgent or normal) of requests. For non-emergency transport of patients to/from hospitals, constraints are mainly related to vehicle capacity, vehicle type (staff seats, patient seats, stretchers and wheelchairs) and driver-vehicle. Other DARP are treated in Parragh et al. [67], and Schilde et al. [68] which studied the Austrian Red Cross in Graz. The choice not to serve some clients is allowed in Zhang et al., [69], Liu et al. [70] and Lim et al. [71]. In Molenbruch et al. [72] restrictions on user-user and user-driver combinations are considered. In the application in Tuscany studied by Detti et al. [73], a patient can choose the transport provider among different non-profitable organizations. Fikar and Hirsh [74], compared the trip and car sharing concepts in HHC to the traditional case of a transport problem by an individual car. The problem defined as an extended many to many multi-trip DARP for a single period. Their objective was to minimize the total traveling time of the nurses and drivers and obtain sustainable solutions for healthcare operators in Austria. A discrete-event driven biased-randomized metaheuristic solution approach was developed for dynamic home service routing with synchronized trip sharing, including dynamic events as cancellations or new requests.

Patient Transport and Mobile Health Workforce

541

For intra-hospital transport, [66] lists a number of different problems that can be formulated as DARPs, such as patient transport, displacement of supplies and equipment for diagnostic or therapeutic reasons. Vancroonenburg [10] adds that the organization of logistical transport in hospitals naturally belongs to the DARP class. Indeed, certain routes or corridors in a hospital may be excluded for the transport of patients, whereas allowed for the transport of goods. Thus, the transport time depends on what is transported. Another example is when several goods can be transported together, while patient transport is ideally not combined with other transports. The dynamic aspect of the problem is also of relevant, as it is characterized by high demand arrival rates and short transport times between pick-ups and deliveries in a hospital. As a result, assignment decisions must be made within a relatively short time frame. The authors of [60] discuss the application of a “Dynamic DARP” (DDARP), a variant of the DPDP, to organize the intra-hospital transport of patients in a German hospital. Their study focuses mainly on the transport between buildings in a multi-building hospital site. This application led to the development of Opti-TRANS, a computerized planning system that supports all parties involved in the transport workflow, including nurses, transport staff, dispatchers, logistics managers and financial controllers.

4 Conclusion The organization and optimization of intra- and extra-hospital flows of people play a fundamental role for a better governance of Human Resources for Health in a context of global shortage of health workers. The flow of people in health services is reflected in the transport of patients and the mobility of health personnel. Patient flow logistics can be described by three steering models: segmented patient trajectory management, global management or full patient care. The problem of patient transport and mobile healthcare workforce is a problem generic to all intra-hospital and extra-hospital care service structures. Nevertheless, limited operational research has focused directly on this problem. It is usually treated as a sub-problem of health workforce planning or transport management studied separately. Moreover, the optimization of this transport problem fits into the Dial a Ride Problem class. Although the application of operations research techniques to support this class of problem is not new, more attention should be directed to the problem of patient transport and mobile healthcare workforce. With different stakeholders, DARP systems often have multiple (and sometimes conflicting) objectives, requiring multi-criteria models in a dynamic healthcare context.

References 1. Aggoune-Mtalaa, W., Aggoune, R.: An optimization algorithm to schedule care for the elderly at home. Int. J. Inf. Sci. Intell. Syst. 3(3), 41–50 (2014) 2. Van den Bergh, J., Beliën, J., De Bruecker, P., Demeulemeester, E., De Boeck, L.: Personnel scheduling: a literature review. Eur. J. Oper. Res. 226, 367–385 (2013) 3. De Bruecker, P., Van den Bergh, J., Beliën, J., Demeulemeester, E.: Workforce planning incorporating skills: State of the art. Eur. J. Oper. Res. 243, 1–16 (2015)

542

Y. Lahmer et al.

4. Cheang, B., Li, H., Lim, A., Rodrigues, B.: Nurse rostering problems—a bibliographic survey. Eur. J. Oper. Res. 151, 447–460 (2003) 5. OMS|Travailler ensemble pour la santé - rapport sur la santé dans le monde (2006). https:// www.who.int/whr/2006/fr/ 6. Health workforce. https://www.who.int/westernpacific/health-topics/health-workforce 7. WHO|Global Health Workforce Network. http://www.who.int/hrh/network/en/ 8. Blua, P., Yalaoui, F., Amodeo, L., Laplance, D., De Block, M. (eds.): Hospital Logistics and e-Management. ISTE Ltd/John Wiley and Sons Inc., Hoboken (2019) 9. Kahla-Touil, I.B.: Gestion des risques et aide à la décision dans la chaîne logistique hospitalière : cas des blocs opératoires du CHU Sahloul (2011). https://tel.archives-ouvertes.fr/tel-007 14925 10. Vancroonenburg, W., Esprit, E., Smet, P., Berghe, G.V.: Optimizing internal logistic flows in hospitals by dynamic pick-up and delivery models, vol. 13 11. Benanteur, Y.: La sous-traitance de fonctions logistiques en milieu hospitalier: un enjeu complexe dans un contexte budgétaire constraint et structurant. Logistique Manage. 12, 41–48 (2004) 12. Rais, A., Viana, A.: Operations research in healthcare: a survey. Int. Trans. Oper. Res. 18, 1–31 (2011) 13. Serrou, D., Abouabdellah, A.: Mesure de la performance de la chaîne logistique hospitalière en intégrant les dimensions: Coûts, Sécurité et Qualité: Application en cas du regroupement des pharmacies, vol. 11 14. Bélanger, V., Ruiz, A., Soriano, P.: Déploiement et Redéploiement des Véhicules Ambulanciers dans la Gestion d’un Service Préhospitalier d’Urgence. INFOR: Inf. Syst. Oper. Res. 50, 1–30 (2012) 15. Gutiérrez, E.V., Vidal, C.J.: Home Health Care Logistics Management: Framework and Research Perspectives, vol. 9 16. Demirbilek, M., Branke, J., Strauss, A.: Dynamically accepting and scheduling patients for home healthcare. Health Care Manag. Sci. 22, 140–155 (2019) 17. Cissé, M., Yalçında˘g, S., Kergosien, Y., Sahin, ¸ E., Lenté, C., Matta, A.: OR problems related to Home Health Care: A review of relevant routing and scheduling problems. Oper. Res. Health Care 13–14, 1–22 (2017) 18. Hirsch, P.: Recent planning approaches and mobility concepts for home health care services in Austria – A review. Die Bodenkultur: J. Land Manage. Food Environ. 68, 205–222 (2018) 19. Begur, S.V., Miller, D.M., Weaver, J.R.: An integrated sdss for scheduling and routing homehealth-care nurses. INFORMS J. Appl. Anal. 27, 35–48 (1997) 20. Cheng, E., Rich, J.L.: A Home Health Care Routing and Scheduling Problem (1998) 21. Bertels, S., Fahle, T.: A hybrid setup for a hybrid scenario: combining heuristics for the home health care problem. Comput. Oper. Res. 33, 2866–2890 (2006) 22. Eveborn, P., Flisberg, P., Rönnqvist, M.: Laps Care—an operational system for staff planning of home care. Eur. J. Oper. Res. 171, 962–976 (2006) 23. Thomsen, K.: Optimization on Home Care, vol. 182 24. Akjiratikarl, C., Yenradee, P., Drake, P.R.: PSO-based algorithm for home care worker scheduling in the UK. Comput. Ind. Eng. 53, 559–583 (2007) 25. Bredström, D., Rönnqvist, M.: Combined vehicle routing and scheduling with temporal precedence and synchronization constraints. Eur. J. Oper. Res. 191, 19–31 (2008) 26. Dohn, A, Rasmussen, M.S., Justesen, T., Larsen, J.: The home care crew scheduling problem. In: Sheibani, K. (ed) Proceedings of the 1st International Conference on Applied Operational Research (ICAOR 2008), Teheran, pp 1–8 (2008) 27. Elbenani, B., Ferland, J.A., Gascon, V.: Mathematical programming approach for routing home care nurses. In: 2008 IEEE International Conference on Industrial Engineering and Engineering Management, pp. 107–111 (2008)

Patient Transport and Mobile Health Workforce

543

28. Ben Bachouch, R., Guinet, A., Hajri-Gabouj, S.: A model for scheduling drug deliveries in a french homecare (2009). https://hal.archives-ouvertes.fr/hal-00385516 29. Chahal, K., Eldabi, T.: Applicability of hybrid simulation to different modes of governance in UK healthcare. In: 2008 Winter Simulation Conference, Miami, FL, USA, pp. 1469–1477. IEEE (2008) 30. Hertz, A., Lahrichi, N.: A patient assignment algorithm for home care services. J. Oper. Res. Soc. 60, 481–495 (2009) 31. Kergosien, Y., Lenté, C., Billaut, J.-C.: An extended multiple Traveling Salesman Problem, vol. 8 (2009) 32. Trautsamwieser, A., Gronalt, M., Hirsch, P.: Securing home health care in times of natural disasters. OR Spectrum 33, 787–813 (2011) 33. Barrera, D., Velasco, N., Amaya, C.: A network-based approach to the multi-activity combined timetabling and crew scheduling problem: workforce scheduling for public health policy implementation. Comput. Ind. Eng. 63, 802–812 (2012) 34. Bräysy, O., Arola, J., Dullaert, W., Väisänen, J.: Planning Strategies for Home Care services, vol. 26 35. Nickel, S., Schröder, M., Steeg, J.: Mid-term and short-term planning support for home health care services. Eur. J. Oper. Res. 219, 574–587 (2012) 36. Rasmussen, M.S., Justesen, T., Dohn, A., Larsen, J.: The home care crew scheduling problem: preference-based visit clustering and temporal dependencies. Eur. J. Oper. Res. 219, 598–610 (2012) 37. Shao, Y., Bard, J.F., Jarrah, A.I.: The therapist routing and scheduling problem. IIE Trans. 44, 868–893 (2012) 38. Liu, M., Zhang, P.: Three-level and dynamic optimization model for allocating medical resources based on epidemic diffusion model. In: Zhang, Z., Zhang, R., Zhang, J. (eds.) LISS 2012, pp. 241–246. Springer, Berlin, Heidelberg (2013) 39. Allaoua, H.: Routage et planification des personnels pour l’hospitalisation à domicile (2014). https://www.theses.fr/2014PA132060 40. Bard, J.F., Shao, Y., Jarrah, A.I.: A sequential GRASP for the therapist routing and scheduling problem. J. Sched. 17, 109–133 (2014) 41. Lanzarone, E., Matta, A.: Robust nurse-to-patient assignment in home care services to minimize overtimes under continuity of care. Oper. Res. Health Care 3, 48–58 (2014) 42. Mankowska, D.S., Meisel, F., Bierwirth, C.: The home health care routing and scheduling problem with interdependent services. Health Care Manag. Sci. 17, 15–30 (2014) 43. Mutingi, M., Mbohwa, C.: Home Healthcare Staff Scheduling: A Clustering Particle Swarm Optimization Approach, vol. 10 44. Trautsamwieser, A., Hirsch, P.: A branch-price-and-cut approach for solving the medium-term home health care planning problem. Networks 64, 143–159 (2014) 45. Cappanera, P., Scutellà, M.G.: Joint Assignment, Scheduling, and Routing Models to Home Care Optimization: A Pattern-Based Approach. Transp. Sci. 49, 830–852 (2015) 46. Yuan, B., Liu, R., Jiang, Z.: A branch-and-price algorithm for the home health care scheduling and routing problem with stochastic service times and skill requirements. Int. J. Prod. Res. 53, 7450–7464 (2015) 47. Braekers, K., Hartl, R.F., Parragh, S.N., Tricoire, F.: A bi-objective home care scheduling problem: analyzing the trade-off between costs and client inconvenience. Eur. J. Oper. Res. 248, 428–443 (2016) 48. Redjem, R., Marcon, E.: Operations management in the home care services: a heuristic for the caregivers’ routing problem. Flex. Serv. Manuf. J. 28, 280–303 (2016) 49. Rest, K.-D., Hirsch, P.: Daily scheduling of home health care services using time-dependent public transport. Flex. Serv. Manuf. J. 28, 495–525 (2016)

544

Y. Lahmer et al.

50. Wirnitzer, J., Heckmann, I., Meyer, A., Nickel, S.: Patient-based nurse rostering in home care. Oper. Res. Health Care 8, 91–102 (2016) 51. Alves, F., Alvelos, F., Rocha, A., Pereira, A., Leitão, P.: Periodic vehicle routing problem in a health unit: In: Proceedings of the 8th International Conference on Operations Research and Enterprise Systems, pp. 384–389. SCITEPRESS - Science and Technology Publications, Prague, Czech Republic (2019) 52. Ait Haddadene, S.R., Labadie, N., Prodhon, C.: Bicriteria VRP with preferences and timing constraints in home health careservices. Algorithms 12, 152 (2019) 53. Amroun, K., Habbas, Z., Aggoune-Mtalaa, W.: A compressed generalized hypertree decomposition-based solving technique for non-binary constraint satisfaction problems. AI Commun. 29, 371–392 (2016) 54. Bennekrouf, M., Aggoune-Mtalaa, W., Sari, Z.: A generic model for network design including remanufacturing activities. Supply Chain Forum Int. J. 14, 4–17 (2013) 55. Serrano, C., Aggoune-Mtalaa, W., Sauer, N.: Dynamic models for green logistic networks design. In: IFAC Proceedings, vol. 46, pp. 736–741 (2013) 56. Rezgui, D., Chaouachi-Siala, J., Aggoune-Mtalaa, W., Bouziri, H: Application of a memetic algorithm to the fleet size and mix VRP with electric modular vehicles|Proceedings of the Genetic and Evolutionary Computation Conference Companion 57. Rezgui, D., Siala, J.C., Aggoune-Mtalaa, W., Bouziri, H.: Towards smart urban freight distribution using fleets of modular electric vehicles. In: Ben Ahmed, M., Boudhir, A.A. (eds.) Innovations in Smart Cities and Applications, pp. 602–612. Springer International Publishing, Cham (2018) 58. Rezgui, D., Chaouachi Siala, J., Aggoune-Mtalaa, W., Bouziri, H.: Application of a variable neighborhood search algorithm to a fleet size and mix vehicle routing problem with electric modular vehicles. Comput. Ind. Eng. 130, 537–550 (2019) 59. Hiermann, G., Prandtstetter, M., Rendl, A., Puchinger, J., Raidl, G.R.: Metaheuristics for solving a multimodal HHC scheduling problem. Cent. Eur. J. Oper. Res. 23, 89–113 (2015) 60. Hanne, T., Melo, T., Nickel, S.: Bringing robustness to patient flow management through optimized patient transports in hospitals. Interfaces 39, 241–255 (2009) 61. Glaa, B.: Contribution à la conception et l’optimisation d’un système d’aide à la gestion des urgences (2008). https://tel.archives-ouvertes.fr/tel-00359607 62. Mission Nationale d’Expertise et d’Audit Hospitaliers. Guide pratique: Décrire et Analyser un processus de prise en charge (2005) 63. Hassani, R., Desaulniers, G., El Hallaoui, I.: Réoptimisation multi-objectif en temps réel suite à une petite perturbation, Rapport technique, Les Cahiers du GERAD G{2018{47. GERAD, HEC Montréal, Canada (2018) 64. Fiegl, C., Pontow, C.: Online scheduling of pick-up and delivery tasks in hospitals. J. Biomed. Inform. 42, 624–632 (2009) 65. Cordeau, J.-F., Laporte, G.: The Dial-a-Ride Problem (DARP): Variants, modeling issues and algorithms. 4OR, vol. 1 (2003) 66. Ho, S.C., Szeto, W.Y., Kuo, Y.-H., Leung, J.M.Y., Petering, M., Tou, T.W.H.: A survey of diala-ride problems: Literature review and recent developments. Transp. Res. Part B: Methodol. 111, 395–421 (2018) 67. Parragh, S.N., Cordeau, J.-F., Doerner, K.F., Hartl, R.F.: Models and algorithms for the heterogeneous DARP with driver-related constraints. OR Spectrum 34, 593–633 (2012) 68. Schilde, M., Doerner, K.F., Hartl, R.F.: Metaheuristics for the dynamic stochastic dial-a-ride problem with expected return transports. Comput. Oper. Res. 38, 1719–1730 (2011) 69. Zhang, Z., Liu, M., Lim, A.: A memetic algorithm for the patient transportation problem. Omega 54, 60–71 (2015) 70. Liu, M., Luo, Z., Lim, A.: A branch-and-cut algorithm for a realistic dial-a-ride problem. Transp. Res. Part B: Methodol. 81, 267–288 (2015)

Patient Transport and Mobile Health Workforce

545

71. Lim, A., Zhang, Z., Qin, H.: Pickup and delivery service with manpower planning in Hong Kong Public hospitals. Transp. Sci. 51, 688–705 (2016) 72. Molenbruch, Y., Braekers, K., Caris, A., Vanden Berghe, G.: Multi-directional local search for a bi-objective dial-a-ride problem in patient transportation. Comput. Oper. Res. 77, 58–71 (2017) 73. Detti, P., Papalini, F., de Lara, G.Z.M.: A multi-depot dial-a-ride problem with heterogeneous vehicles and compatibility constraints in healthcare. Omega 70, 1–14 (2017) 74. Fikar, C., Hirsch, P.: Home health care routing and scheduling: A review. Comput. Oper. Res. 77, 86–95 (2017)

Semantic Web and Healthcare System in IoT Enabled Smart Cities Barakat A. Dawood1,2(B) and Melike Sah1 1 Research Centre for AI and IoT, Near East University, via Mersin 10, Nicosia,

North Cyprus, Turkey [email protected], [email protected] 2 Computer Engineering Department, Near East University, via Mersin 10, Nicosia, North Cyprus, Turkey

Abstract. The ultimate objective is in healthcare systems is to boost healthcare services. This evolvement of better biomedical products to a great extent relies upon the capacity to share and connect the abundance of gathered clinical information. The major issue concerning this goal is not just empowering the union of the information but to make information examination and interfaces easy to understand. To achieve this, Semantic Web (SW) technologies can be utilized, which provide standards and inter-operable rich semantic metadata for intelligent applications. This paper, overviews and examines different predominant research headings in the areas of SW and healthcare. In particular, we summarize (1) ontologies in healthcare, (2) the usage of SW technologies for representing patient records, (3) discuss healthcare approaches that utilizes SW, (4) discuss interoperability issue because of the heterogeneous healthcare data, (5) investigate integration of IoT and artificial intelligence techniques into smart semantic healthcare systems and (6) outline SW based security methodologies. Finally, we conclude with future research challenges. Keywords: Semantic web · Ontology · Artificial intelligence · Healthcare · IoT · EHR

1 Introduction Semantic Web (SW) provides technologies for representing the meaning of the information in a machine-processable manner where machines can process, change, accumulate, and even follow up on the data in an accommodating way. To achieve this, first, a shared agreed vocabulary (such as RDF Schema [1]) or an ontology [2, 3], needs to be created. Then, based on the ontology/schema, information can be extracted/represented in a standard way that can be automatically consumed, inferred, and presented by intelligent applications. In particular, semantic rules can be applied to learn new data, or SPARQL queries can be applied to query incomplete data flexibly. On the other hand, SW technologies can be applied to any kind of content [4]; to web pages, text documents, slides, videos, speech files, etc. In the context of SW, data and metadata are separated. Semantic data (metadata) about any resource (text, pdf, image, etc.) can be generated © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 546–557, 2021. https://doi.org/10.1007/978-3-030-66840-2_41

Semantic Web and Healthcare System in IoT Enabled Smart Cities

547

by applying data mining techniques and by structuring the extracted data according to a vocabulary. In particular, the extracted semantic data can be stored in a triple format (subject predicate object), called the Resource Description Framework (RDF) [1]. RDF data can be saved in a variety of syntaxes such as RDF/XML, N3, Turtle, Json, RDFa (embedded into HTML documents). Since RDF is separated from the source content, this knowledge can be distributed, queried in a flexible manner using SPARQL and consumed easily. As a result of this, SW allows interoperability in intelligent applications, such as in healthcare systems. In an intelligent healthcare system, information can be in different locales (i.e. in-hospital facility, in a smart home, in a clinic, etc.,), in different formats (sensor collected data, databases, health records, documents, etc.), and processed by different applications as shown in Fig. 1. Therefore, there is a need for a common language, such as SW the offers, for intelligent healthcare systems to operate in heterogeneous environments. In this context, ontologies, RDF, SPARQL queries, and inference support of SW can help to build strong and more inter-operable healthcare frameworks as discussed in [5, 6, 7]. Ontologies can support the requirements of a healthcare process for re-use, transmitting, and sharing individual information of the patient. The use of ontology in medicine is fundamentally centered on the representation and organization or revamping of medical terminologies.

Fig. 1. Interoperability in heterogeneous healthcare environments [8]

One of the significant challenges in intelligent healthcare systems is that the interoperability of wellbeing and clinical information. The created information originates from different sources, yet it is put away in various and disseminated information forms, it lives inside different authoritative areas and it has an inconsistency in naming, structure, and arrangement. A critical prerequisite is to catch significant information, yet additionally to make it universally accessible for others around the globe in an arrangement that will be anything but difficult to get to, definite, and reasonable as far as proficient information handling and joining with different systems. Notwithstanding the information joining

548

B. A. Dawood and M. Sah

issue, the client communication with the information is another difficulty [9]. The test lies in the retrieval of information and information steering. Also, another huge need is to give a consistent introduction of that information and to permit basic route and capable information investigation, even from different gadgets. This challenge can be addressed by SW technologies since the information is represented by standards (RDF) conforming to ontologies, as well as, data and presentation are separated, which allows intelligent applications to use, distribute and query this structured data on different systems. Another challenge is utilizing this gigantic measure of information to find critical new examples and apply such information into significant data provoking possible improvement of asset use, persistent wellbeing, and the headway of biomedical items. For example, an information mining application could examine persistent information, side effects, conditions, or family ancestry to recognize the reasons for illnesses just as to prescribe a clinical answer for the patient at a lesser expense. The World Wide Web Consortium has set up the Semantic Web for Health Care and Life Sciences Interest Group [10] whose purpose is to set up, advance, and energize the utilization of SW advances in human services and related fields [11]. Also, IoT enabled sensors can be used to assist the gathering of clinical and patient information from smart homes, wellbeing sensor networks, etc. Again, this sensor information can be mediated between different healthcare systems by representing this information using SW technologies. In this paper, we review the present status of how the SW is utilized in the wellbeing information unification and interoperability issues of healthcare systems. Also, we discuss how the structured information can help searching and recuperation of clinical information. Also, we discuss the use of IoT and artificial intelligence (AI) methods for the future of smart healthcare systems. The main discussions of the paper can be summarized as follows: • This survey provides an overview of the SW in the healthcare sector; ontologies, interoperability, heterogeneous data sources, data mining, and other aspects are discussed. • We outline the potential usage of IoT and artificial intelligence using semantic data in the healthcare sector. • We discuss SW based security models for healthcare systems. • Finally, open research challenges and issues are discussed. The paper is organized as follows. Section 2 overviews the combination of SW approaches to the healthcare system. Section 3 explores how IoT and machine intelligence can be incorporated into the SW and healthcare domains. Section 4 describes the security issues concerning both worlds. Finally, Sect. 5 closes the work and we ponder the conceivable effect of this worldview.

2 Semantic Web and Healthcare SW is applied in different fields of healthcare systems [12]. In this section, we briefly summarize these efforts.

Semantic Web and Healthcare System in IoT Enabled Smart Cities

549

2.1 Ontology in Healthcare The use of ontology in medicine essentially focuses on the representation and organization of medical terminologies. Specialists have developed their specialized languages to assist them with keeping up and communicate effectively general medical data and patient information [1, 13–15]. Such terminologies, which are streamlined for human processing, are described by a lot of knowledge not communicated. Meanwhile, medical data frameworks should be able to convey unpredictable and definite medical concepts. It can be accomplished by building an ontology of the medical field to introduce medical terminology frameworks [16]. The advantages of ontology in this area: (1) Ontologies can help fabricating more grounded frameworks and higher collaboration of data in healthcare. Collaboration in healthcare is the ability of different technological information frameworks to trade data. (2) Ontologies can bolster the requirement for a healthcare procedure to transmit and reuse patient records. (3) Ontologies can support the coordination of knowledge and data, which can be considered as the most significant advantage they can bring to healthcare frameworks. 2.2 Semantically Enhanced Patient and Clinical Information A definitive goal to improve human services rehearses and the advancement of better biomedical items, by and large, depends upon the ability to share and connect the wealth of assembled clinical information. The principle trouble to seek after this yearning objective is not just engaging the incorporation of the information spreading over heterogeneous information sources and organizations however in the improvement of gadgets and gauges for adaptable inquiry, information investigation, and easy to understand interfaces [17]. The computerization of wellbeing, for example, the use of information structures and inventive clinical gadgets day by day in emergency clinics and other clinical foundations have quite recently conveyed and will continue delivering a huge proportion of data from clinical records, quietly observing, and clinical imaging. This hazardous advancement of wellbeing related information should be examined appropriately to uncover noteworthy data and to change such information into significant data that could incite to improved social insurance rehearses and the advancement of better biomedical items. SW allows patient records and clinical data to be structured and stored in scattered domains. Healthcare data, systems applications and information is scattered across different spaces, offices, branches, and so forth. Similarly as with the vast majority of the circulated applications present today, information sharing and a blend of scattered segments in Healthcare systems are ordinarily done in specially appointed habits. There are no rules set up for information move, information sharing, and information correspondence between dispersed locales. New information sharing rules, for example, HL7 [18] for wellbeing information configuration, and web administrations for information trade are new advances, and they are not yet solidified in most Health systems yet [19]. For these and various reasons, the use of information from Health systems is extremely hard for structures and applications outside of the space. Semantic web innovations then again give progressively extensible and increasingly versatile information stockpiling and interoperability decisions for any information structures including Health data

550

B. A. Dawood and M. Sah

systems. Information in conveyed structures can be easily associated utilizing a standard connecting instrument like URI/URL simply like in the web space. The semantic information model is entirely versatile. In [20, 21], authors provide semantically enhanced patient records, electronic prosperity record (EHR), such as significant events are recorded during a medical procedure in a continuous situation. In [22], they portray the security challenges in HL7 [23] EHR, talk about the utilization of semantics, and depict how much information is required to achieve semantic disclosure. It additionally portrays the security escape clauses presented in HL7. The telemonitoring home clinical contraptions were used for the on-demand uniform telemonitoring administrations. The information is accessed from different gadgets. The engineering oversees EHR and keeps up the patient’s security and end-client check. But the design does not deal with the patient’s mindfulness and there is no choice help system to check the approaching reports [24]. 2.3 Using Semantic Web in Healthcare Systems The major issues in Healthcare structures are the information sharing and interoperability. RDF and triple stores (semantic databases) are utilized to execute a European patient diagram structure [25]. European patient summary system is in the arrangement of the European Union’s Health methodology as the unification of the part countries requires such exercises. One of the significant objectives of this task is to engage the versatility and sharing of patient information in a sheltered, security ensuring, and protection careful way, supporting various dialects. A related work was done in the US, as well. US Center for Disease Control (CDC) shaped the Public Health Information Network PHIN for Healthcare frameworks, and SW utilization in these frameworks is suggested [26]. Sensor frameworks research hopes to make significant appropriate innovations in various regions including the human services area. Here, the SW can be utilized to connect semantics to the sensor information with the objective that heterogeneous information from various stages and configurations can be gathered and utilized. There is a proposition to handle the issue of wellbeing picture information social occasion and blend utilizing SW technologies [27]. Information and system fuse is one of the old and existing issues of all information structures. Information and system joining in Health is furthermore a current issue successfully being examined. One of the continuous advances around there is the administration of situated joining procedures and programming models. Systems offer their functionalities to various systems by strategy for administrations characterized on those structures as interfaces. Web administrations innovation includes different straightforward conventions for administration depiction, message setup, and administration distribution. Human services systems are one of the critical application locales for web administrations where heterogeneous appropriated social insurance structures can be consolidated utilizing standard conventions and message positions. Regardless, there is a creating issue here, too. As an ever-expanding number of structures are joined to this overall wellbeing framework by their stand-out administrations, finding and using administrations from this reliably creating arrangement of wellbeing systems is transforming into its one of a kind issue. Here, new advancements

Semantic Web and Healthcare System in IoT Enabled Smart Cities

551

are offered to disentangle the heterogeneity of web administration contributions, specifically semantic web administrations. Two huge arrangements in these territories are OWL-S (OWL-based web administration philosophy) and WSMO (Web administration displaying cosmology) [28, 29]. Both WSMO and OWL-S are being utilized in the customized and dynamic disclosure of web benefits in Health systems. An EU-supported venture CASCOM investigated the improvement of a framework for a job-based portrayal and utilization of administrations, utilizing OWL-S and specialists [30]. In this work, wellbeing administrations for client jobs, for example, tolerant, specialist, nurture, and so on are characterized and the structure finds the best possible help consequently (administration coordinating). SW is furthermore utilized in Health business forms. They offered to utilize the semantic web in the determination procedure and indicated their work in the finding of Miral Valve Prolapse sickness [31]. A model system was developed for the customized conclusion of coronary sickness by building a clinical data information structure utilizing SW [32]. SW is utilized in the portrayal and execution of clinical methods [33]. Another noteworthy theme around there is the semantic interoperability of disseminated Health structures. A metaphysics based structure was produced for semantic interoperability [34]. They plan to join particular cosmology semantically (metaphysics mapping) towards interoperable structures. 2.4 Interoperability in Healthcare The act of human services has utilized electronic strategies for regulatory tasks for a long, yet the social insurance industry has all the more bit by bit held onto innovation as a way to deal with improve the conveyance of its administrations and for a legitimate explanation. Different normalization endeavors [35] are progressing to address this interoperability issue, for instance, EHRcom, openEHR, and HL7 [36]. HL7 is one of the early and most unique standard associations conveying electronic systems to the medicinal services industry. HL7 adaptation 2 is the most by and large executed human services informatics standard on the planet today. In any case, being HL7 Version 2 consistent does not construe direct interoperability between human services structures. Variant 2 messages contain various discretionary information fields. This adaptability gives phenomenal flexibility yet requires point by direct respective understandings among the human services systems toward achieving interoperability [37]. This issue has been seen also in HL7 since 1996 when the improvement of a reference data model (RIM) began, and it transformed into an establishment of HL7 variant 3. Invariant 3 the RIM is a complete source from which all convention detail norms draw their data related substance. HL7 Version 2.x, a similar number of other application conventions in different spaces, has no legitimate data model; the model is understood, not unequivocal. Along these lines, from one hand, the distinctive HL7 messages in v2.x are like programming language structures, anyway without formal assignments and huge article situated ideas, for example, speculation specialization chains of importance. Then again form, 2.x has no appropriate official of standard vocabularies to structures. The ties are specially appointed and reliably site express. HL7 form 3 spots in the RIM an unequivocal information semantics model from which realizing the messages locally and top-down. This complements reuse over different settings. Furthermore, RIM

552

B. A. Dawood and M. Sah

has formalisms for jargon support. It has a solid semantic establishment in explicitly characterized thought spaces drawn from the best wordings [38] (SNOMED, LOINC, CPT, ICD, and so on.) which, in HL7 RIM working gathering conclusion, makes semantic interoperability possible. Nonetheless, it is not reasonable to expect all the medicinal services organizations to foundations to conform to alone standard. Moreover, different forms of a comparative standard, (for example, HL7 Version 2 and Version 3) and even the different use of a comparative standard, for example, some HL7 Version 2 executions, do not interoperate. Accordingly, there is a need to address the interoperability issue at the semantic level using SW technologies such as the work of [39]. In [39], the authors provide the testing on EHR interoperability by the usage of ontology.

3 Intelligence and IoT in SW and Healthcare IoT and related advances are continuously inescapable in the society and businesses and are currently being applied to healthcare systems [40]. As shown in Fig. 2, using IoT sensors, cross-domain sensor data about patients can be gathered continuously to monitor and understand their wellbeing. One of the challenges for IoT empowered healthcare systems is that interoperability. SW can be used to solve interoperability issues since data is formatted using a machine-understandable format using RDF according to ontologies. Secondly, semantically tagged sensor data can be re-used, processed, and inferred automatically. Using SW and IoT, smart healthcare services are possible.

Fig. 2. Combining IoT data from different domains [16]

Semantic Web and Healthcare System in IoT Enabled Smart Cities

553

Semantic methods can be utilized in IoT empowered healthcare systems for inferencing, information depiction, and channel or gathering of data. Authors in [41] suggested a semantic method to depict the organized restorative administration’s rules for putting an alarm situation by watching the identifying data from social protection sensors, for instance, heartbeat, lipid, and circulatory strain. Authors in [42] suggested the semantic method for the social insurance framework by determining the gadget and area data alongside depicting rules to encourage appropriate services. In addition to IoT, AI technologies can also be integrated into smart IoT empowered healthcare systems as explained below. Natural Language Processing (NLP) can be used to improve intelligent healthcare systems such as the system presented in [43], authors introduce a decision support system (CDSS) for the patient administration. They have introduced thinking and an ontological display for the substance of clinical practice rules. This methodology can support analysts and clinicians better comprehend and break down patients’ suppositions, what’s more, needs toward different wellbeing points. Rule-based expert systems have been used in healthcare for decades and can also be integrated with SW for interoperability, such as the works of [44] and [45]. The author applied SW strategies, a framework for upgrading cautions explicitness in basic consideration situations, and the other author used an antidiabetic medication determination suggestion framework based on rules respectively. However, with the recent advances in machine learning (ML), more autonomous methods are being introduced to aid, assist, and complement smart IoT enabled healthcare systems. Such an ML method is used in human administrations [46]. It is a drug forecasting method that uses patient characteristics and the medication setting [46]. Recently, deep learning methods have become the most significant advancement for the detection [47], diagnosis [48], and analysis of diseases. Hence in the future, deep learning will be an integral part of decision support of smart healthcare systems that can use sensor data, semantic data, and other forms of patient information, to assist radiologists, doctors and increase the efficiency of smart healthcare systems [49].

4 Security in Semantically Enhanced Healthcare Systems Policy, Protection, and security, subject to semantic medicinal service framework and are exceptionally identified with one another. They individually each field has a range of scope of challenges and difficulties, with which an assortment of arrangements are implemented in different areas. Not all issues and difficulties may cover the SW advancements. Here, we briefly summarize a few of the efforts that address security issues using SW. 4.1 A Taxonomy of Privacy This profoundly refers to subjects that are utilized to arrange protection, this methodology is performed by Solove [50]. They contend that protection is an equivocal, polysemic and regularly emotional term that can in this manner not be decreased to a basic idea, and particularly cannot be taken absolutely from the perspective of a guideline. Rather

554

B. A. Dawood and M. Sah

than giving a meaning for security, Solove centers around protection dangers which, he contends, can be recorded and characterized in an increasingly strong way. This scientific classification of security problems where data-based exercises that are known to make issues are partitioned into four primary classes: data assortment, data preparation, data dispersal, and attack. 4.2 Availability, Authentication, Authorization, and Trustworthiness The foreseen semantic web condition can comprise of sensor hub facilitated administrations. Henceforth, it is profoundly critical that these administrations exist from wherever whenever to deliver data-based semantic. To fulfill this property, a security convention isn’t present. Be that as it may, different realistic measures might be acknowledged to guarantee the accessibility [51]. It is applied to the character check. In semantic web condition, shared confirmation is required because semantic web condition information is applied in activating procedures and dynamic. Along these lines, the administration customer and the specialist co-op should be guaranteed that the administration is drawn closer by a valid assistance and a client is given by a real source. When utilizing any verification procedure, it is required to enroll the characters of client and asset challenges of semantic web condition objects which causes limited limitations to enable the strategy of validation [51]. This model is utilized to depict the entrance strategies that typically designate particular benefits to subjects. Semantic web condition needs encouraging re-useable, dynamic, refreshing, fine-grained, and simple to utilize strategies depicting the system. Consequently, it is huge to externalize the meaning of approach and requirement procedure of semantic web condition administrations [51]. A few fragile applications, for example, human services administrations, security basic administrations, require to break down the reliability of different elements reveled. From a semantic web condition application point of view, examining the dependability of sensors information and sensor is huge. Non-dependable sensor information or vindictive sensor hubs can prompt a catastrophe in wellbeing basic spots. Untrusted sensor information may be gotten from a confidant in the sensor hub. Non-reliable nature may consist of two expectations: unexpected mistakes and deliberate trouble making. It tends to be simpler to guarantee the dependability of semantic web conditions by fusing reliability examination [51].

5 Conclusions and Future Works The healthcare sector consistently creates enormous volumes of heterogeneous information collected from different domains such as sensor data, patient records, databases, and so forth. Because of the mixture of the information, current data frameworks cannot endeavor such information powerfully and productively. Because of that, the recovery of data and connecting information from various sources stays troublesome errand, and a great part of the data present in the information database stays covered up. Semantic Web (SW) has just picked up fame as a stage for information representation, linkage, and examination in the wellbeing area. This paper gives an outline of ongoing improvements

Semantic Web and Healthcare System in IoT Enabled Smart Cities

555

using SW in the healthcare sector. In particular, (1) we explained the advantages of using ontologies in healthcare, (2) discussed the usage of SW technologies for representing patient records, (3) briefly summary healthcare approaches that utilize SW, (4) explain how SW can aid interoperability issue when handling heterogeneous healthcare data, (5) investigate the integration of IoT and artificial intelligence techniques into smart semantic healthcare systems and (6) outline SW based security methodologies. Integration of SW and healthcare is a popular research field, and it will continue to attract interest in the future. One of the challenges of a combination of these fields is keeping up with the ontology development, creation of semantic data, and mapping of the semantic data in heterogeneous environments. With the Linked Open Data movement, in the future, more and more clinical, patient, and disease datasets will be made semantically available and linked to other relevant data. As a result, intelligent processing will be possible on a large scale. On the other hand, integration of IoT data with SW will provide mechanisms for really intelligent healthcare applications that can understand the meaning of data and can apply AI methods (i.e. deep learning) to make automated judgments. Another future direction is the availability of multi-modal healthcare user interfaces for different purposes. For example, a mobile application to monitor an elderly relative remotely, an intelligent medical record search interface to find similar patience for a possible analysis, or a personal assistant to monitor activities and make suggestions based on the semantically enhanced sensor data. Therefore, the future will fascinate IoT enabled semantic healthcare systems. Additionally, a new architecture based on Semantic Web and IoT can be introduced and evaluated with user studies.

References 1. Brickley, D., Guha R.V. (eds.): RDF vocabulary description language 1.0: RDF schema (2004). http://www.w3.org/TR/rdf-schema/ 2. Gruber, T.R: A Translation approach to portable ontology specifications. Knowl. Acquis. 5, 199–220 (1993). http://dx.doi.org/10.1006/knac.1993.1008 3. Dean, M., Schreiber, G., Bechhofer, S., van Harmelen, F., Hendler, J., Horrocks, L., Stein, L.A.: OWL Web ontology language reference. W3C Recommendation 10 February 2004. http://www.w3.org/TR/owl-ref/ 4. Hatirnaz, E., Sah, M., Direkoglu, C.: A novel framework and concept-based semantic search Interface for abnormal crowd behavior analysis in surveillance videos. Multimedia Tools Appl. 1–39 (2020) 5. Thomas, G.R.: A translation approach to portable ontology specifications. Knowl. Acquis. (2005) 6. Da Silveria, M., Guelfi, N.: A survey of Interoperability in E-health systems: the European approach. In: International Conference on Health Informatics and Health Info (2008) 7. Graben wegar, J., Deftsch mid, G.: Ontologies and their application in EHR. Med. Inf. (2008) 8. Ali, S., Chong, I.: Semantic mediation model to promote improved data sharing using representation learning in heterogeneous healthcare service environments. Appl. Sci. 9, 4175 (2019) 9. FabianeBizinellanardon, Moura, L.A.: Knowledge sharing and information integration in healthcare using ontologies and deductive databases, MEDINFO (2004) 10. Laleci, G.B., Dogac, A.: A semantically enriched clinical guideline model enabling deployment in heterogenous healthcare environment. IEEE Trans. Inf. Technol. Biomed. 13 (2009)

556

B. A. Dawood and M. Sah

11. Arguellocasteleivo, M., Des, J.: Executing medical guidelines on the web towards next generation healthcare. In: Knowledge Based Systems, vol. 22, pp. 545–551 (2009) 12. Erdogan, D.: Semantic web in eHealth. In: Conference: Proceedings of the 47th Annual Southeast Regional Conference, Clemson, South Carolina, USA, March 19–21, 2009. https:// doi.org/10.1145/1566445.1566542 13. Hyungjiklee, E.J., Lee, J.W.: Ontology and CDSS based Intelligent health data management in healthcare server (2007). http://www.waset.org/pwaset/v23/v2322.pdf 14. Da Silveria, M., Guelfi, N.: A survey of interoperability in E-health systems: the European approach. In: International Conference on Health Informatics and Healthinfo (2008) 15. Mirhaji, P., Casscells, S.W., Allemang, D., Coyne, R.: Improving the public health information network through semantic modeling. Intell. Syst. 22(3), pp. 13–17. IEEE, May-June 2007 16. Gyrard, A.: Designing Cross-Domain Semantic Web of Things Applications, https://www.sli deshare.net/AmlieGyrard/designing-crossdomain-semantic-web-of-things-applications 17. Xhemal, Z., Bujar, R., Florije, I., Jaumin, A.: State of the art of semantic web for healthcare. In: World Conference on Technology, Innovation and Entrepreneurship (2015) 18. HL7: Health Level 7. www.hl7.org 19. Dogdu, E.: Service-oriented approach for the information integration in eHealth applications. In: 2nd EHealth Conference, Antalya, Turkey (2007) 20. Kalra, D.: Electronic health records and systems. In: Expert Workshop on Semantic Health Consoritum (2008) 21. Eichel berg, M.: A distributed patient identification protocol based on control numbers with semantic annotation. Int. J. Semantic Web Inf. Syst. (2008) 22. Sahay, R., Akhtar, W., Foze, R.: PPEPR plug & play electronic patient records. In: ACM Symposium on Applied Computing (2008) 23. Frohlich, N., Helin, H., Laamanen, H.: Semantic service co-ordination for emergency assistance in mobile e-health environments. In: Sixth International Semantic Web Conference, Workshop on Semantic Web in Ubiquitous Healthcare (2007) 24. Agarwal, S.K.: Context Aware system to create Electronic medical encounter records (2016). http://ebiquity.umbc.edu/_file_directory_/papers/285.pdf 25. Commission of the EC. An action plan for a European eHealth Area – COM (2009), vol. 356 (2004) 26. Schuldt, H., Brett Lecker, G.: Sensor Data stream processing in health monitoring. Mobile Datenbanken und Informations system (2003) 27. Mirza, A.R.: Data fusion architectures for sensor platforms. In: Aerospace Conference, pp. 1– 13. IEEE, 1–8 March 2008 28. Salvadores, M., Horridge, M., Alexander, P.R., Fergerson, R.W., Musen, M.A., Noy, N.F.: Using SPARQL to query bioportal ontologies and metadata. In: International Semantic Web Conference, vol. 7650, pp. 180195. LNCS, Boston, US (2012) 29. WSMO, Web service modeling ontology. http://www.wsmo.org/ 30. Caceres, C., Fernandez, A., Ossowski, S., Vasirani, M.: Agent-based semantic service discovery for healthcare: an organizational approach. Intell. Syst. 21(6), 11–20. IEEE, November–December 2006 31. Podgorelec, V., Pavlic, L.: Managing diagnostic process data using semantic web. In: CBMS 2007 Twentieth IEEE International Symposium on Computer-Based Medical Systems, pp. 127–134, 20–22 June 2007 32. Kim, K-H., Choi, H-J.: Design of a clinical knowledge base for heart disease detection. In: CIT 2007 7th IEEE International Conference on Computer and Information Technology, pp. 610–615, 16–19 October 2007

Semantic Web and Healthcare System in IoT Enabled Smart Cities

557

33. Arguello, M., Des, J., Fernandez-Prieto, M.J., Perez, R., Paniagua, H.: Enabling reasoning on the web: introducing a test-bed simulation framework. In: EMS 2008 Second UKSIM European Symposium on Computer Modeling and Simulation, pp. 469–475, 8–10 September 2008 34. Ganguly, P., Chattopadhyay, S., Paramesh, N., Ray, P.: An ontology-based framework for managing semantic interoperability issues in eHealth. In: HealthCom 2008 10th International Conference on eHealth Networking, Applications and Services, pp. 73–78, 7–9 July 2008 35. EHRcom. http://www.centc251.org/TCMeet/doclist/TCdoc00/N00048.pdf, openEHR: http://www.openehr.org/ and HL7 http://www.hl7.org 36. Patterson, R.S.: Security & Authorization issues in HL7 EHRS-A semantic web service based approach (2006). http://lsdis.cs.uga.edu/rsp/patterson_richard_s_200612.pdf 37. Bicer, V., Laleci, G., Dogac, A., Kabak, Y.: Artemis message exchange framework: semantic interoperability of exchanged messages in the healthcare domain. ACM Sigmod. Record, 34(2), June 2005 38. SNOMED (http://www.snomed.org), CPT (http://www.aacap.org/clinical/cptcode.html), ICD (http://www.who.int/classifications/icd/en/), LOINC (http://www.loinc.org/) 39. Abdullah, U., Ahmad, J., Ahmed, A.: Analysis of effectiveness of apriori algorithm in medical billing data mining. In: ICET 2008 4th International Conference on Emerging Technologies, pp. 327–331. IEEE, October 2008 40. Zavyalova, Y.V., Korzun, D.G., Meigal, A.Y., Borodin, A.V.: Towards the development of smart spaces-based socio-cyber-medicine systems. Int. J. Embed. Real Time Commun. Syst. 8, 45–63 (2017) 41. Li, G., Zhang, C., Zhang, Y., Xing, C., Yang, J.: SemanMedical: a kind of semantic medical monitoring system model based on the IoT sensors. In: Proceedings of the 2012 IEEE 14th International Conference on e-Health Networking, Applications and Services (Healthcom), Chengdu, China, 9 November 2012 42. Sezer, E., Bursa, O., Can, O., Unalir, M.O.: Semantic web technologies for IoT-based health care information systems. In: SEMAPRO 2016 The Tenth International Conference on Advances in Semantic Processing, IARIA, (2016). ISBN: 978–1-61208-507-4 43. Galopin, A., Bouaud, J., Pereira, S., Seroussi, B.: An ontology-based clinical decision support system for the management of patients with multiple chronic disorders. In: MEDINFO 2015: eHealth-enabled Health, pp. 275–279 (2015) 44. Nocedal, A.S., Gerrikagoitia, J.K., Huerga, I.: Supporting clinical processes with semantic web technologies: a case in breast cancer treatment. Int. J. Metadata Semant. Ontol. 5(4), 309–320 (2010) 45. Chen, R.C., Huang, Y.-H., Bau, C.-T., Chen, S.-M.: A recommendation system based on domain ontology and SWRL for anti-diabetic drugs selection. Expert Syst. Appl. 39(4), 3995–4006 (2012) 46. Lee, S.I., Celik, S., Logsdon, B.A.: A machine learning approach to integrate big data for precision medicine in acute myeloid leukemia. Nat. Commun. 9, 42 (2018) 47. Isın, A., Direkoglu, C., Sah, M.: Review of MRI-based brain tumor image segmentation using deep learning methods. In: 12th International Conference Applied Fuzzy System Soft Computing ICAFS, Procedia Computer Science, Vienna, Austria, pp. 317–324 (2016) 48. Vial, A., Stirling, D., Field, M.: The role of deep learning and radiomic feature extraction in cancer-specific predictive modelling: A review. Trans. Cancer Res. 7, 803–816 (2018) 49. Hussain, A.A., Bouachir, O., Al-Turjman, F., Aloqaily, M.: AI Techniques for COVID-19. IEEE Access 8, 128776–128795 (2020). https://doi.org/10.1109/ACCESS.2020.3007939 50. Solove, D.J.: A taxonomy of privacy. Univ. Pennsylvania Law Rev. 154, 477–560 (2005). https://doi.org/10.2307/40041279 51. Mozzaquatro, B.A., Agostinho, C., Goncalves, D., Ricardo, J.M., An ontology-based cybersecurity framework for the internet of things. Sensors, 18, 3053 (2018)

Skin Cancer Prediction and Diagnosis Using Convolutional Neural Network (CNN) Deep Learning Algorithm Hajar Mousannif1 , Hiba Asri1,2(B) , Mohamed Mansoura1 , Anas Mourahhib1 , Yassine Isaouy1 , and Mouad Marmouchi1 1 Engineering Informatics Systems Laboratory (LISI), Faculty of Sciences Semlalia,

Cadi Ayyad University, Marrakech, Morocco [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] 2 Informatics and Applied Science Laboratory (LIMA), National School of Applied Science, Ibn Zohr University, Agadir, Morocco

Abstract. Artificial intelligence (AI) has recently surpassed human performance in several areas; and there is great hope that in the medical field, AI can enable a better prevention, detection, diagnosis and treatment of diseases. Cancer today remains the second leading cause of death. Most, if not all, professionals in the medical field, agree that early detection of cancer offers great chances of healing or disease control. Image processing is a used method used for the detection of skin cancer as soon as the affected area appears. This triggered the idea of creating an Artificial Intelligence; based on Convolution Neural Network (CNN) deep learning algorithm; which allows the diagnosis of skin diseases by simply using an image captured by a mobile application. A mobile application and a web ap- plication were created to help access to this diagnosis in the most landlocked places where the presence of dermatologists is almost non-existent. Through the proposed system skin cancer will predict and identify malignant skin lesions. Dataset from the International Skin Imaging Collaboration (ISIC) Dermoscopic Archive; that support several types of skin cancer; is used to train and test the proposed model. Keywords: CNN · Big data · Skin cancer · Deep learning · Predictive analytics

1 Introduction Skin cancer is the most common form of cancer. It occurs when there is an uncontrollable growth of abnormal cells in a layer of the skin and most frequently develops on skin exposed to the sun. However, this type of cancer may occur on areas of your skin not normally exposed to daylight. There are three major varieties of carcinoma basal cell malignant neoplastic disease, epithelial cell malignant neoplastic disease and skin cancer. Early detection of carcinoma offers you the best likelihood for fortunate carcinoma treatment [1]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 558–567, 2021. https://doi.org/10.1007/978-3-030-66840-2_42

Skin Cancer Prediction and Diagnosis Using CNN

559

About a third of cancers in the world are skin cancers and the rate continues to increase. In fact, the cases of skin cancer all over the world raised by 50% due to the ozone layer depletion. Because of analyzing the patterns in the skin using manual methods by experts is very time-consuming process and not always accurate, and also be- cause of similarities of some lesion types [2], it is necessary to find computer aided solution to make the process more accurate and more efficient [5, 6, 22]. Including machine learning, deep learning and image processing will increase the ac- curacy of the skin cancer prediction [14]. Those tools show their power and effectiveness in many patterns’ prediction such as breast cancer prediction, obesity prediction, miscarriage prediction, … among others [7, 15]. The goal is not to replace the doctor with the machine, but to help him in analyzing and interpreting the huge volumes of data collected. Big data, predictive analytics tools and artificial intelligence helps to promote good diagnosis and fight against medical errors by generating differential diagnosis based on real data [8]. For this purpose, we propose an intelligent system that can predict cancer. The rest of this paper is organized as follows. Section 2 present our experiment including dataset used, the pre-processing step, the proposed model and experiment results. The deployment of the proposed model is discussed in Sect. 3, while Sect. 4 present some possible future works. Finally, Sect. 5 concludes the paper.

2 Related Work Multiple studies have been conducted for proposing models of prediction of skin cancer using Convolutional Neural Network. we can cite: Authors in [23], proposed a modified convolutional neural net- work model to improve the classification of skin cancer. The results of the revised network prediction accuracy got 91.92% on training set and 89.5% on test set. The results obtained demonstrate that the proposed method is performant and can be used to predict whether the skin cancer is benign or malignant. In [24], to develop a model for Intelligent Prognostics Model for Disease Prediction and Classification (IPM-DPC) from dermoscopy images is presented using the combination of Convolutional Neural Network (CNN) structure along with the Particle Swarm Optimization (PSO). Experimental results yielded a diagnostic accuracy as high as 99.46% using the IMP-DPC approach, providing a14.94% improvement over a system without using the PSO as filter layer in CNN. Authors in [25] propose a classification model to improve performance of classification of skin lesion using Deep CNN and Data Augmentation, and demonstrate the use of image data augmentation for overcoming the problem of data limitation and examine the influence of different number of augmented samples on the performance of different classifiers. In [26], a proposed solution consists of 6 layers of convolutional blocks with batch normalization followed by a fully connected layer that performs binary classification. The custom CNN model is similar to the proposed model with the absence of Batch normalization and presence of Dropout at Fully connected layer. Experimental results for the proposed model provided better accuracy of 89.30%.

560

H. Mousannif et al.

3 Experiment 3.1 Dataset In the healthcare field, multiple datasets are available and can be used for training and validation of the model. You can access to data about stress, heart attack, miscarriage, breast cancer [4, 19]. In this study, dataset comes from an open access dermatology repository, the international Skin Imaging Collaboration (ISIC) Dermoscopic Archive [13]. The images from the online open- access dermatology repositories are annotated by dermatologists, not necessarily through biopsy. The ISIC Archive data used contains melanocytic lesions that are biopsy-proven and classified into two labels: malignant or benign. Several types of skin cancer are supported in the ISIC Dermoscopic Archive dataset, we cite: Melanocytic nevi, Melanoma, Benign keratosis-like lesions, Basal cell carcinoma, Actinic keratosis, Vascular lesions and Dermatofibroma [21] (see Fig. 1). The dataset contains 33346 samples; where 25009 for training and 8334 for validating the model.

Fig. 1. Sample images from ISIC dataset for cancer types (a) Actinic keratosis (b) Basal cell carcinoma (c) Benign keratosis-like lesions (d) dermatofibroma (e) Melanocytic nevi (f) Melanoma.

3.2 Proposed Method. Data Preparation Data Augmentation. Deep learning models perform better when there are large datasets [20]. The common way to make our datasets bigger is data augmentation the increasing of the dataset discards the problem of overfitting. The idea is generating new data

Skin Cancer Prediction and Diagnosis Using CNN

561

from training data by applying several transformations to images of lesions: Horizontal reversal, vertical flips, random cropping and rotations. The increase of the set will allow us to create a very robust model that learns better and proposes accurate results [10]. Data Normalization. Normalization is an important process in image classification [2]. It attempts to deal with external sources like data variation and how it can affect the pixel values. And in the specific field of deep learning, each data set N requires standardization. It is only required when the features have different ranges. The objective of normalization is to change the values of columns in a dataset on a common scale. In our experiment we compute the mean of intensities for all images, and then subtract it from each pixel of the image. Data Training. After the generation and the normalization of the data, we separated our data into: • A set of representative processing of all global data for learning, • An evaluation set for the evaluation of the model • And a test set. Convolution neural network. Deep learning is known as the process of analyzing different types of data to extract patterns and knowledge algorithm by using different data mining tools. Many applications use data mining and clustering in their implementation like healthcare data analysis, education system, networks, … among others [9]. Neural network CNN is the most used deep learning algorithm in many computer visions tasks as face recognition, image classification and others. The deep learning CNN outperforms the average of the dermatologists at skin cancer classification using photographic and dermoscopic images [12, 18]. The choice of using CNN is due to many reasons: • • • • •

It is specially designed to process input images. It is able to learn relevant functionality from an image. It is more efficient in terms of memory and complexity It has a high statistical efficiency (it requires few labels to learn). It has a high calculation efficiency (It requires fewer operations to be able to learn)

CNN is a multi-layer neural network composed of feature extraction of convolutional layers and feature processing of sub-sampling layers. The bottom layer of the network inputs data and the top layer outputs its recognition result (see Fig. 2). The first layer is the convolutional layer (Conv2D). It’s like a set of learning filters. We choose to define 32 filters for the first two conv2D and 64 filters for the last two conv2D. Each filter transforms a part of the image (defined by the size of the kernel) using the kernel filter. The kernel filter matrix is applied to the entire image. Filters can be seen as a transformation of the image.

562

H. Mousannif et al.

Fig. 2. Convolution Neural Network architecture layers.

Fig. 3. Experiment results.

The second important layer of CNN is the grouping layer (MaxPool2D). This layer simply acts as a subsampling filter. It looks at the 2 neighboring pixels and chooses the maximum value. These are used to reduce computation costs and, to some extent, also to reduce over-adjustment. Dropout is a regularization method, in which a proportion of nodes in the layer are randomly ignored (by setting their weight to zero), for each training sample. This randomly removes a drive from the network and forces the network to learn functionality in a distributed manner. This technique also improves generalization and reduces overadjustment.

Skin Cancer Prediction and Diagnosis Using CNN

563

The activation functions commonly used in CNN network are Relu, Absval, and Tanh. The linear activation function Relu is the most frequently used, and it is the case in our case study. The activation function is used to add non-linearity to the network.

3.3 Experiment Results After preparing our dataset and apply all the necessary changes to get accurate results, we build the model of CNN to predict new samples of skin cancer as shown in Fig. 4. We can notice that all the layers of CNN are well implemented. Python language is used to train the model. In fact, most of Big data solution for pattern prediction used Python as programming language due to its performance and efficiency [3].

Fig. 4. Model implementation.

In this part we trained our model with a batch size 32 and 25 epoch. Using the proposed model we reach an accuracy of around 86% with insignificant loss (see Fig. 3 and Fig. 4); which can be considered as good precision and accuracy for getting faithful results.

4 Model Deployment It is true that is we get good accuracy by using the proposed model, but we have to validate and test it. In several studies, models are tested and validated through real applications. In our study, two applications were deployed to test our proposed model using CNN and to facilitate its use:

564

H. Mousannif et al.

4.1 Mobile Application The created mobile application can predict skin cancer through pictures captured by the mobile phone camera. Once the picture is captured the, application show the result of the prediction; which can be malignant or benign with the appropriate type of skin cancer (see Fig. 5).

Fig. 5. The mobile application interface.

Fig. 6. Difference between dermoscopic image and clinical image.

The use of this application remains very simple, you just have to take a dermoscopic photo (see Fig. 6 and Fig. 7); and then import it in the mobile application to be analyzed. 4.2 Web Application The proposed model can be also used and tested using a web application that we created (see Fig. 8). To use our application, you just need to upload the new image taken of your skin cancer. Once the image is uploaded, the model analyses it to give you predictions as seen in Fig. 8. The application was developed using TensorFlow version 2.0, Open CV, Scikit learn and the Flask framework [17].

Skin Cancer Prediction and Diagnosis Using CNN

565

Fig. 7. Smartphone accessory for dermoscopy.

Fig. 8. The Web application interface.

5 Future Work In this section, we present some possible research future work we believe have the potential to generate a more performant system in terms of performance and efficiency of skin cancer prediction. We aim to enhance the algorithm of CNN and work on its layers to propose new architecture of the model that generate more accurate and faithful results. Working on the model in itself is challenging since deep learning algorithm that exist; perform well in predicting patterns in many filed such as healthcare, education, agriculture … among others. The model will be then applied in many case studies for validation.

566

H. Mousannif et al.

6 Conclusion The rate of having skin cancer is increasing rapidly over the past decades; the need of an hour is to move toward performant systems that predict skin cancer with highly accuracy and speedy predictions. The diagnosis of skin cancer is very challenging due to the variability in the appearance if skin lesions. Nowadays, deep learning shows its power in term of visual recognition including medical diagnosis, gaming and object recognition. Through this paper, we illustrate the efficiency and effectiveness of deep learning in predicting skin cancer. Convolutional neural network CNN is used to train dataset that comes from an open access dermatology repository, the International Skin Im- aging Collaboration (ISIC) Dermoscopic Archive. The model shows a categorical accuracy of around 86% which can be considered as good precision to get faithful results. Two applications were created to test and validate our proposed model: • A mobile application, • And a web application. Both applications are easy to use through their simple interfaces. Through this interface: • You have to upload the picture of the skin, • Then the image is analyzed using the proposed model, • And finally, the predicted result is shown to the end user.

References 1. Prof. Bholane, S., Patil, S., Rajput, G., Patil, G., Gunjalkar, S.: Skin cancer prediction using image processing and deep learning. Int. Res. J. Eng. Technol. (IRJET) 07(02) (2020). (https:// ns67209122217.a2dns.com/archives/V7/i2/IRJET-V7I223.pdf). Accessed 2 Mar 2020 2. Jianfeng, H., Dong, Q., Yi, S.: Prediction of Skin Cancer Based on Convolutional Neural Network. pp. 1223–1229 (2019). (http://link.springer.com/10.1007/978-3-030-00214-5_150). Accessed 2 March 2020 3. Asri, H., Mousannif, H., Moatassime, H.A.: Real-time miscarriage prediction with SPARK. Procedia Comput. Sci. 113, 423–428 (2017) 4. Asri, H., Mousannif, H., Al Moatassime, H.: Comprehensive Miscarriage Dataset for an Early Miscarriage Prediction. Data Brief, 19 (2018) 5. Asri, H., H. Mousannif, H. Al Moatassime, H., Noel, T.: Big data in healthcare: challenges and opportunities. In: Proceedings of 2015 International Conference on Cloud Computing Technologies and Applications, CloudTech 2015 (2015) 6. Asri, H., Mousannif, H., Al Moatassime, H., Noel, T.: Big data analytics in healthcare: case study-miscarriage prediction. Int. J. Distrib. Syst. Technol. (IJDST) 10(4), 14 (2019) 7. Asri, H., Mousannif, H., Al Moatassim, H.: A Hybrid Data Mining Classifier for Breast Cancer Prediction, pp. 9–16 (2020) 8. Asri, H., Mousannif, Al Moatassime, H.: Reality mining and predictive analytics for building smart applications. J. Big Data 6(1), 66 (2019). (https://journalofbigdata.springeropen.com/ articles/10.1186/s40537-019-0227-y)

Skin Cancer Prediction and Diagnosis Using CNN

567

9. Asri, H., Mousannif, H., Al Moatassime, H., Noel, T.: Using machine learning algorithms for breast cancer risk prediction and diagnosis. Procedia Comput. Sci. 83, 1064–1069 (2016). (http://linkinghub.elsevier.com/retrieve/pii/S1877050916302575) 10. Chougrad, H., Zouaki, H., Alheyane, O.: Deep convolutional neural networks for breast cancer screening. Comput. Methods Programs Biomed. 157, 19–30 (2018) 11. (https://linkinghub.elsevier.com/retrieve/pii/S0169260717301451). March 2, 2020 12. Grassi, G., Grieco, L.A.: Object-oriented image analysis using the CNN universal machine: new analogic cnn algorithms for motion compensation, image synthesis, and consistency observation. IEEE Trans. Circuits Syst. I: Fundam. Theory Appl. 50(4), 488–99 (2003). (http://ieeexplore.ieee.org/document/1196447/) 13. ISIC 2020. Dermoscopedia. (https://dermoscopedia.org/Main_Page) 14. Learning, Deep. Deep Learning 简 介一、什么是Deep Learning ? 29, 1–73 (2019) 15. Mayer-Schönberger, V., and Kenneth, C.: Big Data : A Revolution That Will Transform How We Live, Work, and Think. Houghton Mifflin Harcourt (2013) 16. (https://books.google.fr/books?hl=fr&lr=&id=uy4lhWEhhIC&oi=fnd&pg=PP1&dq=big+ data+predictions&ots=Jsl4hgEPIN&sig=_BTyntsT3QMpVvRHDeJziSPn4Mg#v=one page&q&f=false) 17. Mufid, M.R., Arif Basofi, M. Al Rasyid, U.H., Rochimansyah, I.F.-H., Rokhim, A.: 2019. Design an MVC model using python for flask framework development. In: 2019 International Elec-tronics Symposium (IES), pp. 214–219. IEEE (2019). (https://ieeexplore.ieee.org/doc ument/8901656/) 18. Pham, T.-C., Luong, C.-M., Visani, M., Hoang, V.-D.: Deep CNN and Data Augmentation for Skin Lesion Classification, pp. 573–582 (2018). (http://link.springer.com/10.1007/978-3319-75420-8_54) 19. Salama, G.I., Abdelhalim, M.B., Zeid, MA.-e.: Breast cancer diagnosis on three different datasets, using multi- classifiers. Int. J. Comput. Inf. Technol. 01, 2277–3076 (2012) 20. Shin, H.-C., et al.: Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans. Med. Imaging, 35(5), 1285–1298 (2016). (http://ieeexplore.ieee.org/document/7404017/) 21. Weber, P., Tschandl, P., Sinz, C., Kittler, H.: Der-matoscopy of neoplastic skin lesions: recent advances, updates, and revisions. Curr. Treatment Options Oncol. 19(11), 56 (2018). (http:// link.springer.com/10.1007/s11864-018-0573-6) 22. Asri, H., Mousannif, H., Al Moatassime, H., Zahir, J.: Big data and reality mining in healthcare: promise and potential. In: International Conference on Image and Signal Processing, pp. 122–129. Springer, Cham, June 2020 23. He, J., Dong, Q., Yi, S.: Prediction of skin cancer based on convolutional neural network. In: International Conference on Mechatronics and Intelligent Robotics, pp. 1223–1229. Springer, Cham, May 2018 24. Tyagi, A., Mehra, R.: An optimized CNN based intelligent prognostics model for disease prediction and classification from Dermoscopy images. Multimedia Tools and Applications, pp. 1–19 25. Pham, T.C., Luong, C.M., Visani, M., Hoang, V.D.: Deep CNN and data augmentation for skin lesion classification. In: Asian Conference on Intelligent Information and Database Systems, pp. 573–582. Springer, Cham, March 2018 26. Jayalakshmi, G.S., Kumar, V.S.: Performance analysis of convolutional neural Network (CNN) based cancerous skin lesion detection system. In: 2019 International Conference on Computational Intelligence in Data Science (ICCIDS), pp. 1–6. IEEE, February 2019

Smart Earth Environment and Agriculture

Climate-Smart Landscapes for Sustainable Cities Canan Cengiz , Bülent Cengiz , and Aybüke Özge Boz(B) Faculty of Engineering, Architecture and Design, Department of Landscape Architecture, Bartın University, Bartın, Turkey {canancengiz,bcengiz,aboz}@bartin.edu.tr

Abstract. Landscape has a multifunctional structure. It is defined as a mosaic of ecological, social and socio-ecological processes and the different uses of land based on the interactions between these processes. In this context, landscape scale takes into consideration the different uses of land and their interactions for determining the synergies between multiple targets. Whereas the social, economic and environmental goals in addition to the landscape actions and processes encompassing reduction of climate change and climate compatible approach goals are taken into consideration within the framework of Climate-Smart Landscape Approach. The other goals are related with ecosystem service functioning and quality of life. In this scope, smart applications at the agricultural landscape scale, attaining the variety of land use in landscape for resilience along with management of land use interactions for social, economic and ecological effects are defined as the characteristics of Climate-Smart Landscapes (CSLs). In this regard, integrated landscape management is of importance for obtaining sustainable landscapes. CSLs are multifunctional and operate in accordance with the principles of integrated landscape management principles. It is focused on adaptation to climate change and developing low emission. The gestion de terroirs (GT) approach applied for setting forth goals related with food production, biological diversity or the protection of the ecosystem and rural sources of living is supportive for encouraging CSLs applications. The synergies between climate change and smart landscapes are of significant importance with regard to the smart and sustainable city concept for both research studies and applications. Climate change and adaptation gives priority to innovative design approaches which make smart use of resources while also ensuring technological integration. In this context; coastal resilience, rain garden, biophilic design, xeriscape and climate-smart agriculture applications have been examined in the present research via smart climate landscape design approach for sustainable cities with focus on innovative solutions. In conclusion, the contributions of CSLs to the climate change adaptation period have been evaluated with dimensions of reducing energy consumption, saving water, reducing the effects of climate change, reducing carbon emission, providing human-environment interaction and increasing the quality of life. Keywords: Climate-smart landscapes · Smart city · Sustainable cities · Climate change · Innovative approaches

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 571–582, 2021. https://doi.org/10.1007/978-3-030-66840-2_43

572

C. Cengiz et al.

1 Climate-Smart Landscapes and Climate-Smart Landscape Approach As means of taking into consideration the various aspects of climate change, landscapes attract attention from a political and scientific perspective. Climate-smart landscapes operate in accordance with the integrated landscape management principle that renders a synergy out of harmony with the climate in the target landscape and reducing the impacts of climate change while also encompassing the ecological, social and economic actions [1]. Landscapes have many functions with regard to climate such as ecological, social and economic and they operate in accordance with the principles of integrated landscape management, however it focuses on reducing emission, climate change adaptation and means of developing low emission [2]. As an example, Scherr et al. [1] characterize smart landscapes with three primary characteristics in order to develop a general resilience as well as adaptation to and reducing of climate change [2]: • Climate-smart practices at the field and farm scale such as mixed crop-tree systems and integrated soil and nutrient management • Diversifying land uses across the landscape • Sustainable management of different land use interactions between field, forest, grasslands and other land uses at the landscape scale. According to Bernard et al. [3], since understanding the processes at the landscape scale requires an understanding of the interactions inside and outside the landscape, more than one scale is a primary characteristic of smart landscapes in terms of climate [2]. With regard to climate-smart landscapes, integrated landscape management requires a long term cooperation between the shareholders in order to reach the multiple goals necessitated by the landscape. Since landscapes are interconnected socio-ecological systems, complexity and change are intrinsic features that require management. Agricultural production, provision of ecosystem services (such as water low regulation and quality, pollination, climate change mitigation and adaptation, cultural values); protection of biodiversity, landscape beauty, identity and recreation value; and local livelihoods, human health and well-being can be typically considered among these. In this scope, shareholders strive to resolve common problems and to make use of new opportunities that strengthen the synergy between different landscapes [4]. This approach for attaining sustainable landscapes that prioritize cooperation between more than one shareholder is known as “Integrated Landscape Management” (ILM) [5]. The term “climate-smart” emerged as a development concept following the framing of sustainable landscape management applications with regard to both adaptation and potentials for providing benefit [6]. Climate-Smart Landscape Approaches (CSLAs) are focused on the spatial use of climate projections at the landscape scale as well as both to risks of short term extreme climate events and reducing the impacts of long term climate change and may help in understanding and defining potential climate-smart actions. It is important in these

Climate-Smart Landscapes for Sustainable Cities

573

processes that the landscape scale is used for understanding and using the dynamics of both environmental and social change. In this scope, the landscape scale should be sensitive to social and cultural norms in order to determine locally suitable and acceptable approaches. CSLA is generally characterized as such [7]: • Using the landscape scale to define and handle more than one goal; • Ensuring that at least one goal reduces climate change or is in accordance with the general goal of attaining multi-functionality; • Being in cooperation with the related shareholders ideally using a participatory approach; and • Implementing a repetitive process supporting applied learning and social learning where appropriate. Even though these points are less comprehensive than the other definitions of landscape approach, these are considered as minimum founding criteria for a CSLA. Gestion de Terroir (GT) is of significant importance for the development of climatesmart landscape approaches. Figure 1 presents the key stages for the implementation of GT [2]. According to Cleary [8], GT approach focuses on a socially and geographically defined area for meeting the needs of communities [2]. According to Bassett et al. [9], terroir is not only a physical geography concept, but is also the primary management unit of rural development that takes into account both physical data as well as socioeconomic and cultural context. Terroir represents the socio-natural heritage of a local community with social organization and resource use model [2]. In addition to multisectoral, multidisciplinary and multi-shareholder approaches, climate-smart landscape and GT approaches have a series of similarities with regard to sharing the aim of reaching more than one goal (Table 1) [2].

Fig. 1. Key steps for implementation of GT [2].

574

C. Cengiz et al.

Table 1. Key similarities and differences between the GT and climate-smart landscape approaches [2]. Feature

Gestion de terroir approach

Climate-smart landscape approach

Management unit

• Terroir, often limited to village or inter-village scale

• The landscape

Land-use patterns

• Homogenous (in the sense that farmer’s practices are similar)

• Heterogeneous in land uses and land use patterns

Sectoral focus

• Multisectoral (but with a strong bias towards agriculture)

• Multisectoral

Field of expertise

• Multidisciplinary

• Multidisciplinary

Key objectives

• Restoration and improvement of natural resources, soil fertility and food production • Security of land rights • Capacity-building of individuals and reinforcement and strengthening of local level institutions at the terroir level

• Human well-being • Food and fiber production • Climate change adaptation and mitigation • Conservation of biodiversity and ecosystem services

Stakeholder involvement

• Incorporation of the local • Emphasis on participatory knowledge and identification processes, multi-stakeholder of local priority concerns negotiation and recognition of • Involvement of local local communities stakeholders in processes of planning and development and in decision-making • Empowerment of local communities through training and education

Integration of actions

• Drawing on synergies between actions to more efficiently generate benefits

• Ecological, social and economic interactions among different parts of the landscape managed to seek positive synergies among interests and actors or reduce negative tradeoffs

Flexibility

• Accommodates changing needs • Iterative process

• Promotion of adaptive strategies based on dynamic social and economic changes

Linkages with other scales • Focus at the micro level

• Linkages between the micro, meso and macro levels

Climate-Smart Landscapes for Sustainable Cities

575

2 Climate-Smart Landscapes with Regard to Sustainable and Smart Cities Sustainable and smart cities are innovative and sustainable cities with a focus on sustainable development while meeting the needs of today without hampering the needs of future generations, developing smart solutions to attain savings in accordance with the changing environmental conditions in addition to improving the quality of life. Dhingra and Chattopadhyay [10], state that smart and sustainable cities have aims that can be attained in an adaptable, reliable, scalable, accessible and resilient manner, such as [11]: • Improving the quality of life of its citizens; • Ensuring economic growth with better employment opportunities; • Improving the well-being of its citizens by ensuring access to social and community services; • Establishing an environmentally responsible and sustainable approach to development; • Ensuring efficient service delivery of basic services and infrastructure such as public transportation, water supply and drainage, telecommunication and other utilities; • Ability to address climate change and environmental issues, and; • Providing an effective regulatory and local governance mechanism ensuring equitable policies.

3 Climate-Smart Landscape Design Approaches for Sustainable Cities Greater integration of climate-smart landscape design approaches to urban design are of significant importance in the climate change adaptation process within the scope of the aims of sustainable and smart cities. Climate-smart landscape design approach makes important contributions to adapting to the many effects and outcomes of climate change such as increasing urban temperatures, increasing energy demand and storms that are becoming more and more frequent, reduced agricultural yield, loss of biodiversity and decrease in the quality of human life. In this regard; coastal resilience, rain garden, biophilic design, xeriscape and climate-smart agriculture applications were examined during the study with a climate-smart landscape design approach for sustainable cities with the focus on innovative solutions. 3.1 Coastal Resilience Excessive weather events such as heavy rains, storms and tornadoes, rising sea levels, excessive heat waves resulting from climate change are threatening the social, spatial and biological systems of today and tomorrow [12]. People are globally faced with severe risks due to natural hazards, especially in coastal regions with high population densities. The infrastructure of coastal communities are becoming more and more vulnerable to natural disasters as a result of rising sea levels due to global warming. Natural habitats are

576

C. Cengiz et al.

of significant importance for coastal resilience since they can act as a force of recovery in the face of coastal functionality loss [13]. Worldwide, coastal populations are faced with greater risks from inundation and erosion due to rising sea levels [13]. According to Hallegatte et al. [14], Woodruff et al. [15], Reed et al. [16], Wood et al. [17] and Jones [18], thousands of people die and hundreds of coastal communities are destroyed due to rising sea levels (SLR), coupled with tsunamis, hurricanes, and other [13]. According to Claudia et al. [19], Hallegatte et al. [14], Woodruff et al. [15] and Xu et al. [20, 21], increased risk of natural hazards result in life-threatening flooding, destruction of infrastructure, and the decline of economic and ecological systems in coastal urban areas [13]. According to Cutter and Finch [22], Nicholls and Cazenave [23], this is critical for low-lying coastal regions especially in some developing nations as they are ill-equipped to deal with current and future climate change [13]. According to Bijlsma et al. [24] and Klein et al. [25], sustainability requires staying alive in the face of both expected and unexpected conditions and circumstances. Hence, it is a smart approach to take into consideration the potential to improve the resilience of the system in addition to resisting against changes. Resistance is defined as the skill to stop (or resist) change, while flexibility and resilience is the ability of the system to organize itself and stand against change [26]. A more resistant system will due to its nature be more flexible and sustainable. The need to develop both conceptual and practical approaches has arisen as a result of climate induced effects with regard to the effective integration and functioning of the physical, social, ecological and economic processes at the coastlines [26]. Coastal resilience is a sustainable and smart approach that is taken into consideration with regard to its environmental and technological dimensions which is adaptable and which can be characterized by its ability to organize itself that has taken center stage in minimizing the effects of climate change on coastlines. Studies conducted for developing sustainable planning/design ideas with adaptation goals in the future for creating more resistant spaces against rising sea levels and floods. 3.2 Rain Garden Increased rainfall amount and frequency due to climate change has resulted in increased risk of floods and overflows of combined sewage systems in urban areas. The need has arisen for practical design tools that will make adjustable planning easier with regard to water management in order to cope with these changes [27]. Changes in area use that have emerged with urbanization result in the destruction of the vegetation, increase in impermeable surfaces, increasing stormwater runoff volumes and peak flows [28]. The recent increase in the importance of water has also increased the importance of management policies on integrated water resources. Various sustainable urban rainwater management models have emerged as a result and new ways have been sought for the proper management of water by way of a sustainable development approach [29]. Rain gardens are one of the technologies often recommended for residential stormwater management [30]. Rain garden has many benefits for cities within the scope of sustainable rainwater management. These benefits are [31];

Climate-Smart Landscapes for Sustainable Cities

577

• According to Jaber et al. [32], decreasing the surface flow amount to the waste water channel and reducing the rate of surface flow • According to Do˘gangönül and Do˘gangönül [33], feeding the underground water in addition to developing solutions for the drainage problems encountered in areas where they are applied, making aesthetic contributions to cities, • According to Department of Environmental Protection Bureau of Watershed Management [34], increasing evapotranspiration (transpiration + evaporation), • According to Demir [35], cleansing the surface flow from pollutant foreign materials (oil, heavy metals etc.) and thus increasing the quality of water, (pollution in receiving water can be reduced by 30%) and • Contributing to urban ecology by forming habitats for urban fauna 3.3 Biophilic Design The development of the biophilia hypothesis is based on biological science and human needs. From a biological standpoint, humans are a part of nature and they synchronize to the environmental conditions [36]. Improving urban quality of life, adapting to climate change and ensuring sustainability are among the primary subjects that both biophilic cities and smart cities focus on. In addition, attaining environmental sustainability remains in the background for smart cities with regard to sustainability. The integration of biophilic cities with smart cities will complete the missing sections of smart cities with green areas and surfaces at the building, neighborhood and city scale thus reducing environmental issues [37]. The power and value of biophilic urbanism is apparent as a strategy of urban resilience and sustainability. Trees, urban forests, wetlands and river systems as biophilic investments are among the dynamics that increase urban resilience. These dynamics will be effective in developing the adaptation power against the “perfect storm” that global cities will face which may be caused by hot air currents, draughts and climate change and other global warming related effects as well as a series of potential resource shocks such as natural disasters, water and food availability and draughts [38]. Biophilic design supports low energy buildings to reduce climate change, while also contributing to the diversity and protection of species in addition to human comfort as well as mental, emotional and social health in the face of climate change [39]. Biophilic design approach has various potential benefits in extremely cold climates. Biophilic design is among the contemporary approaches for improving the positive relations and interactions between humans and nature in built environments [36]. 3.4 Xeriscape Subject to the decrease in water resources, new methods have been sought for its effective use. The fact that water consumption has reached significant amounts especially in outdoor areas such as parks and gardens has made it necessary to develop new forms of landscaping in which minimum water is used in a more effective manner. In this regard, new and different landscaping concepts different from classical landscaping understanding have been developed under the general heading of “Water-Efficient Landscaping”

578

C. Cengiz et al.

such as “Water-Wise, Water-Smart”, “Low-Water” and “Natural Landscaping”. Even though each of these concepts differ slightly with regard to their philosophies and approach to the issue, all are based on the same primary principles and may generally be used interchangeably. “Xeriscape” is one of the first conceptual approaches developed through the formulation of these primary principles [40]. In addition to attaining savings in water, xeriscape also develops landscapes that are more resilient to disturbances resulting from draught and water scarcity. Moreover, it also enables the development of high quality landscapes through ecological landscaping applications [41]. Xeriscape is based on seven principles for using water more efficiently and for saving time and money. These are [41, 42]; Planning and design, soil improvement, low-water use plants, limit grass areas, efficient irrigation, mulch and proper maintenance. 3.5 Climate-Smart Agriculture Integrated landscape approach provides a strategy for reaching climate-smart agriculture goals. Thanks to climate-smart agriculture landscapes, coordinated actions can be carried out at the farm and landscape scales to reduce the impacts of agricultural production, adaptation to the climate and climate change in addition to develop important synergies for other sources of income and environmental goals [1]. CSA aims to reduce the exposure of farmers to short-term risks in addition to improving their resilience by strengthening their capacities to adapt to long term stressors. Protection of ecosystem services is important for sustaining production and adapting to changes in climate [43] (Fig. 2).

Fig. 2. Framework for Climate-Smart Agriculture (CSA) [44].

Key characteristics of CSA; CSA addresses climate change, CSA integrates multiple goals and manages trade-offs, CSA maintains ecosystems services, CSA has multiple entry points at different levels, CSA is context specific [43].

Climate-Smart Landscapes for Sustainable Cities

579

As is put forth by Leslie Lipper et al. in Nature Climate Change, “CSA is an approach for transforming and reorienting agricultural systems to support food security under the new realities of climate change”. It was developed in 2010 by the United Nations Food and Agriculture Organization (FAO) as a response to the need for a transformation in agricultural development in the face of climate change related difficulties. Climatesmart agriculture is an approach for handling these difficulties in an inclusive manner. According to the Intergovernmental Panel on Climate Change (IPCC), an increase in temperature by 2 °C may have a fifteen percent effect on the current agricultural applications and agricultural yield, and FAO states that 60% more food will be required by 2050 in order to meet the growth [45]. CSA approach has been designed to define and operate sustainable agricultural development as part of climate change parameters. It integrates the three dimensions of sustainable development (economic, social and environmental) by handling food safety and climate challenges together [46]. It is composed of three main pillars [46]; 1. Sustainably increasing agricultural productivity and incomes; 2. Adapting and building resilience to climate change; 3. Reducing and/or removing greenhouse gases emissions, where possible. In short, changes in rainfall and extreme weather events along with the change in water resources, rising sea levels and salination make significant impacts on irrigation thereby having an adverse impact on food production. In this regard, CSA applications play a primary role for adaptation and mitigation.

4 Discussion Climate compatible planning and design approaches and interactions between these approaches are important for sustainability in order to reduce the negative effects on landscapes that are changed and transformed through climate change. Climate-smart landscapes are evaluated and characterized in terms of social, ecologic, technologic and environmental dimensions with regard to adaptation, reduction and resilience. Climate-smart landscape is a promising approach for encouraging sustainable transformations by way of establishing climate resilient pathways [7]. Since the effects of climate change will continue to increase, it is of special importance to adopt climate compatible approaches and applications for cities in order to reduce the negative effects and support the adaptation process. Climate change related landscape transformations will take place if such approaches are not implemented which will have inevitable adverse effects on ecosystem services, natural and cultural resources as well as the quality of life.

580

C. Cengiz et al.

References 1. Scherr, S.J., Shames, S., Friedman, R.: From climate-smart agriculture to climate-smart landscapes. Agric. Food Secur. 1(12), 1–15 (2012). https://doi.org/10.1186/2048-7010-1-12 2. Bernard, F.: What can climate-smart agricultural landscapes learn from the gestion de terroirs approach? In: Minang, P.A., van Noordwijk, M., Freeman, O.E., Mbow, C., de Leeuw, J., Catacutan, D. (eds.) Climate-Smart Landscapes: Multifunctionality In Practice, pp. 51–61. World Agroforestry Centre (ICRAF), Nairobi, Kenya (2015) 3. Bernard, F., Minang, P.A., van Noordwijk, M., Freeman, O. E., Duguma, L.A. (eds.).: Towards a landscape approach for reducing emissions: substantive report of Reducing Emissions from All Land Uses (REALU) project. World Agroforestry Centre (ICRAF), Nairobi, Kenya (2013) 4. Scherr, S.J., Shames, S., Friedman, R.: Defining integrated landscape management for policy makers. Ecoagriculture Policy Focus 10, 1–6 (2013) 5. Shames, S.A., Heiner, K., Scherr, S.J.: Public policy guidelines for integrated landscape management. EcoAgriculture Partners, and Landscapes for people, Food and Nature, Washington, DC, USA (2017). https://ecoagriculture.org/publication/public-policy-guidelines-forintegrated-landscape-management/. Accessed 09 Sep 2020 6. Gichenje, H., Godinho, S.: A climate-smart approach to the implementation of land degradation neutrality within a water catchment area in Kenya. Climate 7(12), 136 (2019). https:// doi.org/10.3390/cli7120136 7. Freeman, O.E.: Characterising multifunctionality in climate-smart landscapes. In: Minang, P.A., van Noordwijk, M., Freeman, O.E., Mbow, C., de Leeuw, J., Catacutan, D. (eds.) ClimateSmart Landscapes: Multifunctionality in Practice, pp. 37–49. World Agroforestry Centre (ICRAF), Nairobi, Kenya (2015) 8. Cleary, D.: People-centred Approaches: A Brief Literature Review and Comparison of Types. Food and Agriculture Organization of the United Nationas (FAO), Italy, Rome (2003) 9. Bassett, T.J., Blanc-Pamard, C., Boutrais, J.: Constructing locality: the terroir approach in west Africa. Africa 77, 104–129 (2007). https://doi.org/10.3366/afr.2007.77.1.104 10. Dhingra, M., Chattopadhyay, S.: Advancing smartness of traditional settlements-case analysis of Indian and Arab old cities. Int. J. Sustain. Built Environ. 5(2), 549–563 (2016) 11. Trindade, E.P., Hinnig, M.P.F., Moreira da Costa, E., Marques, J.S., Bastos, R.C., Yigitcanlar, T.: Sustainable development of smart cities: a systematic review of the literature. J. Open Innovation: Technol. Market Complex. 3(3), 11 (2017). https://doi.org/10.1186/s40852-0170063-2 12. IPCC.: Climate Change 2007: Impacts, adaptation and vulnerability. In: Parry, M.L., Canziani, O.F., Palutikof, J.P., van der Linden, P.J., Hanson C.E. (eds.) Contribution of Working Group II to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change, p. 976. Cambridge University Press, Cambridge, UK (2007) 13. Sajjad, M., Li, Y., Tang, Z., Cao, L., Liu, X.: Assessing hazard vulnerability, habitat conservation, and restoration for the enhancement of mainland China’s coastal resilience. Earth’s Future 6, 326–338 (2018). https://doi.org/10.1002/2017EF000676 14. Hallegatte, S., Green, C., Nicholls, R.J., Corfee-Morlot, J.: Future flood losses in major coastal cities. Nat. Climate Change 3(9), 802–806 (2013). https://doi.org/10.1038/nclimate1979 15. Woodruff, J.D., Irish, J.L., Camargo, S.J.: Coastal flooding by tropical cyclones and sea-level rise. Nature 504(7478), 44–52 (2013). https://doi.org/10.1038/nature12855 16. Reed, A.J., Mann, M.E., Emanuel, K.A., Lin, N., Horton, B.P., Kemp, A.C., Donnelly, J.P.: Increased threat of tropical cyclones and coastal flooding to New York City during the anthropogenic era. Proc. Nat. Acad. Sci. USA 112(41), 12610–12615 (2015). https://doi.org/10. 1073/pnas.1513127112

Climate-Smart Landscapes for Sustainable Cities

581

17. Wood, N.J., Jones, J., Spielman, S., Schmidtlein, M.C.: Community clusters of tsunami vulnerability in the US Pacific Northwest. Proc. Nat. Acad. Sci. USA 112(17), 5354–5359 (2015). https://doi.org/10.1073/pnas.1420309112 18. Jones, B.: Natural disasters: cities build their vulnerability. Nat. Climate Change 7(4), 237–238 (2017). https://doi.org/10.1038/nclimate3261 19. Claudia, T., Benjamin, H.S., Chris, E.Z.: Modelling sea level rise impacts on storm surges along US coasts. Environ. Res. Lett. 7(1) (2012) 20. Xu, B., Feng, G., Li, Z., Wang, Q., Wang, C., Xie, R.: Coastal subsidence monitoring associated with land reclamation using the point target based SBAS-InSAR method: A case study of Shenzhen, China. Remote Sens. 8, 652 (2016). https://doi.org/10.3390/rs8080652 21. Xu, L., He, Y., Huang, W., Cui, S.: A multi-dimensional integrated approach to assess flood risks on a coastal city, induced by sea-level rise and storm tides. Environ. Res. Lett. 11 (2016). https://doi.org/10.1088/1748-9326/11/1/014001 22. Cutter, S.L., Finch, C.: Temporal and spatial changes in social vulnerability to natural hazards. Proc. Nat. Acad. Sci. USA 105(7), 2301–2306 (2008). https://doi.org/10.1073/pnas.071037 5105 23. Nicholls, R.J., Cazenave, A.: Sea-level rise and its impact on coastal zones. Science 328(5985), 1517–1520 (2010). https://doi.org/10.1126/science.1185782 24. Bijlsma, L., Ehler, C.N., Klein, RJ.T., Kulshrestha, S.M., McLean, R.F., Mimura, N., Nicholls, RJ., Nurse, L.A., Perez Nieto, H., Stakhiv, E.Z., Turner, R.K., Warrick, R.A.: Coastal zones and small islands. In: Watson, R.T., Zinyowera, M.C., Moss, R.H. (eds.) Impacts, adaptations and mitigation of climate change: scientfic-technical anases, pp. 289–324, Cambridge (1996) 25. Klein, R.J.T., Smit, M.J., Goosen, H., Hulsbergen, C.H.: Resilience and vulnerability: coastal dynamics or Dutch dikes? Geogr. J. 164(3), 259–268 (1998). https://doi.org/10.2307/3060615 26. Nicholls, R.J., Branson, J.: Coastal resilience and planning for an uncertain future: an introduction. Geogr. J. 164(3), 255–258 (1998) 27. Kristvik, E., Kleiven, G.H., Lohne, J., Muthanna, M.T.: Assessing the robustness of raingardens under climate change using SDSM and temporal downscaling. Water Sci. Technol. 77(6), 1640–1650 (2018). https://doi.org/10.2166/wst.2018.043 28. Barbosa, A.E., Fernandes, J.N., David, L.M.: Key issues for sustainable urban stormwater management. Water Res. 46(20), 6787–6798 (2012). https://doi.org/10.1016/j.watres.2012. 05.029 29. Ek¸si, M., Yılmaz, M., Özden, Ö.: Ya˘gmur bahçelerinin nicel de˘gerlendirilmesi: ˙Istanbul Üniversitesi Orman Fakültesi örne˘gi. J. Faculty Eng. Archit. Gazi Univ. 31(4), 1113–1123 (2016). https://doi.org/10.17341/gazimmfd.278467 30. Jennings, A.A.: Residential rain garden performance in the climate zones of the contiguous United States. J. Environ. Eng. 142(12) (2016). https://doi.org/10.1061/(asce)ee.1943-7870. 0001143 31. Müftüo˘glu, V., Perçin, H.: Sürdürülebilir kentsel ya˘gmur suyu yönetimi kapsaminda ya˘gmur bahçesi. ˙Inönü Üniversitesi Sanat ve Tasarım Dergisi 5(11), 27–37 (2015). https://doi.org/10. 16950/std.34364 32. Jaber, F., Woodson, D., LaChance, C., York, C.: Stormwater management: Rain gardens. The Department of Soil and Crop Sciences and Texas A&M AgriLife Communications, The Texas A&M System, USA (2012) 33. Do˘gangönül, Ö., Do˘gangönül, C.: Küçük ve orta ölçekli ya˘gmursuyu kullanımı. 2. Baskı. Teknik Yayınevi, Ankara (2008) 34. Department of Environmental Protection Bureau of Watershed Management.: Pennsylvania Stormwater Best Management Practices Manual. USA (2006)

582

C. Cengiz et al.

35. Demir, D.: Konvansiyonel ya˘gmursuyu yönetim sistemleri ile sürdürülebilir ya˘gmur-suyu yönetim sistemlerinin kar¸sıla¸stırılması: ˙ITÜ Ayaza˘ga Yerle¸skesi örne˘gi. Yüksek Lisans Tezi, ˙Istanbul Teknik Üniversitesi Fen Bilimleri Enstitüsü, Çevre Mühendisli˘gi Anabilim Dalı, Çevre Bilimleri ve Mühendisli˘gi Programı, ˙Istanbul (2012) 36. Parsaee, M., Demers, C.M., Hébert, M., Lalonde, J.F., Potvin, A.: A photobiological approach to biophilic design in extreme climates. Build. Environ. 154, 211–226 (2019). https://doi.org/ 10.1016/j.buildenv.2019.03.027 37. Boz, A.Ö., Cengiz, C.: Biophilic smart cities in ecological sustainability – smart city interaction. In: Kaya, L.G. (ed.) New Horizons in Architecture, Planning and Design, pp. 32–51. Gece Publishing, Ankara (2019) 38. Beatley, T., Newman, P.: Biophilic cities are sustainable, resilient cities. Sustainability 5, 3328–3345 (2013). https://doi.org/10.3390/su5083328 39. Africa, J., Heerwagen, J., Loftness, V., Ryan Balagtas, C.: Biophilic design and climate change: performance parameters for health. Front. Built Environ. 5 (2019). https://doi.org/ 10.3389/fbuil.2019.00028 40. Çorbacı, Ö.L., Özyavuz, M., Yazgan, M.E.: Peyzaj mimarlı˘gında suyun akıllı kullanımı: Xeriscape. Tarım Bilimleri Ara¸stırma Dergisi 4(1), 25–31 (2011) 41. Welsh, D.F.: Xeriscape North Carolina. National Zeriscape Council, USA (2000) 42. Özyavuz, A., Özyavuz, M.: Xeriscape in landscape design. In: Özyavuz, M. (ed.) Landscape Planning, pp. 353–360. IntechOpen (2012). https://doi.org/10.5772/38989 43. Dwivedi, A., Naresh, R., Kumar, R., Kumar, P., Kumar, R.: Climate Smart Agriculture, pp. 20– 42. Parmar publishers and distributors, Dhanbad, Jharkhand (2017) 44. Shedekar, V.: Decision tools for climate-smart agriculture. In: Verma, D., Shedekar, V., Murumkar, A., Sharma, R., Kumar, A., Rani, V., Yusuf, F., Brajendra. (eds.) Climate Smart Agriculture: Training Manual, Certified Training workshop during 2nd International Conference on Food & Agriculture 2018, pp. 23–34. Dhanbad, Jharkhand (2018) 45. Campbell, B.M.: Climate-smart agriculture - what is it? Rural 21 Int. J. Rural Dev. 51(4), 14–16 (2017) 46. Palombi, L., Sessa, R.: Climate-smart agriculture: Sourcebook. Food and Agriculture Organization of the United Nations (FAO) (2013). http://www.fao.org/docrep/018/i3325e/i3325e. pdf. Accessed 11 Sep 2020

Diversity and Seasonal Occurrence of Sand Flies and the Impact of Climatic Change in Aichoune Locality, Central Morocco Fatima Zahra Talbi6(B) , Mohamed Najy2 , Mouhcine Fadil3 , Nordine Nouayti4 , and Abdelhakim El Ouali Lalami1,5 1 Laboratory Biotechnology and Preservation of Natural Resources, Faculty of Sciences Dhar El

Mahraz, Sidi Mohamed Ben Abdellah University, 30000 Fez, Morocco 2 Laboratory of Agro-Physiology, Biotechnology, Environment and Quality, Department of

Biology, University Ibn Tofail, Faculty of Science, BP133, 14000 Kenitra, Morocco 3 Physico-Chemical Laboratory of Inorganic and Organic Materials, Materials Science Center

(MSC), Ecole Normale Supérieure, Mohammed V University In Rabat, Rabat, Morocco 4 Applied Sciences Laboratory, Water and Environmental Engineering Team, National School

of Applied Sciences, Abdelmalek Essaadi University, Tetouan, Morocco 5 Higher Institute of Nursing Professions and Health Techniques of Fez, Regional Health

Directorate Fez-Meknes, EL Ghassani Hospital, 30000 Fez, Morocco 6 Laboratory of Biochemistry, Neurosciences, Natural Resources and Environment, Faculty of

Sciences and Technologies, Hassan First University, BP 577, Settat, Morocco [email protected]

Abstract. Cutaneous leishmanias are an endemic in Morocco. The commune of Tazouta (locality of Aichoune) has been reported as a Leishmanian risk area. The impact of climatic factors on the epidemiology of leishmaniasis diseases has become an essential study for scientific research given the climate change experienced by the various Leishmanian foci. Weather changes in any of these parameters contribute to the creation of optimal ecological niches for the biological life of vectors. Therefore, we undertook this entomological study to better understand the distribution and epidemiology of this disease. The entomological survey was carried out in the town of Aichoune for one year from October 2013 to September 2014 using Sticky traps. The results of this study, undertaken from 2013 to 2014, showed the predominance of Phlebotomus sergenti (80.08%) followed by Ph. perniciosus (11.63%), Ph. papatasi (6.69%) and Ph. longicuspis (1.43%) while Ph. sergenti was active for only 6 months. These results could explain the increased incidence of cutaneous leishmaniasis caused by L. tropica in this study area. Ph. perniciosus and Ph. longicuspis also exhibited a wide distribution and a long period of activity indicating a potential high risk of transmission of Leishmania infantum. The high prevalence of sand flies was recorded in June when the temperature values reached respectively 25.8 ° C. The results show a positive correlation between temperature and the four vector species and a negative correlation with precipitation. These results could help the authorities to prevent from the risk of leishmaniasis. Indeed, medium-term climate forecasts are essential tools for developing a leishmaniasis warning system.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 583–593, 2021. https://doi.org/10.1007/978-3-030-66840-2_44

584

F. Z. Talbi et al. Keywords: Aichoune · Entomological survey · Density · Precipitation · Sand flies · Seasonal occurrence · Temperature

1 Introduction Leishmaniasis is one of the emerging diseases posing a serious public health problem. Since the beginning of the 1970s, the leishmanian situation became worrying in Morocco, which helped to develop a Franco-Moroccan study program to study the different leishmanian foci by trapping sandflies according to three meridians transects, from Rif to the Sahara. Currently, leishmaniases are serious diseases that must be taken into consideration and quickly reported (ministerial decree n° 683–95 of March 31th, 1995), and still constitute a real public health problem [1]. The two clinical observed entities, visceral and cutaneous, are widely distributed throughout the territory. These diseases are illustrated in three complex epidemiological entities: zoonotic cutaneous leishmaniasis (ZCL) with Leishmania major (L. major), anthroponotic cutaneous leishmaniasis (ACL) with Leishmania tropica (L. tropica) and VL with Leishmania infantum (L. infantum). Leishmania parasites are transmitted by Sandflies. L. major is transmitted by Phlebotomus (Phlebotomus) papatasi (Scopoli), Leishmania tropica, transmitted by Phlebotomus (Paraphlebotomus) sergenti Parrot [2] and L. infantum is transmitted by Ph. ariasi. Ph. perniciosus and Ph. longicuspis are the usual vectors [3, 4]. Sandflies are small insects belonging to the order Diptera, the Psychodidae family, the phlebotominae subfamily. There are four stages in the development of sand flies: egg, four larval stages, pupa and adult. In general, the habitat of sand flies is conditioned by three closely related vital needs. A vertebrate host on which females must feed, A dark, humid and temperate place in which spawning takes place and an appropriate larval nutrient medium [5]. It is known that several biotic and abiotical variables impact the seasonality distribution of sand flies. The impact of climate, urbanization, proximity of humans and domestic animals, organic matter in the soil, and vegetation type has a significant role in the distribution and abundance of sand fly populations [6]. Sandfly activity is reduced or suspended by excessive heat, cold, rain, and more importantly, by wind [7]. The structure of leishman’s foci depends on the volume of sandflies population. The impact of climatic factors on the epidemiology of diseases in general, and leishmaniasis in particular, continues to be the subject of research and analysis. Changes in one of these parameters contribute to the creation of ecological conditions favorable to the multiplication of vectors of certain diseases. Therefore, it is with the entomological aspect that it is appropriate to begin the study of the functioning of the foci previously delimited due to the census of human cases. In our case, the study region was selected from sectors with a high incidence of CL cases in the Province of Sefrou. In order to determine the effect of the temperature and precipitation parameters on the dynamics of leishmaniasis vectors, a study of the seasonal fluctuation of the sandfly population and its relationship with climatic factors was carried out from October 2013

Diversity and Seasonal Occurrence of Sand Flies

585

to September 2014 in the sector of Tazouta (Locality of Aichoune). They are transmitted from a mammal to another (including humans) by the bite of an infested female sandfly [8]. The acquisition of knowledge about the impact of climatic factors on the epidemiology of leishmaniasis diseases associated with the distribution of sand fly population can contribute to an efficient control method. This study will be of great interest to the hygiene and surveillance services for sand fly mosquitoes, of the Ministry of Health.

2 Materials and Methods 2.1 The Study Zone This study was carried out in the locality of Aichoune (33 ° 39’N, 04 ° 38’W) within the Tazouta sector, Province of Sefrou, region of Fès-Meknes (Fig. 1). It is located northwest of the middle Atlas Mountains in the Province of Sefrou. It is a recognized endemic focus of L. tropica [9]. Aichoune supports a semi-arid climate with an average annual rainfall of about 450 mm. The average altitude is 750 m.

Fig. 1. Presentation of Province Sefrou, Commune of Tazouta (Aichoun locality)

2.2 Period of Entomological Surveys During 2013, we carried out entomological and bioecological monitoring for 12 months from November 2013 to September 2014 on a bimonthly basis.

586

F. Z. Talbi et al.

2.3 Criterion for Choosing the Biotope Several types of biotopes were prospected during this study. To choose them, we searched for the appropriate conditions for the development of the pre-imaginal stages (namely sites rich in organic matter, high humidity and free of factors influencing their proliferation) which also play the role of resting places for adult vectors. 2.4 Sandfly Capture Techniques We proceeded to use the sticky traps. This is the best adapted method for the qualitative and quantitative record of sand flies in the Mediterranean region and the most used one in Morocco to obtain dead sand flies. Sheets, preferably white paper (25 × 20 cm), coated with castor oil should be used. A4 size paper (29.7 × 21) was not a right choice since it is placed in the evening at sunset in the places where sand flies rest or reproduce that why we removed it the next morning. After stripping the papers, the immediately recovered insects were placed in small tubes containing 70% of alcohol, labeled with the date and the capture station. 2.5 Identifications of Sand Flies After the dissection of the specimens, we identified the species. The identification of the species was carried out under a microscope using the determination key established by the guide of the Ministry of Health, [10–12]. 2.6 Entomological Data Analyses The recognized ecological indices were used in order to carry out a quantitative and qualitative evaluation of the results of our collected data in order to compare the composition of the different stations in the locality of Aichoune. • Sex ratio The sex ratio represents the number of males/females. • Relative frequency The link between the number of trapped individuals and the total number of individuals in the stand is called the relative frequency of that species.

 Relative frequency = ni ∗ 100 N Where: ni is the number of species i N is the total number of the collected sand flies

(1)

Diversity and Seasonal Occurrence of Sand Flies

587

• Density of sand flies This index is calculated from the sampling of adhesive traps, it is expressed by the number of the species concerned in the station per m2 of trap and per night of trapping. To calculate the trapped area, it will be necessary to take into account that both sides of the trap are functional. Calculating the density of sand flies allows us to detect seasonal fluctuations in the densities of each species in a given biotope and to know the durations of its invasion.    night of capture Density(D) = Number of sandfiles captured Trap surface m2 (2) Where: Trapped area = Trap area × Number of traps used × 2 2.7 Study of Meteorological Parameters During the whole year of 2013, the frequent ambient temperature (°C) was taken with a thermometer and we placed two data loggers. They were programmed to record the temperature in the study area throughout the year. Each month, we retrieved these parameters from the data logger and obtained the recorded averages. 2.8 Statistical Analysis All the analyses were carried out with the Statistics software (SPSS, version 20.0) and the Excel spreadsheet (version 2010). We used principal component analysis (PCA) to look for the existing correlations between temperature, precipitation, and month groupings that show similarity. Analyses were performed using Unscrambler software (version 9.7).

3 Results and Discussion 3.1 Characteristics of the Study Population • Diversity and relative frequency of sand flies The results of the catches at the level of the locality of Aichoun show that the distribution of the species varies from one station to another. During the study period, it was generally noticed during our captures by the adhesive traps that there were more males than females. These results are well documented by other researches, which confirmed the domination of males by contribution to the female [13, 14]. In a total of 1883, 230 females (12.21%) and 1653 males (87.78%) were collected from the four study stations. Three subgenus of the genus Phlebotomus, and one subgenus of the genus Sergentomyia were encountered during the study period. In total, five species have been identified (Table 1).

588

F. Z. Talbi et al.

Table 1. Systematic inventory of collected sand flies by sticky traps at Aichoun locality between October 2013 and September 2014. Genus

Subgenus

Species

Phlebotomus

Phlebotomus

Ph. papatasi

126

27

Larroussius

Ph. perniciosus

219

59

Ph. longicuspis

27

19

8

1508

122

1386

3

3

0

1883

230

1653

Paraphlebotomus Sergentomyia Total

Ph. sergenti S. minuta

N

F

M

Sex ratio

Rf

99

3.6

6.69

160

2.71

11.63

0.42

1.43

11.36

80.08

0

0.15

7.18

100%

N: Total number, F: Female, M: Male, Rf: Relative frequency

Comparative study of the sex ratio by species shows that males out number females in all species of the genus Phlebotomus except for Ph. longicuspis (0.42). Among 23 species defined in Morocco, five species were identified at the level of the study region. If we compare our results with those found precisely at an outbreak of cutaneous leishmaniasis Ouled Aid belonging to the neighboring Province of Sefrou [15, 16] we can deduce that the faunistic inventory that we have obtained is the same but with different densities. In the subgenus Larroussius, we have reported two species: Ph. longicuspis and Ph. perniciosus. These species are proven vectors of L. infantum [17]. In the subgenus Paraphlebotomus, we have identified only one species: Ph. Sergenti. It is the proven vector of L. tropica in Morocco [2]. This specimen reaches arid and semi-arid areas [18]. For the species, Ph. papatasi was also the only species belonging to the subgenus Phlebotomus in our collection. It is the proven vector of L. major in Morocco [17]. The last species belonging to the genus Sergentomyia is S. minuta which was poorly represented. These species are not of medical interest in humans [19]. Of the five vectors that we have described, Ph. sergenti, Ph. perniciosus, Ph. longicuspis and Ph. papatasi were the species found in this study. The surveys carried out at Tazouta (locality of Aichoune) during the years 2011 and 2012 showed that almost all the vector species of leishmaniasis persist with the predominance still of Ph. sergenti [20, 21]. Several studies have also shown that these species are the most common sand flies in Morocco [22, 23]. • Seasonnal occurrence of sand flies The density of sand fly species as well as the seasonal fluctuations are shown in each vector and are illustrated in Fig. 2. In Aichoune, the seasonal activity of sand flies extended from May to October (6 months). This period corresponds with the period of activity of sand flies in temperate zones [24]. The majority of sand flies, dominated by Ph. sergenti, were collected between May and August. Ph. sergenti shows a biphasic evolution with a first more important peak, during the month of June. A second peak was observed in August. The same species

Diversity and Seasonal Occurrence of Sand Flies

589

Fig. 2. Seasonal variations of sand fly species collected by sticky traps in Aichoun locality between October 2013 and September 2014.

was collected from Taza, a semi-arid zone in northern Morocco, with seasonal activity from June to November and with two very distinct density peaks [25]. 3.2 Characteristics of Climatic Factors The Aichoune region is characterized by a semi-arid climate, whose ecological parameters, temperature and precipitation, play an important role. These abiotic factors have an effect on the ecological life of the biocenosis. The locality of Aichoune experienced a quantity of precipitation of 270 mm throughout the year (2013–2014) with a variation of precipitation ranging from 0 mm to 73 mm. From the meteorological data recorded at the Mdez station in the Tazouta sector, we were able to draw the ombrothermal diagram (Fig. 3) which revealed the coexistence of two seasons. A dry spell lasts for five months (November to March) and a cold seven-month period lasts from April to October. August was considered the driest month, while January was recorded as the wettest month. The period of significant activity of sand flies in our study is between May and October. This period coincides with the dry period of the year characterized by an increase in temperature and a decrease in precipitation. 3.3 Statistical Analysis of the Impact of Abiotic Factors To determine the number of components, we adopt Kaiser’s criterion which says that during a normalized PCA, we retain the components whose eigen values are greater than 1. The Table of explained variability shows that only the first three components have proper values greater than 1.

590

F. Z. Talbi et al. PrecipitaƟon (mm) Temperature (°C)

70

30

60 PrecipitaƟon (mm)

35

25

50

20

Temperature (°C)

80

40 15

30

10

20 10

5

0

0

Fig. 3. Ombrothermal diagram of the Aichoune locality (2013–2014)

This implies that these three components can be considered for the explanation of the variability of the data. This table also displays the percentages of the variability explained by each component and the cumulative percentages. The graphical representation of these results is illustrated in the graph of eigen values (Table 2). Table 2. Contribution of number components of the total variance undergone by PCA analysis. Number component

Eigenvalue

Percentage of variance

Cumulative percentage

1

4.17148

59.593

2

1.28663

18.380

77.973

3

0.740901

10.584

88.557

4

0.521052

7.444

96.001

5

0.160626

2.295

98.296

6

0.108447

1.549

99.845

7

0.0108594

0.155

100.000

59.593

The correlation circle (Fig. 4) shows that five of the seven variables taken into account in the PCA contribute to the definition of the factorial plane F1 × F2. The graph of parameters reveals the existence of correlations between certain parameters. For species of medical interest to humans, the correlation is marked positive between the species Ph. sergenti, Ph. perniciosus, Ph. longicuspis and Ph. papatasi with each other and with temperature. On the other hand, the correlation between this group and precipitation marked negative.

Diversity and Seasonal Occurrence of Sand Flies

591

Fig. 4. The representation of variables on the factorial plane F1 and F2

Temperature has a significant impact on the activity of sand flies at a certain time. We could say that this abiotic parameter can act as a limiting factor at low value on the abundance of sand flies [26]. We have noted that low temperatures and very hot periods are factors that limit the activity of sand flies. The maximum number of sand flies was collected in June, the average temperature was 25.8 °C, while the minimum number of sand flies was sampled in September with an average temperature of 23.19 °C. This result corroborates with the data found in Spain [27].

4 Conclusion Changes in meteorological characteristics could impact the bioecological life of vector sand flies and consequently affect the Leishmanian risk. The statistical study of climatic data and the seasonal activity of sand flies shows that the rainy season is a limiting factor. On the other hand, the hot season with an optimal interval favors the biological activity of leishmaniasis vectors. Acknowledgments. We are grateful to National Institute of Hygiene. The authors thank the Health regional directorate of the Fes-Boulemane, provincial delegate of Ministry of Health, Sefrou province, and the staff of Aichoun locality for their cooperation, assistance, information, and help.

Conflicts of interest. The authors declare that they have no conflicts of interest

592

F. Z. Talbi et al.

References 1. Ministère de la Santé Publique.: Etat d’avancement des programmes de lutte contre les maladies parasitaires. Direction de l’épidémiologie et de lutte contre les maladies. Rabat, Maroc (2001). Author, F.: Contribution title. In: 9th International Proceedings on Proceedings, pp. 1–2. Publisher, Location (2010) 2. Guilvard, E., Rioux, J.A., Gallego, M., Pratlong, F., Mahjour, J., Martinez-Ortega, E., Dereure, J., Saddiki, A., Martini, A.: Leishmania tropica au Maroc III-Role de Phlebotomus sergenti. A propos de 89 isolats1. Ann Parasitol Hum Comp. 66, 96–99 (1991) 3. Duran-Martinez, M., Ferroglio, E., Acevedo, P., et al.: Leishmania infantum (Trypanosomatida: Trypanosomatidae) phlebotomine sand fly vectors in continental Mediterranean Spain. Environ. Entomol. 42(6), 1157–1165 (2013) 4. Killick-Kendrick, R.: Phlebotomine vectors of the leishmaniases: a review. Med. Vet. Entomol. 4(1), 1–24 (1990) 5. Abonnec, E.: Sandflies of the Ethiopian region (DIPTERA, SYCHODIDAE). Paris: ORSTOM, 289 (1972) 6. Guernaoui, S., Boumezzough, A.: Habitat preferences of phlebotomine sand flies (Diptera: Psychodidae) in Southwestern Morocco. J. Med. Entomol. 46(5), 1187–1194 (2009) 7. Killick-Kendrick, R.: Investigation of Phlebotomine Sandflies vectors of leishmaniasis, IndoUK Workshop on Leishmaniasis, pp. 72–83 (1983) 8. Rogers, M.E., Ilg, T., Nikolaev, A.V., Ferguson, M.A., Bates, P.A.: Transmission of cutaneous Leishmaniasis by sand flies is enhanced by regurgitation of fPPG. Nature 430, 463–467 (2004) 9. Talbi, F.Z., Janati Idrissi, A., Sandoudi, A., El Ouali Lalami, A.: Spatial distribution of incidence of leishmaniasis of different communes of Sefrou Province (2007–2010), central North of Morocco. In: Proceedings of the 4th International Conference on Smart City Applications (SCA’2019), Morocco, October 2019 Article No: 17, pp. 1–8, October 2019. https://doi.org/ 10.1145/3368756.3368992 10. Anonymous.: Fight against leishmaniasis. Guide to activities. Department of Epidemiology and Disease Control. Parasitic Diseases Service. Ministry of Health. Morocco (2010) 11. Lewis, D.J.: The phlebotomine sandflies (Diptera: Psychodidae) of the oriental region. Bull. Br. Mus. (Nat. Hist.) (Entomol.) 37, 217–343 (1978) 12. Killick-Kendrick, R., Tang, Y., Killick-Kendrick, M., et al.: The identification of females sandflies of the subgenus Larroussius by the morphology of the spermathecal ducts. Parassitologia 33, 335–347 (1991) 13. El Omari, H., Chahlaoui, A., Ourrak, K., et al.: Surveillance of Leishmaniasis: Inventory and Seasonal Fluctuation of Phlebotomine Sandflies (Diptera: Psychodidae), at the Prefecture of Meknes (Center of Morocco). Bull. Soc. Pathol. Exot. 111(5), 309–315 (2018) 14. Lahouiti, K., Bekhti, K., Fadil, M., et al.: Entomological investigations in moulay Yaacoub, leishmaniasis focus in the center of Morocco. Asian J. Pharm. Clin. Res. 9, 340–345 (2016) 15. Lahouiti, K., El Ouali Lalami, A., Hmamouch, A., Bekhti, K.: Phototropism of sand flies species (Diptera: Psychodidae) collected in a rural locality in Central Morocco. J. Parasitol. Vector Biol. 6(5), 66–74 (2014) 16. Lahouiti, K., El Ouali Lalami, A., Maniar, S., Bekhti, K.: Seasonal fluctuations of phlebotomines sand fly populations (Diptera: Psychodidae) in the Moulay Yacoub Province, centre Morocco: effect of ecological factors. Asian J. Pharm. Clin. Res. 7(11), 1028–1036 (2013) 17. Rioux, J.A., Lanotte, G., Petter, F., et al.: Cutaneous leishmaniases of the western Mediterranean basin, from enzymatic identification to eco-epidemiological analysis. The example of three “hearths”, Tunisian, Moroccan and French. Leishmania, Taxonomy and Phylogenesis. Eco-Epidemiological Applications, pp. 2365–6395. International Colloquium CNRS/ INSERM/ OMS, Mediterranean Institute of Epidemiological and Ecological Studies, Montpellier, France (1986)

Diversity and Seasonal Occurrence of Sand Flies

593

18. Rioux, J.A.: Eco-epidemiology of leishmaniasis in Morocco: review of 30 years of cooperation. DELM. Epidemiol. Bull. 37, 2–10 (1999) 19. Ramaoui, K., Guernaoui, S., Boumezzough, A.: Entomological and epidemiological study of a new focus of cutaneous leishmaniasis in Morocco. Parasitol. Res. 103(4), 859–863 (2008) 20. Talbi, F.Z., El Ouali Lalami, A., Janati Idrissi, A., Sebti, F., Faraj, C.: Leishmaniasis in central Morocco: seasonal fluctuations of phlebotomine sand fly in Aichoun locality, from Sefrou province, Pathology research international, Volume 2015, Article ID 438749, p. 4 (2015). http://dx.doi.org/10.1155/2015/438749 21. Talbi, F.Z., Faraj, C., EL-Akhal, F., El Khayyat, F., Chenfour, D., Janati Idrissi, A., El Ouali Lalami, A.: Diversity and dynamics of sand flies (Diptera: Psychodidae) of two cutaneous Leishmaniasis foci in the fes-boulemane region of Northern Morocco. Int. J. Zool. 2015, Article ID 497169, 6 pages (2015). http://dx.doi.org/10.1155/2015/497169 22. Boussaa, S., Neffa, M., Pesson, B., Boumezzough, A.: Phlebotomine sandfliies (Diptera: Psychodidae) of southern Morocco: results of entomological surveys along the MarrakechOuarzazat and Marrakech-Azilal roads. Ann. Trop. Med. Parasitol. 104(2), 163–170 (2010) 23. Rioux, J.A.: Trente ans de coopération franco-marocaine sur les leishmanioses: dépistage et analyse des foyers. Facteurs de risque. Changements climatiques et dynamique nosogéographique. Association des Anciens ELèves de L’Institut Pasteur, 168, 90–101 (2001) 24. Rioux, J.A., Golvan, Y.J., Croset, H.: Écologie des leishmanioses dans le sud de la France. Les Phlébotomes, échantillonnage, éthologie. Ann. Parasitol. Hum. Comp 42(6), 561–603 (1967) 25. Guessous-Idrissi, N., Chiheb, S., Hamdani, A., et al.: Cutaneous leishmaniasis: an emerging epidemic focus of Leishmania tropica in North Morocco. Trans. Royal Soc. Trop. Med. Hyg. 91(6), 660–663 (1997) 26. Guernaoui, S., Boumezzough, A., Pesson, B., Pichon, G.: Entomological investigations in Chichaoua: an emerging epidemic focus of cutaneous leishmaniasis in Morocco. J. Med. Entomol. 42(4), 697–701 (2005) 27. Gálvez, R., Descalz, M.A., Jiménez, M., et al.: Seasonal trends and spatial relations between environmental/meteorological factors and leishmaniosis sand fly vector abundances in Central Spain. ActaTropica 115, 95–102 (2010)

Environmental Challenges of Solid Waste Management in Moroccan Cities A. El Atmani1,2 , H. Chiguer1,2 , I. Belhaili1,2 , S. Aitsi1,2 , D. Elkhachine1,2 , K. Elkharrim1,2 , El Borjy Aziz1,2 , and Belghyti Driss1,2(B) 1 Laboratory of Environment and Sustainable Development, Faculty of Sciences,

University Ibn Tofail, B.P. 133, 14000 Kenitra, Morocco [email protected], [email protected] 2 RAK, Autonomous Office of Water and Electricity, Kénitra, Morocco

Abstract. In Morocco as everywhere in the world, the enormous amount of solid waste continues to increase in the metropolises, large, medium and small cities of the kingdom. Solid urban waste is heterogeneous and its qualitative composition varies enormously depending on the area, from one society to another, from one country to another, from one city to another, depending on the day of the week (holidays and others) and the wet or dry season. The lack of data on the characterization of waste, which is a prerequisite for any management, recycling and recovery strategy, is one of the main constraints to the implementation of an effective and sustainable waste management policy. Unfortunately in illegal or controlled landfills, odors and gas fumes have a real impact on neighboring dwellings. As a model landfill, in the OumAzza landfill the leachate pH values recorded during the study period ranged from neutrality to 7.3 and basic pH to 8.8 with an average of around 8.06. Temperature measurements give average values of 24.52 °C, maximum of 40 °C and minimum of 10 °C. At the Oum Azza landfill in Rabat-Sale, there are high concentrations of nitrogen (1204 to 5804 mg/L) and ammonia (644 to 4480 mg/L). Likewise, the phosphate concentration is high and varies between 8.1 and 75.8 mg/L. During the campaigns, the electrical conductivity is higher and exceeds the standard of 2700 µS/cm and varies from 20115 to 47100 µs/cm. The concentration of Na+ varies from 1308 to 4630 mg/L and that of Cl− also varies from 2340 to 7100 mg/L. At the same time, the concentration of sulphates is relatively modest and varies from 16 to 55 mg/L. The leachate from the Oum Azza landfill has significant concentrations of suspended solids, ranging from 88 to 1480 mg/L. These high loads generate high measurements of BOD5 and COD. The BOD5 varies considerably between 761 to 12976 and 48801 mg/L and the COD varies from 7296 to 23789 and even reaches 71.880 mg/L. The results of this study will constitute the basis and the foundations for the development of a framework for the treatment and recovery of waste generated in Moroccan cities, and therefore, to avoid the storage and burial of all this waste in controlled landfills. Keywords: Solids · Waste · Landfill · Leachates · Pollution · Rabat · Morocco © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 594–606, 2021. https://doi.org/10.1007/978-3-030-66840-2_45

Environmental Challenges of Solid Waste Management

595

1 Introduction Waste management in general, and municipal solid waste in particular, is one of the main challenges facing local communities and all stakeholders in the sector. The combination of a set of interrelated factors, ranging from population growth, to urban expansion, to the development of socio-economic and production activities, as well as to changes in lifestyles and consumption, generates a growing source of waste. These wastes give rise to direct and indirect negative effects, linked to the nature and quantity, treatment techniques and disposal methods. The methods of management and treatment of household and similar waste have diversified. Previously, illegal dumping remained the only way to get rid of solid waste adopted by the country. Almost all of these landfills are uncontrolled in the open, the waste is dumped there in mixed form (household, industrial and hospital) without any prior treatment. The Moroccan government, through the municipal councils, has encouraged the concession of the solid waste management sector to private individuals who have set the objective of rehabilitating the old illegal dumps and the creation of new controlled landfill sites with recycling and recovery of the material and treatment. of the effluents generated, especially the leachate after the technical landfill of the waste. In fact, following the promulgation and publication of Law 28–00 in the official bulletin n° 5480 of December 7, 2006 relating to the management of waste and its elimination and the implementing decrees, most of the big cities of the Kingdom are equipped with controlled landfills. Morocco currently has about 20 landfills built and in full operation and other projects that are under development. It is a huge effort to enforce this new law to better preserve and protect the environment. The production of household and similar waste in Morocco is rapid, it is estimated at 6.51 million tonnes per year with an average of 0.75 kg/inhabitant/day with a highly humid character. For the area of our study, the daily average of waste produced is about 1 kg/inhabitant/day, it is higher than that recorded at the national level. The Rabat landfill (Oum Azza landfill) is significantly chosen from among the many landfills in Morocco according to a specific spatial distribution representing the landfill in central Morocco. This discharge produces a leachate which is suspected of polluting the environment both at the level of groundwater and at the level of odor threatening the ambient air. Thus the statistical tools used have helped to highlight and explain this spatio-temporal pollution. The main objective of our research work consists of an environmental assessment of all controlled landfills in Morocco through the Rabat landfill by characterizing the raw leachate from this technical landfill center of Oum Azza. In a first part, this work will highlight the anomalies and failures raised with a description of the operating and operating modes of controlled landfills in Morocco. In a second part, the study will focus on the characterization and comparison of the physicochemical composition of the leachate generated at several sites of the controlled landfill in Rabat.

596

A. E. Atmani et al.

2 Study Area The region of Rabat-Salé-Zemmour-Zaër, which covers an area of 18194 km2 or 1.3% of the country is bounded (Fig. 1):

Fig. 1. Location of the Rabat-Sale-Zemmour-Zaër region (MATEE 1997).

• • • •

North and Northeast by the Gharb-Chrarda-Beni Hssen Region; In the West by the Atlantic Ocean; East and South-East by the Meknes-Tafilalt Region; South and Southwest by Chaouia-Ouardigha Region.

The population of the Rabat-Sale-Kenitra amounts to 4552585 inhabitants, or 8.07% of the total national population, of whom 3172955 in urban areas and 1379630 in rural area of the total population. Total regional population with an average density for this region of 251.8 inhabitants/km2 . The national average is 41.7 inhabitants/km2 [4]. The landfill of Oum Azza is chosen from the large number of landfills in Morocco. It is located in Rabat-Sale. This landfill produces a leachate that is suspected of causing environmental pollution of both groundwater and surface water as well as ambient air by the propagation of very bad toxic and allergenic odors [5].

Environmental Challenges of Solid Waste Management

597

The waste comes mainly from the transfer centers of Rabat, Sale and Temara. They are therefore transported by large trucks carrying 20 to 25 tons. Municipalities close to the site and some private organizations or companies bring their waste directly to the Oum Azza site. Treated waste is garbage, refuse from composting household waste and ordinary industrial waste. Production was estimated at 500 000 tons during 2011 [6]. 2.1 Rainfall The rainfall recorded by the Rabat-Sale airport weather station in the region is shown in Table 1. The water slide in Rabat during the whole of the year is about 555 mm. There are summer months (June, July, August and September) marked by very low rainfall. In contrast, November, December, January and February are marked by heavy rains [7]. Table 1. Annual means rainfall (mm) in 25 years. Station

Jan

Feb

Mar

April

May

Jun

Jull

Aug

Sept

Oct

Nov

Dec

Rabat

85.1

72.6

64.9

54.6

19.8

6.5

0.48

1.05

5.5

42.3

79.5

111

2.2 Temperature The analysis of monthly temperatures (Table 2) indicates that the Rabat region appears to be one of the most temperate in Morocco since the annual temperature ranges between the mean maximum temperature and the average minimum temperature are about 9.5 °C [7]. Table 2. Monthly average temperatures (T°C) in Rabat. Jan

Feb.

Mar

April

May

Jun

Jull

Aug

Sept

Oct

Nov

Dec

12,6

13,1

14,4

15,0

17,3

19,9

22,0

22,5

22.2

18,2

15,8

13,0

2.3 Wind According to the wind rose provided by the Rabat weather station at Rabat-Salé airport, the prevailing winds in Rabat come from the western sector in winter, spring and autumn. They are followed closely by those from the North and South. Only the East sector winds play a relatively negligible role during the wet season. Note in the Atlantic region, the Gharbi, a westerly wind (actually from north-west to southwest) blows in any season on the western coast. Always fresh, it is also a source of moisture and precipitation. The desiccating Chergui is of little relevance to the Rabat region (Fig. 2).

598

A. E. Atmani et al.

Fig. 2. Wind frequency distribution (wind rose) from 1995 to 2004 (A) and 2017 (B).

2.4 Landfill of Oum Azza The wild landfill is located in the rural commune of Oum Azza on the plateau of Aïn Aouda 20 km from Rabat. It is located between the Akrach river in the west and the reservoir of the Sidi Mohamed Ben Abdellah dam in the east, at a range between 160 and 200 NGM. Its area is about 110 Ha. The purpose of the landfill is to treat household and similar waste from the 13 urban and rural communes for a population of 572717 inhabitants with a maximum annual production of about 700,000 tons per year (Fig. 3-4) [8].

Fig. 3. Leachate storage pond at the Oum Azza landfill.

Environmental Challenges of Solid Waste Management

599

Fig. 4. Location of the Oum Azza landfill.

3 Material and Method The physicochemical analyzes are carried out as follows [9–12]: • In the field • The pH of the samples was measured using a Hanna pH meter; • The temperature and the conductivity of the samples are determined by a conductivity meter of Cond315i/SET type, WTW82362; • In Kenitra and Faculty of Science Environmental Laboratory • The chemical oxygen demand (COD) was determined using a DBC reactor; • The biological oxygen demand (BOD5) consumed for 5 days was assayed using a DBOmeter; • Suspended Materials MES were measured by filtration and assay; • Ammonium and nitrogen were measured by the distillation method of Parnas and Wagner by a distiller; • Sodium was measured using a flame photometer, type: JENWAY, CLINICAL PFP7; • Sulfates (mg/l) were determined by the colorimetric method; • Chlorides (Cl−) were dosed by Argentimetrie.designation.

4 Results and Discussion The pH is an indicator of water pollution. The recorded pH values during the study period ranged from neutrality at 7.3 to a basic pH of 8.8 with an average of about 8.06. Measurements of leachate temperature give mean values of 24.52 °C, maximum of 40 °C and minimum of 10 °C (Tables 3-4). Thermal exchanges between the atmosphere and the surface of cells or compartments are balanced [13]. So the rise of temperature during the summer and its fall during the winter are in concordance with the seasonal variation of the atmospheric temperature. The maximum value of the leachate temperature is above 30 °C, considered as limit value of direct discharges in the receiving environment [14].

600

A. E. Atmani et al.

Table 3. Representation of the physicochemical parameters of leachate from the Rabat landfill/Month

Av11Azza Av12Azza Av13Azza M14Azza S10Azza S12Azza S14Azza1 S14Azza2 Ao11Azza Ao13Azza J10Azza J11Azza J12Azza J13Azza F15Azza

T (eau) 20,6 18,5 22 33,3 33,6 28,5 38 38 28 36 24 30,2 21,5 20 10,5

T (air) 23,4 15 12 27 30,5 32 35 35 31,2 40 15,1 18 20,4 22 10

pH 8,1 8,1 7,8 8,3 8,2 7,8 8,5 8,5 7,9 7,56 7,34 7,9 8,1 7,9 8,8

CE

MES

32800 392 47100 211 30100 298 36900 662 20115 567 28000 99,5 38500 308 38500 308 26600 161 40200 730 27060 1467 35000 1480 35720 88 35700 415 37000 204

DCO DBO5

Cl-

Na+

NH4+ NTK

PT

13296 6126 6898 3446 1958 1960 17,8 18048 6526 3470 3010 3976 4340 8,1 18144 9638 2886 3020 2576 4844 75,8 10944 2956 4570 3200 4368 5880 40,8 7872 2406 4590 3020 3881 4095 46,4 7296 2756 5020 3470 3010 4396 19,9 11328 2876 7100 4630 4480 5908 42,9 11328 2876 4890 3450 4480 5908 42,9 10708 6461 5480 3680 2728 2982 22,7 23789 12301 5930 4270 3626 4480 26,7 10160 12976 3028 1308 3330 3770 43,7 7488 4800 2970 2560 3986 4410 45,2 7910 761 2560 2340 2499 3710 37,9 17203 7581 2800 2030 3850 4494 48,6 13065 3086 2340 2160 644 1204 16,5

SO43-

37 47 21 45 42 34 49 54 55 37 26 20 25 16 22

Table 4. Descriptive statistics Variable

Observations

Minimum

Maximum

Moyenne

Ecart-type

T (eau)

15

10,5000

38,0000

26,8467

8,0980

T (air)

15

10,0000

40,0000

24,4400

9,3186

pH

15

7,3400

8,8000

8,0533

0,3776

CE

15

20115,0000

47100,0000

33953,0000

6671,5927

MES

15

88,0000

1480,0000

492,7000

441,4219

DCO

15

7296,0000

23789,0000

12571,9333

4792,9517

DBO5

15

761,0000

12976,0000

5608,4000

3702,4140

Cl−

15

2340,0000

7100,0000

4302,1333

1577,8990

Na+

15

1308,0000

4630,0000

3039,6000

866,4566

NH4+

15

644,0000

4480,0000

3292,8000

1070,1074

NTK

15

1204,0000

5908,0000

4158,7333

1342,3390

PT

15

8,1000

75,8000

35,7267

17,1717

SO43-

15

16,0000

55,0000

35,3333

13,0639

Environmental Challenges of Solid Waste Management

601

At the Oum Azza landfill in Rabat-Sale, there are high concentrations of nitrogen (1204 to 5804 mg/L) and ammonia (644–4480 mg/L). Similarly, the phosphate concentration is high and varies between 8.1 and 75.8 mg/L. In the leachate of the Rabat landfill, concentrations remain high in the unloading concerned. During the 2011 campaigns, the electrical conductivity is higher and exceeds the standard of 2700 µS/cm and varies from 20115 to 47100 µs/cm. The concentration of Na+ varies from 1308 to 4630 mg/L and that of Cl− also varies from 2340 to 7100 mg/L. At the same time, the concentration of sulphates is relatively modest and ranges from 16 to 55 mg/L. The leachate from the Oum Azza landfill shows significant concentrations of suspended solids, ranging from 88 to 1480 mg/L. These high loads generate strong measurements of BOD5 and COD. BOD5 varies considerably between 761 to 12976 and 48801 mg/L and the COD varies from 7296 to 23789 and even reaches 71880 mg/L. At the end of our research work, the physico-chemical composition of the leachate from the studied landfill revealed the presence of a very significant organic pollutant load in terms of COD and BOD5 with average values respectively of the order 15 466.26 and 65 86.88 mg of O2/l, hence the BOD5/COD ratio = 0.4 which shows that these liquids are biodegradable with a predominant requirement of a biological treatment explained by the COD ratio/BOD5 = 2.35 which is less than 3. The average value of electrical conductivity is of the order of 36,536 µS/cm. The nitrogenous material represented by kjeldahl nitrogen (NTK) is 3,983.2 mg/l and NH4 + is 2,966.4 mg/l. In addition, the concentrations of Na+ and Cl− are also very high and they are proportional to the salinity concentrations. They are in the order of 2,989 mg/l and 4797 mg/l, respectively (Fig. 5). The physico-chemical analyzes carried out at the controlled landfill of Oum Azza in Rabat, show that the parameters analyzed greatly exceed the standards for direct discharge and even those used for irrigation operations(Fig. 6). Indeed, these results reveal that these leachates are qualified of poor quality compared to wastewater and they are characterized by a high pollutant load worrying for the preservation of the environment and human health(Fig. 7). As a result, the need for the installation of wastewater treatment plants with the implementation of high performance treatment processes, becomes an environmental necessity and this to avoid all risks and damage to the environment and the living environment of present and future living beings (Table 5). At the Rabat landfill there is a high concentration of Kjelhdal nitrogen and ammonia during the 2014 campaign, conversely this concentration is low at the Rabat landfill during the 2015 campaign. the other years between 2011 and 2013, these concentrations remain unchanged.

602

A. E. Atmani et al.

Fig. 5. Diagrammes des valeurs propres

Fig. 6. Projection of the variables on the factorial plane F1x F2 (56,66%)

Environmental Challenges of Solid Waste Management

603

Fig. 7. Projection des individus et des variables sur le plan factoriel F1x F2 (65,31%)

Table 5. Eigenvalues of the Principal Component Analysis F1 Valeur propre

4,5706

F2 2,7946

Variabilité (%) 35,1586 21,4973 % cumulé

35,1586 56,6559

The concentration of Na+ is very high (almost 4000 mg/l) in the summer season at the Oum Azza landfill in Rabat, this produces a remarkable increase in electrical conductivity (Table 6). At the same time, the concentration of Cl− varies considerably between 1500 and 2000 mg/l seasonally. There is a greater variation in the concentration of total phosphorus at the Oum Azza landfill in Rabat; the concentrations of Pt and SO43- are low in the summer and winter seasons.

604

A. E. Atmani et al. Table 6. Seasonal variations in Oum Azza leachate

Azza 2011

A11

T (eau) 27,3

Azza 2012

A12

25

26,2

7,9

31860

93,8

7603

Azza 2013

A13

26

24,7

7,8

35333

481

Azza 2014

A14

36,4

32,3

8,4

37967

426

Azza 2015

A15

10,5

10

8,8

37000

204

Years/Sites Code

T (air) 23,6

pH

CE

MES

DCO

7,9

28315

813,4

9904,8

DBO5

Cl-

Na+

NH4+

NTK

PT

SO43-

6553,8 4593,2 2802,8 3176,6 3443,4

35,2

36

1758,5

3790

19712

9840

3872

11200

2902,6

5520

3760

13065

3086

2340

2160

2905

2754,5

4053

28,9

29,5

3106,7 3350,7

4606

50,3

24,7

42,2

49,3

16,53

22

4442,7 5898,7 644

1204

Seasons/Sites code Te Ta pH CE MES DCO DBO5 ClNa+ NH4+ NTK PT SO43Spring PR 23,6 19,35 8 36725 390,75 15108 6312 4456 3169 3220 4256 35,6 37,5 Autumne

AR

8,2 31279

320,6

9456

2729

5400 3642,5

3963 5076,8

Summer

ER

34,53 33,13 32

35,6

7,7 33400

445,5

17249

9381

5705

3177

Winter

HR

21,24

17,1

8 34096

730,8

11165

5841

2740 2079,6

3975

38

44,8

3731

24,7

46

2862 3517,6

38,4

21,8

5 Conclusion Among the impacts of these illegal dumping, we find the production of leachate, which represents a large part of the diffuse pollution in the soil and water. Unlike biogas, which is dispersed in the atmosphere, leachate, due to its liquid nature, is a concentrated source of biological and chemical pollutants. The greatest risk associated with the production of leachate is the contamination of the water table. This results in the pollution of the underground water network and therefore the danger of exposing populations to health problems. In order to safeguard the natural environment and protect the environment, Morocco launched in 2008 a program to reform and upgrade the solid waste management sector: National Program for the Management of Household and Assimilated Waste (PNDM). Among the objectives of this program, which runs until the year 2023 with a cost of 40 billion dirhams, we find the rehabilitation and closure of all illegal dumps and the development and opening of new controlled landfills.

References 1. Anonyme.: Rapport sur l’Etat de l’environnement du Maroc, chap. IV: déchets, Secrétariat d’état chargée de l’environnement, Maroc (2001) 2. Anonyme.: Loi n° 28–00 relative à la gestion des déchets et à leur elimination BO n°5480 du 7 décembre 2006: « , Royaume du Maroc, Ministère de l’Aménagement du territoire, de l’eau et de l’environnement (2006a) 3. Chofqi, A.: Mise en évidence des mécanismes de contamination des eaux souterraines par les lixiviats d’une décharge incontrôlée ElJadida – Maroc, géologie, hydrologie, géo-électrique, géochimie et épidémiologie. Thèse doctorat national, Facultés des sciences El Jadida, Maroc (2004) 4. RGPH, Recensement Général de la Population et de l’Habitat, Haut Commissariat du Plan (2014) 5. AFNOR, “Déchets: caractérisation d’un échantillon de déchets ménagers et assimilés,” édition AFNOR, France (1996)

Environmental Challenges of Solid Waste Management

605

6. Benabou, R.: Evaluation de l’impact de la décharge de Koreate vers la mise en place d’un système de gestion intégré des déchets solides à Tiflet. Compostage couplé à la mise en décharge contrôlée des déchets ultimes, mémoire DESA, Faculté des Sciences Kenitra, Maroc (2002) 7. DGCL, Monographie général de la region de Rabat- Sale-Kenitra, Ministère de l’Intérieur, Direction Générale des Colléctivités Locales (2015) 8. Prenant, G.: Centte d’Enfouissement Technique d’Oum Azza, Regional Workshop Casablanca (2015) 9. Rodier, J.: L’analyse de l’eau: eaux naturelles, eaux résiduaires, eau de mer: physico-chimie, bactériologie et biologie, Ed. Dunod, Paris, France, 8, p. 1383 (1996) 10. Aloueimine, S.: méthodologie de caractérisation des déchets ménagers à Nouakchout (Mouritanie) contribution à la gestion des déchets et outils d’aide à la décision. Thèse doctorat d’état, Université de Limoges, Faculté des Sciences et Techniques, France (2006) 11. Bentoumi, A.: Mise en place d’un système de traitement des lixiviats de la ville d’El Jadida Mémoire de master, Université Paul Cézanne, France (2006) 12. DGRNE, Guide méthodologique pour l’évaluation des incidences sur l’environnement publié par la direction générale des ressources naturelles et de l’environnement (2000) 13. Aguilar-Juarez, O.: Analyse et modélisation des réactions biologiques aérobies au cours de la phase d’exploitation d’un casier d’un centre d’enfouissement technique, Thèse de Doctorat, INSA, p. 233 (2000) 14. CNS, Comité Normes et Standards, Ministère de l’Environnement du Maroc. Rabat (1994) 15. L’ADEME.: Enquête sur les installations de traitement des déchets ménagers et assimilés: le traitement biologique, ATOM (2002) 16. Hakkou, R., Pineau, J.L., Wahbi, M.: Le lixiviat de la décharge de déchets urbains de Marrakech-Maroc. Eau, industrie, les nuisances, no 261, p. 68 (2003) 17. MATUHE, Rapport sur l’Etat de l’Environnement du Maroc (REEM), Ministère de l’Aménagement du Territoire, de l’Eau et de l’Environnement, Département de l’Environnement, Observatoire National de l’Environnement du Maroc (ONEM), Octobre, Ed. 2, p. 296 (2001) 18. Elmarkhi, M., Sadek, S., Elkharrim, K.H., Benelharkati, F., Belghyti, D.: Contributions methods of statistical analysis of leachate from the landfill Ouled Berjal (Kénitra, Morocco). Int. J. Sci. Technol. Res. 3(7), July 2014 19. Elmarkhi, M., Sadek, S., Elkharrim, K.H., Benelharkati, F., Belghyti, D.: The impact of leachate on well water (City dump of Kenitra, Morocco). Int. J. Innov. Appl. Stud. 8(2), 705–716, September 2014. ISSN 2028-9324. http://www.ijias.issr-journals.org/ 20. Elmarkhi, M., Sadek, S., Elkharrim, K., Benel harkati, F., Dakir, Z., Belghyti, D.: Caractérisation physico-chimique du lixiviats de la décharge d’ouled berjal (Kenitra, Maroc). ScienceLib Editions Mersenne, 5(130203) (2013). ISSN 2111–4706 21. Chiguer, H., El khayyat, F., El rhaouat, O., Rifki, R., Bensaid, A., El kharrim, K., Et Belghyti, D.: Evaluation de la charge polluante des lixiviats de la décharge contrôlée de la ville d’Essaouira, Maroc. Int. J. Innov. Appl. Stud. 14(3), 863–874 (2016). ISSN 2028-9324 22. El khayyat, F.: Etude de la décharge de la ville de Sefrou: Impacts de la décharge sauvage sur l’environnement et la santé et installation d’une décharge contrôlée. Doctorat National, Université Ibn Tofail, Kénitra (2016) 23. Ozane, F.: les Lixiviat de décharge, les points de connaissance, TSM-Eau, pp. 289–312 (1990) 24. Cortez, S., Teixeira, P., Oliveira, R., et al.: Denitrification of a landfill leachate with high nitrate concentration in an anoxic totative biological contractor. Biodegradation 22, 661–671 (2011) 25. Naifar, E.: La matière fermentescible des ordures ménagères. Compost, lixiviat et métaux lourds. Thèse de doctorat d’état, Faculté des Sciences à Rabat (2003)

606

A. E. Atmani et al.

26. Damien, A.: Guide de traitement de déchets, Réglementations et choix des procédés, Dunod (7eme Edition), Paris, France (2016) 27. Billard, H.: Gestion des déchets, Technique de l’Ingénieur, l’expertise technique et scientifique de reference (2001) 28. ADEME.: Gestion des déchets ménagers. Etude de préfiguration de la compagne nationale de caractérisation des ordures ménagères. Rapport intermédiaire: les objectifs (2005) 29. WHO, (eds).: Guideline for Drinking Water Quality, Health Criteria and Other Supporting Information, World Health Organization, 2nd Edition, Geneva, vol. 2, pp. 940–949 (2004)

Evaluation of the Purification Performance of the WWTP by Aered Lagunage of the City of Oujda (Morocco) Belhaili Isslam1(B) , Alemad Ali1 , Aissati Touria2 , Elatmani Ayoub1 , Elkharrim Khadija1 , and Belghyti Driss1 1 Laboratory of Environnement and Natural Resources, Faculty of Sciences, University Ibn

Tofail, BP 133, Kenitra, Morocco [email protected], [email protected], [email protected], [email protected] 2 Laboratory Geosciences Water and Environment (LGEE), Faculty of Sciences Rabat, Rabat, Morocco [email protected]

Abstract. Under the combined effect of socio-economic progress, demographic growth and the concentration of populations, sanitation has over the years been a concern of authorities and public authorities at the national level and also at the level regional. Thus the increase in nuisances linked to poor disposal of wastewater, the deterioration of the environment and the scarcity of natural water resources, have prompted the various actors to safeguard the environment, to purify wastewater and to reuse for multiple purposes, including irrigation for cultivable areas in the region. It is for this purpose that the treatment plant has implemented, is intended for collective sanitation of domestic and industrial wastewater in the city of Oujda. It treated 40,000 m3 /day, which is the equivalent of 14 million m3 of wastewater that previously poured raw into the wadi. This report focused on the study of the operation of the wastewater treatment plant in the city of Oujda, after 3 years of operation, and the results found show that: * That most of the studied parameters respond well to the expected objectives of the wastewater treatment plant in the city of Oujda by aerated lagooning: pH, temperature, dissolved oxygen, conductivity, BOD5, etc. The BOD5 abatement rates are 84%, the COD is 85% and the SS is 80% are in line with the station’s objectives. * The COD/BOD5 ratio of wastewater at the entrance to the treatment plant is of the order of 1.95 < 2 (monitoring report from the period August 2012 to March 2013) and 1.75 < 2 (results of the analysis campaign of 05/06/2013) which indicates that the effluent to be treated is domestic wastewater and therefore biodegradable in nature. Therefore the Oujda wastewater treatment system by aerated lagoon is well suited for this type of effluent. Keywords: Wastewaters · Aerated lagoon · Treatment plant · Sanitation · Biodegradability · Oujda · Morocco © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 607–619, 2021. https://doi.org/10.1007/978-3-030-66840-2_46

608

B. Isslam et al.

1 Introduction Water resources constitute a fundamental and necessary element for the development of all human, economic and social activity. The key to sustainable development, water control is more of a global challenge than ever. Indeed, the rapidity of demographic growth, the development of the industrial sector and the slowness of the improvement in coverage rates have meant that the number of people without access to drinking water or sanitation is stops increasing [1]. As a result, the volume of wastewater continues to increase following the intense development of urbanization and industrialization and the evolution of consumption patterns which are the main origins of the various sources of environmental pollution. Among these sources, the production of wastewater and discharges into the receiving environment (sea, river, soil, etc.), without prior treatment, generates numerous waterborne diseases and the spread of epidemics [1]. These waters carry pollutants in solution or in suspension of a chemical (organic molecules, heavy metals, nutrient salts, etc.) or microbiological (bacteria, parasites) nature which, exceeding certain thresholds, lead to an imbalance in the natural functioning of aquatic ecosystems. To cope with this situation, wastewater must undergo treatment before being discharged into the receiving environment [2]. The applied process is mixed lagoons (with mechanical aeration), which is considered to be the most adopted technology meeting the needs of Morocco by the National Master Plan for Liquid Sanitation [3]. In this work, we will examine the operating mode and performance of the Oujda (Morocco) wastewater treatment plant in order to determine its main characteristics, issues as well as the main issues.

2 Materials and Methods 2.1 General on the Study Site The city of Oujda is located at the north - eastern limit of Morocco. It is located 12 km from the Algerian border and 60 km from the Mediterranean coast. According to the General Population and Housing Census (RGPH) of 2004, the population of the Prefecture counts 477,100 inhabitants made up of 410,808 urban (86%) and 66,292 rural (14%). Thus, the density exceeds 245 ha/km2 High Commission for Planning, 2004. The sanitation network of the city of Oujda is a unitary system, it is made up of nine collectors which evacuate wastewater and rainwater to the treatment plant [4]. The RADEEO (Autonomous Distribution Authority for Water and Electricity) has been managing the liquid sanitation network since October 4, 2001. The scope of the sanitation network extends to the entire urban area of the city of Oujda at about 890 km, and which serves almost 96% of the city’s districts.

Evaluation of the Purification Performance of the WWTP

609

2.2 Description of Oujda WWTP The Oujda wastewater treatment plant, located approximately 2.7 km north of the city’s urban perimeter limit, was filled on May 23, 2010. It is designed for 530,000 inhabitants equivalent, which corresponds to the reception of a nominal flow rate of 40,000 m3 /d. The purification of wastewater at the Oujda WWTP is provided by a series of basins which have all the following characteristics in common (see Fig. 1) shows the different basins that make up the WWTP:

Fig. 1. Synoptic diagram of Oujda WWTP. 1 - Anaerobic basins; 2 - Aerated basins; 3 - Maturation ponds; 4 - Aerated basins; 5 - Drying beds.

• Earthen construction; • Watertightness of the dikes (internal part) and the bottom is ensured by a layer of 40 cm of compacted clay; • The dikes have a slope of h/v = 2/1 for the exterior and interior facings; • The width of the dikes at the crest is 5 m and surmounted by a layer of everything from 40 cm thick to be drivable. 2.3 Sampling Method Samples were taken in specific locations (see Fig. 2), from the pre-treatment works, from the anaerobic basin, from the aeration basin and at the outlet of the maturation basin. The results of analyzes carried out in the RADEEO laboratory were used to evaluate the purification performance of the WWTP for a period of 8 months from August 2012 until March 2013. 2.4 Parameters Analysed All the analyzes and measurements necessary to quantify organic pollutants are standardized according to Moroccan standards, similar to French AFNOR standards, according

610

B. Isslam et al.

Sampling points

MaturaƟon basin Aerated Bassin Anaerobic Bassin

Fig. 2. Different sampling points

to the techniques recommended by [5]. Table 1 groups together the methods of analysis of the various pollution parameters studied. Table 1. Technics for analyzing the physicochemical parameters of water pollution. Parameters

Methods

Temperature of water in °C

Mercury Thermometer 0.1 °C

pH

Field pH meter type WTWHI 991003, Membrane filtration

SM Suspended Solids (mg/l)

Millipore (0.45 µm), oven passage and weighing (AFNOR T90-105)

COD (mg O2 /l)

Potassium dichromate oxidisability (AFNOR T90-101)

BOD5 (mg O2 /l)

(AFNOR T90-103)

Conductivity (µS/Cm)

Conductivity meter type WTW LF 330

Dissolved O2 (mg O2 /l)

Oximeter type WTW Oxi 315i/SET

3 Results and Discussion 3.1 Operating Parameters 3.1.1 The Flow The flow measurements are carried out at the entrance to the WWTP using ultrasonic flowmeters permanently installed after the pre-treatment. Figure 3 represents the results obtained.

Evaluation of the Purification Performance of the WWTP

611

34000

33000

32000

31000

30000

29000

28000

Fig. 3. Variations in inflow and outflow at the treatment plant during the period from August 2012 to March 2013

Figure 3 shows that the daily flow of wastewater is variable, it varies between 33,867 m3 /day and 30,737 m3 /d with an average of 31926.37 m3 /d. This hydraulic load is significant, it represents 79.81% of the nominal hydraulic design load (40,000 m3 /d). 3.1.2 Quality Parameters Characterizing the Raw Wastewater at the Entrance to the Station The results of the analysis of raw water quality parameters at the entrance to the Oujda WWTP are shown in Table 2. Table 2. Values of the main parameters of organic pollution of wastewater at the entrance to the Oujda WWTP. Parameters (mg/l)

Values at the entrance to the wastewater treatment plant

Characteristic of domestic wastewater in large cities (>100,000 inhabitants)

COD

698.58

850

BOD5

357.18

300

SM

493.14

300

612

B. Isslam et al.

Table 2 shows that the values of the main quality parameters characterizing the raw wastewater entering the station comply with the characteristics of domestic wastewater for large towns, which were well defined in 1998 by ONEE-Water Branch. and adopted during the study of the typology of Moroccan urban wastewater [6]. 3.2 Purification Performance Evaluation Parameters 3.2.1 PH Figure 4 shows that the monthly change in the pH of raw wastewater at the inlet and outlet of the Oujda city wastewater treatment plant during the period from August 2012 to March 2013 varies between 8.05 and 7.86 with an average of 7.94 in raw wastewater, and 8.36 and 8.12 with an average of 8.23 in treated water.

8.4 8.2 8 pH

7.8 7.6 Entrance

7.4

Anaerobic

7.2

Aerated

7 6.8

Outlet

6.6

Times (months)

Fig. 4. Evolution of the pH by each stage of treatment during the period 2012–2013.

It is interesting to note that the anaerobic treatment step is accompanied by a slight drop in pH, and as you go through the process, the pH increases slightly. This increase is more marked at the exit of the maturation tanks. The average pH of the water at the outlet is 8.23, complying with the discharge standard delimited between 6.5 and 8.5. Analysis of these results showed that the raw sewage water in the city of Oujda is generally close to neutral and acceptable for irrigation [7]. 3.2.2 Dissolved Oxygen The amount of dissolved oxygen in a body of water is an indication of how healthy the water is and its ability to support a balanced aquatic ecosystem. Figures 5 illustrate the evolution of the dissolved O2 concentration in the various basins during the period from August 2012 to March 2013.

Evaluation of the Purification Performance of the WWTP

613

8

Dissolved O2 (mg/L)

7 6 5 4

Entrance

3

Anaerobic basin

2

Aerated basin

1

Outlet

0

Times (Months)

Fig. 5. Evolution of dissolved O2 during the period August 2012 to March 2013

It is clear that dissolved O2 is almost zero at entry and in anaerobic ponds (Fig. 5). It is also important to note that the aerobic treatment step is accompanied by an increase in dissolved O2 as a result of the aeration system. The comparison of the dissolved oxygen values in the wastewater, analyzed with the surface water quality grid [8], allows us to deduce that this wastewater is of very poor quality. In wastewater treatment networks, the complete disappearance of dissolved O2 is accompanied by the appearance of H2S in the air, resulting from the reduction of sulfur compounds present in the effluents, and correlatively by the phenomenon of acid attack on concrete in pipes [9]. 3.2.3 Electrical Conductivity The electrical conductivity (EC) value is probably one of the simplest and most important for wastewater quality control. The results obtained are shown in Fig. 6. It is noted that the variation in the amplitude of the conductivity is low, which appears normal since there is little or no reduction in soluble ionic compounds during the various treatment processes. The average value of the conductivity recorded at the exit of the WWTP is 2610 µs/cm. This result is well within the Moroccan discharge standard for irrigation 8.7 ms/cm [7]. The results obtained highlight more or less significant monthly average values of the raw wastewater mineralization of the city of Oujda, the maximum value is 2.68 ms/cm and the minimum value is around 2.12 ms/cm. These increases in the electrical conductivity of wastewater could be explained on the one hand by the discharge of wastewater from industrial units connected to the city’s sewerage network (slaughterhouses, 2 industrial areas). On the other hand, the value of the electrical conductivity in drilling water and drinking water is high and is of the order of 1.82 ms/cm.

614

B. Isslam et al.

3

CE ms/cm

2.5 2

Entrance Anaerobic

1.5

Aerated

1

Outlet

0.5 0

Times (months)

Fig. 6. Evolution of conductivity during the period August 2012 to March 2013.

This high salinity observed in Oujda’s wastewater can be explained by the inflow of deep salty water from the Lias lands. In addition, the increase in the growth of the flows withdrawn for the production of drinking water and for agriculture is accompanied by the decrease in the water level of the aquifer which is restored by a large inflow of deep salt water. It can also be added the decrease in precipitation in the region in recent years, this precipitation feeds the water table [10]. 3.2.4 Dco The COD makes it possible to assess the concentration of organic or mineral matter, dissolved or suspended in water, through the quantity of oxygen necessary for their total chemical oxidation. The results are shown in Fig. 7. From Fig. 7, the recorded average COD value of raw sewage from Oujda is 698.58 mg/l. This value is very close to that recorded in 1995 by El Halouani, which is of the order of 710 mg/l. These concentrations are higher than the Moroccan standards for direct discharge (500 mg/l) [7]. 3.2.5 Evolution of BOD5 OD5 expresses the quantity of oxygen necessary for the degradation of the biodegradable organic matter of water by the development of microorganisms for 5 days. It is measured at a temperature of 20 °C and protected from light and air. BOD5 is one of the physico-chemical parameters for estimating biodegradable organic carbon in water. In a polluted environment, carbon is used by bacteria as a source of energy. It should be noted that this degradation can occur in the presence or absence of oxygen [11]. The evolution of BOD5 in the WWTP basins is illustrated in Fig. 8.

Evaluation of the Purification Performance of the WWTP

615

800 700

COD mg/L

600 500 400

Entrance

300

Anaerobic

200

Aerated

100

Outlet

0

Times (months)

BOD5 mg/L

Fig. 7. Evolution of the COD value monitoring from August 2012 to March 2013

450 400 350 300 250 200 150 100 50 0

Entrance Anaerobic Aerated Outlet

Times (months)

Fig. 8. Evolution of BOD5 during the period from August 2012 to March 2013

The BOD5 contents of raw wastewater from the city of Oujda vary between 337 mg/l and 443 mg/l with an average of 357.18 mg/l (Fig. 8). This value is lower than the content recorded in 1995 by El Halouani, on the other hand it is higher than the change in the average BOD5 content of Oujda wastewater between 1977 and 1991 (250 mg/l and 330 mg/l). This content has tended to stabilize in recent years. The values at the outlet of the WWTP vary between 42 mg/l and 69 mg/l with an average for treated wastewater of 56.65 mg/L.

616

B. Isslam et al.

3.2.6 MES C’est la concentration en masse contenue dans un liquide normalement déterminée par filtration ou centrifugation puis séchage dans des conditions définies et exprimée en mg/l. La Fig. 9 représente l’évolution de la MES durant la période d’Août 2012 au Mars 2013.

700

Entrance

SM mg/L

600

Anaerobic

500

Aerated

400

Outlet

300 200 100 0

Times (months)

Fig. 9. The evolution of SM during the period from August 2012 to March 2013

The average SS value at the outlet of the wastewater treatment plant is of the order of 97.31 mg/l (Fig. 9). This value is much lower than the Moroccan standards for indirect discharge (600 mg/l) and the standards for water intended for irrigation (2000 mg/l), on the other hand it is slightly higher than the concentration of Moroccan standards for direct discharge (50 mg/l) [12]. Removal of suspended solids from the raw wastewater effluent on the one hand protects natural aquatic environments and on the other hand minimizes agricultural expenditure due to plowing.

4 Discussion and Interpretation 4.1 Comparison of Results to Design Goals During the design of the Oujda treatment plant, the designers defined the discharge objectives (Table 3). Table 3 shows that the values at the outlet of the WWTP are lower than that fixed during the design of the station (Table 4). 4.2 Comparison of Results to Domestic Discharge Standards The average values of treated water and the specific limit values for domestic discharge are summarized in the table below. The values of the average concentrations of SM, COD and BOD5 in the treated lagoon water are much lower than the specific limit values for domestic discharge (Table 5).

Evaluation of the Purification Performance of the WWTP

617

Table 3. Planned performance of Oujda WWTP (Operations manual version 04, December 2010, BEFESA-STAIP) Parameters

Content

Yield

BOD5 average at exit 70% COD average at exit

75%

SM average at exit

72%

Table 4. Design values and values at the exit of Oujda WWTP Parameters Design values Values at the exit of the station Follow-up from August 2012 to Analysis results of 06/05/2013 March 2013 DCO

]. Ch takes any position a1 to an. The second phase is nominated as a process discovery operation. This phase aims at discovering the generic process model. The generic model defines all possible behaviours (sub-processes) that can handle an Ad-hoc BP (Cf. Figure 2). Here, the selected discovery algorithm is heuristics miner [19], because it focuses on the frequency of patterns. Thus, the main pattern of behaviours can be extracted and its output can be converted to BPMN. In this regard, the most often applied notation for representing dynamic behaviours is BPMN [21]. Indeed, heuristics miner stills a suitable algorithm to discover Ad-hoc processes.

Fig. 2. Dynamic Ad-hoc business process.

In this phase, we introduce the frequency concept, which aims at determining the representative BP for each frequency degree. The frequency values are implicitly changed when the produced BP is changed. For instance, the values 80% and 70% are producing the same process model. This process model defines a specific category A (sub-process).

746

Z. Lamghari et al.

If the produced model is contently changed, we define a new category B (sub-process). In this example, we assume that we have two process model variations with the frequency model values equal to 90% and 20%. (Cf. Figure 3 and Table 2)

Fig. 3. The generic BP with the fixed part (A) and the full Ad-hoc BP part (B: frequency degree, graphical rules and decision points).

Table 2. Cross-environmental variable values. Categories

Conditions

Model frequency (Fq) Specific goal (Sp)

Resource

Time

Antecedent

Category A

Rs1

t1 to t3

A

Fq1 (0,9)

Sp1

Category B

Rs2

t4 to t5

A

Fg2 differs from Fq1 (0,2)

Sp2

To refine the generic model with business conditions and rules, we use the decision miner algorithm [28]. For the business rules’ representation, we propose three possibilities which are correct pattern, incorrect pattern and indirect flow. This will be related to the resource, the time and the antecedent executed activity. This approach is inspired by the Oryx editor process [22]. The Correct and Incorrect pattern icons can be attached to sequence flows of a process model to define rules, whether the behaviours must be done (correct pattern) or must not be done (incorrect pattern). Indirect flow is created to define an additional sequence flow, which does not exist in an imported process model. Indirect flow gives the meaning of indirect following. If an indirect flow connects activity AA to BB, then BB has to follow AA but BB does not need to immediately follow after AA. For example, AA can be followed by CC and then BB, and it still complies with the rule. This phase’s outputs are: full-ad hoc BP, business conditions and rules.

Leveraging Dynamicity and Process Mining in Ad-Hoc Business

747

Third, we proceed to the Ad-hoc refinement phase. This phase’s outputs are: crossenvironmental variable and a categorized model (Cf. Figure 3). We define cross-environmental variable values. These values are used to determine, later, the adaptive Ad-hoc sub-process for execution. This variable contains different elements: available resource, optimized duration to execute a specific sub-process, the previously executed activity (antecedent of the fixed part or the external environment) and a specific business goal for achieving each Ad-hoc BP part/goal. For instance, category A is described with a resource Rs1, who is executed the category A of part B at the time (t1−t3), with a specific goal entitled Sp1. In this work, we define category as the group that often-assigned similarity in the defined activities or criteria. Last, we aim to combine each frequency degree with a specific set of business conditions and rules, to define properly different Ad-hoc BP sub-processes and objectives. After defining the cross-environmental variable values, we obtain the full Ad-hoc BP shown in Fig. 3. Figure 3 illustrates two categories A and B: Category A contains 5 activities (AA, BB, CC, DD, EE) with frequency degree more of less than 40%. Category B contains 3 activities (CC, DD and EE) with frequency degree less than 40%. The challenging task will be detailed in the On-line view, where we must adapt the Ad-hoc BP according to specific business objectives, their attached rules and behaviours. 3.2 On-Line After obtaining the full Ad-hoc BP with the cross-environmental variable, we can determine which sub-process must adaptively be executed during the dynamic selection phase. This is done by respecting certain boundaries of the cross-environmental variable values. For this purpose, we use the most famous mechanism to check business rules, which is the Complex Event Processing (CEP) mechanism. This mechanism is implemented in the conformance checking technique. The CEP [23] can detect whether there is a violation with certain rules or conditions (cross-environmental variable values) by analysing the logs of BPs execution. Generally, the CEP mechanism is used to monitor generated rules in real-time. Interestingly, the CEP engine can be used in an On-line setting for backward compliance checking. In this work, CEP is selected due to its ability in analysing high volume of event series and for allowing customized triggers for detecting violations among events, to ignore no matched Ad-hoc BP sub-processes. After checking conditions, we can select a suitable Ad-hoc BP category. For instance, executing category A with Rs1 resource during t1−t3 gives insights on the selected category during the On-line view. Results, in Fig. 3, present which category will be selected according to the defined conditions in Table 2. 3.3 Synthesis In this section, we have illustrated how to treat dynamically Ad-hoc BP. This approach aims at adapting Ad-hoc BP sub-processes dynamically with changes according to the cross-environmental variable values: business conditions and business rules. Throughout this approach, we use process discovery and conformance checking techniques.

748

Z. Lamghari et al.

In the Off-line view, event logs are used in combination with the process discovery technique (frequency concept) for constructing all Ad-hoc BP behaviours (subprocesses). This helps in obtaining information about BP representation, business conditions and rules, i.e. which set of conditions will define a specific category? And with which business rules? In the On-line view, the cross-environmental variable values are checked for reasoning in which way the Ad-hoc BP sub-process will be adaptive, to achieve the BP goal. This is done by comparing the obtained model (according to different frequency degrees) with event logs using the CEP checking mechanism.

4 Case Study: Business Process of an Insurance Claims’ Handler System In this section, we present a business process of an insurance claims’ handler system. This system makes sure that claims are handled efficiently and that payment for valid claims is made. Thus, this process consists of making decisions on the extent and validity of a claim. It can also check for any potential fraudulent activity. At the beginning, the claim is received, via an alert message, from the smart house [27] security system. Then, the alert message notifies different destinations as SOS and insurance systems about an emerged alert. At the end, systems treat the alert and respond to the notification alert (Fig. 4).

Fig. 4. Claims handler system.

In this case study, we use a dataset that comprises event logs regarding the activities of daily living performed by several individuals. The event logs are derived from smart home sensor data, which is collected in different scenarios and represented activities of daily digital claims customer (claimant). These scenarios include fire claims, water leak claims, etc. In this sense, event logs represent claimants’ behaviours. 4.1 Off-Line According to our proposed approach (Cf. Figure 1), the Off-line view aims at preparing event logs by filtering them from deficiencies as noisy data. Then, it allows constructing the generic BP and the full Ad-hoc BP (all claimants’ behaviours) based on:

Leveraging Dynamicity and Process Mining in Ad-Hoc Business

749

– Applying the process discovery technique in combination with the frequency concept on events. – Refining the Ad-hoc BP by attributing business conditions and rules to it. Business conditions are established to distinguish different categories (subprocesses) of the full Ad-hoc BP, to decide in dynamic manner the adaptive Ad-hoc sub-process and to ensure its modelling phase in next stages. 4.1.1 Preparing Event Logs To prepare event logs, we should extract logs from database and fetch them into the transformation function. Due to the difference of event logs format, a converter is needed for the purpose of transforming XML logs into CSV format. Event logs consist of various attributes for each case. However, the needed attributes involve case identification, event name and timestamp. After extracting the required attributes and transforming logs in CSV format, the CSV format is loaded into the PROM tool [26] and converted into XES format for evaluating the proposed approach. In our case, event logs are already prepared; we download logs from the source (http:// www.processmining.org/event_logs_and_models_used_in_book), then we import them in PROM for manipulation. The claims log has 8 different activities (check if sufficient information is available, AC: Assess claim, DLC: Determine Likelihood of Claim, RC: Register Claim, IP: Initiate Payment, CC: Close Claim, ACR: Advice Claimant on reimbursement and E: End), 132 process instances and 1642 executed events. It means that over a specific period, PROM has investigated 132 cases of claims and executed 1642 tasks. We observe that the 7 activities: RC, DLC, ACR, CC, IP, E are considered as chaotic activities (Cf. Table 3). These activities are executed anywhere [18] in the process. For instance, RC = {0, 5, 6,…}. According to the Ad-hoc definition, they are arranged into an Ad-hoc group, in order to discover the relationship bringing together these Ad-hoc activities into the process discovery phase. Table 3. Excerpt of claims handler system Log summary. Case_ID

Activities

The activity index (chaotic activities/Ad-hoc group)

1

RC = 0, DLC = 1, AC = 2, ACR = 3, CC = 4, IP = 5, E = 6

2

RC = 5, DLC = 4, AC = 2, ACR = 3, CC = 1, IP = 6, E = 0

3

RC = 6, DLC = 2, AC = 0, ACR = 5, CC = 3, IP = 4, E = 3







750

Z. Lamghari et al.

4.1.2 Process Discovery Heuristic miner is applied to discover a process model from event logs. The configurable dependency parameter of this algorithm is used to measure relationship between activities. A high value denotes a strong dependency, while a low value denotes a weak dependency. In this case, the main behaviours of the given real-life log are focused, therefore high dependency values are considered. They are experimented to review the output process models. Heuristic miner discovers first a heuristic net, and then the heuristic net is converted into Petri net. Lastly, Petri net is transformed into BPMN. For our claims handler process, we base on the following logic (Cf. Figure 5 and Fig. 6):

Fig. 5. Discovered process models according to the frequency degree.

Fig. 6. Generic process model representation.

First, we apply the heuristics miner plug-in. Second, we try to discover all possible process models from these event logs by balancing the frequency scroll. Hence, we determine three intervals, where process models are unchangeable (bold arrow). In this context, we obtain three models: the first model represents more than 80% of users’ behaviours. The second model represents 20% of users’ behaviours. The third model represents less than 20% of users’ behaviours.

Leveraging Dynamicity and Process Mining in Ad-Hoc Business

751

For our claims handler process, we have three different degrees of frequency, which are described respectively to two main processes: an Ad-hoc BP (instead of Ad-hoc group) and a fixed activity (check if sufficient information is available). The Ad-hoc sub-processes detailed in Figs. 7:

Fig. 7. The full Ad-hoc BP.

• The model with the frequency equal or more than 80%: we can finish the BP from the Assess Claim Activity. • The model with the frequency equal to 20%: we can finish the BP from the Determine Likelihood for Claim Activity. • The model with the frequency varies between 19% and 0%: we can finish the BP after applying the fixed activity (check if sufficient information is available). Next, we can apply the decision miner algorithm. This is done for defining decision points of this process model. 4.1.3 Ad-Hoc Refinement After obtaining the generic process model (fixed part + Ad-hoc BP), we define categories specifications for the Ad-hoc BP part (Cf. Table 4). It can be nominated as the full Ad-hoc BP (Cf. Figure 7). These categories are defined relatively to the resource condition (who is the responsible for executing this category?), the time condition (when the sequence

752

Z. Lamghari et al.

of activities is executed?) and the antecedent condition (what is the previous executed activity?). Table 4. Excerpt of the cross-environmental variable values. Categories (sub-process)

Conditions

General goal

Specific goal

Frequency

Resource

Time

A

Rs1

T1: from Check if t1 to t3 sufficient information is available

Claims handler

Possible End after the AC* activity

Equal or more than 80%

B

Rs2

T2: from Idem t4 to t5

Possible End after the DLC* activity

20%

C

Rs3

T3: from Idem t6 to t7

End after the check activity

From 19% to 0%

Antecedent

*: AC (Assess claim), DLC (Determine likelihood for claim)

In addition to the generic business process goal, we define a cross-environmental variable (set of conditions) with a specific business goal is mentioned, i.e., what is the reason to execute this category? In Table 4, we define three categories: A, B and C with the following conditions: • Rs1 nominated as responsible for executing the category A at the instance T1, when the Ad-hoc BP can be achieved after the Assess Claim activity. • Rs2 nominated as responsible for executing the category B at the instance T2, when the Ad-hoc BP can be achieved after the Determine Likelihood for Claim activity. • Rs3 nominated as responsible for executing the category C at the instance T3, when the Ad-hoc BP can be achieved after the check activity. After defining the Ad-hoc BP categories and its business conditions, we proceed to define business rules for each category. To do so, we insist on correct pattern, incorrect pattern and direct flow notations (Cf. Figure 6). In this example, we determine 14 rules: There are 6 rules for the category A, 7 rules for the category B and 1 rule for the category C. There are 9 correct rule patterns and 5 incorrect rule patterns. The process model with defined graphical rules is demonstrated in Fig. 7. At this stage, we can translate graphical rules into a rule language. In this work, Esper rule is selected because it is an open-source Java-based framework, and it is commonly used for CEP to analyse event series for detecting situations among events, which can be used for compliance checking. Esper is expressed in Event Process Language (EPL), which is a SQL-like language with for example SELECT, FROM, and WHERE clauses

Leveraging Dynamicity and Process Mining in Ad-Hoc Business

753

[24]. Indeed, The EPL syntax described as follows: The INSERT INTO clause is recast as a means of forwarding events to other streams for further downstream processing. A PATTERN may appear anywhere in the from clause of an EPL statement. The notation of “->” indicates a PATTERN of event ordering. Table 5 demonstrates an excerpt of three possibilities of rule definitions and their rule creations into the rule language. This table also demonstrates rule statements (A_r1, B_r2 and C_r1), which are interpreted from the graphical defined rules by the rule creation function for categories: A, B and C. The Rule creation function interprets graphical rule into rule statements, which are expressed in Event Process Language (EPL). The rule statements are deployed in CEP engine, for the compliance checking in the next step. Table 5. Excerpt of rules definition and creation. Business rules

Definition

Business rules creation

Categories

A_r1

DLC* must immediately follow RC*

INSERT INTO A_r1 SELECT A m.event AS mEvent, m.processId AS mProcessId, n.event AS nEvent, n.processId AS nProcessId FROM pattern [every m = RuleCheck(event = ‘RC’) - > (endEvent = RuleCheck(event = ‘End’, processId = m.processId)) and not n = RuleCheck(event = ‘DLC’, processId = m.processId)]

B_r2

AC* must immediately follow DLC*

INSERT INTO B_r2 SELECT B m.event AS mEvent, m.processId AS mProcessId, n.event AS nEvent, n.processId AS nProcessId FROM pattern [every m = RuleCheck(event = ‘DLC’) - > (endEvent = RuleCheck(event = ‘End’, processId = m.processId)) and not n = RuleCheck(event = ‘AC’, processId = m.processId)]

C_r1

E* must immediately follow check activity

INSERT INTO C_r1 SELECT C m.event AS mEvent, m.processId AS mProcessId, n.event AS nEvent, n.processId AS nProcessId FROM pattern [every m = RuleCheck(event = ‘check’) - > (endEvent = RuleCheck(event = ‘End’, processId = m.processId)) and not n = RuleCheck(event = ‘E’, processId = m.processId)]

754

Z. Lamghari et al.

After defining the three categories of our Ad-hoc BP and attaching their appropriate business conditions and rules, we obtain the cross-environmental variable values, the generic BP representation with the fixed part and the full Ad-hoc BP. 4.2 On-Line According to our proposed approach, the On-line view aims at selecting in dynamic manner the suitable Ad-hoc BP sub-process. This is done by applying the reasoning step using checking business conditions, business rules and by necessity the verification of the cross-environmental variable values. Generally, after the reasoning step we record all the relative information such as violations and solutions (into the Information system entity). This is rentable for future Ad-hoc BP improvement. In our case, we use the CEP for checking conditions and business rules. We ensure the reasoning phase with the purpose of selecting the suitable Ad-hoc BP sub-process at runtime. In this example, we suppose that we arrive to the execution phase, with the specific goal (possible End after the DLC activity). Therefore, the Ad-hoc sub-process, highlighted with the green colour, must be executed (Cf. Table 6). Table 6. Our case study cross-environmental variable values. Time

Resource Antecedent Category Specific goal

Frequency

T1 = From t1 to t3 Rs1

check

A

Possible End after More than 80% the AC activity

T2 = From t4 to t5 Rs2

check

B

Possible End after 20% the DLC activity

T3 = From t6 to t7 Rs3

check

C

End after the check activity

From 19% to 0%

*: AC (Assess claim), IP (Initiate Payment), DLC (Determine likelihood for claim), RC (Register Claim), E (End)

After checking conditions, we can determine the adaptive Ad-hoc sub-process. For instance, we execute the category B with the resource Rs2, in t4 to t5 time interval and the A as antecedent. This presents which sub-process will be selected according to the cross-environmental variable values (Cf. Table 6). Therefore, we select the Category B with 20% of frequency degree (Cf. Figure 8). 4.3 Synthesis In this section, we have applied our proposed approach on a concrete example. This example is about a business process of claims handler. Therefore, our approach consists of two views: The Off-line view defines different behaviours of the Ad-hoc BP. This is done based on the combination between the process discovery technique and the frequency concept (illustrates different frequency models for constructing the full Adhoc BP). Then, we define business conditions (mapping categories according to specific

Leveraging Dynamicity and Process Mining in Ad-Hoc Business

755

Fig. 8. Adaptive Ad-hoc sub-process according to specific conditions.

business conditions) and rules (attached to each Ad-hoc BP part = category) relatively to the generic discovered model. All these conditions values are attributed to a crossenvironmental variable. The On-line view illustrates how to execute dynamically the suitable Ad-hoc BP sub-process. This view is achieved using the conformance checking technique.

5 Conclusion To conclude, this paper presents a new approach treating dynamically Ad-hoc BP using process mining techniques. In this order, we define the still encountered issues related to dynamicity into Ad-hoc BP. Indeed, the Ad-hoc processes are not predefined and the dynamic selection is not matched. Thus, the lack of adapting processes according to real-time variables is observed. To this end, we present requirements that must be respected in an Ad-hoc BP definition. The Ad-hoc BP must be: generic, dynamic i.e., adaptive to real-time variable conditions (changes). Besides, we illustrate how process mining techniques are used to define Ad-hoc BP content and how the CEP tool can be rentable in terms of verifying the cross-environmental variable values and executing dynamically the suitable Ad-hoc BP sub-process. In this context, our approach encompasses two views: The Off-line view aims at constructing generic model, using the process discovery technique in combination with the frequency concept. The On-line view uses the conformance checking technique, to adapt the suitable sub-process of the modelled Ad-hoc BP taking into consideration the dynamicity concept. After execution, all information will be recorded for future improvement of the Ad-hoc BP. As further research, it could be interesting to investigate the use of different clustering techniques in the Off-line mode, in order to evaluate its impact on the quality criteria [29] and on the robustness of the produced models [30] relatively to different process mining challenges [25]. Acknowledgement. This work was supported by the National Center for Scientific and Technical Research (CNRST) in Rabat, Morocco.

References 1. Eversheim, W., Marczinski, G., Cremer, R.: Structured modelling of manufacturing processes as NC-data preparation. CIRP Ann. 40(1), 429–432 (1991)

756

Z. Lamghari et al.

2. Papavassiliou, G., Mentzas, G.: Knowledge modelling in weakly-structured business processes. J. Knowl. Manage. (2003) 3. Marrella, A., Mecella, M., Sardina, S.: Intelligent process adaptation in the SmartPM system. ACM Trans. Intell. Syst. Technol. (TIST) 8(2), 1–43 (2016) 4. Wodtke, D., Jordt, N., Kruse, M.: SAP SE. End user oriented workflow approach including structured processing of ad hoc workflows with a collaborative process engine. U.S. Patent 7,885,847 (2011) 5. Vasilecas, O., Rusinaite, T., Kalibatiene, D.: Dynamic business processes and their simulation: a survey. In: DB&IS, pp. 155–166 (2016) 6. Van der Aalst, W.M.P.: Process mining’. Process Mining: Data Science in Action, 2nd ed., pp. 3–23. Springer. NY (2016) 7. Pesic, M., Aalst, W.M.P.: A declarative approach for flexible business processes management. Business Process Management Workshops. Lecture Notes in Computer Science, pp. 169–180. Springer, Berlin, Heidelberg (2006) 8. Schonenberg, H., Mans, R., Russell, N., Mulyar, N., van der Aalst, W.: Process flexibility: a survey of contemporary approaches. In: Advances in enterprise engineering I (pp. 16–30). Springer, Berlin, Heidelberg (2008) 9. Dustdar, S., Hoffmann, T., Van der Aalst, W.: Mining of ad-hoc business processes with TeamLog. Data Knowl. Eng. 55(2), 129–158 (2005) 10. Duma, D., Aringhieri, R.: An ad hoc process mining approach to discover patient paths of an Emergency Department. Flex. Serv. Manuf. J. pp. 1–29 (2018) 11. Duma, D., Aringhieri, R.: An ad hoc process mining approach to discover patient paths of an Emer-gency Department. Flex. Serv. Manuf. J. (2020) 12. Kiedrowicz, M.: Dynamic business process in workflow systems. In: MATEC Web of Conferences, EDP Sciences, vol. 125, p. 02014 (2017) 13. Jain, P., Yeh, P.Z., Verma, K., Kass, A., Sheth, A.: Enhancing process-adaptation capabilities with web-based corporate radar technologies. In: Proceedings of the first international workshop on Ontology-supported business intelligence, pp. 1–6 (2008) 14. Vasilecas, O., Kalibatiene, D., Lavbiˇc, D.: Implementing rule-and context-based dynamic business process modelling and simulation. J. Syst. Softw. 122, 1–15 (2016) 15. Zhu, X., Recker, J., Zhu, G., Santoro, F.M.: Exploring location-dependency in process modeling. Bus. Process Manag. J. (2014) 16. Adams, M.: Dynamic workflow. In: Modern Business Process Automation, pp. 123–145. Springer, Berlin, Heidelberg (2010) 17. Cheng, H.J., Kumar, A.: Process mining on noisy logs—Can log sanitization help to improve performance? Decis. Support Syst. 79, 138–149 (2015) 18. Tax, N., Sidorova, N., van der Aalst, W.M.: Discovering more precise process models from event logs by filtering out chaotic activities. J. Intell. Inf. Syst. 52(1), 107–139 (2019) 19. Weijters, A.J.M.M., van Der Aalst, W.M., De Medeiros, A.A.: Process mining with the heuristics miner-algorithm. Technische Universiteit Eindhoven, Tech. Rep. WP, 166, 1–34 (2006) 20. Günther, C.W., Van Der Aalst, W.M.: Fuzzy mining–adaptive process simplification based on multi-perspective metrics. In: International conference on business process management, pp. 328–343. Springer, Berlin, Heidelberg (2007) 21. Cognini, R., Corradini, F., Gnesi, S., Polini, A., Re, B.: Business process flexibility-a systematic literature review with a software systems perspective. Inf. Syst. Front. 20(2), 343–371 (2018) 22. Decker, G., Overdick, H., Weske, M.: Oryx–an open modeling platform for the BPM community. In: International Conference on Business Process Management, pp. 382–385. Springer, Berlin, Heidelberg (2008)

Leveraging Dynamicity and Process Mining in Ad-Hoc Business

757

23. Horgan, D.S., Holliday, J.R., O’toole, E.: Johnson Controls Technology Co. Building access control system with complex event processing. U.S. Patent 10,565,838 (2020) 24. Intelligence, E.E.S.: Where complex event processing meets open source. Esper & NEsper 2, 2006–2013 (2006) 25. Lamghari, Z., Radgui, M., Saidi, R., Rahmani, M.D.: Passage challenges from data-intensive system to knowledge-intensive system related to process mining field. In: Proceedings of the ArabWIC 6th Annual International Conference Research Track, p. 3. ACM (2019) 26. Van der Aalst, W.M., van Dongen, B.F., Günther, C.W., Rozinat, A., Verbeek, E., Weijters, T.: ProM: the process mining toolkit. BPM (Demos) 489(31), 2 (2009) 27. Geneiatakis, D., Kounelis, I., Neisse, R., Nai-Fovino, I., Steri, G., Baldini, G.: Security and privacy issues for an IoT based smart home. In: 2017 40th International Convention on Information and Communication Technology, Electronics and Microelectronics, pp. 1292– 1297. IEEE (2017) 28. http://www.processmining.org/prom/decisionmining?s%5b%5d=decision&s%5b%5d= mining (Decision Miner) 29. Lamghari, Z., Radgui, M., Saidi, R., Rahmani, M.D.: Defining business process improvement metrics based on BPM life cycle and process mining techniques. Int. J. Bus. Process Integr. Manag. 9(2), 107–133 (2019) 30. Collard, M., Callejas, Y., Cavarero, J.L.: Business process management: a conceptual and operational optimisation approach. RITA 13(1), 7–22 (2006)

Modeling the Use of RFID Technology in Smart Processes Ihsane Abouzid(B) and Rajaa Saidi SI2M Laboratory, National Institute of Statistics and Applied Economics Institute, Rabat, Morocco {iabouzid,rsaidi}@insea.ac.ma

Abstract. Business environment variations are increasing the need for making corresponding changes in a business system. Supply Chain Management domain requires the integration of new technologies in order to increase productivity. The knows fast changing technologies included in the IoT which can be used to improve business processes of supply chain. Especially in retail sector, streamlining procurement process is one of the strategies to increase efficiency. Several efficiency challenges are raised due to the lack of information. Existing technology such as Radio Frequency Identification, with its potential to automate product authentications, makes possible to solve, or at least to reduce, most of the possible negative effects caused by the mismanagement of the supply chain processes. This article outlines the use of RFID technology for a smart management. Additionally, it aims at describing the processes currently implemented (“As-is”) in a manual manner modelled BPMN and target processes (“to-be”) modelled with BPMN extension. Keywords: Smart business processes · uBPMN · RFID · Procurement process · Business process improvement · Supply chain management

1 Introduction It is well known that supply chain management (SCM) is an integral part of most businesses and is essential to company success and customer satisfaction. It also includes the crucial components of coordination and collaboration with channel partners, which can be suppliers, intermediaries, third-party service providers, and customers [1]. Essentially, supply chain management integrates supply and demand management within and across companies. Business process improvement is now an everyday activity including the use of technology and rely less on humans for the data gathering. Therefore, ubiquitous systems urge to improve business processes (optimize time, cost and quality) that get more complex and less documented after each round of improvement which makes them even harder to improve (costly in time and money) and more error-prone (quality at risk) [2]. Ubicomp systems are less error-prone compared to traditional systems since they diversify their resources of data collection and do not rely primarily on humans. Ubicomp encloses numerous mechanisms for collecting data such as Automatic Identification and Data Capture (AIDC) [3] (e.g., RFID). © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 758–769, 2021. https://doi.org/10.1007/978-3-030-66840-2_57

Modeling the Use of RFID Technology in Smart Processes

759

RFID tags are a type of tracking system that uses smart barcodes in order to identify items. RFID tags utilize radio frequency technology. These radio waves transmit data from the tag to a reader, which then transmit the information to an RFID computer program. RFID tags are frequently used for merchandise. It may also be called an RFID chip. The application of RFID tags increases the level of data accessibility as well as the efficiency of their processing and flows in organizations, where the flow of information has crucial importance for the functioning of this technology in procurement process [5]. Since such solutions are rather complex and consist of many different components, which differ in terms of technology and the scope of their functioning and impact [6]. Thus, supply chain is mainly how to manage the different flows (product flow, financial flow and information flow). Therefore, the main loss in procurement process is because of the quality of the data. In addition, the way how business process can be presented, strongly impact how the process could be analyzed, implemented then improved. In this context BPMN covers graphical notations for process specification, process reengineering and reasoning about processes. While in a previous work [8], it was presented the ubicomp extension of BPMN which forms a clear conceptual link with ubiquitous business processes. It also provides suitable modeling elements to describe ubicomp interactions within the process flow. The proposed extension advances BPMN by offering support for modeling ubiquitous business rules. These cannot be represented using traditional BPMN. Therefore, the objective is to verify a possibility of modeling main activities performed in procurement process in the traditional form (without automation) and when using process automation systems, including RFID tag technology and a possibility of checking their compliance with the uBPMN model [9]. The remainder of this paper is organized as follows: In Sect. 2, we present related work of our paper context. In Sect. 3 we present the phases “as-is” and “to-be” of the business process. While in Sect. 4 describes the application of RFID technology and modeling with the uBPMN business process of procurement to fulfill those requirements of supply chain in retail field. And we walk through modeling the example of procurement business process before and after using RFID. We conclude the paper in Sect. 5 with the improvement of performance metrics related to the procurement process and some perspectives.

2 Related Work Nowadays, organizations use more and more business processes to capture, manage, and optimize their activities. In areas such as supply chain management, intelligent transport systems, demotics, or remote healthcare, business processes can gain a competitive edge by using the information and functionalities of IoT devices (sensors and actuators). Business processes use IoT information to incorporate real world data, to take informed decisions, optimize their execution, and adapt itself to context changes. In [12] it was concluded that IoTs and BPMN allow to domain’s experts to draw the business processes with specifying how to connect devices all together. Where all kind of users can use the same language to communicate; domain’s experts can focus

760

I. Abouzid and R. Saidi

on important business processes or important logic without worrying about the communication details. Another advantage of using BPMN is easy to integrate with other systems. The most important reason to use BPMN is that its tools are standardized. Current BPMN-based approaches already support modelers to define both business processes and IoT devices behavior at the same level of abstraction. However, they are not restricted to standard BPMN elements and they generate IoT device specific low [13]. The extended BPMN version satisfies every requirement, compared to the current language which only satisfies two requirements which are abstraction and real time [15]. BPMN provides support to represent the most common control flow modeling requirements. Other language proposals in the literature get an abstract representation of business processes, but the key aspect is that BPMN is supported by an executable model to enact instances of processes on ICT platforms [16]. Therefore, it was concluded that it is not possible to model the internet of things by using the existing language [11]. Hence BPMN should be extended. There are eight extended task types, which are discussed below. It is used as example the sensing task “Sense Wakefulness” and the actuation task “Open Curtains” to illustrate the proposed concepts [15] (Table 1). In a previous work [7], it was concluded that the best scenario for improving the process is to automate the entire process. Among technologies in supply chain management systems we find RFID as one of the most useful technologies for tracking the material flow and it is known that RFID technology offers the possibility of significantly enhancing tool management and tool procurement process. In [5], it was presented that the adoption of RFID technology or other related technologies for real-time sensing and communication provides strong capabilities for real-time service. RFID could be applied to solve the tool management problem, and the analysis of the RFID information flow help to understand the impact of integrating this technology into existing IT infrastructure and enterprise software [10]. The implementation for this application proved the benefit of applying RFID in spare parts supply chain where reducing more than 50% of data entry time and improving accuracy (Table 2). Additionally, the author of [1] explains the importance of reshaping industries by innovative technology platforms such as RFID. He proposes thus material tracking system which may provide a momentum for driving changes of construction industry and a drastic shift in mindsets of those engaged in construction industry is called for. The main goal of this work is to improve the productiveness and provides capabilities in real time by using ubicomp capabilities include, for instance, Automatic Identification and Data Capture (AIDC) such as RFID and modelling with its specifications.

Modeling the Use of RFID Technology in Smart Processes Table 1. Extended task types of BPMN Authors 1. Sperner 2011:an actuation task and a sensing task.

2. Meyer 2012: an actuation task and a sensing task.

3. Graja 2016: an actuation task and a sensing task.

4. Yousfi 2016: sensing task, reader task, image task, audio task and collector task.

5. Sungur 2013: an actuation task (!), sensing task (?)

6. Tranquillini 2012: an actuation task, sensing task

7. Chiu & Wang 2013: an actuation task, sensing task

BPMN Extensions

761

762

I. Abouzid and R. Saidi

Table 2. The performance between manual process and RFID-based automated process [5]. Evaluation process

Time

Manual entry process

2.1 min 98%

RFID-based automated process 1 min

Accuracy 100%

In this regard, we use uBPMN, specified in [8], to address the many cases in the field of ubiquitous computing that supersede the capabilities of BPMN v2.0. The limitation of BPMN v2.0 hinders the dissemination of ubicomp ideas within business process management and blocks any verification, validation and transformation initiative that may arise within ubiquitous business processes.

3 Process “as-is” and “to-be” Phases The PDCA model [4] may be used to manage business processes in organization. The model describes the life cycle of the process and consists of the four main steps (Cf. Fig. 1): Plan, Do, Check and Act. The model presents a closed cycle of the process management, where the processes should be constantly modified and improved.

1. PLAN (analysis, modelling)

4. ACT (improve, simulation, modelling)

2. DO (execution)

3. CHECK (analysis, simulation)

Fig. 1. PDCA MODEL

The modeling of the business processes according to BPMN for procurement process in the retail field may be divided into two main phases (Cf. Fig. 2): 1) modeling of the processes that are currently implemented in the organization (“as is”) and 2) modeling

Modeling the Use of RFID Technology in Smart Processes

763

of the processes for the target model of the organization’s functioning model (“to be”). It means that the change of the current processes to target processes must be defined by determining which mainly business goals are to be achieved. In this case, the main goal is improving the quality of data and manage the information flow.

Fig. 2. Modeling processes from «As-is» to «to be» phase

3.1 ‘As-is’ Phase Modeling as-is Process. BPMN have the activity point of view of the entire business process, which means that from BPMN we can see the flow of each process. Even manual processes can be presented clearly. BPMN also found many approval processes that can takes time to finish and effecting the continuous of the next process. Analysis of as-is Process. The as-is process takes an important part time to be processed. In retail activity, the data process has a high process time compared to others processes in supply chain management. The information takes important part of time to be delivered. it will then be improved by using the RFID process improvement targets as an alternative to the to-be process. In this process the information flow has the main impact on the whole activity because of its timing also it has a critical role to manage other processes.

3.2 ‘To be’ Phase Modeling to-be Process. Business process improvement targets and adjustment of improvement targets on insurance of delivering correct data in the supply chain management arriving to the procurement process of the supplier. After automation with RFID technology and modeling data flow through uBPMN, it will have a huge impact on improving the quality of data in the process.

764

I. Abouzid and R. Saidi

4 Modelling RFID Technology in Smart Processes Several reasons exist for the choice of BPMN as a suitable modelling language to be extended. Firstly, the BPMN is a standardized construct, actively being maintained by the Object Management Group (OMG). In addition, the meta model of the BPMN explicitly allows extensions and provides elements as support [14]. Further, the BPMN has evolved as a standard for process modelling and has reached a wide dispersion in the process modelling domain. Thus, many different modelling tools exist in the market, whereof a big part is freely available. Besides, the BPMN has an effective interchange format with the XML. The last-mentioned points are crucial for a model-driven approach, since an implementation of the conceptualized extension in an open source-modelling tool poses less effort in contrast to a completely new development of a modelling tool. The transformation of the graphical representation of a model in an XML document is the main link between a model and the corresponding smart processes of logistics information system. Additionally, we can use the already modelled use cases in BPMN from the previous research project for a direct comparison. Overall, the choice of the corresponding modelling method leads to BPMN due to the mentioned points. In logistics industry, especially in retail domain the automation in product monitoring and control, inventory, customer relationship management, is a typical issue dealt by the enterprises. The main goal of this work is to model and improve such problems by making use of the IoT, through managing ubiquitous information about the transported goods between three stakeholders: supplier, customer and 3PL (third party logistics). This work describes a solution for modeling the complete information flow of procurement process with uBPMN, which makes use of IoT technologies such as RFID in a secure and efficient way. First, it provides notation of RFID specifications as a ubiquitous system. Besides, it allows to observe the improvements happened such as the time taken to circulate the information. The operations of procurement process, which are directly providing inputs for processing other process such as production and/or purchasing processes, which are related to the information flow. 4.1 uBPMN Ubicomp gathers an abundance of technologies such as sensors and smart readers that cannot be represented by BPMN v2.0. Therefore, we stand with the idea that an extension of BPMN is better suited for representing ubicomp technologies. uBPMN [8] conservative extension (extended by the notation) of BPMN that al-lows the creation of end-to-end ubiquitous business processes as well as the portability of their definitions. uBPMN allows the creation of end-to-end ubiquitous business processes and guarantees their portability. Everything true about BPMN is also true about uBPMN [8]. In the previous work of [2, 8], among the extensions of BPMN for ubiquitous systems, we find the following tasks (Cf. Table 3).

Modeling the Use of RFID Technology in Smart Processes

765

Table 3. uBPMN specification Task Type Sensor task

Task Description A Sensor Task is a Task that uses some sort of sensor which could be wired, wireless or smart to sense a particular contextual dimension in the business environment.

Reader Task

A Reader Task is a Task that uses some sort of smart reader which could be of type Bar code, RFID, Biometrics, etc

Collector Task

A Collector Task is a Task that collects any piece of context aside from using sensors or smart readers. The collection is usually accomplished from databases or files or an outcome of another process put in a data object (short) or data-store (permanent).

Smart Object

A Smart Object is a declaration that a particular kind of data collected by either a Sensor Task or Reader Task will be used.

Icons

4.2 Business Process Model – Phase “as is” Figure 3 shows the business process model delivering goods from supplier to the customer in retail field where it has a 3PL. A 3PL (Third-Party Logistics) provider offers outsourced logistics services, which encompass anything that involves management of one or more facets of procurement uBPMN specification and fulfillment activities. In business, 3PL has a broad meaning that applies to any service contract that involves storing or shipping items. In this phase, the BPMN process describes legitimate activities that are performed manually, without RFID technology, for the purpose of their automation. The structural and analyses of the developed processes were also conducted in this phase. The following process presents mainly the information flow from supplier to 3PL till customer and vise versa. When the good arrived at the customer the information flow goes through 3PL, which means that the data between the supplier and customer cannot pass without 3PL. In order to improve the procurement process should starts from the equality of da-ta, especially in retail domain of food that has short life date. In this case, the data should be communicated as soon as possible. Because of the criticality of information, it will need a fast technology for sharing and business process model for improving procurement business process. Before applying RFID technology, the information returned from customer should be through the 3PL which takes time to be treated in the process to be finally shared with the supplier.

766

I. Abouzid and R. Saidi

Fig. 3. BPMN process: procurement process - phase “as is”

4.3 Business Process Model – Phase “to be” Figure 4 shows the business process model for procurement process after using RFID and IT systems - phase “to be”. The developed models in the “to be” phase extend the functionality of the “as is” phase by incorporating actions with the use of the RFID technology. RFID automate the information flow in the supply chain from supplier to customer.

Fig. 4. uBPMN procurement process - phase “to be”

Modeling the Use of RFID Technology in Smart Processes

767

Similarly, to the business process models of the “as-is”, the structural and analysis of the business process models of the “to be” phase was performed. The process simulation can be used to verify changes resulting from the application of the RFID technology in the modeled processes. The estimation of the execution time necessary for the discussed processes. As it shown below that the estimated average time of execution of the availability process performed using the traditional method was long because of passing each time through the 3PL. Whereas in case of the RFID-supported process - the time is much shorter. Where we can see that data can move fluently and directly from the supplier to the customer, it is apparent that the technology will increase the efficiency and quality of data, including high reliability of the whole supporting system, using the RFID technology (Fig. 5).

Fig. 5. uBPMN process: information flow returned to supplier - phase “to-be”

The availability of the new technologies contributes to their wide spreading in many different institutions. The application of the RFID technology in the procurement process inside of supply chain of retail field not only improved the functioning of the entire supply chain, but also ensured such high level of quality of data (Cf. Table 4). Table 4. Process as-is vs Process to-be performance metrics Performance metric

Process as-is

Process to-be

Time

Information should be through 3PL

Information shared in real time

Cost

Daily production badly impacted by information treatment in procurement process

Efficiency of management of information flow inside and outside procurement process

Quality

Risks of lost information

Confirmed information

5 Conclusion In this paper, we performed and presented a first part of the use of uBPMN for modeling smart process with an RFID technology. We identified the reason for using the extension for a BPMN in the IoT domain.

768

I. Abouzid and R. Saidi

The analysis of the processes as the definition and implemented instances constitute the further direction of the research. To evolve the contribution of this paper, we plan on keeping up with the advances of processes improvement. We aim to look after more technologies of supply chain requirement and processing different operations. The application of intelligent technology in logistics management optimization, modeling and simulation, global logistics management will offer basis to the logistics enterprises reasonable positioning, precise control and decision-making, maximize the supplier’s profit and provide the best services to the demanders, accordingly the enterprises market agility and competitiveness will be reinforced. Moreover, it will support the construction of smart cities with more refinery and dynamic means in management of production and living. As a future work, we plan to complete in this perception and work on the development of the specific BPMN extension for the use of IoTs in supply chain management.

References 1. Kwon, S.-W., Lee, M.-W., Han, J.-G., Cho, M.-Y., Park, J.-W.: Model development of the material tracking system for high-rise building construction project using RFID technology. Construction Technology, Gyeonggi-Do, pp. 411–712 (2004) 2. Yousfi, A., de Freitas, A., Dey, A.K., Saidi, R.: The use of ubiquitous computing for business process improvement. IEEE Trans. Serv. Comput. 9(4), 621–632 (2016). https://doi.org/10. 1109/tsc.2015.2406694 3. Jung, J.Y., Kong, J., Park, J.: Service integration toward ubiquitous business process management. In: Industrial Engineering and Engineering Management, 2008. IEEM 2008. IEEE International Conference on. IEEE, pp. 1500–1504 (2008) 4. Koszela, J.: Business process modeling for processing classified documents using RFID technology. MATEC Web of Conf. 76, 04005 (2016) 5. Cheng, C.Y., Prabhu, V.: Applying RFID for cutting tool supply chain management. Semantics Scholar (2007) 6. RFID system components and costs. RFID Journal, At http://www.rfidjournal.com/article/ view/1336/1/129 7. Humphreys, P.: Designing a management development programme for procurement executives. J. Manag. Dev. 20(7), 604–623 (2001) 8. Yousfi, A., Bauer, C., Saidi, R., Dey, A.K.: uBPMN: A BPMN extension for modeling ubiquitous business processes. Inf. Softw. Technol. 74, 55–68 (2016) 9. Chinosi, M., Trombetta, A.: BPMN: an introduction to the standard. Comput. Stand. Interfaces 34, 124–134 (2011) 10. Rodriguez, A., Fernandez-Medina, E., Piattini, M.: A BPMN extension for the modeling of security requirements in business processes. In: IEICE Transactions on Information and Systems, 90(4), 745–752 (2007) 11. Braun, R., Esswein, W.: Classification of domain-specific BPMN extensions. Springer, pp. 42– 57 (2014) 12. Stroppi, L.J.R., Chiotti, O., Villarreal, P.D.: A BPMN 2.0 extension to define the resource perspective of business process models (2011) 13. Martins, F., Domingos, D.: Modelling IoT behaviour within BPMN business processes. vol 121, pp. 1014–1022 (2017) 14. OMG: Business Process Model and Notation (BPMN), FTF Beta 2 for Version2.0 (2010). http://www.omg.org/spec/BPMN/2.0/Beta2/PDF

Modeling the Use of RFID Technology in Smart Processes

769

15. Meroni, G.: Integrating the internet of things with business process management: a processaware framework for smart objects. Politecnico di Milano – Dipartimento di Elettronica, Informazione e Bioingegneria Piazza Leonardo da Vinci, 32 – 20133 Milano (2015) 16. Cimino, M.G., Palumbo, F., Vaglini, G., Ferro, E., Celandroni, N., La Rosa, D.: Evaluating the impact of smart technologies on harbor’s logistics via BPMN modeling and simulation. 9, pp. 269–316 (2009)

Real Time Release Approach: At-Line Prediction of Ascorbic Acid Concentration in Nutraceutical Syrup via Artificial Neural Network Mikhael Anthony Felipe(B)

and Renann Baldovino

Manufacturing Engineering and Management Department, Gokongwei College of Engineering, De La Salle University, 2401 Taft Avenue, 0922 Manila, Philippines [email protected]

Abstract. Demand for greater volume of pediatric nutraceutical products become higher every year, whilst maintaining an expected high quality. Conventional methods that most pharmaceutical manufacturing companies still rely limit them to resolve this challenge. In this study, Artificial Neural Network (ANN) with a Multi-Layer Perceptron (MLP) architecture was used together with the FDA’s Process Analytical Technology (PAT) framework and strategy to achieve real time release of an Ascorbic Acid nutraceutical syrup prior packaging. Physicochemical properties, namely, pH, specific gravity, viscosity and percentage ascorbic acid concentration measurements were used for the training of the network. Preprocessing technique involving data smoothing was employed on each nonlinear, main effect relationship to reduce the noise and achieve better prediction accuracy. Upon training, the generated model achieved 92.55% accuracy using Bayesian regularization training algorithm. However, when compared to Levenberg-Marquardt training function, both network may be evaluated relatively similar. Yet despite the difference, it is found that ANN is a good tool for predicting physicochemical properties to achieve real time release during production. Keywords: Ascorbic acid · Artificial neural networks · Process analytical technology · Predictive analytics

1 Introduction 1.1 Background Conventional pharmaceutical manufacturing heavily relies on two processes– batch processing and laboratory testing of samples to evaluate quality of the finished goods before releasing it to the patients or the consumers (Pestieau et al. 2014). Likewise, through the years, these orthodox methods continually provide decent quality pharmaceuticals to the community. However, with the increasing role of the pharmaceuticals in the health care industry (Batten and Savage 2006), the demand for higher volume for pharmaceuticals and even nutraceuticals rises together with tighter FDA regulations. With the © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 770–781, 2021. https://doi.org/10.1007/978-3-030-66840-2_58

Real Time Release Approach

771

rise of Industry 4.0 or the 4th Industrial Revolution (Ding 2018), the effective use of machine learning (ML) principles and pharmaceutical body of knowledge throughout a product’s life cycle may push the efficiency of the current manufacturing process to higher performance and meet the rising the demand – the desired state. Process analytical technology (PAT) is an initiative by the United States Food and Drug Administration (US FDA) to facilitate guidance to pharmaceutical companies and achieve this desired state. As defined by the FDA, “PAT is a system for designing, analyzing, and controlling manufacturing through timely measurements (i.e., during processing) of critical quality and performance attributes of raw and in-process materials and processes, with the goal of ensuring final product quality”. Through this framework, real time release (RTR) and reducing production cycle times by using at-line measurements, predictions and controls may generate improvements in efficiency (Berntsson et al. 2002). According to Lee et al. (2018), ML has resurged to distinction as a prevalent tool in performing tasks based on rules learned from data instead of rules explicitly described by human (Booth et al. 2018). In recent years, artificial neural network (ANN) become popular because of its high flexibility and proven prediction abilities. With its capability of predictive analytics, this technology together with the PAT strategy may improve current pharmaceutical manufacturing processes. A pharmaceutical manufacturing facility would like to optimize its current process to achieve real time release of batch prior packaging thru at-line prediction of active pharmaceutical ingredient (API) concentration by using the physical properties of the specific drug product. In this paper, the physicochemical properties (pH, specific gravity, and viscosity), as well as the ascorbic acid (AA) concentration of a high-volume, pediatric, vitamin C dietary supplement shall be used on an ANN ML tool to develop a prediction model. The developed model shall be used for predicting AA concentration. The product’s long history of conforming assay concentration and stability purports the reason for choosing it as the model product for predicting the API concentration, since risks to patient safety and quality brought by this prediction would be minimal. By achieving RTR of a high-volume and high-demand product, cycle time for release to public of the product would certainly decrease. 1.2 Ascorbic Acid (AA) Vitamin C or AA is an essential cofactor for the enzyme prolyl hydroxylase, and a deficiency of this vitamin results in accumulation of abnormal collagen (Mandl et al. 2009), the most abundant protein essential in the formation of teeth, bones and skin. Collagen provides strength and elasticity to the skin and helps promote faster wound healing (Younes 1999). AA is of general importance as an antioxidant, because of its high reducing potential. However, under some conditions AA can also act as a pro-oxidant. AA is a 2, 3-enediolL-gulonic acid. Both of the hydrogens of the enediol group can dissociate, which results in the strong acidity of AA at a range of pH 1.0 to 2.5 at 25 °C. Standard laboratory assay testing requires the use of 1,1 – Diphenyl-2-picryl-hydrazyl (DDHP) standard assay on UV/Vis Spectrophotometer to determine concentration of AA antioxidant (Sharma and

772

M. A. Felipe and R. Baldovino

Bhat 2009). A required amount of 100 mg/10 mg per 5 mL or 175.0% to 185.0% of the label claim is needed to comply with company and regulatory specifications. 1.3 Power of Hydrogen (pH) pH is a measure of the acidity or alkalinity of a fluid. The pH of any fluid is the measure of its hydrogen ion concentration relative to that of a given standard solution. The pH may range from 0 to 14, where 0 most acid, 14 most basic, and 7 is neutral (Covington 2016). Acid/base properties greatly influence the pharmaceutical characteristics and the physicochemical properties of the product (Manallack et al. 2013). Company and regulatory specifications require that the product batch be within pH 2.70 to 3.10. 1.4 Specific Gravity (SG) SG is the ratio of the weight of the liquid in air at the specified temperature to that of an equal volume of water at the same temperature. Measures of SG of the batch displays the apparent content uniformity of the final dosage product. All batches produced for the target dietary supplement product must comply with the SG specification of 1.20 to 1.35. 1.5 Viscosity Viscosity measures the resistance of a solution to flow when a stress is applied. The viscosity of a solution is in poise units and typically expressed in centipoise (cps) when used in pharmaceutical applications (Mastropietro et al. 2013). Viscosity affects the rate at which how the product travels through production, and how long it takes the fluid to dispense into packaging (Lee et al. 2009). During production, determined product viscosity must range between 250 to 1500 cps before release to packaging.

2 Artificial Neural Network (ANN) 2.1 Multi-layer Perceptron In this study, MLP is employed. MLP is a type of a supervised ML algorithm and a class of feed forward ANN complimented with a back propagation feedback algorithm (Panchal et al. 2011). Due to the architecture’s ability to model highly complex and nonlinear system (Goncalves et al. 2013), MLP have proven to be one of the most commonly applied techniques in predictive analysis (Franceschi et al. 2018). Figure 1 shows a generic MLP uses processing points, known as nodes, in three different succeeding layers: input, hidden and output. Similar to the dendrites of a biological neuron, each node in a layer is linked to a node of a neighboring layer harboring a corresponding weight or connection strength. Each of these weights undergo iterative adjustments during network training until the difference between the predicted and the target output is within a desired minimum error, a process known as gradient descent (Ji et al. 2019). During training, the entire

Real Time Release Approach

773

Fig. 1. MLP-ANN architecture

data set is usually divided into three sets: training (70%), validation (15%) and testing (15%). While the training set is used to train the network, the verification set is applied to check the network’s error performance and to prevent overfitting. Lastly, the testing set is used to guarantee that the results on the training and verification set are indeed genuine (Korjus et al. 2016). In the pharmaceutical industry, application of ANN-MLP increases for its capability to model complex relationships between the process parameters and output product quality. Velasco-Mejia et al. (2016) utilized a Levenberg-Marquardt-Backpropagation learning algorithm for an MLP network complimented with genetic algorithm (GA) on different input process parameters to predict the crystal density. Consequently, creating a model to optimize the complex crystallization of a specific pharmaceutical product. Behzadi et al. (2009) used an MLP to validate the fluidized bed granulation process, wherein parameters such as bed temperature, air pressure and batch size were taken as input variable for the network’s training. In addition, studies have shown that use of ANN-MLP to achieve advance process predictability and product properties surpass that of conventional statistical methods (Heidari et al. 2016; Manda et al. 2019; Nadeem et al. 2017).

3 Materials and Methods 3.1 Data Acquisition and Preprocessing The studied product is a nutraceutical syrup with 100 mg/5 mL AA, has an appearance of a slightly hazy, yellow-green thickened syrup, and commercially manufactured using a 10,000 L compounding tank. In this study, data, containing input (pH, SG and viscosity) and output (assay concentration) physicochemical properties, from 264 batches of the nutraceutical product were utilized to construct a predictive model. A partial list, 12 out of 264 batches, of the actual raw data used for the development of the predictive model is seen in Table 1.

774

M. A. Felipe and R. Baldovino Table 1. Actual physicochemical properties measured pH

SG

Viscosity %AA Assay

2.99 1.28 1383

182.2

2.98 1.29 1213

183.5

3.05 1.27 1339

180.8

3.06 1.27 1093

180.7

3.04 1.26 1335

178.1

3.07 1.25 1270

181.1

2.97 1.25 1216

181.9

3.02 1.25 1204

182.1

3.03 1.25 1075

179.7

2.99 1.24 1233

180.0

3.06 1.21 1230

177.7

As per manufacturing procedure, each of the input properties were determined from samples taken at final volume. Each of the properties were determined using the following instruments: (1) ThermoFisher© Benchtop pH meter at 27 °C for the pH; Durac© hydrometer at 27 °C for the SG; Brookfield© Digital Viscometer at 27 °C for the viscosity; and Mettler Toledo© UV/Vis Spectrophotometer for the AA concentration. To ensure accuracy and meaningfulness of analysis, raw data was smoothened using the moving average technique. Data smoothing refers to the removal of random variation, noise and linear trends from data. Using the smoothdata function in MATLAB®, raw data was processed. Figure 2 shows the plot for the smoothed data against the raw data of the pH, SG, and viscosity versus AA concentrations. Based on the smoothed data of the input and output parameters, simple regression analysis was employed through Minitab® 17.1.0. Relations between AA concentration with other input parameters were tested using linear and quadratic functions. Statistically significant (p < 0.05) relationships between the input parameters and AA concentration were determind to be linear (viscosity vs. AA) and quadratic (pH, SG vs. AA) regressions. However, despite the significance, percent variation from each regression model only shows that the relationship is truly nonlinear and cannot be simply fitted using simple linear or quadratic relations. Table 2 lists all of the obtained binary relationships together with their fitted equations and percentage of variation.

Real Time Release Approach

pH

Raw Data

775

Smoothed Data

3.14 3.10 3.06 3.02 2.98 2.94 2.90 2.86

Specific Gravity (SG)

1.32 1.3 1.28 1.26 1.24 1.22 1.2 1550 Viscosity, cps

1400 1250 1100 950 800 176

177

178

179

180

181

182

183

184

185

% Ascorbic Acid Concentration

Fig. 2. Percentage ascorbic acid concentration versus pH, specific gravity and viscosity

Table 2. Regression models for assessing %AA concentrations Relation to AA Fitted equation

R2

pH

−115.9 + 1.334(AA) – 0.003743(AA)2 0.61

SG

17.52 – 0.1896(AA) + 0.000551(AA)2

0.65

Viscosity

−4873 + 33.82(AA)

0.22

3.2 Neural Network Model This study utilized a 3-layer ANN-MLP with back propagation through a network architecture shown in Fig. 3. The MLP network’s hidden layer is composed of 10 sigmoid neurons for the training and validation. This configuration utilized and tested 3 different training functions to determine the optimal function for the prediction of %AA concentration. Of the total 264 batches, 70% of which went to the training set, 15% allocated to the validation set,

776

M. A. Felipe and R. Baldovino

Fig. 3. AA prediction model architecture

and the remaining 15% to the test set. MATLAB®’s neural fitting tool (nftool) was used to map the relationship of the input data set to a single numeric output.

4 Results and Discussion MLP network at three different training functions yielded the performance values listed in Table 3. Table 3. Regression models for assessing %AA concentrations Training algorithm Regression LMANN

0.92123

BRANN

0.92547

SCGANN

0.88199

As shown in the comparison of the different training algorithms used, Bayesian regularization ANN (BRANN) generated the highest regression coefficient and the best fitting regression. In addition, as expected among regularization techniques, both Bayesian regularization and Levenberg-Marquard ANN (LMANN) exhibited higher functioning approximation than scaled conjugate gradient ANN (SCGANN) (Kayri 2016). Despite BRANN’s higher correlation than LMANN, the observed not so significant difference between their regression values may be attributed to BRANN’s algorithm. BRANN is a network training function that uses Bayesian regularization to update the weight and bias values, and use the Gauss-Newton approximation to the Hessian matrix available inside the Levenberg-Marquardt algorithm. In this process, Bayesian regularization minimizes a linear combination of squared errors and weights, and modifies the linear combination so that at the end of training the resulting network has good generalization qualities (Mahapatra and Sood 2012). In addition, even though BRANN achieved a good generalized model than LMANN, the later exhibited far faster convergence (see Fig. 4). LMANN achieved faster convergence due to the fact that it approaches 2nd-order training speed without having to

Real Time Release Approach

777

Fig. 4. Training performance of (a) LMANN and (b) BRANN

compute the Hessian matrix (Mahapatra and Sood 2012). Similarly, a study by Kayri (2016) proved that LMANN converges faster in back propagation algorithms. As seen in Fig. 5, as the sum of squares during the training of BRANN stabilizes both mu and gradient steady out with each other for each epoch.both mu and gradient steady out with each other for each epoch. During preprocessing, all of the input parameters exhibited a noisy and nonlinear relationship with AA concentration. This situation puts BRANN in an advantage over LMANN to approximate the target outputs. The model generated by BRANN (see Fig. 6) puts it in a higher accuracy than LMANN (see Fig. 7) though the latter uses lesser epochs than the former.

778

M. A. Felipe and R. Baldovino

Fig. 5. Training state record of BRANN

Fig. 6. Regression plot of Bayesian regularization ANN

Real Time Release Approach

779

Fig. 7. Regression plot of Levenberg-Marquardt ANN

5 Conclusion Though Bayesian regularization training algorithm exhibited higher accuracy score than Levenberg-Marquardt function, an approximately small insignificant difference in performance index between the two exist. A closer look in both of their training performance yielded LMANN far faster than BRANN to converge. These two findings indicate that BRANN generalizes a model trend accurately than LMANN; however, the later surpasses the former by using a faster algorithm. Despite these two differences, a 92.55% accuracy score of the ANN model was achieved in predicting the %AA concentration. This also shows the use of ANN may provide new approaches and methodologies in achieving real time release of historically stable nutraceutical or pharmaceutical syrups. It appears that there is a possibility of estimating percentage AA concentration by achieving more values and cleaner data. High noise and variability may have prevented to achieve higher accuracy scores, and that further preprocessing methods or more accurate data acquisition is needed. Acknowledgement. The authors would like to express their gratitude to the United Laboratories, Inc. through Amherst Laboratories, Inc. in supplying the training data used in the experiment and to the De La Salle University for the dissemination support.

780

M. A. Felipe and R. Baldovino

References Batten, L.M., Savage, R.: Information sharing in supply chain systems. Global Integr. Supply Chain Syst. pp. 67–82 (2006). https://doi.org/10.4018/978-1-59140-611-2.ch005 Behzadi, S.S., Prakasvudhisarn, C., Klocker, J., Wolschann, P., Viernstein, H.: Comparison between two types of artificial neural networks used for validation of pharmaceutical processes. Powder Technol. 195(2), 150–157 (2009). https://doi.org/10.1016/j.powtec.2009.05.0 Berntsson, O., Danielsson, L., Lagerholm, B., Folestad, S.: Quantitative in-line monitoring of powder blending by near infrared reflection spectroscopy. Powder Technol. 123(2–3), 185–193 (2002). https://doi.org/10.1016/s0032-5910(01)00456-9 Booth, A., Halhol, S., Merinopoulou, E., Oguz, M., Pan, S., Cox, A.: Pmu1 - Frequency of reportable adverse events in health-related social media posts. Value in Health 21, S309 (2018). https://doi.org/10.1016/j.jval.2018.09.1837 Clinical biochemistry of domestic animals (1997). https://doi.org/10.1016/b978-0-12-396305-5. x5000-3 Covington, A.K.: Definition of pH scales, standard reference values, measurement of pH and related terminology. IUPAC Standards Online (2016). https://doi.org/10.1515/iupac.55.0404 Development and implementation of a national quality assurance framework. United Nations National Quality Assurance Frameworks Manual for Official Statistics, 37–47 (2019). https:// doi.org/10.18356/58c620ef-en Ding, B.: Pharma industry 4.0: literature review and research opportunities in sustainable pharmaceutical supply chains. Process Saf. Environ. Prot. 119, 115–130 (2018). https://doi.org/10. 1016/j.psep.2018.06.031 Franceschi, F., Cobo, M., Figueredo, M.: Discovering relationships and forecasting PM10 and PM2.5 concentrations in Bogota, Colombia, using artificial neural networks, principal component analysis, and K-means clustering. Atmos. Pollut. Res. 9(5), 912–922 (2018). https://doi. org/10.1016/j.apr.2018.02.006 Goncalves, V., Maria, K., Da Silv, A.B.: Applications of artificial neural networks in chemical problems. Artificial Neural Networks - Architectures and Applications (2013). https://doi.org/ 10.5772/51275 Gradient descent algorithms. An Introduction to Neural Networks (1995). https://doi.org/10.7551/ mitpress/3905.003.0011 Heidari, E., Sobati, M.A., Movahedirad, S.: Accurate prediction of nanofluid viscosity using a multilayer perceptron artificial neural network (MLP-ANN). Chemometr. Intell. Lab. Syst. 155, 73–85 (2016). https://doi.org/10.1016/j.chemolab.2016.03.031 Ji, J., Chen, X., Wang, Q., Yu, L., Li, P.: Learning to learn gradient aggregation by gradient descent. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (2019). https://doi.org/10.24963/ijcai.2019/363 Kayri, M.: Predictive abilities of Bayesian regularization and Levenberg–Marquardt algorithms in artificial neural networks: a comparative empirical study on social data. Math. Comput. Appl. 21(2), 20 (2016). https://doi.org/10.3390/mca21020020 Korjus, K., Hebart, M.N., Vicente, R.: An efficient data partitioning to improve classification performance while keeping parameters interpretable. PLOS ONE 11(8), e0161788 (2016). https://doi.org/10.1371/journal.pone.0161788 Lee, C.H., Moturi, V., Lee, Y.: Thixotropic property in pharmaceutical formulations. J. Controlled Release 136(2), 88–98 (2009). https://doi.org/10.1016/j.jconrel.2009.02.013 Lee, J.H., Shin, J., Realff, M.J.: Machine learning: overview of the recent progresses and implications for the process systems engineering field. Comput. Chem. Eng. 114, 111–121 (2018). https://doi.org/10.1016/j.compchemeng.2017.10.008

Real Time Release Approach

781

Mahapatra, S.S., Sood, A.K.: Bayesian regularization-based Levenberg–Marquardt neural model combined with BFOA for improving surface finish of FDM processed part. Int. J. Adv. Manuf. Technol. 60(9–12), 1223–1235 (2011). https://doi.org/10.1007/s00170-011-3675-x Manallack, D.T., Prankerd, R.J., Yuriev, E., Oprea, T.I., Chalmers, D.K.: The significance of acid/base properties in drug discovery. Chem. Soc. Rev. 42(2), 485–496 (2013). https://doi.org/ 10.1039/c2cs35348b Manda, A., Walker, R., Khamanga, S.: An artificial neural network approach to predict the effects of formulation and process variables on prednisone release from a multipartite system. Pharm. 11(3), 109 (2019). https://doi.org/10.3390/pharmaceutics11030109 Mandl, J., Szarka, A., Bánhegyi, G.: Vitamin C: update on physiology and pharmacology. Br. J. Pharm. 157(7), 1097–1110 (2009). https://doi.org/10.1111/j.1476-5381.2009.00282.x Mastropietro, D.J.: Rheology in pharmaceutical formulations-a perspective. J. Developing Drugs 02(02) (2013). https://doi.org/10.4172/2329-6631.1000108 Nadeem, M., Banka, H., Venugopal, R.: Estimation of pellet size and strength of limestone and manganese concentrate using soft computing techniques. Appl. Soft Comput. 59, 500–511 (2017). https://doi.org/10.1016/j.asoc.2017.06.005 Panchal, G., Ganatra, A., Kosta, Y.P., Panchal, D.: Behaviour analysis of multilayer perceptrons with multiple hidden neurons and hidden layers. Int. J. Comput. Theory Eng. 332–337 (2011). https://doi.org/10.7763/ijcte.2011.v3.328 Pestieau, A., Krier, F., Thoorens, G., Dupont, A., Chavez, P., Ziemons, E., Hubert, P., Evrard, B.: Towards a real time release approach for manufacturing tablets using NIR spectroscopy. J. Pharm. Biomed. Analy. 98, 60–67 (2014). https://doi.org/10.1016/j.jpba.2014.05.002 Sharma, O.P., Bhat, T.K.: DPPH antioxidant assay revisited. Food Chem. 113(4), 1202–1205 (2009). https://doi.org/10.1016/j.foodchem.2008.08.008 Specific gravity (2020). https://doi.org/10.32388/fkor1n Velásco-Mejía, A., Vallejo-Becerra, V., Chávez-Ramírez, A., Torres-González, J., Reyes-Vidal, Y., Castañeda-Zaldivar, F.: Modeling and optimization of a pharmaceutical crystallization process by using neural networks and genetic algorithms. Powder Technol. 292, 122–128 (2016). https:// doi.org/10.1016/j.powtec.2016.01.028 Younes, M.: Free radicals and reactive oxygen species. Toxicology, pp. 111–125 (1999). https:// doi.org/10.1016/b978-012473270-4/50064-x

Smart Recognition Systems and Multimedia Processing

Convolutional Neural Network for Identifying Human Emotions with Different Head Poses Wafa Mellouk(B) and Wahida Handouzi Laboratoire D’automatique de Tlemcen LAT, Tlemcen University, Tlemcen, Algeria [email protected]

Abstract. Automatic facial emotion recognition is intriguing, emerging research area, provided many advantages in different fields. In recent years, artificial intelligence has enjoyed enormous success thanks to powerful deep learning architectures, capable of interpreting and classifying after training with several data. Researchers are now using this technique to code facial expressions, to a get better emotion classification. The objective of our study is to achieve better precision in classifying the seven basic emotions and to overcome several challenges such as different ages, sexes, races, head poses, and gazes. In this work, we propose deep convolutional neural networks CNN evaluated on the RaFD database. In the first, we studied emotions with frontal head pose then we passed on the studying through different head poses (front, left, and right). A 98% of accuracy with 5% loss validation obtained on frontal faces and 96.55% of accuracy with 11% of loss validation on three head poses. Our method achieved competitive results with the state of the art methods trained and tested with the same database. Keywords: Facial expression · Head pose · CNN

1 Introduction Today, we surrounded by machines, robots, virtual assistants, and human-machine interfaces, which is why researchers have thought of integrating human emotions into these technological advances to achieve a natural interaction human-machine. Emotions are psychological signs, which present our state in a verbal and non-verbal way, accompanied by physical and physiological changes [1]. At the beginning, psychologists interested in this field, but with the advancement of artificial intelligence techniques, the emotional state today is captured by several sensors and automatically predicted by the computer, providing important advantages in many sectors such as health [2], marketing [3] and security [4]. Facial expressions are the most interesting non-verbal modality to study by researchers because facial changes are from the first signs that transmit to us the human emotional state. According to the different studies carried out in this field, facial expressions help us to identify six basic emotions plus neutral [5], which already explained by Ekman and Freisen [6]: anger, surprise, happiness, disgust, fear, sadness. Automatic facial emotions recognition FER generally includes three stages: the preprocessing © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 785–796, 2021. https://doi.org/10.1007/978-3-030-66840-2_59

786

W. Mellouk

database, features extraction and emotion classification, this is what we will apply in our studies in the rest of this paper. In recent years, automatic FER has had satisfactory results through the use of different deep learning architecture; this is due to its structure capable of extracting important features and the classification automatically [4]. Despite all the studies and the results obtained, this field is still difficult to study and it faces several obstacles because each person expresses his emotional state differently from the other, also the differences of ages, sex, races, and head poses. In this context, we proposed a convolutional neural network CNN in order to achieve higher performance for the classification the basic emotions evaluated on a large and rich database called Radboud Faces Database RaFD [7]. This article organized as follows, section two: we present recent researches related to this field, section three: we explained the experiment carried out which includes the description the data preprocessing steps and the network used, section four: we present the results obtained and the comparison with the recent state of the art results. We end in section five with a conclusion and future works.

2 Related Work Automatic human emotion recognition has achieved success and great interest in recent years; this is due to the use and adoption of different deep learning techniques. In this section we present some recent works realize in this area. Mavani et al. [8] used AlexNet architecture to detect one of six basic emotions, evaluated on CFEE and RaFD databases after the data preprocessing stage, which consists of cropping the faces with Viola-Jones algorithm then converted to the 256 × 256 size with grayscale color. They achieved better generalization by training the network in CFEE and testing on RaFD database. Fathallah et al. [9] proposed a novel method of deep learning based on Visual geometry Group model VGG trained on two steps and evaluated on three large databases such as CK + , MUG, RaFD. Sun et al. [10] present a CNN-based deep learning architecture method for learning spatio-temporal features in static images, in addition to spatial features, researchers extracting the optical flux as a temporal feature by studying changes in facial expres-sions between the neutral state and the peak of expressions. This experience trained and tested on RaFD, CK + , and MMI databases. Yolcu et al. [11] show the importance of the extraction of the iconize face, which makes up from essential parts of the face such as eyebrows, eyes, and mouth. This iconic face is combined with raw images before the training CNN. This method achieves high accuracy with 94.44% on the RaFD database. The following year, the same researchers propose a novel model CNN to detect only the front faces among different head poses after that they study and classify the emotions from facial expressions. The best recognition rate obtained in RaFD and KDEF with 94.61% and 92.86% respectively [3]. All of these papers categorize the six basic emotions; that trained and tested with different deep learning models using frontal face images. For several variations of head poses, Wu et lin [12] propose a CNN architecture with Weighted Center Regression Adaptive Features Mapping (W-CR-AFM) technique, this technique allows transforming

Convolutional Neural Network for Identifying Human Emotions

787

the distribution of testing features into a new distribution similar of the features database extracted from the training phase. This architecture trained and tested on three large databases: CK+ , RaFD, and ADFES.

3 Experiment 3.1 Data Preprocessing Before training our proposed CNN architecture, the database used goes through several stages of preprocessing, which allows us to reduce the size of the images and eliminate the non-important features to learn, which entails facilitating and reduce the time training. In our experiment, we trained and tested with the RafD database. The Radboud Faces Database RaFD [7] is a rich database containing 8040 images with the expressions of eight emotions: anger, sadness, surprise, fear, disgust, happy, neutral, and contempt, formed under controlled laboratory conditions, it includes males, females and even children’s with different races. The images captured with different angle of poses (0°, 45°, 90°, 135°, 180°) in three gazing directions such as left front and right (See Fig. 1). In our study, we are interested in the classification of basic emotions except for contempt emotion with three head poses (45°, 90°, and 135°).

Fig. 1. Samples of head poses in the order 180°, 135°, 90°, 45° and 0°.

At first, we reduce the size of the images from 681 × 1024 to 200 × 200 pixels; this reduction allows us to obtain better facial detection by the haar cascade algorithm with the OpenCV library [13]. The images obtained go through the cropping step and a reduction to 48 × 48 pixels. Figure 2 present all the steps of the preprocessing stage. After preprocessing steps, all data processed at 48 × 48 pixels categorized into seven basic emotions presented in Table 1, and separated on 90% training, and 10% for testing and from the training data, we used 10% for the validation (see Table 2). A set of samples from the RaFD database with a frontal face, and different gazes (front, left and right) presented in Fig. 3 showing the seven basic emotions, and two other sets of samples from the same database presented in Fig. 4, showing emotions with different gazes, head poses, and gender. Of the current research on automatic FER through deep learning, we have found that most of them based on CNN architectures. Thanks to its structure capable of extracting the important features of the data and transforming them into an abstract space by applying convolution operations with learning filters, then followed by the max pooling operation to reduce learnable features. After these steps, fully connected layers applied for the classification [14]. The objective of our work is to propose a CNN in order to improve the classification recognition rate. We first started our study only with frontal face images at 90° (See

788

W. Mellouk

Input images (681x1024) pixels Resize and face detection

Cropped and resize Output images (48x48) pixels Fig. 2. Structure of preprocessing steps Table 1. The number obtained by each emotion after the preprocessing steps. Emotion

AN

DI

FE

HA

NE

SA

SU

Number of frontal pose images

200

200

200

200

200

200

200

Number of three different pose images

595

574

571

571

599

595

549

Table 2. The separation of data for the training, testing, and validation set. Types

Frontal head pose Three different head poses

Training

1134

3283

Testing

140

406

Validation Total

126

365

1400

4054

(a)

(b)

(e)

(c)

(f)

(d)

(g)

Fig. 3. Samples of the frontal face from the RaFD database, resized on 48 × 48 pixels showing the seven basic emotions with different gazes. (a) Anger, (b) Disgust, (c) Fear, (d) Happy, (e) Neutral, (f) Sadness, (g) Surprise.

Convolutional Neural Network for Identifying Human Emotions

789

Fig. 4. Samples of the RaFD database, resized on 48 × 48 pixels showing the emotions with different gazes, head poses and gender. (a) Anger, (b) Fear

Fig. 3), and then we will continue our study by adding images of different head poses such as 45° and 135° (See Fig. 4). 3.2 The Proposed CNN Architecture In this part, we present our proposed CNN architecture used to classify the seven basic emotions such as Anger (AN), Disgust (DI), Fear (FE), Happy (HA), Neutral (NE), Sadness (SA), surprise (SU). Our architecture consists of two convolution layers, each followed by max-pooling, the outputs introduced in fully connected layers; the output layer contains seven neurons activated with the Softmax function to indicate one of seven basic emotions. Moreover, to avoid the overfitting problem we adopt the dropout technique. Figure 5 present an overview of our proposed network, and for more information about it, see Tables 3 and 4.

Fig. 5. Overview of our proposed CNN architecture.

790

W. Mellouk Table 3. Details of proposed CNN for front head pose images. Type

Size and number of filter Output shape

Conv1

32 × 3×3

Max-Pool

2×2

23 × 23 × 32

Conv2

64 × 3×3

21 × 21 × 64

Max-Pool

2×2

10 × 10 × 64

Dropout

0.5

Fully connected 128 Fully connected

64

Dropout value

0.2

Output layer

7

46 × 46 × 32

10 × 10 × 64 128 64 64 7

Table 4. Details of second CNN architecture for different head poses. Type

Size and number of filter Output shape

Conv1

32 × 3×3

46 × 46 × 32

Max-Pool

2×2

23 × 23 × 32

0.5

23 × 23 × 32

Dropout Conv2

64 × 3×3

21 × 21 × 64

Max-Pool

3×3

7 × 7×64

0.5

7 × 7×64

Dropout

Fully connected 256

256

Dropout value

0.5

256

Output layer

7

7

4 Results and Discussions The objective of our study is to achieve a greater recognition accuracy of the basic human emotions with different head poses such as front, left and right. In the following part of the paper, we showed the several steps of our study, the results obtained, and the comparison with the latest stat of the art results evaluated on the same database RaFD. 4.1 Emotion Recognition with Frontal Pose At first, we chose to start our experiment with the training of our proposed CNN architecture presented in Table 3 only with frontal faces images. Figure 6 shows the results obtained through our CNN model, representing the train accuracy vs validation accuracy and the loss train vs loss validation on RaFD database. Our model trained in 50 epochs and we chose Adam optimizer.

Convolutional Neural Network for Identifying Human Emotions

791

Fig. 6. The accuracy and loss obtained from our proposed method.

According to the results obtained are presented in the Fig. 6, we found that our proposed model (See Table 3) achieving a high performance up to 99% of training accuracy and 98% of test accuracy with a 5% loss validation. Table 5 We present some performance obtained with different recent methods evaluated on RaFD database. We show that our method achieves a competitive accuracy compared with the recent state of the art methods. Table 5. Comparison of the performance of our method versus others evaluated on the Rafd database. Authors

Accuracy on the RaFD database

Mavani et al. [8]

95.71%

Fathallah et al. [9] 93.33% Sun et al. [10]

99.17%

Yoclu et al. [11]

94.44%

Our method

98.57%

The confusion matrix obtained from our proposed method is present in Table 6. We have shown that our method performs better in detecting disgust, fear, happy, neutral and sadness but we noted that the surprise emotion was confused with the fear, and anger with disgust. In this part of our study, we have shown the capacity of our CNN architecture to have better precision in detecting emotions. This is what pushes us to use this architecture in the next part with a more complicated database containing different head poses images. 4.2 Emotion Classification with Three Head Poses In this part, we have added images of two different head poses (45° and 135°) to the front images as they previously displayed in the data preprocessing section (See Fig. 4). We started the training and testing these images with the same previous CNN architecture (See Table 3). Figure 7 shows the results obtained.

792

W. Mellouk Table 6. Confusion matrix on RaFD database obtained with the front faces. Emotion AN DI FE HA NE SA SU AN

18

1

0

0

0

0

0

DI

0

15 0

0

0

0

0

FE

0

0

23 0

0

0

0

HA

0

0

0

14

0

0

0

NE

0

0

0

0

23

0

0

SA

0

0

0

0

0

23 0

SU

0

0

1

0

0

0

22

Fig. 7. The accuracy and loss obtained from our proposed method, evaluated on different head poses.

In this experiment, we also obtained a high recognition rate with 98% for the training accuracy and 96.30% for the test accuracy with 16% of the loss test, but we note that after 40 epochs, our system suffered from the overfitting problem. The meaning of overfitting is that our model becomes super fit on the dataset, but it cannot generalize what it has learned in new entries [15]. To solve this problem sever-al techniques are available such as data augmentation, transfer learning, and dropout [16]. To improve our results, we made changes to the CNN architecture, we adopted the dropout after the first (convolution-pooling) layer, we change the size of the second maxpooling layer and we end with two fully connected layers, the first with 256 neurons and the second with seven neurons. Table 4 provides more details on the second proposed CNN and Fig. 8 presents the results obtained. In this experiment, we trained in 200 epochs and we chose Adam optimizer without any parameters. The results show that our model achieves a better recognition rate without overfitting problem with 96.55% validation accuracy and 11% loss validation. This is what has implied that dropout is a robust technique to limit the overfitting problem. In Table 7 We present the recent accuracy obtained by researchers on the Rafd database. We are noted that our method has a 0.28% better performance compared with Wu et lin [11].

Convolutional Neural Network for Identifying Human Emotions

793

Fig. 8. The accuracy and loss obtained with novel CNN architecture evaluated on images with different head poses

Table 7. A comparison of the performance of our method versus others evaluated on the Rafd database. Authors

Accuracy on RaFD database

Wu et lin. [12] 96.27% Our method

96.55%

Table 8. Confusion matrix on the RaFD database with three head poses. Emotion AN DI FE HA NE SA SU AN

64

0

0

0

1

0

0

DI

0

57 0

0

FE

0

0

48 0

0

0

0

1

0

1

HA

0

1

0

71

0

0

0

NE

0

0

SA

0

0

0

0

56

2

0

1

0

3

43 0

SU

0

0

4

0

0

0

53

The confusion matrix obtained from the second proposed CNN presented in Table 8. We have shown that our architecture achieves a better performance in detecting of disgust emotion, but for other types of emotions, our model is still making prediction errors this may be because certain emotions have common expressions. In the next part, we tested the performance of our method in the novel images from the LFW database (Labeled Faces in the Wild) [17], this database contains images of famous people who presented spontaneous emotional states without any emotional labels. We took some images and we tested the emotional predictions after applying the preprocessing steps used. We present some results obtained in Table 9.

794

W. Mellouk

Table 9. Samples of images results used for testing our performance model with fair prediction Raw images

Images after preprocessing steps

Emotions prediction

After several tests, we notice that our model capable of generalizing to predict the emotional state on images outside the database. Although the images tested were taken under different circumstances than the Rafd database, our CNN model has proven its ability to correctly recognize emotions.

5 Conclusion and Future Works In this paper, we proposed a CNN architecture to have better detect emotions. For this study, we chose the RaFD database because it contains several challenges to learn such as different head poses, ages, races, and gender. Before training our networks, we went through an interesting phase, which is the preprocessing of the images; this step allows our networks to learn only the important features with small training time. We started our studies with the frontal faces im-ages and we achieved a high recognition rate with 98.57%%. After that, we developed our

Convolutional Neural Network for Identifying Human Emotions

795

study by supplying the database used with other images from two other different head poses (left and right). With the small modifications of the CNN architecture used for the front faces, we obtained a better recognition rate by 96.55% for three head poses. This is what drives us to deepen our studies in future works by adding all the rest images from the RaFD database and propose a powerful CNN architecture capable of extracting features and classifications with all head poses. Moreover, researchers today show the importance of merging several modalities to have ideal emotions detection [18–20], this is why we will also add other modalities to our study, such as speech and physiological signals in future works to arrive at a natural detection of human emotions by machines. Acknowledgement. This work supported by the Directorate General of Scientific Research and Technological Development DGRSDT.

References 1. Marechal, C., et al.: Survey on AI-based multimodal methods for emotion detection. In: High-Performance Modelling and Simulation for Big Data Applications: Selected Results of the COST Action IC1406 cHiPSet, J. Kołodziej et H. González-Vélez, Éd. Cham: Springer International Publishing, pp. 307–324 (2019) 2. Majid Mehmood, R., Du, R., Lee, H.J.: Optimal feature selection and deep learning ensembles method for emotion recognition from human brain EEG sensors. IEEE Access 5, 14797–14806 (2017). https://doi.org/10.1109/access.2017.2724555 3. Yolcu, G., Oztel, I., Kazan, S., Oz, C., Bunyak, F.: Deep learning-based face analysis system for monitoring customer interest. J. Ambient Intell. Hum. Comput. 11(1), 237–248 (2020). https://doi.org/10.1007/s12652-019-01310-5 4. Reney, D., Tripathi, N.: An efficient method to face and emotion detection. In: Fifth International Conference on Communication Systems and Network Technologies, avr. pp. 493–497 (2015). https://doi.org/10.1109/csnt.2015.155 5. Li, S., Deng, W.: Deep facial expression recognition: a survey. arXiv:1804.08348 [cs], avr. (2018) 6. Alkawaz, M.H., Mohamad, D., Basori, A.H., Saba, T.: Blend shape interpolation and FACS for realistic avatar. 3D Res. 6(1), p. 6 (2015). https://doi.org/10.1007/s13319-015-0038-7 7. Langner, O., Dotsch, R., Bijlstra, G., Wigboldus, D.H.J., Hawk, S.T., van Knippenberg, A.: Presentation and validation of the radboud faces database. Cogn. Emot. 24(8), 1377–1388 (2010). https://doi.org/10.1080/02699930903485076 8. Mavani, V., Raman, S., Miyapuram, K.P.: Facial expression recognition using visual saliency and deep learning. pp. 2783–2788 (2017) Consulté le: mars 14, 2020 9. Fathallah, A., Abdi, L., Douik, A.: Facial expression recognition via deep learning. In: IEEE/ACS 14th International Conference on Computer Systems and Applications (AICCSA), pp. 745–750 (2017). https://doi.org/10.1109/aiccsa.2017.124 10. Sun, N., Li, Q., Huan, R., Liu, J., Han, G.: Deep spatial-temporal feature fusion for facial expression recognition in static images. Pattern Recogn. Lett. 119, 49–61 (2019). https://doi. org/10.1016/j.patrec.2017.10.022 11. Yolcu, G., et al.: Facial expression recognition for monitoring neurological disorders based on convolutional neural network. Multimedia Tools Appl. 78(22), 31581–31603 (2019). https:// doi.org/10.1007/s11042-019-07959-6

796

W. Mellouk

12. Wu, B.-F., Lin, C.-H.: Adaptive feature mapping for customizing deep learning based facial expression recognition model. IEEE Access 6, 12451–12461 (2018). https://doi.org/10.1109/ access.2018.2805861 13. Howse, J.: OpenCV Computer Vision with Python. Packt Publishing Ltd. (2013) 14. Alom, M.Z., et al.: A state-of-the-art survey on deep learning theory and architectures. Electronics 8(3), Art. no 3 (2019). https://doi.org/10.3390/electronics8030292 15. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(56), 1929–1958 (2014) 16. Rouast, P.V., Adam, M., Chiong, R.: Deep learning for human affect recognition: insights and new developments. IEEE Trans. Affect. Comput. p. 1 (2018). https://doi.org/10.1109/taffc. 2018.2890471 17. Faces recognition example using eigenfaces and SVMs—scikit-learn 0.23.1 documentation. https://scikit-learn.org/stable/auto_examples/applications/plot_face_recognition.html# sphx-glr-download-auto-examples-applications-plot-face-recognition-py 18. Pantic, M., Rothkrantz, L.J.M., et al.: Toward an affect-sensitive multimodal human-computer interaction. Proc. IEEE 91(9), 1370–1390 (2003). https://doi.org/10.1109/jproc.2003.817122 19. D’mello, S.K., Kory, J., et al.: A review and meta-analysis of multimodal affect detection systems. ACM Comput. Surv. 47(3), 1–36 (2015). https://doi.org/10.1145/2682899 20. Ringeval, F., et al.: Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data. Pattern Recogn. Lett. 66, 22–30 (2015). https://doi.org/10.1016/j.pat rec.2014.11.007

Deep Learning-Based 3D Face Recognition Using Derived Features from Point Cloud Muhammed Enes Atik(B)

and Zaide Duran

Department of Geomatics Engineering, Istanbul Technical University, 34469 Maslak, Turkey {atikm,duranza}@itu.edu.tr

Abstract. With developing technology and urbanization, smart city applications have increased. Accordingly, this development brought some difficulties such as public security risk. Identifying people’s identities is a requirement in both smart city challenges and smart environment or smart interaction difficulties. Face recognition has a huge potential for people’s identification. It was possible to perform face recognition applications in larger databases and different situations with the development of deep learning methods. 2D images are usually used for face recognition applications. However, different challenges such as pose change and illumination cause difficulties in 2D facial recognition applications. Laser scanning technology has provided the production of 3D point clouds, including the geometric information of the faces. When the point clouds are combined with deep learning techniques, 3D face recognition has great potential. In the study, 2D images were created for facial recognition using feature maps obtained from 3D point clouds. ResNet-18, ResNet-50 and ResNet-101 architectures, which are different versions of ResNet architecture, were used for classification purposes. Bosphorus database was used in the study. 3D Face recognition was performed with different facial expressions and occlusions based on the data of 105 people. As a result of the study, overall accuracy was obtained with ResNet-18, ResNet-50, and ResNet-101 architectures at 77.36%, 77.03% and 81.54% respectively. Keywords: Face recognition · Point cloud · Feature map · Deep learning

1 Introduction Increasing traffic, public safety risk, effective law enforcement and practices for improving personalized services such as healthcare and home environment increase the importance of identifying a person in smart cities. Face recognition is a useful biometric for smart city and smart environmental challenges [1]. Face recognition (FR) has become one of the most important research areas in computer vision. This is a difficult problem because the face appearance and surface of a person can change greatly due to changes in exposure, lightening, makeup, expression, or severe occlusions. Face recognition has found wide use in the surveillance, security and entertainment industry. Developing an efficient facial recognition system is an important and challenging research area [2]. Face recognition applications have also become closer to human perception, due to the © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 797–808, 2021. https://doi.org/10.1007/978-3-030-66840-2_60

798

M. E. Atik and Z. Duran

rapid development in the field of deep learning. As a form of nonverbal communication, facial expressions are ideal for the transfer of social information between people and for measuring, calculating and interpreting human emotions [3]. Ekman [4], who is the first person to examine human facial expressions systematically, divides the prototype facial expressions into six classes that represent anger, disgust, fear, happiness, sadness and surprise, apart from the neutral expression [5]. In this study, a new approach is proposed for 3D face recognition (Fig. 1). For this purpose, three different feature maps that represent the surface of the 3D point cloud were produced: depth map, mean curvature map and normal angle map. Three feature maps were combined to produce an image for each face scan. Thus, it was possible to apply classical 2D deep learning techniques for face recognition. Resnet-18, Resnet-50 and Resnet-101 architectures were used in the study. Six facial expressions (anger, disgust, fear, happiness, sadness and surprise), natural facial expression and occluded facial scans of the Bosphorus dataset were used for testing, and the remaining scans were used for training of the architectures. As a result of the study, the accuracy, precision, recall and F1 score values of each facial expression are presented in separate tables according to the method.

Fig. 1. The pipeline of the proposed approach

2 Related Works Although deep learning techniques have been successfully applied in 2D face recognition, studies for related applications in 3D face recognition are in their early stages. There are two important reasons. First, the existing 3D face datasets have several training constraints (i.e., overfitting and low separation of features) compared to the 2D datasets. Secondly, most of the available deep networks are specially designed for 2D face images instead of 3D faces, because 3D data has an irregular structure [6]. In the literature, feature extraction from 3D data is performed and projected to 2D images to use these deep learning techniques. A new deep learning network for face recognition that is called as FR3DNet has been developed by Gilani and Mian [7]. The depth, azimuth and elevation information were extracted from the 3D data and a 2D view of each face was

Deep Learning-Based 3D Face Recognition Using Derived Features

799

obtained. Training data has been increased with data augmentation techniques. Thus, face recognition was realized with the developed network on large-scale data produced from large 3D data. Kim et al. [8] created 2.5D depth maps to represent the 3D surface to use the VGG-Face algorithm for 2D face recognition. They applied data augmentation techniques to increase the number of facial expressions in the datasets. The developed method has been tested on Bosphorus [9], BU-3DFE [10], and 3D-TEC [11] databases. Li et al. [3] suggested that the fusion of 2D and 3D data for use in deep learning architecture. An efficient deep fusion convolutional neural network (DF-CNN) was developed for multimodal 2D + 3D facial expression recognition. The input image consists of 6 feature maps: the geometry map, texture map, curvature map, and the three normal maps (components x, y, and z). DF-CNN was tested on three datasets (BU-3DFE Subset I, BU-3DFE Subset II, and Bosphorus Subset). Cai et al. [6] proposed a 3D face recognition approach for real-world applications. For this purpose, a fast technique has been developed to produce a 2.5D range image from a 3D raw data. Different versions of the Resnet architecture were used in the study. The developed approach was tested on FRGC v2.0, Bosphorus, BU-3DFE, and 3D-TEC databases. Danelakis et al. [5] proposed a dynamic face recognition system. The proposed methodology automatically detects specific landmarks on the face and uses them to create a descriptor. This identifier is a combination of three sub-identifiers that capture the topological and geometric information of 3D face scans. A time-dependent facial recognition system was performed on 6 different facial expressions in the BU-4DFE dataset and 8 different facial datasets in the BP4D-Spontaneous dataset. Zheng et al. [12] proposed a fine-tuned ResNet for 3D face recognition. The proposed face recognition system based on 3D face texture consists of geometric invariants, histogram of oriented gradients and fine-tuned residual neural networks. As an input to the deep learning architecture, images created from Histogram of Oriented Gradient (HOG) features were used. The developed method has been tested on FRGC-v2 database. Berretti et al. [13] proposed a face recognition approach using deep learning on depth images. In the study, the depth images of faces were not used directly, but hybrid data were produced with the 3DLBP method. Thus, it becomes easier to train data in a shallower deep learning architecture. The proposed method was applied on 3 different databases: FRGC v2.0, Bosphorus and EUROCOM. Hariri et al. [14] proposed a 3D facial expression recognition method using the kernel method on Reimannian manifold. In the method, instead of directly using the geometric properties of the points on the 3D mesh model, the covariance matrix of these features is used. The local property spaces of such points are formed. Bosphorus and BU-3DFE databases were used in the study. In the study conducted by Azazi et al. [15], the 3D point cloud has been transformed into a 2D image. Points were extracted from the two-dimensional image using the SURF algorithm. The features were then evaluated using SVM and probability estimation (PE) methods, thus facial expression recognition was performed. Bosphorus and BU-3DFE databases were used in the study.

800

M. E. Atik and Z. Duran

3 Material and Method 3.1 Dataset Bosphorus database [9] was used in the study. It contains 105 subjects and 4,666 pairs of 3D face models and 2D face images with different action units, facial expressions, poses and occlusions. Scans for each person range from 31 to 54. In the study, 2255 scans were used for training. Scans of six different facial expressions (Fig. 1), occlusions and natural facial expressions were used for testing. The samples of facial expressions from Bosphorus database are shown in Fig. 2 and Fig. 3.

Fig. 2. Examples of emotions from Bosphorus database

Fig. 3. Samples of occluded face models from Bosphorus database

A pre-processing step has been applied to point clouds. Thus, it was possible to extract more information from point cloud. A 2.5 D depth map is generated from the

Deep Learning-Based 3D Face Recognition Using Derived Features

801

Fig. 4. Example of feature maps (depth map, normal angle map and mean curvature map) and image of a 3D scan

3D point cloud of the face. Thus, thanks to the depth information, the 3D point cloud has been projected on a 2D image. The gridfit function [16] was used to produce depth maps. Another feature is normal angle of the points. The normal vectors are computed locally using six neighboring points. Then, the angle between normal vector and z-axis was calculated. Mean curvature values were used for curvature map. Claxton’s surfature function [17] was used for the mean curvature. First and second derivatives of the surface are calculated to compute mean curvature. The 3 different feature maps obtained were combined in one image and a 3-band face image was created. The size of each image is 224 × 224 × 3. Examples of the produced images are shown in Fig. 4. 3.2 ResNet ResNet [18] added several stacked residual units in the CNN architecture to facilitate network optimization and increase accuracy with significantly increased depth. ResNet provides training to hundreds or even thousands of layers and still performs successfully. Since AlexNet, CNN architectures have become deeper. However, increased network depth does not succeed by simply putting the layers together. Adding more layers to a deeper model leads to higher training errors. One of the most important contributions of ResNet is that, despite increasing the number of layers of the artificial neural network, it can be trained more quickly and successfully. The part of the network from the input to the output can be mapped with a

Fig. 5. The building blocks of our residual network [19].

802

M. E. Atik and Z. Duran

non-linear H (x) function. In the ResNet architecture, this path is mapped with another non-linear function defined as F (x): = H (x) − x instead of H (x). Also, by making a short-cut connection from the input to the output, the x (input) value is added to the F (x) function arithmetically. Then the function F (x) + x is passed through ReLU layer together (Fig. 5). It is aimed to transfer the previous layer data to the next layers effectively [18] (Fig. 6).

Fig. 6. ResNet-18 architecture [20].

The numbers added to the end of “ResNet” represent the number of layers. The structures of ResNet-18, ResNet-50 and ResNet-101 architectures used in the study are shown comparatively in Fig. 7.

Fig. 7. ResNet architectures [21].

Deep Learning-Based 3D Face Recognition Using Derived Features

803

4 Experiment Each created feature space is transformed into images with dimensions of 224 × 224 × 3. Firstly, the tip of each 3 scans must be detected first. For this, the points of the point clouds with the highest Z value were determined. In all scans, this is the point at the tip. The determined nose tip point is taken to the center of the image and the 3D point cloud is projected on the image (Fig. 8). The average of the depth, normal angle and mean curvature values of the points corresponding to a pixel were determined for each band value of that pixel. This process was repeated for each 3D face scan in training and test data.

Fig. 8. Samples from training data

The facial point clouds in the data set are divided into two groups as training and test data. The test data includes six facial expressions (anger, disgust, fear, happiness, sadness and surprise), natural facial expression and occlusion. 2255 scans were selected for the training. The distribution of test data by expressions is represented in Table 1. In this study, face recognition is considered as a classification problem. Unlike classical face recognition methods, in deep learning, the class that is most similar to the model is determined through the features learned instead of matching the gallery and model. ResNet-18, ResNet-50 and ResNet-101 architectures, which are different versions of ResNet architecture, were used for classification. All images in the database are manually labeled. The most optimized parameters were selected for training by experimentally. Parameters are 32 for batch size, 15 for epoch, and initial learning rate of 0.01. It was determined that the algorithm is overfitting and the test accuracy decreases when there are more than 15 epochs. Each algorithm is individually trained and tested. All operations were carried out in MATLAB environment. Since the training process was

804

M. E. Atik and Z. Duran Table 1. Distribution of test samples Expression Number of samples Expression Number of samples Neutral

76

Happiness

106

Anger

71

Sadness

66

Disgust

69

Surprise

71

Fear

70

Occlusion

381

performed on a CPU, it took longer than a process that can be performed on the GPU. Properties of the computer used; Intel Core i7-7700HQ CPU @2.8 GHz 4 Cores, 16 GB RAM, 4 GB Graphic Card Memory. Training accuracy is 100% for all architecture.

5 Results and Discussion At the end of the training process, algorithms trained by the training data have been applied to classify the test data. All the people in the database do not have a scan of every facial expression. Some facial expressions are also more than one for one person. Therefore, the number of test scans for each expression varies. As a result, the accuracy value was calculated for each algorithm by making a comparison between the predicted person and the real person. Since the class distribution is unbalanced, precision and recall values are also calculated for more accurate evaluation. F1 score that is a function of precision and recall values, prevents making an incorrect model selection in imbalanced datasets. Training and test data are manually labeled. According to the results, it is understood that the proposed data production approach is suitable for 3D face recognition. Table 2. Results of ResNet-18 architecture. Expression

Accuracy (%)

Neutral

97.37

Precision

Recall

F1 score

Anger

78.87

Disgust

78.26

Fear

74.29

Happiness

92.45

Sadness

89.39

Surprise

92.96

Occlusion

64.30

Facial expressions

84.37

85.06

84.98

85.01

All

77.36

75.85

80.67

78.19

Deep Learning-Based 3D Face Recognition Using Derived Features

805

Table 3. Results of ResNet-50 architecture. Expression

Accuracy (%)

Neutral

98.68

Precision

Recall

F1 score

Anger

73.24

Disgust

72.46

Fear

81.43

Happiness

88.68

Sadness

93.94

Surprise

92.96

Occlusion

64.30

Facial expressions

84.10

85.33

84.96

85.14

All

77.03

75.94

79.52

77.69

Table 4. Results of ResNet-101 architecture. Expression

Accuracy (%)

Neutral

98.68

Anger

78.87

Disgust

81.16

Fear

80.00

Happiness

97.17

Sadness

95.45

Precision

Recall

F1 score

Surprise

95.77

Occlusion

69.55

Facial expressions

88.74

88.70

87.12

87.90

All

81.54

79.95

84.17

82.01

According to the results, it is concluded that almost all facial expressions have higher accuracy with ResNet-101 architecture. Especially for occlusion scans, accuracy increases by about 4%. It was found that all methods showed similar performance in neutral facial expression. It was observed that there was no significant difference between ResNet-18 and ResNet-50 in terms of recognizing facial expressions. It is a difficult situation in terms of face recognition since there is data loss in occluded data. Therefore, recognition accuracy decreased. Almost all algorithms have a near-100 percent accuracy in neutral facial expression. Total accuracy is the accuracy obtained overall test data. The overall accuracy of ResNet-18 and ResNet-50 architectures are 77.36% and 77.03%,

806

M. E. Atik and Z. Duran

respectively. These two architectures performed similarly. ResNet-101 was the architecture that has the highest overall accuracy with 80.66%. Accuracy values were calculated for facial expressions only, except for face neutral and occlusion cases. It was observed that the accuracies increased for all values. Because the occlusion situation creates a lack of data, the accuracy values decrease. The overall accuracy of ResNet-18, ResNet-50 and ResNet-101 architectures for facial expressions are 84.37%, 84.10% and 88.74%, respectively. Since there is an unbalanced data distribution, the accuracy value alone is not sufficient for evaluation. For a better analysis, precision, recall and F1 score values were also calculated. Since there is no regular distributed test data, the F1 score value is more significant in terms of accuracy. F1 score of ResNet-101 is 87.90% for facial expressions and 82.01% for all test data. ResNet-50 and ResNet-18 algorithms had similar results that are 85.14% and 85.01% for facial expressions, 77.69% and 78.19% for all test data. According to the results, it was determined that ResNet-101 has more layers, so it can perform facial recognition with higher accuracy. As mentioned before, this is the biggest advantage of ResNet architecture. The main difference between ResNet-50 and ResNet-18 is the number of layers. According to the results, the depth of ResNet-50 does not make any difference in terms of accuracy. However, ResNet-101 can perform better training over the data. The results of the algorithms are presented in Table 2, Table 3 and Table 4. The proposed approach was compared with similar studies on Bosphorus database in Table 5. Occlusion cases are not included in the comparative table because they were not examined in the previous studies. ResNet algorithms used in the study have the highest recognition accuracy in neutral, sad and surprised facial expressions. Although the accuracy of some facial expressions was higher in previous studies, the approach proposed in the study proved to be more successful in terms of overall accuracy. Recognition performed with ResNet-101 has the highest accuracy with 88.74% general accuracy. A significant increase in overall accuracy is observed when compared to previous similar studies. Table 5. Comparison of recognition ratios of the proposed method and the state-of-art ones. (Values are shown as percentage (%)). Method

NE

AN

DI

FE

HA

SA

SU

All

Azazi et al. [14]

81.25

82.50

90.00

86.25

97.50

67.50

83.75

84.10

Hariri et al. [15]

87.50

86.25

85.25

81.00

93.00

79.75

90.50

86.17

Li et al. [22]



82.33

82.83

72.33

100.00

89.00

81.83

86.32

Li et al. [3]















80.28

Our study (ResNet-18)

97.37

78.87

78.26

74.29

92.45

89.39

92.96

84.37

Our study (ResNet-50)

98.68

73.24

72.46

81.43

88.68

93.94

92.96

85.14

Our study (ResNet-101)

98.68

78.87

81.16

80.00

97.17

95.45

95.77

88.74

The training process takes a very long time since it is performed on the CPU. In this case, the image sizes are chosen small. The larger images cause longer training time.

Deep Learning-Based 3D Face Recognition Using Derived Features

807

The depth of the algorithm is also a factor affecting the training process. The training of ResNet-101 architecture took longer than the other two architectures. However, a quantitative evaluation has not been made in terms of training time.

6 Conclusion In this study, a 3D face recognition approach is proposed using 2D images produced with features derived from the 3D point cloud. Depending on this purpose, depth map, normal angle map and mean curvature map have been produced and 3-band images of each face scan have been created by combining them. Training took a long time as all the work was done on the CPU. If these studies are carried out via the graphics card processor (GPU), it will be much faster. Also, more features can be explored from point clouds for future studies. Thus, higher accuracy can be achieved in terms of 3D facial recognition. In addition, there are many databases and methods in the literature. The scope of the study can be extended by using these methods and databases. 3D face recognition approaches using direct point clouds can be explored without preprocessing steps. Point clouds have great potential for 3D face recognition. Due to the geometric information, especially in cases such as facial expression differences, 3D facial recognition with point clouds is an important field of study research.

References 1. Praveen, G.B., Dakala, J.: Face recognition: challenges and issues in smart city/environments. In: 2020 International Conference on COMmunication Systems & NETworkS (COMSNETS), pp. 791–793. IEEE (2020) 2. Yaman, M.A., Subasi A., Rattay F.: Comparison of random subspace and voting ensemble machine learning methods for face recognition. Symmetry 10(11), 651 (2018) 3. Li, H., Sun, J., Xu, Z., Chen, L.: Multimodal 2D + 3D facial expression recognition with deep fusion convolutional neural network. IEEE Trans. Multimedia 19(12), 2816–2831 (2017). https://doi.org/10.1109/TMM.2017.2713408 4. Ekman, P., Friesen, W.V.: Manual for the facial action coding system. Consulting Psychologists Press, Palo Alto (1978) 5. Danelakis, A., Theoharis, T., Pratikakis, I., Perakis, P.: An effective methodology for dynamic 3D facial expression retrieval. Pattern Recogn. 52, 174–185 (2016) 6. Cai, Y., Lei, Y., Yang, M., You, Z., Shan, S.: A fast and robust 3D face recognition approach based on deeply learned face representation. Neurocomputing 363, 375–397 (2019) 7. Gilani, S., Mian, A.: Learning from millions of 3D scans for large-scale 3D face recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1896–1905 (2018) 8. Kim, D., Hernandez, M., Choi, J., Medioni, G.: Deep 3D face identification. In: 2017 IEEE international joint conference on biometrics (IJCB), pp. 133–142. IEEE (2017) 9. Savran, A., Alyuz, N., Dibeklioglu, H., Çeliktutan, O., Gokberk, B., Sankur, B., Akarun, L.: Bosphorus database for 3D face analysis. In: European Workshop on Biometrics and Identity Management, pp. 47–56. Springer, Heidelberg (2008) 10. Yin, L., Wei, X., Sun, Y., Wang, J., Rosato, M.J.: A 3D facial expression database for facial behavior research. In: 7th International Conference on Automatic Face and Gesture Recognition (FGR06), pp. 211–216. IEEE (2006)

808

M. E. Atik and Z. Duran

11. Vijayan, V., Bowyer, K.W., Flynn, P.J., Huang, D., Chen, L., Hansen, M., Kakadiaris, I.A.: Twins 3D face recognition challenge. In: 2011 International Joint Conference on Biometrics (IJCB), pp. 1–7. IEEE (2011) 12. Zheng, S., Rahmat, R.W.O., Khalid, F., Nasharuddin, N.A.: 3D texture-based face recognition system using fine-tuned deep residual networks. PeerJ Comput. Sci. 5, e236 (2019) 13. Neto, J.B.C., Marana, A.N., Ferrari, C., Berretti, S., Del Bimbo, A.: Deep learning from 3DLBP Descriptors for depth image based face recognition. In: International Conference on Biometrics (ICB), Crete, Greece, 2019, pp. 1–7 (2019). https://doi.org/10.1109/icb45273. 2019.8987432 14. Hariri, W., Tabia, H., Farah, N., Benouareth, A., Declercq, D.: 3D facial expression recognition using kernel methods on Riemannian manifold. Eng. Appl. Artif. Intell. 64, 25–32 (2017) 15. Azazi, A., Lutfi, S.L., Venkat, I., Fernández-Martínez, F.: Towards a robust affect recognition: Automatic facial expression recognition in 3D faces. Expert Syst. Appl. 42(6), 3056–3066 (2015) 16. D’Errico, J.: Surface Fitting using gridfit (https://www.mathworks.com/matlabcentral/fileex change/8998-surface-fitting-using-gridfit). MATLAB Central File (2020) 17. Claxton, D.: Surface Curvature. (https://www.mathworks.com/matlabcentral/fileexchange/ 11168-surface-curvature). MATLAB Central File Exchange. Accessed 26 Jul 2020 18. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016) 19. Li, M., Xu, H., Huang, X., Song, Z., Liu, X., Li, X.: Facial expression recognition with identity and emotion joint learning. IEEE Trans. Affective Comput. 14(8), 1–8 (2018) 20. Ghorakavi, R.S.: TBNet: pulmonary tuberculosis diagnosing system using deep neural networks. arXiv preprint arXiv:1902.08897 (2019) 21. Jay, P.: Understanding and Implementing Architectures of ResNet and ResNeXt for state-ofthe-art image classification: from Microsoft to Facebook [Part 1]. https://medium.com/@14p rakash/understanding-and-implementing-architectures-of-resnet-and-resnext-for-state-ofthe-art-image-cf51669e1624 22. Li, H., Ding, H., Huang, D., Wang, Y., Zhao, X., Morvan, J.M., Chen, L.: An efficient multimodal 2D + 3D feature-based approach to automatic facial expression recognition. Comput. Vis. Image Underst. 140, 83–92 (2015)

Face Sketch Recognition: Gender Classification Using Eyebrow Features and Bayes Classifier Khalid Ounachad1(B) , Mohamed Oualla2 , and Abdelalim Sadiq1 1 Department of Informatics, Faculty of Sciences, Ibn Tofail University, Kenitra, Morocco

{khalid.ounachad,a.sadiq}@uit.ac.ma 2 SEISE: Software Engineering and Information Systems Engineering Team,

Faculty of Sciences and Technology, Moulay Ismail University, Errachidia, Morocco [email protected]

Abstract. Machine Learning is a subset of artificial intelligence which focuses on the development of computer programs that can access data and use it learn for themselves. Bayes Theorem is widely used in machine learning. The main objective of this paper is to classify the gender of the human being based on their face sketch images by using eyebrows features and Bayes Classifier. This paper presents a method for human face sketch gender classification and recognition. It is inspired in our other model which was pre-trained on the same task, but with sixteen features and fuzzy approach. Toward this end, just three features will be extract from the input face sketch image based on eyebrow face golden ratio and two other measurements. The face detection stage passes by Viola and Jones algorithm. The classification task is evaluated through Bayes classifier. An experimental evaluation demonstrates the satisfactory performance of our approach on CUFS database with 80% for training, 20% for testing. The proposed machine learning algorithm will be a competitor of the proposed relative the stat of the art approaches. The estimate rate reaches more than 98.96% for male gender and 97.38% for female gender. Keywords: Forensic sketches · Facial gender recognition · Gender classification · Machine learning · Bayes classifier · CUFS dataset

1 Introduction Face Gender Recognition (FGR) system is a major area for non-verbal language in day to day life communication. FGR systems have been attracted numerous researchers since they attempt to overcome the problems and factors weakening these systems including problem of images classification, also due to its large-scale applications in face analysis, particularly face recognition [1] and face sketch recognition [2]. Gender based separation among humans is classified into two: male and female [3]. Face Gender Classification (FGC) systems aim to automatically classify gender in a dataset of photos or sketches images (Fig. 1). It based on two-dimensional images of human subjects. Currently gender classification and recognition from facial imagery has © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 809–819, 2021. https://doi.org/10.1007/978-3-030-66840-2_61

810

K. Ounachad et al.

grown its importance in the computer vision field: It play a very important function in many fields likes, face recognition [1], forensic crime detection [4], facial emotion recognition and psychologically affected patients [5], and night surveillance [6] and soon. In this paper it can be used to identify; fastly; a criminal person from his sketch for purposes of identification.

Fig. 1. In the left, an input image sketch to the Facial sketch gender classification system. In the right, the output result. It indicates the detected gender.

Humans have a natural behavior and ability to extract, analyze, identify, and interpret information encrypted in the face features likes gender. The automatic task of facial gender recognition is a challenging work and explicitly difficult: Human gender classification and recognition can be done in many ways. In this paper is concerned with the gender classification based on two-dimensional images of people’s face sketches. There is a large number of databases available for human sketches gender classification and recognition research, some of them are private and some are public. The CUFS [7] is most commonly used in face sketch recognition scenario. Machine learning is a subarea of artificial intelligence based on the idea that systems can learn from data and make decisions automatically. Bayes Theorem is widely used in machine learning [8], including its use in a probability framework for fitting a model to a training dataset, referred to as maximum a posteriori (MAP). We can use probability to make predictions and also in step of developing models for classification predictive modeling problems such as the Bayes Optimal Classifier and Naive Bayes. The classifier relies on supervised learning for being trained for classification. It can be trained by determining the average vector and the covariance matrices of the discriminant functions for the abnormal and normal classes from the training data. In this paper, we used Bayes classifier to solve the problem of recognizing and classifying human gender of face sketches between two different classes: male and female’s face sketches subjects. The Naive Bayes classifier is a classification algorithm based on the concept of Bayes Theorem; it is a simple classifier that is based on the Bayes rules [9, 10]. Bayes optimal classifier is often used to tune and harmony the discriminant function parameters of a given well-performing model on a validation database. Generally, there are three useful types of Naive Bayes model [11]: Gaussian, Multinomial and Bernoulli models. Gaussian model, assumes that features follow a normal distribution. Multinomial model, used for

Face Sketch Recognition

811

discrete counts and Bernoulli model. The binomial model is useful in binary feature vectors case. The objective of this paper is to propose our approach based on eyebrow ratios with two other distances and Bayes classifier to classify and recognize the gender of the input request face sketch image. This paper presents a novel method for human face sketches classification. The principle and the core of the approach include two basic aspects. The first one grants to extract an eyebrow ratio and two other distances for the input face sketch. The first distance is between outside edge of the eye and outside edge of the relatively eyebrow and the second distance is between pupil center to relative center eyebrow. The second basic aspect allows to apply Bayes classifier by using the Gaussian model. The eyebrow golden ratios according to scientists are used to compute a golden ratio: eyebrow ratio. Eyebrow ratio is the proportion between distance from outside edge of eyebrow to inside edge of some eyebrow and the distance between eyebrows in the face sketch or facial photo. The remainder of this paper is organized as follows. Section 2 related works. Section 3 background information about eyebrow ratio and two other facial distances, naïve and optimal Bayes classifiers, their definitions, expressions and algorithms. In Sect. 4, the proposed framework architecture of our approach is explained in depth. The used Dataset and experimental results are given in Sect. 5 and conclusion discussion and future work are presented in Sect. 6.

2 Related Work There are several methods we had studied in the gender recognition of literature. Over the last year until Mai 2020, there has been a wealth of research in Human gender classification and recognition based on their facial images. Specially in last two years (from 2018 to March 2020) many techniques were used in this task, for example just to name a few, we briefly review related methods for facial gender classification and recognition: In [12] authors use feature fusion and parameter Optimization of dual-input convolutional neural network for Face Gender Recognition, their method is called the weighting fusion. MORPH is their experiments dataset. The authors of [13] use a machine learning based approach to recognize gender from real-life images. They have been trained various classifiers on many images from the adience benchmark with multiple train and test data splits. In [14] authors use motion data from multiple smart devices in their gender recognition approach. The collected motion data are according to three aspects: time, frequency, and wavelet domains. In [15] authors adopt a fused CNN of separated GEI to gender recognition in gait analysis. Kamarulzaman et Chi [16] use an intelligent gender recognition system for classification of gender in Malaysian demographic, emphasizing on deep learning-based gender recognition and HAAR Cascade classifier. In [17] the authors use a multi-task framework for facial attributes classification through end-to-end face parsing and deep CNN, they address three challenging problems of race, age, and gender recognition. In [18] authors use an Average Neural Face Embeddings (ANFE) method that uses facial vectors of people for gender recognition. In [3] we present a model which was pre-trained on face sketch gender classification and recognition task. The model is based on fuzzy hamming distance with geometric relationships called face

812

K. Ounachad et al.

ratios. We attempt to tune the same model for the same gender classification task but with just three features and we attempt to evaluate the approach to obtain a satisfied machine learning model on adience datasets. Bayes Classifiers have been used in a lot of machine learning for data classification, this model is able to make higher classification accuracy with less complexity. For example, such as in [19] authors apply naïve bayes classification for disease prediction. In structured peer-to-peer systems the author of [20] present an intelligent cooperative web caching approaches based on J48 decision tree and Naïve Bayes (NB) supervised machine learning algorithms. In [21] authors use Naïve Bayes classifier models for predicting the colon cancer. In [22] authors propose a customer behavior analysis using Naive Bayes with bagging homogeneous feature selection approach and soon. 2.1 Contribution of This Work The contribution of this work can be summarized as: – Comprehensive performance evaluation of off line based facial sketch gender classification and recognition by using three features in the face and Bayes classifiers. – The development and evaluation of a geometric machine learning based human face sketch gender classification model with images from CUFS dataset, 80% for training and 20% for testing. – The proposed approach can accurately classify the right kind of the gender’s face sketch. It is inspired in our model [3] which was pretrained on face sketch gender classification and recognition task with sixteen features. We tuned a method just with three features or distances and Naïve Bayes supervised machine learning algorithms. This method achieves our goal by producing a classification rate reaches more than 97.38% for female and more than 98.96%for male especially in the CUFS dataset. It can be comparable with the stat of the art human face sketch gender Classifier algorithms.

3 Backround Information 3.1 Naive Bayesian Classification Method The naive Bayesian classification is a simple method to implement and which provides good results, despite the hypothesis strong used. This learning method is supervised. The method is based on Bayes’ theorem which assumes probabilistic independence characteristics of a given group. We denote Xi , i ∈ [0, n – 1], the vector column i of the data matrix Data. We assume that each attribute i (column of the data matrix Data) can be modeled by a random variable denoted Xi for i ∈ [0, n – 1]. The membership of a data in a group is modeled by the random variable Y, whose values are discrete y0 = 0, y1 = 1…. The membership of the data k (kth line of the Data matrix) is known and indicated in a vector called State.

Face Sketch Recognition

813

Bayes’ theorem determines on which a data item belongs to a group yj knowing its attributes xi.   P Y = yj |X0 = x0 , X1 = x1 , . . . Xn−1 = xn−1     = P Y = yj × P X0 = x0 , X1 = x1 , . . . Xn−1 = xn−1 |Y = yj (1) Where:   – Y = yj |X 0 = x0 , X1 = x1 , . . . Xn−1 = xn−1 : is the probability of belonging to the group yj knowing that the different random variables Xi respectively take the values xi,  – P Y = yj is the probability of belonging to the  group yj , – P X0 = x0 , X1 = x1 , . . . Xn−1 = xn−1 |Y = yj : is the probability that the different random variables Xi take the values xi respectively knowing that the data belongs to the group yj , – (X0 = x0 , X1 = x1 , . . . Xn−1 = xn−1 ): is the probability that the different random variables Xi take the values xi respectively. The naive Bayesian hypothesis assumes that all attributes are independent and therefore that:    P(Xi = xi |Y = yj ) (2) P X0 = x0 , X1 = x1 , . . . Xn−1 = xn−1 |Y = yj = i

As the denominator (X0 = x0 , X1 = x1 , . . . Xn−1 = xn−1 ) is independent of the group considered and therefore constant, we only consider the numerator:   (3) P Y = yj × (X0 = x0 , X1 = x1 , . . . Xn−1 = xn−1 ) To determine the most likely group to which a datum z to be classified belongs, represented by the tuple z0 , z1 , . . . , z n−1 , we choose the maximum probability:    P Y = yj × P(Xi = zi |Y = yj ) (4) i

Among the j groups. Various probability laws are used to estimate. P(Xi = zi |Y = yj ), we use a Gaussian distribution law.   = z |Y = yj of a datum z represented To calculate the conditional probability P X i i   by the tuple z0 , z1 , . . . , z n−1 , we use a Gaussian distribution of the form: 



1

P Xi = zi |Y = yj = √ e 2 2π σ 2



(zi −μxi ,yj )2 2σ 2 xi ,yj

(5)

where μxi ,yj and μ2 xi ,yj are the mean and variance of the random variable Xi , estimated from the values attribute data from the ith column of the vector Data for the group corresponding to yj . We recall that, for a n dimension vector x: i=n−1 xi (6) μ = i=0 n

814

K. Ounachad et al.

and i=n−1 σ2 =

i=0

(xi − μ)2 n

The values of the vector state are:    ⎡ ⎤ μ , μ , . . . , μ , σ , σ , σ x ,y x ,y x ,y x ,y x ,y x ,y 0 0 0 0 1 0 1 0 n−1 0 n−1 0   ,  ⎣ ⎦  ,y1 , σxn−1 ,y1  μx0 ,y1 , σx0 ,y1 , μx1 ,y1 , σx1 ,y1 , . . . , μxn−1 . . . , μx0 ,yn−1 , σx0 ,yn−1 , μx1 ,yn−1 , σx1 ,yn−1 , . . . , μxn−1 ,yn−1 , σxn−1 ,yn−1

(7)

(8)

3.2 Eyebrow Golden Ratio and Two Other Features √ The irrational number ( 5–1)/2 is known as phi denoted by /ϕ, defined by Euclid of Alexandra [21].  = 1.6180339…, ϕ = 0.6180339. The golden ratio has a very special place in mathematics. The eyebrow ratio (Reyebrow ) is the proportion between distance from outside edge of eyebrow to inside edge of some eyebrow and the inside distance between eyebrows in the face sketch or facial photo. The shape of the eyebrow was perfect or golden when its eyebrow ratio was equal to Phi = 1.6180339…, (Fig. 2). We recall DEye_Eyebrow the distance from midpoint of one eye to outside edge of eyebrow of some eye. We also recall DPupil_CEyebrow the distance between the pupil center and the eyebrow midpoint (Fig. 3).

Fig. 2. Sonia Couoh face. She has a face that Fig. 3. Our face sketch features model. It meets the eyebrows golden ratio according to shows the different selecting distances to scientists. calculate DEyebrow , DEye_Eyebrow and DPupil_CEyebrow .

4 Approach An overview of our proposed research methodology framework based on Bayes Classifier with three face features for face sketch gender classification is shown in Fig. 4. In detail, our system has two modes, in both them, the input facial sketch image and all

Face Sketch Recognition

815

Fig. 4. An overview of our proposed research methodology framework based on Bayes Classifier with three features an eyebrow ratio and two other distances for Face sketch Gender Classification (FSGC)

face sketches of the dataset are converted to a Gray level, they are resized and cropped into 200 × 250 pixels. These dimensions are chosen: It’s the proposed default choice of the datasets used and it is also the dimensions used in our related work. Facial images detection passes though viola johns facial detection algorithm. In the offline phase, the first step of the system is to normalize and to pretrain all sketches. For that they have been transformed into Gray level images and they are all cropped to 200 × 250 pixels. The same technique steps are thus used to the online mode. After this stage, to detect the faces of the images, we projected the viola and jones algorithm. The result that follows this second step is used to locate the 68_point_landmarks in each face. These 68 points will be the parameter of a geometric descriptor which allows to extract an identity of each face via the calculation of the eyebrow ratio (Reyebrow ), the DEye_Eyebrow distance and the DPupil_CEyebrow distance. A vector will be dedicated to group these harmonious and beautiful distances in order. This vector represents a real proportionality with any other similar vector. These distances in the vector have been stored as already detailed in last Sect. 3.2. In online process of the face sketch gender classification system, given a facial sketch image, three features are extracted based on four distance as showed in Fig. 3 and Fig. 4 and defined in last section. These characteristics composes a vector of real values. This vector is considered as an identifier of the face image from which the values have been extracted and calculated. In offline process of the facial sketch gender classification System, the same distances used in online process are extracted and calculated for each facial sketch image request. The face ratio and distances for each input facial sketch image is then used as feature vectors for final classification and recognition system based on Bayes Classifier. The Bayes classifier algorithm contained five steps: Dataset separation by gender classes. Summarization Dataset. Summarization Data by classes. Gaussian probability density function definition and class probabilities calculation. Two levels of headings should be numbered. Lower level headings remain unnumbered; they are formatted as run-in headings.

816

K. Ounachad et al.

Table 1. The gender statistics from CUFS dataset Database name

Male images

CUHK

133

AR XM2VTS

Female images 54

Total images 188

74

49

123

179

116

295

Fig. 5. Our used CUFS gender distribution

5 Experiment and Results There is a large number of databases available for human face sketch classification and recognition research, some of them are public and some are private. The CUFS database [2, 3, 23] is public and most commonly used in face sketch recognition research. CUHK Face Sketch database (CUFS) is for research on face sketch synthesis and face sketch recognition. It includes 188 faces from the Chinese University of Hong Kong (CUHK) student database, 123 faces from the AR database and 295 faces from the XM2VTS database. There are 606 faces in total. For each face, there is a sketch drawn by an artist based on a photo taken in a frontal pose, under normal lighting condition, and with a neutral expression. In Table 1 CUFS gender statistics is detailed whilst the Fig. 5 the gender distribution. To demonstrate the effectiveness of the proposed method, we proceeded for acquiring the CUFS Dataset. We used 80% of database images for training and 20% for testing. Figure 8 clearly illustrates step by step the results obtained as we progress in the process of the framework already described previously. In the first step the extract face sketches of the dataset. In the second: The cropped Face sketches. In the third: The extract 68 face landmarks points. In the fourth: calculate of three features and in the last step: the features vector generated from the facial images; it is involved to classify the kind of the facial gender of the input facial sketch. The output result of Face Sketch Gender Classification System (FSGCS) is the probe kind of gender. Table 2 shows the Accuracy per gender for our approach based on Bayes classifier and two golden face ratios. It reports the facial sketch gender classification and recognition accuracies using a supervised machine algorithm based on probability. We calculate two posteriors using gaussian naive Bayes classifier, it is probably the most popular type of Bayes. posterior(male) = posterior(female) =

P(male)p(DEyebrow | male)p(DEye_Eyebrow | male)p(DPupil_CEyebrow | male)

marginal probability P(female)p(DEyebrow | female)p(DEye_Eyebrow | female)p(DPupil_CEyebrow | female) marginal probability

Figure 6 shows comparison of cumulative match scores in our facial images gender classification method using Bayes classifier. It measures the percentage of the probe

Face Sketch Recognition

817

Table 2. ACCURACY (%) PER Gender (CUFS Dataset) Gender

Female Male

Accuracy % 97.38% 98.96%

gender. The x-axis represents the gender and the y-axis represents the classification rate. The results clearly demonstrate the superiority of the algorithm to more classify male, but the last rate of classification is that of the female gender. The result tests deduct that the estimate rate reaches more than 98.96% for male and more than 97.38% for female. In Fig. 7 a comparison between our method and our last face sketch gender recognition other approach [3]. Two approaches help to better recognize the male gender but not able the recognize the female gender. This accuracy inequality is due to some parameters analyzed in [1] where the authors present a comprehensive analysis of how and why face recognition accuracy differs between men and women. We recall that in face sketch gender recognition approach based on Bayes classifier we use just three face features: eyebrow golden ratio and other two distances, however the second approach based on FHD (Fuzzy Hamming Distance) and averages method [3] we use sixteen features and the result accuracy is almost the same approximately.

Fig. 6. Performance of eyebrow ratio and Bayes Classifier approach vs Gender (CUFS Database)

Fig. 7. Comparison between our machine learning approaches for FSGCS

To use our algorithm perfectly: – Large and labelled face sketch dataset is needed with facial images useful for gender classification. – Facial frontal and net images are needed. – A structured and multiple computing power are required for training and testing stages. – Large memory, powerful operating system and efficient platform are demanded.

818

K. Ounachad et al.

Fig. 8. The process of our FSGCS, column1: extract of the dataset photos/input facial images. Column 2: cropped face sketch images. Column 3: extract the 68 face landmarks points. Column 4: calculate of three features and Column 5: the features vector generated from the facial sketches; it is involved to estimate the kind of gender. The output result of FSGCS is the probe kind of gender of the input facial face sketch images.

6 Conclusion This paper proposes a new geometrical approach for facial gender recognition. The method is based on Bayes classifier and three special features: Eyebrow golden Ratio with two other distances relive to eyes and eyebrows. We used three features based on: first, the eyebrow ratio. Second, the distance between the Eye and the Eyebrow. The third one is the distance between the Pupil and the Eyebrow. We tested our method on CUFS dataset and the results is very satisfactory. Our work is inspired by the recent successful methods that showed that relatively simple geometrics features could be used to give good performance in machine learning algorithms-based framework. Our approach Can be useful in the process of identification of the criminal and the future work will include a comparison of our algorithms with other algorithms on the machine learning domain, it will include also an automatic decrease in the number of the features used in our last face sketch gender recognition other approach [3].

References 1. Albiero, V., Krishnapriya, K.S., Vangara, K., Zhang, K., King, M.C., Bowyer, K.W.: Analysis of gender inequality in face recognition accuracy. In: The IEEE Winter Conference on Applications of Computer Vision (WACV) Workshops, pp. 81–89. arxiv (2020) 2. Ounachad, K., Oualla, M., Souhar, A., Sadiq, A.: Face sketch recognition-an overview. In: NISS2020: Proceedings of the 3rd International Conference on Networking, Information Systems & Security, pp.1–8. ACM Digital Library, Marrakesh (2020) 3. Ounachad, K., Oualla, M., Souhar, A., Sadiq, A.: Face sketch recognition: gender classification and recognition. Int. J. Psychosoc. Rehabil. 24(03), 1073–1085 (2020)

Face Sketch Recognition

819

4. Jacquet, M., Champod, C.: Automated face recognition in forensic science: review and perspectives. Forensic Sci. Int. 307, 110124 (2020) 5. Simcock, G., McLoughlin, L.T., Regt, T.D., Broadhouse, K.M., Beaudequin, D., Lagopoulos, J., Hermens, D.F.: Associations between facial emotion recognition and mental health in early adolescence. Int. J. Environ. Res. Public Health 17(1), 330 (2020) 6. Kadim, Z., Zulkifley, M.A., Hamzah, N.: Deep-learning based single object tracker for night surveillance. Int. J. Electr. Comput. Eng. (IJECE). 10(4), 3576–3587 (2020) 7. Wang, X., Tang, X.: Face photo-sketch synthesis and recognition. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 31(11), 1955–1967 (2009) 8. Chandana, C.S., Rao, K.D., Sahoo, P.K.: Face recognition through machine learning of periocular region. Int. J. Eng. Res. Technol. (IJERT) 9(03), 362–365 (2020) 9. Alpaydin, E.: Introduction to machine learning. books.google.com (2020) 10. Vinaya, A., Guptaa, A., Bharadwaja, A., Srinivasana, A., Murthya, K.N.B., Natarajana, S.: Unconstrained face recognition using bayesian classification. Procedia Comput. Sci. 143, 519–527 (2018) 11. MKaarthik, K., Madhumitha, J., Narmath, T.A., Barani, S.S.: Face detection and recognition using Naïve Bayes algorithm. Int. J. Disaster Recovery Business Continuity 11(1), 11–18 (2020) 12. Lin, C.J., Lin, C.H., Jeng, S.Y.: Using feature fusion and parameter optimization of dual-input convolutional neural network for face gender recognition. Appl. Sci. (Advances of Computer Vision special issue) 10(9), 1–12 (2020) 13. Balyan, A., Suman, S., Naqvi, N.Z., Ahlawat, K.: Gender recognition from real-life images. In: Solanki, V., Hoang, M., Lu, Z., Pattnaik, P. (eds.) Intelligent Computing in Engineering. Advances in Intelligent Systems and Computing, vol. 1125, pp. 127–134. Springer, Cham (2020) 14. Dong, J., Du, Y., Cai, Z.: Gender recognition using motion data from multiple smart devices. Expert Syst. Appl. 147, 113195 (2020) 15. Deng, J., Bei, S., Shaojing, S., Xiaopeng, T., Zhen, Z.: Gender recognition via fused CNN of separated GEI. In: IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), China, pp. 2032–2037. IEEE Xplore (2020) 16. Chi, Y.S., Kamarulzaman, S.F.: Intelligent gender recognition system for classification of gender in Malaysian demographic. In: Kasruddin Nasir, A., et al. (eds.) In: ECCE 2019. Lecture Notes in Electrical Engineering, vol. 632, pp. 283–295. Springer, Singapore (2020) 17. Khan, K., Attique, M., Khan, R.U., Syed, I., Chung, T.S.: A multi-task framework for facial attributes classification through end-to-end face parsing and deep convolutional neural networks. Sensors (Image and Video Processing and Recognition Based on Artificial Intelligence). 20(2), 328 (2020) 18. Makinist, S., Ay, B., Aydin, G.: Average neural face embeddings for gender recognition. Avrupa Bilim ve Teknoloji Dergisi 2020, 522–527 (2020) 19. Sharad, M., Bhavesh, J.: Application of naïve Bayes classification for disease prediction. Int. J. Manag. IT Eng. 9, 80–87 (2019) 20. Hamidah, I.: Intelligent cooperative web caching policies for media objects based on J48 decision tree and Naïve Bayes supervised machine learning algorithms in structured peer-topeer systems. J. Inf. Commun. Technol. 15, 85–116 (2020) 21. Salmi, N., Rustam, Z.: Naïve Bayes classifier models for predicting the colon cancer. In: IOP Conference Series: Materials Science and Engineering, vol. 546 (2019) 22. Subramanian, R.S., Prabha, D.: Customer behavior analysis using Naive Bayes with bagging homogeneous feature selection approach. J. Ambient Intell. Humanized Comput. (2020) 23. Ounachad, K., Oualla, M., Souhar, A., Sadiq, A.: Fuzzy hamming distance and perfect face ratios based face sketch recognition. In: 2018 IEEE 5th International Congress on Information Science and Technology (CiSt), Marrakesh, pp. 317–322. IEEE Xplore (2018)

Fall Detection for Pedestrians in Video-Surveillance Wassima Aitfares1,2(B) 1 Faculty of Sciences, LIMIARF - Mohammed V University of Rabat, Rabat, Morocco

[email protected] 2 GENIUS Laboratory – SUPMTI of Rabat, Rabat, Morocco

Abstract. The main objective of video surveillance in public places is to ensure public safety. This can be accomplished by continuously analyzing video streams in order to detect any abnormal event. Human fall automatic detection is also of great interest in video surveillance, especially for elderly people where accident falls are common and can lead to severe injuries. In this paper, we introduce a new approach to detect pedestrian fall event in video surveillance. The proposed method integrates visual human tracking using an active contour and human velocity estimation to detect motionless pedestrians. We start tracking moving pedestrians using an active contour that evolves and follows the pedestrian shape boundaries in each frame of the video streams. We analyze the velocity of moving pedestrians based on the displacement of their interest points. Once the velocity is determined to be null, we analyze the body boundaries of this person to determine if they are motionless during several frames. If it is the case then a red curve surrounding the pedestrian is displayed in the video alarming the operator system that an urgent intervention is needed. The presented fall detection method is tested under real traffic scenarios. Experimental results show that our new approach achieves a good performance and it is efficient in detecting a true pedestrian fall. Keywords: Video surveillance · Analytic video · Interest points · Active contour · Object detection · Object tracking · Moving object

1 Introduction In recent years, significant amount of work has been reported in analytic video field for a proposed solution to be able to monitor and analyze human behaviors. The surveillance systems attempt in general in dynamic scene to recognize the regions of interest in a video scene, especially the moving objects. The purpose is to track these moving objects in the sequence of images. Prakasha et al. have presented in [1] a model for detecting and tracking moving objects in video sequences by identifying a motion feature to detect moving objects. Most approaches that allow tracking objects aim to identify the object of interest using background subtraction technique to identify moving objects in the video sequences [2, 3]. An approach toward target representation and localization of non-rigid objects is proposed by Comaniciu et al. in [4]. The representations relying on histogram are regulated based on spatial mask with an isotropic kernel. The target localization is © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 820–834, 2021. https://doi.org/10.1007/978-3-030-66840-2_62

Fall Detection for Pedestrians in Video-Surveillance

821

developed based on the basin of attraction of the local maxima optimized by using the mean shift approach. Additionally, Jepson et al. have proposed in [5] an approach for learning appearance models for motion-based object tracking. This approach uses an algorithm based on two steps consisting of an expectation step followed by a maximization step as developed by Dempster et al. in [6]. Most of these approaches allow tracking only the centroid or the orientation of the interest object but not the integral object of interest. In fact, the goal of the surveillance systems is not only to track the moving people but to understand their behaviors. These systems aim to insure an entire surveillance task in real time as automatically as possible to analyze and detect if an incident is taken place. The need for automatic systems to detect and recognize an abnormal behavior produced by people is really increasing by the increase of the amount of video data collected periodically by monitoring cameras. The work in this paper presents a new approach for detecting pedestrians fall within a video sequence. Our approach consists to analyze the velocity value of the tracked pedestrian during its motion based on his displacement vector. The analysis of the object trajectory is made in [7] to know if the person changes suddenly his trajectory to decide if a suspicious behavior is detected. But in this paper, we study the velocity value of the moving person to know if this latter is still moving or not based on the displacement vector of object interest points. We use the well-known Scale Invariant Feature Transform (SIFT) descriptor introduced by Lowe in [8] as a candidate for our object recognition between consecutive frames owing to its high performance and suitability for video vision application. This paper is organized as follows. The next section presents a background on the active contour method, object tracking and some related works associated to human falls detection. Section 3 describes the new proposed approach for the detection of pedestrians fall in public spaces. Section 4 presents the experimental results and Sect. 5 concludes the paper.

2 Background 2.1 Principle of the Active Contour Technique The method of the Active Contour (AC) has been extensively used in the segmentation of images as presented in [9–17]. The main objective of this method is to segment an object of interest by evolving a curve until it reaches the real object boundaries. This process is achieved by minimizing an energy criterion. One approach to implement the AC segmentation consists in using the Level-Set method which considers the original curve as the zero level of a surface [18, 19]. The distortion of the entire surface induces a deformation of the curve shape. This process stimulates the evolution of the AC and achieves, at the end, the object segmentation. Let I denotes a given image defined on the domain Ω and I (x) the intensity of the pixel x where x ∈ Ω. Let C denotes a closed curve represented as the zero level set of a signed distance function Φ, i.e., C = {x|Φ{x} = 0}. The goal of this mechanism is to implicitly evolve the curve C such

822

W. Aitfares

as, at the convergence, the distance functions Φ > 0 (inside of C) and Φ > 0 (outside of C) represent the region of interest and the background, respectively. In Level-Set formulation, a smoothed Heaviside function HΦ{x} is used to determine the inside and the outside of the curve C. The approximation of the smoothed Heaviside function specifies the interior of C (formula 1 in Table 1). Similarly, the exterior of C is specified as (1−HΦ(x)). The energy is calculated in a small band around the curve C as descripted in Adalsteinsson and Sethian work [20] in order to decrease the computational complexity of the standard Level-Set method. Table 1. Formulas used in the AC evolution toward the object boundaries

To specify the area around the curve, a smoothed version of the Dirac delta (formula 2 in Table 1) which presents the derivative of the Heaviside function is used. To perform object segmentation by the AC, we initially define an objective that determines what we want to extract from the image, and then an energy criterion should be minimized to achieve this objective. The region-based energy is formulated in general by an integral domain of a region descriptor. The descriptor used in our work assumes, as presented by Yezzi et al. in [21], that foreground and background regions should have maximal separate mean intensities (formula 3 in Table 1). Consider the two values Ωin and Ωout standing respectively for the interior and the exterior of the curve , where the

Fall Detection for Pedestrians in Video-Surveillance

823

values μin and μout are denoting the mean intensities, respectively in the interior and the exterior of the curve. The gradient and the divergence operators are denoted by ∇ and div respectively. The energy function in the Level-Set formulation and the evolution equation of C are expressed in formulas 4 and 5 (in Table 1) respectively. In addition of using the criterion introduced by Yezzi et al. in [21], we apply this descriptor based on the local approach proposed by Lankton and Tannenbaum in [12]. This local technique is based on information collected from local interior and exterior regions along the AC. The choice for these local regions is made by using a radius inside and outside the AC for each point along this AC. The segmentation of the object using this local selection is related to the radius value. This value is optimized by considering the object size and the proximity of surrounding objects. 2.2 Object Tracking The techniques of object tracking allow assessing over time the parameters of the tracked object. These parameters can be for example the position or the shape or also an apparent orientation of the target object in the image. An automatic tracking technique must not only track the target object but must make an automatic initialization by a detection technique. Tracking methods require an object detection mechanism either in every frame or during the object first appearance in the video. A review of the major existent tracking methods can be found in Yilmaz et al. work [22]. Authors in [22] have classified the tracking methods into three fundamental categories: techniques establishing point correspondence, techniques using primitive geometric models, and techniques using curve evolution. A considerable amount of research has been devoted to visual tracking for a variety of applications [23–27]. Nawaz et al. have proposed in [23] a method to evaluate multi-target video tracking by using some parameters. These parameters are based on the variations of the target-size, combining accuracy and cardinality errors, evaluating the duration of the track and quantifying long-term tracking accuracy. Favalli et al. have developed in [24] an object tracking tool suitable for MPEG-2 sequences. Courtney et al. [25] have developed a technique by tracking individual objects through the segmented region. Moscheni et al. [26] have developed a spatio-temporal segmentation technique to detect and track the moving objects. Segen and Pingali have developed in [27] a real-time human tracking system in video sequences acquired by a stationary camera. As mentioned above, the local descriptor SIFT [8] is used in our approach owing to its robustness against large transformations. This local descriptor is used in the aim to evaluate the motion of the object’s interest points even if this object undergoes a large displacement between consecutive frames.

824

W. Aitfares

2.3 Related Works for Fall Detection Methods Various fall detection approaches have been recently studied and proposed in the literature from different aspects. Ramachandran and Karuppiah presented in [28] a survey on recent advances on wearable fall detection systems using machine learning. Pannurat et al. presented in [29] a review of automatic fall monitoring systems by categorizing the techniques into rule-based and machine learning methods. Teddy Ko presented in [30] a survey on human behavior analysis in video surveillance for indoor security applications by combining motion and behavior information, and standoff biometrics for detecting anomalies and understanding behaviors. Taramasco et al. presented in [31] a monitoring approach that can detect falls for elderly people based on thermal sensors. Robotic field was introduced by Ciabattoni et al. in [32] for fall detection based on mobile robots and electronic environment sensitive and reactive to the presence of people. Tao et al. presented in [33] a system to detect human fall incidents in indoor environments. The system consists of detecting and tracking moving people firstly and then an event-inference module which analyses video sequences of people features to detect if a falling behavioral occurs. Faisal et al. proposed in [34] a fall monitoring system based on wearable sensors to detect falls and identify the falling pattern and the activity associated with this incident. The study is made by using three types of machine learning techniques namely, k-nearest neighbors, support vector machine and random forest. Hakim et al. implemented in [35] a threshold based fall detection algorithm where a supervised machine learning algorithm is used to classify activity daily living. This combination is used to increase the accuracy of the fall detection. The approach is based on the built inertial measurement unit sensors of a smartphone attached to the body of the interest object with the signals wirelessly transmitted to remote machines for processing. Han et al. proposed in [36] an approach for fall detection based on the advanced wireless technologies by analyzing the wireless signal propagation model considering human activities influence. Nuñez-Marcos et al. proposed in [37] a vision-based technique using convolutional neural networks to decide if a fall can be detected in a sequence of frames. Yanfei et al. proposed in [38] a fall detection system based on point cloud. The technique is based on depth data of living environment of human body by drawing the pixels in human real space coordinates through mapping in depth. The Arduino microprocessor is used to handle the receiving data in real time. The acceleration of the human point cloud is used to determine a potential fall activity. Lu et al. developed in [39] a fall detection technique based on three-dimensional convolutional neural network. The three-dimensional convolution is employed to extract temporal information from the video streams. A long short-term memory based spatial visual attention scheme is incorporated to detect falls in the video sequence. Leoni et al. proposed in [40] a convolutional neural network composed of three convolutional layers, two MaxPool, and three fully-connected layers as their deep learning model for fall detection in an Internet-of-

Fall Detection for Pedestrians in Video-Surveillance

825

Things and fog computing environment. Yacchirema et al. presented in [41] a system based on the Internet-of-Things for detecting falls of elderly people in indoor environments, which takes advantages of low-power wireless sensor networks, smart devices, big data and cloud computing. The sensor readings are processed and analyzed using a decision trees-based Big Data model running on a smart Internet-of-Things gateway to efficiently detect falls.

3 Detection Of Pedestrian Fall Our proposed fall human detection approach is based on object tracking using object interest points and object centroid velocity estimation to determine if the moving pedestrian is motionless or not. Several algorithms for object tracking in video surveillance can be employed. We use in our approach the object tracking technique, proposed in [7], that tracks people and detects if any eventual suspicious behavior occurred once a moving person changes suddenly his trajectory under some constraints. In this paper, the abnormal human behavior detected at the first stage is considered when the velocity of the tracked pedestrian becomes null and remains null during a certain time in the video streams. In addition, the human tracking is performed by using a deformable AC that evolves and follows the pedestrian silhouette at the second stage. This AC is used as an indication to determine if the pedestrian is moving or motionless. The following annotations are used in our algorithm: – Velocity: The centroid velocity of the pedestrian. – MIP (Mobile Interest Point): Boolean parameter used to indicate if any interest point of the tracked pedestrian is still moving. This parameter is evaluated when the pedestrian velocity is null. It is initialized with a “True” value since we track at the beginning a mobile object). – NAF (The Number of Analyzed Frames): A parameter that is re-initialized to 0 after each 40 frames. – Alert: A counter parameter used to notify that more than 100 frames are analyzed. The algorithm below describes step-by-step the proposed approach for fall pedestrian detection within a sequence of N frames in the video streams.

826

1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32.

W. Aitfares The approach algorithm Initialize a blue rectangle curve encompassing the target pedestrian in the first frame F0. Evolve the curve until it reaches the real boundaries of pedestrian //This step is executed only if we track the object of interest using an AC as shown in Fig. 3 and Fig. 4. Extract the Object’s Interest Points (OIP) in the frame F0 //Those in the inside of the curve. Initialize the parameters : NAF = 0 and Alert = 0 and MIP=True for i = 1 to N-1 do NAF = NAF + 1 Extract the interest points in the whole frame Fi. Match the (OIP) obtained in the previous frame with those obtained in the whole current frame Fi to identify the new position of (OIP) in this frame Fi. Based on the motion of (OIP), compute the displacement vector of the object between the consecutive frames Fi-1 and Fi. Adjust the position of the curve in the current frame Fi by applying the displacement vector (obtained from the previous step) //This will allow speeding up the convergence of the curve to the pedestrian silhouette in case of tracking with an AC (Fig. 3 and Fig. 4). Extract the (OIP) within the frame Fi //Belonging to the inside of the curve. if NAF = 40 then //Update the velocity value. Compute the velocity of the object centroid within the 40 previous frames based on the mean displacement of the object’s interest points (OIP). if velocity = 0 then if the curve is motionless based on the mean displacement of (OIP) then MIP=False Alert = Alert + NAF if Alert > 100 and MIP=False then Surround the object with a red curve //Urgent intervention is needed. break end if else MIP=True Surround the object with a white curve //The pedestrian is not walking. Reinitialize Alert = 0 end if else Surround the object with a blue curve //The pedestrian is walking. Reinitialize NAF = 0 end if end if end for

The displacement vector between consecutive frames is computed and applied to the curve in each frame to move the AC as close as possible toward the pedestrian silhouette. After each 40 frames, we compute the object velocity to determine if the person is still walking or not. If the person is stationary, we analyze his silhouette to decide if he is motionless or just stopped for an ordinary situation. An ordinary situation can be for example when the person is sitting down after walking, meets someone and talking to him for few minutes, talking to a person on cell phone, etc. If velocity is null while the pedestrian silhouette is still moving then no emergency alert will be indicated since the person might be moving some parts of his body like his hands, head or legs, etc. But if the velocity remained null with a motionless AC for a certain period of time (more

Fall Detection for Pedestrians in Video-Surveillance

827

than 100 frames in our approach), the AC changes its color to red in the video indicating to the operator that this person needs an urgent intervention. Table 2 summarizes the decisions taken according to cases. Table 2. Decisions taken depending on the velocity value and the AC state Number of analyzed Centroid frames velocity

Curve state

Decision

Curve color

Each 40 frames

Velocity  = 0 Any state

Pedestrian is walking Blue

Each 40 frames

Velocity = 0 Curve in motion

Pedestrian is not White walking but he is moving some parts of his body

Greater than 100 frames

Velocity = 0 Curve is immobile Pedestrian is Red motionless for long time so an immediate intervention is needed

The following flowchart describes the main steps of the proposed approach for detecting the pedestrians fall in video-surveillance (Fig. 1): Initialize the tracking process by a blue curve Extract (OIP) in the current frame

Compute the displacement vector of (OIP) between current and next frames

Adjust the curve position to correctly encompass the pedestrian

Compute the object centroid velocity (After each 40 frames)

Continue the tracking process with a blue curve for the next frame

No

Velocity = 0 Yes

Analyse the evolving state of the curve

Change the curve color to red. Indicate that an urgent intervention is needed

Yes

Is the curve motionless for more than 100 frames

No

Continue the tracking process with a white curve for the next frame

Fig. 1. Flowchart of the main steps of the proposed method

828

W. Aitfares

4 Experimental Results In this section, we present the results obtained using real image sequences to assess the performance of our proposed approach using the well-known CAVIAR data sets (Context Aware V ision using Image-based Active Recognition) which are public real image data sets (http://groups.inf.ed.ac.uk/vision/CAVIAR/) used by several researchers. CAVIAR is funded by the EC’s Information Society Technology’s program project IST 2001 37540 (http://homepages.inf.ed.ac.uk/rbf/CAVIAR/). The proposed approach was implemented in MATLAB simulation environment. The results are presented for some specific frames in the video streams especially when a change on the curve color occurs. This change will occur once a person is determined to be stationary “white color” or motionless “red color”. 4.1 Comparison with Previous Work We tested our approach using video stream (Fig. 2) that shows a person walking in a hall where at the end of the video he stopped walking and checked his cellphone. Figure 2 shows the tracking process result of this moving pedestrian by using a rigid rectangular curve instead of a deformable curve (i.e. AC). Using the rigid curve, the tracking will

First frame

Frame 40

Frame 80

Frame 120

Frame 160

Frame 200

Frame 240

Frame 280

Frame 320

Fig. 2. Detecting null velocity (white curve) followed by a false alert for an urgent intervention (red curve) when tracking by rigid curve [7]

Fall Detection for Pedestrians in Video-Surveillance

829

be performed based only on the temporal information as descripted in [7] and not on the spatial information. As a result, the curve will not evolve and move toward the desired object boundaries in each frame but will only undergo the displacement of the tracked object. The tracking of the target object will be decided only on analyzing the pedestrian velocity and not the pedestrian silhouette as well. As shown in this figure, during the motion of our object of interest, the curve color remains blue as long as the object velocity is not null from the first frame until the frame 40. However, between the frames 40 and 80, the curve color changes to white since the person velocity becomes null. From the frame 120, the pedestrian velocity value is updated and determined to be not null which was translated by updating the tracking rectangle state to blue color. After that, the curve color changed to white after the frame 200 since the pedestrian is steady between frames 160 and 200. The estimated velocity value remains null until the frame 320. At this last frame, the rectangle curve color becomes red since the velocity value is null and also the curve is stationary for more than 100 frames. In fact, the rigid rectangular curve is evolved only if the pedestrian walks. Consequently, the target object is determined to be motionless at frame 320 and a false alert for a fall pedestrian detection will be generated and sent to the video surveillance operator system. We can clearly see that between frames 200 and 320 the person is not motionless because he was moving his hand and his head. By examining the result obtained in Fig. 2, we demonstrate that information related to the motion of the real object silhouette, using an evolving curve (i.e. AC) instead of rigid curve, should be integrated to avoid detecting any false alert. Figure 3 shows the tracking process result using the same sequence video streams used in Fig. 2 using an AC instead of a rigid curve. As shown in this figure, the tracking curve is evolved and reaches the pedestrian silhouette at each frame. As was aforementioned, we analyze the pedestrian velocity, if a null velocity value is detected during several frames; the evolving AC is then analyzed to retrieve the information of the object silhouette. As shown in Fig. 3, the AC contour state (i.e. color) matches accurately the tracking person state and at frame 320, the AC color remains white signifying that even if the person is not walking, some parts of his body is still moving. By comparing the results obtained in Fig. 2 and Fig. 3, we conclude that our proposed approach applied in Fig. 3, that combines both pedestrian velocity and its silhouette information, gives an accurate result and no motionless pedestrian is detected at frame 320 and as a result no false alert for a fall pedestrian detection will be generated. In fact, our approach is based on the combination of temporal and spatial information for human silhouette extraction using the AC. But the key contribution of our method is the analysis of this pedestrian silhouette state especially when this pedestrian is not walking. This analysis allows us to determine if the tracked pedestrian is motionless or not for a long period of time.

830

W. Aitfares

First frame

Frame 40

Frame 80

Frame 120

Frame 160

Frame 200

Frame 240

Frame 280

Frame 320

Fig. 3. Detecting null velocity not followed by false alert when using an AC because the pedestrian silhouette is determined to be not steady yet

4.2 Detecting True Fall for Pedestrian by the Proposed Approach We tested our approach using another video stream (Fig. 4) that shows a fighting violence event between two persons where at the end of the video the aggressor escaped away and left the victim motionless and lying on the ground. In this scenario, a true alert should be generated and an immediate intervention is needed. Figure 4 depicts the tracking process using an AC to monitor the silhouette of the object of interest. In this figure we observe a tracking without any abnormal behavior detected for this pedestrian until he meets someone and starts fighting with him (i.e. frame 200). The object of interest is tracked with a blue AC until frame 320 where its color changes to white indicating that he is not in movement state but some parts of his body might be moving. Since his centroid velocity value remains null with an immobile AC for more than 100 frames, the AC color changes its color to red indicating that this person is motionless and an alert should be generated (i.e. frame 440). It is worth mentioning that the value of some parameters used in our MATLAB simulation environment are the optimized values obtained from our experimentation tests. We mention here the 40 frames for updating the velocity value, the 100 frames for determining if an alert is needed to be generated and the radius value (14 pixels in our experiments) used for the local information (discussed in Sect. 2.1). These parameters

Fall Detection for Pedestrians in Video-Surveillance

First frame

Frame 40

Frame 80

Frame 120

Frame 160

Frame 200

Frame 240

Frame 280

Frame 320

Frame 360

Frame 400

Frame 440

831

Fig. 4. Human tracking using our proposed approach where a true alert (red curve) is generated

are configurable and adjustable based on the speed of the video streams, the background and the mobility of the surrounding objects.

5 Conclusion In this paper, a new approach for human fall detection in video surveillance is presented. It is based on the measurement of the velocity of the moving object and its silhouette motion estimation. The velocity estimation is calculated based on the displacement vector of the interest points of the pedestrian. It is estimated at each 40 frames to determine if the tracked pedestrian is still walking or not. If the centroid velocity is null, which means that the pedestrian is not walking, we study if some of his interest points are still in motion which implies a motion on his silhouette. If no motion on his silhouette is detected for

832

W. Aitfares

more than 100 frames then the person is determined to be immobile. In this case, the falling person is then surrounded by a red curve in the video streams indicating to the video surveillance operator that a human fall is detected and an immediate intervention is needed. Experimental results show effectiveness of the proposed approach. Future work includes applying this approach for multiple tracked objects to allow the surveillance system to detect falls for several moving pedestrians. In addition, we attempt to generalize the detection for any abnormal behavior without being limited to just detecting motionless person but also if any other abnormal behavior occurs like fighting for example. The video surveillance system must be smart as a human brain to easily analyze any event and make decision in real time within the video streams.

References 1. Prakasha, M., Puneeth, B.R., Hedge, C.K.: Detection and tracking of moving objects using image processing. Int. J. Eng. Sci. Invent. Res. Dev. 4(1), 28–39 (2017) 2. Stauffer, C., Grimson, W.: Learning patterns of activity using real time tracking. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 747–767 (2000) 3. Wren, C.R., Azarbayejani, A., Darell, T., Pentland, A.P.: Pfinder: real-time tracking of the human body. IEEE Trans. Pattern Anal. Mach. Intell. 19(7), 70–785 (1997) 4. Comaniciu, D., Ramesh, V., Meer,P.: Kernel-based object tracking. IEEE Trans. Pattern Anal. Mach. Intell. 25(5), 564–575 (2003) 5. Jepson, A.D., Fleet, D.J., El-Maraghi, T.F.: Robust online appearance models for visual tracking. IEEE Trans. Pattern Anal. Mach. Intell. 25(10), 1296–1311 (2003) 6. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soc. B 39(1), 1–38 (1977) 7. Aitfares, W., Kobbane, A., Kriouile, A.: Suspicious behavior detection of people by monitoring camera. In: The IEEE 5th International Conference on Multimedia Computing and Systems, Marrakech, Morocco, 29 September–1 October (2016) 8. Lowe, D.G.: Object recognition from local scale-invariant features. In: ICCV, pp. 1150–1157 (1999) 9. Aitfares, W., Bouyakhf, E.H., Regragui, F., Herbulot, A., Devy, M.: A robust region-based active contour for object segmentation in heterogeneous case. Pattern Recogn. Image Anal. J. 24(1), 24–35, March 2014 10. Aitfares, W., Bouyakhf, E.H., Herbulot, A., Regragui, F., Devy, M.: Hybrid Region and interest points-based active contour for object tracking. Appl. Math. Sci. J. 7(118), 5879–5899 (2013) 11. Aitfares, W., Herbulot, A., Devy, M., Bouyakhf, E.H., Regragui, F.: A novel region-based active contour approach relying on local and global information. In: Proceedings of IEEE International Conference on Image Processing, Brussels, Belgium, pp. 1049–1052, September 2011 12. Lankton, S., Tannenbaum, A.: Localizing region-based active contours. IEEE Trans. Image Process. 17(11), 2029–2039 (2008) 13. Michailovich, O., Rathi, Y., Tannenbaum, A.: Image segmentation using active contours driven by the Bhattacharyya gradient flow. IEEE Trans. Image Process. 15(11), 2787–2801 (2007) 14. Rathi, Y., Vaswani, N., Tannenbaum, A., Yezzi, A.: Tracking deforming objects using particle filtering for geometric active contours. IEEE Trans. Pattern Anal. Match. Intell. 29(8), 1470– 1475 (2007) 15. Chan, T., Vese, L.: Active contours without edges. IEEE Trans. Image Process. 10(2), 266–277 (2001)

Fall Detection for Pedestrians in Video-Surveillance

833

16. Caselles, V., Kimmel, R., Sapiro, G.: Geodesic actives contours. Int. J. Comput. Vis. 22(1), 61–79 (1997) 17. Kass, M., Witkin, A., Terzopoulos, D.: Snakes: active contour models. Int. J. Comput. Vis. 1(4), 321–332 (1988) 18. Osher, S., Tsai, R.: Level Set Methods and their applications in image science. Commun. Math. Sci. 1(4), 1–20 (2003) 19. Osher, S., Fedkiw, R.: Level Set Methods and Dynamic Implicit Surfaces. Cambridge University Press, New York (2003) 20. Adalsteinsson, D., Sethian, J.A.: A fast level set method for propagating interfaces. J. Comp. Phys. 118, 269–277 (1995) 21. Yezzi, J.A., Tsai, A., Willsky, A.: A fully global approach to image segmentation via coupled curve evolution equations. J. Vis. Comm. Image Rep. 13(1), 195–216 (2002) 22. Yilmaz, A., Javed, O., Shah,M.: Object tracking: a survey. ACM Comput. Surv. 38(4), 13 (2006) 23. Nawaz, T., Poiesi, F., Cavallaro, A.: Measures of Effective Video Tracking. IEEE Trans. Image Process. 23(1), 376–388 (2014) 24. Favalli, L., Mecocci, A., Moschetti, F.: Object tracking for retrieval application in MPEG-2. IEEE Trans. Circ. Syst. Video Technol. 10(3), 427–432 (2000) 25. Courtney, J.D.: Automatic video indexing via object motion analysis. Pattern Recogn. 30(4), 607–625 (1997) 26. Moscheni, F., Dufaux, F., Kunt, M.: Object tracking based on temporal and spatial information. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 4, pp. 1914–1917 (1996) 27. Segen, J., Pingali, S.: A camera-based system for tracking people in real time. In: Proceedings of ICPR 1996, pp. 63–67 (1996) 28. Ramachandran, A., Karuppiah, A.: A survey on recent advances in wearable fall detection systems. BioMed Res. Int. 2020, 17 (2020) 29. Pannurat, N., Thiemjarus, S., Nantajeewarawat, E.: Automatic fall monitoring: a review. Sensors 14(7), 12900–12936 (2014) 30. Ko, T.: A survey on behavior analysis in video surveillance for homeland security applications. In: 37th IEEE Applied Imagery Pattern Recognition Workshop, Washington DC, pp. 1–8, October 2008 31. Taramasco, C., Rodenas, T., Martinez, F., Fuentes, P., Munoz, R., Olivares, R., De Albuquerque, V.H.C., Demongeot, J.: A novel monitoring system for fall detection in older people. IEEE Access 6, 43563–43574 (2018) 32. Ciabattoni, L., Foresi, G., Monteriu, A., Pagnotta, D.P., Tomaiuolo, L.: Fall detection system by using ambient intelligence and mobile robots. In: Proceedings of the 2018 Zooming Innovation in Consumer Technologies Conference (ZINC), Novi Sad, Serbia, pp. 130–131, May 2018 33. Tao, J., Turjo, M., Wong, M.F., Wang, M., Tan, Y.P.: Fall incidents detection for intelligent video surveillance. In: 5th International Conference on Information Communications & Signal Processing, Bangkok, Thailand, pp. 1590–1594, December 2005 34. Faisal, H., Fawad, H., Ehatisham-ul-Haq, M., Azam, M.A.: Activity-aware fall detection and recognition based on wearable sensors. IEEE Sens. J. 19(12), 4528–4536 (2019) 35. Hakim, A., Huq, M.S., Shanta, S., Ibrahim, B.S.K.K.: Smartphone based data mining for fall detection: analysis and design. Proc. Comput. Sci. 105, 46–51 (2017) 36. Han, C., Wu, K., Wang, Y., Ni, L.M.: WiFall: device-free fall detection by wireless networks. In: Proceedings of the IEEE INFOCOM 2014—IEEE Conference on Computer Communications, Toronto, Canada, pp. 271–279, April 2014

834

W. Aitfares

37. Nuñez-Marcos, A., Azkune, G., Arganda-Carreras, I.: Vision-based fall detection with convolutional neural networks. Wireless Commun. Mobile Comput. 2017, 16 (2017). Article ID 9474806 38. Yanfei, P., Jianjun, P., Jiping, L., Yan, P., Hu, B.: Design and development of the fall detection system based on point cloud. Proc. Comput. Sci. 147, 271–275 (2019) 39. Lu, N., Wu, Y., Feng, L., Song, J.: Deep learning for fall detection: three-dimensional CNN combined with LSTM on video kinematic data. IEEE J. Biomed. Health Inform. 23(1), 314– 323 (2018) 40. Leoni, G., Endo, P.T., Monteiro, K., Rocha, E., Silva, I., Lynn, T.G.: Accelerometer-based human fall detection using convolutional neural networks. Sensors (Basel) 19(7), 1644 (2019) 41. Yacchirema, D., de Puga, J.S., Palau, C., Esteve, M.: Fall detection system for elderly people using IoT and big data. Proc. Comput. Sci. 130, 603–610 (2018)

Hand Pose Estimation Based on Deep Learning Marwane Bellahcen1 , El Arbi Abdellaoui Alaoui2,3(B) , and St´ephane C´edric Koum´etio T´ekouabou4,5 1

5

Laboratory ADMIR, Higher National School of Computer Science and System Analysis (ENSIAS), University of Mohamed V, Rabat, Morocco 2 EIGSI-Casablanca, 282 Route of the Oasis, Casablanca, Morocco [email protected] 3 Department Computer Science, Faculty of Sciences and Technologies-My Ismail University, Errachidia, Morocco 4 Laboratory LAROSERI, Department of Computer Science, Faculty of Sciences, Chouaib Doukkali University, B.P. 20, 24000 El Jadida, Morocco Center of urban Systems (CUS), Mohamed VI Politechnic University (UM6P), Hay Moulay Rachid, 43150 Ben Guerir, Morocco [email protected] http://www.elarbiabdellaoui.com Abstract. The problem of 3D hand pose estimation has aroused a lot of attention in computer vision community for long time. It has been studied in computer vision for decades, as it plays a significant role in human-computer interaction such as virtual/augmented reality applications, computer graphics and robotics. Because of the practical value associated with this topic, it regained huge research interests recently due to the emergence of commodity depth cameras. But despite the recent progress in this field, robust and accurate hand pose estimation remains a challenging task due to the large pose variations, the high dimension of hand motion, the highly articulated structure, significant self-occlusion, viewpoint changes and data noises. Besides, real time performance is often desired in many applications. In this work we have tried to make a comparative study of different methods of hand pose estimation introduced recently, we worked on the implementation of our method based on Deep Learning to solve this problem. Keywords: Hand pose estimation neural network · RGB-D sensors

1

· Deep learning · Convolution

Introduction

Human beings use different types of gestures outside the voice for human interactions, such as hand gestures, body gestures and facial expressions. The hand is still considered one of the most natural and intuitive modalities for human interaction. In addition, hands are the most natural means of unspoken communication, and a major means of communication with Deaf people. For the field of human-machine interaction, reliable and robust hand tracking is the first step c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021  M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 835–843, 2021. https://doi.org/10.1007/978-3-030-66840-2_63

836

M. Bellahcen et al.

towards a more natural and flexible interaction with our machines, which can be exploited in several applications such as gesture recognition, manipulation of virtual objects and games. However, accurate and reliable hand tracking presents many challenges such as occlusions between fingers and rapid and irregular hand movement. In addition, 3D hand tracking is interesting from a theoretical point of view because it addresses three main areas of computer vision which are segmentation, detection and tracking [3]. The estimation of the hand pose to find a great attention on the academic or industrial side because it is more suitable for interaction with machines or for RV Apps, but the hand tracking is not a trivial task because of the great variety of the hand between human beings, and the great number of it is free to free, which has given birth to a great field of research in computer vision since the 80’s. Initially, researchers to find many of the challenges in this field found some, such as resolution problems, noisy images, rethinking time or self obscurations. Recently we find deep learning techniques that allow computers to fully understand the shape of a human hand and estimate its position in images or videos, simply by choosing the right model and hyper-parameters, several approaches have shown their effectiveness in providing an accurate and realtime estimate of the hand pose. The objective of this article is to realize a system to predict the position of the hand using the Inception network idea.

2

Related Work

Different approaches have been proposed, including generative approaches by Henia et al. [5], Wang and Popovic [13], Xu and Cheng [15], Qian et al. [10], Sridhar et al. [4], which allow to reproduce the hand pose of the input image. The hand poses observed in an image are compared with those of the 3D model. There are also discriminative approaches such as Oberweger et al. [7,8], Tompson et al. [12], Tang et al. [11], Guijin Wang et al. [14], Oberweger et al. [6], aimed at directly estimating the positions of the different joints of the hand from an RGB or depth image. Studies show that discriminative approaches provide better real-time performance compared to the generative model. But the challenge of discriminative methods, depending heavily on the quality of the learning data, is that they require a large, well-labelled database to model a large number of possible poses, which is costly. However, generative approaches require a large image database to cover all the characteristics of the shape and its variations from different points of view. Linking the test image with all the models in the database is costly and computationally time-consuming, limiting the use of generative models for real-time applications. For the estimation of the hand laying there are works using random forests: Tang et al. [11], Keskin et al. [2]. The estimation of hand placement using LRF (Latent Regression Forest) is the objective of the work carried out by Tang et al. [11]. The problem of estimation of the pose is divided into small problems of estimation of the local positions

Hand Pose Estimation Based on Deep Learning

837

of the joints, this is formulated as a dichotomous search problem in which the image is recursively divided into two regions until each region corresponds to a single hand joint. Afterwards, Latent Regression Forest is used to determine the position of each of these articulations. The Convnets have found a great success for the estimation of hand placement, one finds the first work using Convnet, Tompson et al. [12] which produces hand point joints in 2D for different images for different planes after it reproduces the point joints in 3D space with kinematic inverse, Oberweger et al. [6,7] proposed different architecture to directly estimate the hand point joints in 3D space. To improve the performance of the model, Oberweger et al. [8] introduced RESNET and data augmentation to avoid overfiting, Wang et al. [14] used the same architecture as Oberweger [8] for feature extraction, the last convnet decomposed on several FCs to regress hand point joints in 3D space called Region Ensemble Network (REN).

Fig. 1. The different architectures proposed by oberweger et al. [6]

Oberweger et al. [6] have proposed deep learning models to solve the problem. The architectures presented in his which has greatly motivated our work are shown in the Fig. 1 and are as follows: a- The architecture shown in Fig. 1(a) is a shallow network architecture composed of a convolution layer (C), a max-polling layer (P), and a fully connected layer (FC). The parameters used for each layer noted in the figure such as the number of filters and their dimensions for the C layer, the polling dimension for the P layer, and the number of neurons in the FC layer.

838

M. Bellahcen et al.

b- The architecture shown in Fig. 1(b) represents a deeper network (with several layers). It uses 3 type C layers, 2 polling layers and 3 FC layers. c- Figure 1(c) represents an architecture that uses several sizes of the input image, and extracts the characteristics offered by each size. This method is more accurate than the deep architecture. Use the PCA method to decrease the size of the data by exploiting the correlation between the positions of the joints. d- Figure 1(d) represents an architecture that will be based essentially on architecture (b) but in this architecture we do not use PCA.

3

Formulation of the Proposed Approach

Our model estimates the pose of the hand in 3d space to do this, we will start with the preprocessing, segment the image using thresholding, then we take the large part of the image since the hand is the closest object to the camera, normalize the image between [−1, 1] and resize the images to a size of 128 * 128. The proposed model, composed of a convolution layer which contains 8 cores of size 5 × 5, the output of this layer is the input of the three convolution layers for the first 8 cores of size 1 × 1 and the second 8 cores of size 3 × 3, the third 8 cores of size 5 × 5 each followed by a pooling layer of size 2 × 2, we group the results, a convolution layer of 24 cores of size 5 × 5 followed by three convolution layers for the first 24 cores of size 1 × 1 and the second 24 cores of size 3 × 3 the third 24 cores of size 5 × 5 we group the results, and finally 2 fully connected layers containing 1024 neurons and followed by an output layer with dropout 0. 7 for both CFs. All these layers use the Rectified Linear Unit (ReLU) as activation function, for Learning rate 0.001, and a mini-batch of 128 and 200 epoches (Table 1). Table 1. Description Dropout

0,7

Learning rate

0,001

Mini-batch

128

Epoche

200

Activation function RELU

3.1

Dataset

The ICVL [13] database is the database with which we will work, this database is captured by intel creative interactive gesture camera, this database is divided into two parts training and testing where all the images are labeled by the 3D coordinates of 16 hand articulations, For the training part is composed of more than 180 k images, For the testing part it is composed of 1400 images (Fig. 2).

Hand Pose Estimation Based on Deep Learning

839

Fig. 2. Architecture of the proposed model

3.2

Experimentation

To implement the proposed approach, we were inspired by the oberweger model [6] and the inception network which is based on convolution neural networks to predict the position of the joints of an isolated hand. The code is implemented in python and uses several libraries such as: – OpenCV (Open Source Computer vision): It’s an open source library for processing and analysis of images and videos in real time with interfaces for the main programming languages Python, C, C++, Java, C ... – Tensorflow Abadi et al. [1]: TensorFlow is a numerical computation library for Python widely used in deep learning. It’s adapted to run efficiently on CPU or GPU architectures. The implemented code contains a learning part that we used to build our own model and testing part where the built model was evaluated. Finally, we compare it with the state of the art. The process generally followed by the proposed 3D hand placement estimation systems, goes through the following steps: a The choice of input data which can be either texture images or depth images or both. b Detection of the hand in the image and its segmentation which can be based either on thresholding or by other classical segmentation methods.

840

M. Bellahcen et al.

Fig. 3. The 16 finger’s joints estimated by the Deep-Prior approach. Each contains three joints, plus the center of the hand. [6]

Fig. 4. Flowchart of the processing steps

c Pre-processing step is necessary for depth normalization or filtering. d The most important prediction step either by a learning technique. e Post-processing used by some systems to ensure more accuracy (Figs. 3 and 4).

Hand Pose Estimation Based on Deep Learning

4

841

Results Analysis and Comparison

We have developed the model we presented previously, and we have calculated the error to evaluate the performance of this model. The following graph of the Fig. 5 shows the result of this calculation during the training.

Fig. 5. Performance results in term of error rate according to the number of epoch

To evaluate the performance of our method, we compare the result we have calculated with the results of the other methods cited in the state of the art. We have calculated the error of our method for the evaluation, so we use the errors of the other methods with which we will compare our method. Now, the following diagram shows us the result of our work and the work carried out in the state of the art. We see that our algorithm results in an error comparable with the other works. Table 2 compares our results with those of the works we have studied. It shows us the error generated for each previous work. The error we found is 11.25 mm, this error is more and more inefficient. The largest error is 15.5 mm, it is the error resulting from Otberdout et al. [9], the smallest error is 7.31 mm, it is the error resulting from Wang et al. [14] (Region Ensemble).

842

M. Bellahcen et al.

Table 2. Comparison of errors in our work with previous work on hand tracking. Methods:

Erreur

Otberdout et al. [9]

15,5 mm

Tang et al. (LRF) [11]

12.6 mm

Oberweger et al. (DeepPrior) [6]

10.4 mm

Zhou et al. (DeepModel) [16]

11.3 mm

Oberweger et al. [8] (DeepPrior++) 8.1 mm Wang et al. [14] (Region Ensemble) 7.31 mm Our model

5

11,25 mm

Conclusion

We presented the work done for the estimation of the laying of the hand. Afterwards, we introduced the inception network model to estimate hand placement, and we applied this method on the ICVL database. We then evaluated the performance of our method and calculated the error produced. And we presented the result obtained by our model and compared it with the results of the work carried out in the state of the art. We found that our method results in an error comparable with the methods cited in the state of the art.

References 1. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Ghemawat, S.: Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016) 2. Keskin, C., Kıra¸c, F., Kara, Y.E., Akarun, L.: Hand pose estimation and hand shape classification using multi-layered randomized decision forests. In: European Conference on Computer Vision, pp. 852–863. Springer, Heidelberg, October 2012 3. Kyriazis, N., Argyros, A.: Scalable 3D tracking of multiple interacting objects. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3430–3437 (2014) 4. Mueller, F., Bernard, F., Sotnychenko, O., Mehta, D., Sridhar, S., Casas, D., Theobalt, C.: Ganerated hands for real-time 3D hand tracking from monocular RGB. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 49–59 (2018) 5. Henia, O.B., Hariti, M., Bouakaz, S.: A two-step minimization algorithm for modelbased hand tracking (2010) 6. Oberweger, M., Wohlhart, P., Lepetit, V.: Hands deep in deep learning for hand pose estimation. arXiv preprint arXiv:1502.06807 (2015) 7. Oberweger, M., Wohlhart, P., Lepetit, V.: Training a feedback loop for hand pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3316–3324 (2015) 8. Oberweger, M., Lepetit, V.: Deepprior++: improving fast and accurate 3D hand pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 585–594 (2017)

Hand Pose Estimation Based on Deep Learning

843

9. Otberdout, N., Ballihi, L., Aboutajdine, D.: Hand pose estimation based on deep learning depth map for hand gesture recognition. In: 2017 Intelligent Systems and Computer Vision (ISCV), pp. 1–8. IEEE, April 2017 10. Qian, C., Sun, X., Wei, Y., Tang, X., Sun, J.: Realtime and robust hand tracking from depth. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1106–1113 (2014) 11. Tang, D., Chang, H.J., Tejani, A., Kim, T.K.: Latent regression forest: structured estimation of 3D articulated hand posture. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3786–3793 (2014) 12. Tompson, J., Stein, M., Lecun, Y., Perlin, K.: Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans. Graph. (ToG) 33(5), 1–10 (2014) 13. Wang, R.Y., Popovi´c, J.: Real-time hand-tracking with a color glove. ACM Trans. Graph. (TOG) 28(3), 1–8 (2009) 14. Wang, G., Chen, X., Guo, H., Zhang, C.: Region ensemble network: towards good practices for deep 3D hand pose estimation. J. Vis. Commun. Image Represent. 55, 404–414 (2018) 15. Xu, C., Cheng, L.: Efficient hand pose estimation from a single depth image. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3456– 3462 (2013) 16. Zhou, X., Wan, Q., Zhang, W., Xue, X., Wei, Y.: Model-based deep hand pose estimation arXiv preprint arXiv:1606.06854 (2016)

Static and Dynamic Hand Gesture Recognition System Using Contourlet Transform Roumiassa Ferhat(B) , Fatma Zohra Chelali, and Salah eddine Agab Speech Communication and Signal Processing Laboratory University of Sciences and Technology Houari Boumediene (USTHB), Box n: 32 El Alia, 16111 Algiers, Algeria [email protected], chelali [email protected], [email protected]

Abstract. This article describes the benefit of Contourlet transform and discrete wavelet for hand gesture recognition system applied for static and dynamic data sets. For this purpose, One-against-all SVM and RBF neural network classifiers are applied, where several tests are performed for recognition process to improve the global efficiency. Three data set are used to demonstrate our study. Good recognition rate was obtained for the different architectures where 93% of efficiency was achieved. Keywords: Hand gesture recognition gesture · CT · DWT · SVM · RBF

1

· Static gesture · Dynamic

Introduction

Gestural communication takes a major place in new technologies. It is applied in a wide range of applications such as: machine control, video games, security, speech comprehension, human machine interfaces using gesture, etc. In this context, gesture recognition is considered one of the most important field of research that focuses on developing recognition systems capable of recognizing gesture from video sequences or static images. Hand gestures are a natural and intuitive human communication channel for interacting environment. They are used to designate or manipulate objects, to reinforce speech, or to communicate in a basic way in a noisy environment. Hand Gestures can have a different meaning depending on the language or culture: the sign languages in particular are specific to each language. Hand gestures recognition can be divided into three phases: analysis, recognition and interpretation as shown Fig. 1. During the analysis stage, the characteristics of a hand for each image is extracted, In the recognition step images are classified into a set of known gestures specific to the application. Finally, the interpretation stage finds the correspondence between the recognized gesture and the action to be taken by the system. According to the temporal evolution of the gesture shape, two types of gestures can be distinguished, namely static gestures and dynamic gestures. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021  M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 844–854, 2021. https://doi.org/10.1007/978-3-030-66840-2_64

Static and Dynamic Hand Gesture Recognition System

845

Fig. 1. Hand gesture recognition system.

The information carried in static gesture is expressed by a single posture contrast, whereas in dynamic gesture, It is expressed by two characteristics, configuration of the hand and its position, which corresponds to a movement of the hand in time. One of the difficult task in hand gesture recognition is how to characterize the hand gesture correctly with features that best describe the movement of the hand. Different methods have been developed in this area, such as Orientation Histogram [6], Histogram of the Oriented Gradient (HOG) [2,8], Edge Orientation Histogram (EOH) [11], Dynamic Time Wrapping (DTW) [8,13], Histogram of Oriented Optical Flow (HOOF) [2], Discrete-Wavelet-Transform (DWT) [15], Local Binary Patterns [19], etc. Manar Maraqa, et al. [10] introduce the use of different architectures of neural networks in hand gesture recognition for static and dynamic gestures. The recognition rate obtained by this method is about 95% for the static gestures. In [11], a static hand gesture recognition system for American Sign Language using EOH features and multiclass SVM is proposed. The effectiveness of this system was about 93.75%. Asha Thalange and shanteanu [18] proposed to use DWT, with a fourth level of decomposition, as well as a neural classifier for recognition of ASL sign language. With this approach an average of 97.47% is reached. In [7], authors developed a combination of LBP and principal component analysis (PCA) to extract the descriptor vectors that are introduced into a HMM classifier, the experimental results indicate that the proposed system has a recognition rate of 97% in signer-independent mode. In [1], authors present a novel method for static hand gesture recognition using Spatial Histogram Coding of NCT an efficient method. The experimental results showed that the proposed method achieved high recognition of 98.05%. The objective of our work is to exploit the benefit of Discrete wavelet transform DWT and Contourlet transform CT for hand feature extraction, where Radial Basis Neural network and Support Vector machine classifier are used to

846

R. Ferhat et al.

build our hand gestures recognition system. Three data-sets are used for testing the performance of our hand gesture recognition. The remainder of this paper is organized as follows. Section 2 provides a description of the methods used for features extraction (CT and DWT). Section 3 presents our experimental design, data collection, experimental results and the performance of our system. Section 4 concludes our research.

2

Features Extraction

We present in this section the characterization techniques used to extract the vector descriptor of the hand form. For this objective, two types of the transform are used, namely: CT and DWT. 2.1

DWT Transform

DWT is a powerful technique to extract features characteristic of an image because it allows an analysis of the image to different levels of resolution. Low pass and high pass filters are used to decompose the original image. The low pass filter gives an approximation of the image while the filter high pass filter provides its details. The approximation of the image can then be decomposed into other levels of approximation and detail according to different applications [14]. The DWT is defined as the inner product of a signal, x(t) with the mother wavelet w(t):   t−b ψa,b (t) = ψ ; (1) a    t−b 1 x(t)ψ dt; (2) W(a,b) = √ a a Where a and b are the scale and shift parameters, respectively. In the proposed system, 3 levels of DWT are applied as shown in Fig. 2, where the approximated image is used in each subsequent levels of decomposition with Haar wavelet. 2.2

Contourlet Transform

CT is a directional multiresolution image representation constructed by applying two successive decomposition stages, multi-directional decomposition followed by multi-scale decomposition. The first stage transforms the input image into a Laplacian pyramid (LP). The second stage applies a directional filter bank (DFB) to each LP scale level as shown in Fig. 3. The input image is transformed into a Laplacian pyramid (LP) having l+1 scale levels, each LP scale level is decomposed through a directional filter bank structure (DFB) using quincunx filters and criticaly down sampling [4,5]. Figure 3 shows the hand posture decomposed using the contourlet transform for two levels.

Static and Dynamic Hand Gesture Recognition System

847

Fig. 2. System structure of DWT, where cA, cH, cV and cD are the Approximation, Horizontal, Vertical and Diagonal image respectively.

Fig. 3. Hand posture decomposed using the contourlet transform for 2 levels.

Figure 4 shows an Illustration of CT decomposition in two levels. Different filters were tested (“9–7”,“5–3”, Pkva, Burt) to achieve the best recognition result.

3

Simulation Results

This section presents the simulation results obtained for our recognition system for static and dynamic hand gesture. 3.1

Dataset Description

For evaluating the performances of our gesture recognition system, we have performed several experiments on static and dynamic gesture databases.

848

R. Ferhat et al.

Fig. 4. Illustration of CT decomposition in 2 levels.

Fig. 5. Examples of static base images with white background

Fig. 6. Examples of static base images with black background

Static Gesture Datasets – American Sign Language This database was collected in 14–16 October 1996 by Jochen Triesch for American Sign Language. It contains 10 postures of the hands (a, b, c, d, g, h, i, l, v, y), made by 24 people, and in front of three different backgrounds (white, black, complex). Figure 5 and 6 shows examples of static base with white and black backgrounds, respectively. – Arabic Sign Language This database was collected by EL Halawani for Arabic Sign Language. It contains 30 postures of alphabet made by 60 people. Figure 7 show some posters.

Static and Dynamic Hand Gesture Recognition System

849

Fig. 7. Examples of static base images of Arabic Sign Language.

Fig. 8. Examples from each gesture of dynamic database.

Dynamic Hand Posture Database. This database contains 4 gestures (Clic, No, Rotate, Stop), as shown in Fig. 8. Each gesture is represented by 12 video sequences, and each sequence includes 55 frames, made by different person. 3.2

Experimental Design

To test the performance of the used descriptors, several experiments on the static and dynamic gesture database have been performed. The main goal of our analysis is to select and explore the suitable feature vector descriptor for hand gesture recognition. Figure 9 shows the diagram of our gesture recognition system. Our analysis shall describe and compare the recognition rate obtained for each characterization technique (CT and DWT), using RBF and SVM classifier for the datasets. The RBF [17] model is a formal neuron network organized in two layer, namely: hidden layer and output layer, The hidden layer is composed of RBF kernels that transform the inputs using a nonlinear activation function. The training is done by calculating the distance between the input and its center using a Gaussian function as show in Eq. 3 R(X) = exp

−X − Ci  σ2

(3)

850

R. Ferhat et al.

Fig. 9. Block diagram of the proposed method

where . measure the Euclidean norm on the input spaceX, In our case X correspond to CT and DWT features. σi is the width of the Ith RBF kernel. The Jth output layer is calculated by the linear combination of its inputs as shown in Eq. 4, u  Ri (P )w(j, i); (4) Yj (P ) = i=1

Where w(j, i) is the strength of the connections between the hidden and the output layers. In addition, we have implemented SVM classifier to decide the best classification. The SVM is a supervised learning method, which separates two classes using an optimal hyperplane [9]. The decision function of SVM can be expressed as follows:  N  αi yi k(x, xi ) + b0 (5) F (x) = sign i=1

Where σ is a positive parameter to control the radius defined by user. It should be point out, that the hand gesture classification problems are related to multi-class classification. Several algorithms exist for solving multiclass classification tasks. In this article, the Approach ‘one-against-rest’ is used for this purpose. Which constructs N -class for an problem, N binary classifiers; The Ith SVM is trained with all the training examples of the Ith class with positive labels, otherwise is negative [17]. k(x, xi ) = exp

−x − xi  2σ 2

(6)

Static and Dynamic Hand Gesture Recognition System

851

σ is a positive parameter to control the radius defined by user. Most problems go beyond two classes of data. It exists several methods to make the multi-class classification the one we use in our application is the Approach ‘one-againstrest’ which constructs, N -class requires N -Svms; the kth SVM is trained with the examples in the kth class with positive label and the others with negative labels [17]. In order to achieve the best recognition rate, for each classifier, training parameters have been empirically selected after several tests. We optimized the configuration of the SVM by varying the regularization parameter “C” in the range [1:5:150] while the RBF kernel sigma was varied in the range [1:1:500], and for the RBF, the width of the gaussian function was varied in the range [0.01:0.1: 5] and the number of neurons in the hidden layer was varied in the range [1:1:100]. For the American sign gesture dataset, all images are converted from RGB to gray-scale, and two backgrounds are used namely: black and white. Each gesture is represented by 20 images, we used 50% of the images to train our classifiers and 50% for the tests. For the Arabic sign gesture dataset, all images are converted from BMP to grayscale. Each gesture is represented by 20 images, where 50% is assigned for training and 50% for test. For the dynamic data set 6 sequences were used for the training phase and six for the test phase, each sequence was extracted by 55 frames, and each one was converted from RGB to grayscale. After performing DWT and CT parameterization, feature vectors are calculated for each technique for both DWT and CT technique. The resulting vectors are fed into RBF neural Network and SVM classifier to compute the recognition efficiency of our recognition system. The recognition rate (RR) is used as a metric to evaluate the performances of the following architectures CT-SVM, DWT-SVM, CT-RBF and DWT-RBF, which is defined by: Ni ; (7) N where N i is the number of test images correctly recognized and N is the total number of test sample tests. Comparison analysis is done for each classifier and database to decide the suitable characterization. R(i) =

Experimental Results. This study is proposed to review hand gesture recognition system based on different classifier techniques by using CT and DWT features.

852

R. Ferhat et al. Table 1. Recognition rates obtained for Arabic dataset. Method

CT-SVM DWT-SVM CT-RBF DWT-RBF

1 Decomposition 92%

92%

91%

91%

2 Decomposition 93%

92%

92%

91%

3 Decomposition 90%

92%

91%

91%

Table 2. Recognition rates obtained for American dataset, black background. Method

CT-SVM DWT-SVM CT-RBF DWT-RBF

1 Decomposition 64%

66%

54%

63%

2 Decomposition 65%

66%

63%

64%

3 Decomposition 66%

65%

67%

67%

Table 3. Recognition rates obtained for American dataset, white background. Method

CT-SVM DWT-SVM CT-RBF DWT-RBF

1 Decomposition 66%

51%

62%

66%

2 Decomposition 59%

51%

76%

66%

3 Decomposition 64%

57%

79%

72%

The experimentation on Arabic data set is performed by following the diagram presented in Fig. 9, where features extraction is done using CT and DWT methods. Further, 10 classes of the selected images are used as a ‘training’ dataset and 10 others one user as a ‘testing’ dataset. The RR recognition is present in Table 1. The best recognition rate for the CT descriptor is about 93% obtained for the second level decomposition using svm classifier with C = 71 and sigma = 51, and about 92% using RBF with 83 neurons in the hidden layer and spread = 2.3. Whereas, the RR obtained for DWT descriptor is about 92% for the whole levels decomposition using, and about 91% using RBF. For performance comparison, we followed the experimentation protocol present in Fig. 9 with Jochen Triesch data set. The recognition results are presented in Table 2 and 3. It can be seen that the performance of our system vary from 54% to 79%. This is due to several parameters such as: users variability, lighting variation, shadows, complex background, etc. Table 4 shows the experiment results using the dynamic dataset. The obtained recognition rates decrease significantly for the tow descriptors. In all case, for SVM classifier the CT approach provides better results whereas, for RBF classifier the same results are obtained. The tow descriptors identified the correctly the four gesture. The best performance obtained is 87,5%. Table 5 describes a comparison of our results with other works for the same database. The comparative analysis clearly shows that the two descriptors

Static and Dynamic Hand Gesture Recognition System

853

Table 4. Recognition rates obtained for dynamic dataset. Method

CT-SVM DWT-SVM

1 Decomposition 87,5%

70,73%

2 Decomposition 75%

75%

3 Decomposition 83,33%

83,33%

Table 5. Comparison with previous works. Reference

Data-set Method

[12]

Dynamic Contour-Based Similarity Images 85.4

RR%

OUR method Dynamic CT descriptor

87,5

[16]

ArSL

LBP

90,41

[3]

ArSL

ANFIS network

93,55

CT

93

OUR method ArSL

perform well on dynamic and Arabic hand gesture database. We noticed that the proposed features achieves comparable and sometimes better performance than other systems.

4

Conclusion

This study is proposed to develop a hand gesture recognition system based on different classifier techniques by using CT and DWT features. Different types of classifiers were analyzed based on static and dynamic datasets. For This we have studied and implemented a static and dynamic gesture recognition system. The evaluation of our system shows its efficiency in recognizing both static and dataset. Satisfactory results were obtained for Arabic data set, where 93% of efficiency was obtained, and for dynamic dataset with 87,5% of recognition rate, compared to American dataset where only 79% was obtained, due to the complex background. Our approach presents encouraging results using CT features using RBF classifier or SVM classifier with the RBF kernel. For future work, we can apply our system for real time acquisition and recognition.

References 1. Adithya, V., Rajesh, R.: An efficient method for hand posture recognition using spatial histogram coding of NCT coefficients. In: 2018 IEEE Recent Advances in Intelligent Computational Systems (RAICS), pp. 16–20. IEEE (2018) 2. eddine Agab, S., zohra Chelali, F.: Hog and hoof spatio-temporal descriptors for gesture recognition. In: 2018 International Conference on Signal, Image, Vision and their Applications (SIVA), pp. 1–7. IEEE (2018)

854

R. Ferhat et al.

3. Al-Jarrah, O., Halawani, A.: Recognition of gestures in Arabic sign language using neuro-fuzzy systems. Artif. Intell. 133(1–2), 117–138 (2001) 4. Bamberger, R.H., Smith, M.J.: A filter bank for the directional decomposition of images: theory and design. IEEE Trans. Signal Process. 40(4), 882–893 (1992) 5. Burt, P., Adelson, E.: The Laplacian pyramid as a compact image code. IEEE Trans. Commun. 31(4), 532–540 (1983) 6. Freeman, W.T., Roth, M.: Orientation histograms for hand gesture recognition. In: International Workshop on Automatic Face and Gesture Recognition, vol. 12, pp. 296–301 (1995) 7. Ibrahim, N.B., Selim, M.M., Zayed, H.H.: An automatic Arabic sign language recognition system (arslrs). J. King Saud Univ.-Comput. Inf. Sci. 30(4), 470–477 (2018) 8. Jangyodsuk, P., Conly, C., Athitsos, V.: Sign language recognition using dynamic time warping and hand shape distance based on histogram of oriented gradient features. In: Proceedings of the 7th International Conference on PErvasive Technologies Related to Assistive Environments, p. 50. ACM (2014) 9. Liu, Y., Zheng, Y.F.: One-against-all multi-class SVM classification using reliability measures. In: Proceedings 2005 IEEE International Joint Conference on Neural Networks, vol. 2, pp. 849–854. IEEE (2005) 10. Maraqa, M., Al-Zboun, F., Dhyabat, M., Zitar, R.A.: Recognition of Arabic sign language (arsl) using recurrent neural networks (2012) 11. Nagarajan, S., Subashini, T.: Static hand gesture recognition for sign language alphabets using edge oriented histogram and multi class SVM. Int. J. Comput. Appl. 82(4) (2013) 12. Nasri, S., Behrad, A., Razzazi, F.: A novel approach for dynamic hand gesture recognition using contour-based similarity images. Int. J. Comput. Math. 92(4), 662–685 (2015) 13. Plouffe, G., Cretu, A.M.: Static and dynamic hand gesture recognition in depth data using dynamic time warping. IEEE Trans. Instrum. Meas. 65(2), 305–316 (2015) 14. Rioul, O., Duhamel, P.: Fast algorithms for discrete and continuous wavelet transforms. IEEE Trans. Inf. Theory 38(2), 569–586 (1992) 15. Sadeddine, K., Chelali, F.Z., Djeradi, R.: Sign language recognition using PCA, wavelet and neural network. In: 2015 3rd International Conference on Control, Engineering and Information Technology (CEIT), pp. 1–6. IEEE (2015) 16. Sadeddine, K., Djeradi, R., Chelali, F.Z., Djeradi, A.: Recognition of static hand gesture. In: 2018 6th International Conference on Multimedia Computing and Systems (ICMCS), pp. 1–6. IEEE (2018) 17. Scholkopf, B., Sung, K.K., Burges, C.J., Girosi, F., Niyogi, P., Poggio, T., Vapnik, V.: Comparing support vector machines with Gaussian kernels to radial basis function classifiers. IEEE Trans. Signal Process. 45(11), 2758–2765 (1997) 18. Thalange, A., Dixit, S.: Sign language alphabets recognition using wavelet transform. In: Conference on Intelligent Computing, Electronics Systems and Information Technology. Kuala Lumpur (Malaysia), pp. 25–26, August 2015 19. Trigueiros, P., Ribeiro, A.F., Reis, L.P.: A comparative study of different image features for hand gesture machine learning. In: ICAART 2013–5th International Conference on Agents and Artificial Inteligence, vol. 2, pp. 51–61. SCITEPRESS (2013)

Video Activity Recognition Based on Objects Detection Using Recurrent Neural Networks Mounir Boudmagh1 , Mohammed Redjimi2(B)

, and Adlen Kerboua3

1 Computer Science Department, LASE Laboratory, University Badji Mokhtar, 23000 Annaba,

Algeria [email protected] 2 Computer Science Department, LICUS Laboratory, University 20 August 1955, 21000 Skikda, Algeria [email protected], [email protected] 3 University 20 August 1955, 21000 Skikda, Algeria [email protected]

Abstract. Recognition of human actions in videos is a challenging task, which has received a significant amount of attention in the research community. We introduce an end-to-end multitask model that jointly learns object-action relationships. We compare it with different training objectives, validate its effectiveness for detecting objects-actions in videos, First we fine-tune a Resnet model to detect objects in videos, second a Neural Network model is used for sequence learning to get the object-action correlation. Finally, we apply our multitask architecture to detect visual relationships between objects to recognize activities in videos of the MSR Daily Activity Dataset. Keywords: Action recognition · Object detection · CNN · LSTM

1 Introduction The world we live in is inherently structured. It is comprised of components that interact with each other in space and time. Recognizing these interactions in real-world videos is a challenging AI problem with many practical applications [1–8]. We present a novel approach to efficiently constructing activity recognizers by effectively combining two diverse techniques. We apply Transfer Learning to fine-tune the pre-trained Residual Neural Network model (Resnet50) [9] to automatically detect specified objects in video, after that we use Recurrent Neural Network “Long Short Term Memory” (LSTM) [10] for sequence learning to select the main object correlation to the activity and identify the related activities. For example, detecting the object “Drink” as a main object correlation to the activity in the video helps classify the activity as “Drinking”. Integrating these methods allows the development of an accurate activity recognizers based on object detection without ever explicitly providing training labels for actions in videos. Experiments on a set of videos verify that our approach gives good result for the accuracy of a standard activity recognizer for videos. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 855–866, 2021. https://doi.org/10.1007/978-3-030-66840-2_65

856

M. Boudmagh et al.

The remainder of the paper is organized as follows. Section 2 discusses related work. Section 3 describes our new system. Section 4 experimentally evaluates it on real-world videos, and finally, section five presents our conclusion and discusses future work.

2 Related Work In this section, we briefly review the existing literature that closely relates to the proposed model, including three categories of approaches representing object detection, video analysis, and recurrent neural network. Several papers propose ways of using deep convolutional networks for detecting objects [11–13, 31–33]. Recent work on the field [14–16] has shown remarkable progress, mainly due to the use of CNNs [16, 17]. R-CNN [16] process the task as a region-proposals classification problem. Faster R-CNN [15] goes a step further using a region proposal Network (RPN) that shares convolutional features with the proposal classification branch. Chen et al. [34]. Proposed Spatial Memory Network (SMN), the spatial memory module captured instance-level contexts by assembling object instances back into a pseudo “image” representations which were later used for object relations reasoning. He et al. [1] proposed ResNet, which reduced optimization difficulties by introducing shortcut connections, these shortcut connections, creates a highway, which directly propagates the gradients from deep layers to shallow units and thus, significantly reduces training difficulty. With residual blocks effectively training networks, the model depth could be increased, allowing us to train very high capacity models [18]. Proposed transfer learning from the image domain to video frames. The approach is capable of detecting moving and static objects. However, the object generation step that proceeds classification is slow, Fig. 1 illustrates the milestones of deep learning based object detection techniques after 2012 [30].

Fig. 1. Major milestone in object detection research based on deep convolution neural networks since 2012.

Video Activity Recognition Based on Objects Detection

857

Studies on interaction recognition have become a popular research topic in recent years. Because objects and poses provide rich action features for interaction recognition, many methods are based on image data [35]. In early studies [19, 37, 38], researchers attempted to use the relationship between the object, human pose, and action to integrate object detection, pose estimation, and action analysis methods into one framework. The middle-level semantic feature of interaction is extracted based on the object detection and pose estimation results. Closely related to object-action detection in videos is the work [22, 24] on segmenting object-action pairs. They use Conditional Random Fields at the super-pixel level to output a semantic segmentation at the pixel level. In images, object-action pairs have been modeled implicitly in the context of predicting sentences for images [26] and relationships between objects [25]. Other researchers have attempted to use depth information and skeleton data to model human–object interactions [39–41]. Meng et al. [40] expressed the interaction based on the joint points detected in the depth information, the relative position changes between these joint points, Sensors and the relative positional changes between the joint points and the objects. Koppula et al. [41] used a Markov random field to model the interaction, where the graph nodes represent the object and the human motion and the edges represent the relationship. By modeling video as a time series for action recognition task, especially via GRU [20] or LSTM [21], several papers demonstrate improvement on visual tasks including video classification [23], and activity recognition [10]. Liu et al. [29] proposed a recurrent multitask neural network to model action in RGB and skeleton data, and integrated action classification and localization into a unified network to recognize action. These models generally aggregate CNN features over tens of seconds, which form the input to an RNN. They perform well for global description tasks such as classification [21, 23] but require large annotated datasets.

3 Details of the Approach This work proposes a model combining a deep hierarchical visual feature extractor (based on CNN Resnet50) with a model that can learn to recognize and synthesize temporal dynamics for tasks involving sequential data. Figure 2 describes the core of our approach works by passing each visual input xt (an image, or a set of frames in video) through a feature transformation VP() with parameters P, usually a CNN, to produce a fixedlength vector representation VP(xt ). The outputs of VP are then passed into a recurrent sequence-learning module RNNs (LSTM or GRU). An RNN model has W as a parameter, maps an input xt and a previous time step hidden state ht-1 to an output zt and updated hidden state ht . h1 = fW(x1 ; h0 ) = fW(x1 ; 0), then h2 = fW(x2 ; h1 ), up to hT . We Focused on problem of type “Sequential input, static output”: (x1; x2 . . . xt) → y. With videos of arbitrary length T as input, the prediction target is the action label like Drinking or Eating.

858

M. Boudmagh et al.

Fig. 2. Overview of the proposed model architecture

First, we used a pre-trained Residual Neural Network model (Resnet-50) to detect objects in videos. The ResNet-50 model consists of 5 stages, each with a convolution and identity block. Each convolution block has 3 convolution layers and each identity block also has 3 convolution layers. The ResNet-50 has over 23 million trainable parameters. Figure 3 shows the architecture of the resnet-50 model. We fine-tune the model to detect 12 specific objects in the “MSR Daily Activity” data set [28]. The 12 objects detected are: Book, Cellphone, Drink, Food, Guitar, Human, Joystick, Paper, Laptop, Sofa, Folded Paper, Vaccum.

Fig. 3. Architecture of the Resnt-50 model

This approach provides robust, state-of-the-art object detection for static images, which we adapted to videos. Figure 4 shows some samples of detected objects. We took the maximum probability assigned with minimum score threshold equal or higher than

Video Activity Recognition Based on Objects Detection

859

50% to any detected object Oi in the frame Ft of the video Vj . In this way, we computed a probability of each of the 12 objects occurring in each video. The output of the object detection model is a (11, 5) matrix for each frame that represents the objects coordinates in the video. These outputs are the inputs for the RNNs model. For that, we used an LSTM model for sequence learning to process the output of the objects detected and to predict the main object correlation to the activity. In each time step t, the LSTM layer takes its previous hidden state h(t − 1), the previous cell state c(t − 1), and the time step of the input sequence x(t) as inputs. Therefore, it computes the updated cell state c(t) and the hidden state h(t). Given a sequence input x = [x1 , x2 , . . . , xT ] of length T, at time step t we can compute the current hidden state h(t) and the updated cell state c(t) of the LSTM layer by iterating the following equations from t = [1 : T ] as follows: c(t) = f (t)  c(t − 1) + i(t)  g(t),

(1)

h(t) = σ (t)  tanh(c(t)),

(2)

Where  denotes the element-wise product of vectors, and σ denotes the sigmoid function given by: −1

σ (x) = (1 + e−x )

(3)

Fig. 4. Images with corresponding objects detected

Figure 5 shows an example of our model workflow.

4 Experimental Evaluation This section presents an experimental evaluation of our approach. First, we describe the dataset, next we explain our experimental methods, and finally we present the

860

M. Boudmagh et al.

Fig. 5. Example of our final model workflow

results. Note that we coded our approach in Python, using Tensor flow 2.0 and Keras 2.3.1 libraries, evaluated on an Intel Core i7, 2.4 GHz with 8.0 Go RAM. 4.1 The Dataset We used the data collected by Liu et al. [3]. This dataset is designed to cover human’s daily activities in the living room. When the performer stands close to the sofa or sits on it, most of the activities involve the human-object interaction. A Kinect device captures the data set. There are 16 activity types: drink, eat, read book, call cellphone, write on a paper, use laptop, use vacuum cleaner, cheer up, sit still, toss paper, play game, lay down on sofa, walk, play guitar, stand up, sit down. Each subject performs an activity in two different poses: “sitting on sofa” and “standing”. The total number of the activity sequences is “320”. 4.2 Experimental Method First, we represent each video by a set of frames. Then we label the images using the tool “labelimg” [27], and finally we divided the data set into disjoint training and test as fellow: 70% as a training set and 30% as a test set. Therefore, for every object in the data set we used 14 videos for training and 6 videos for test. We fine-tune the resnet-50 network model [9], which was originally trained for the ImageNet database [29]. We

Video Activity Recognition Based on Objects Detection

861

used the optimizer Adaptive Moment Estimation (Adam) [42] that uses Momentum and Adaptive Learning Rates to converge faster, and the categorical cross entropy [36] as loss function. The categorical cross entropy loss function calculates the loss by computing the following sum: E=−

I i=1

yi · log yˆ i

(4)



Where yi the i-th scalar value in the model output is, yi is the corresponding target value, and I is the number of scalar values in the model output. At last, we produced boundingboxes with objects name and scores for each of the 12 objects in videos. Table 1 presents the list of training parameters. Table 1. List of the training hyper parameters Parameter

Value

Class number

16

Epochs

10

Mini batch size

128

Initial learn rate

1e-4

Training processor

GPU

Language

CUDA

Validation patience

6 iterations

Optimization algorithm SGDM

The output of the object detection model for each frame is the input of the LSTM model to predict the object that has correlation with the activity in each frame. For that, we trained our model using a data set that we created using the output of the object detection model and we added some random matrix to increase and versatile the data. The data set contains 1000 samples of (11, 5) matrix each one that we divided onto 600 samples for training and 400 sample for testing. The output of this model is a vector of prediction where the highest prediction is the object that has a correlation with activity. The Table 2 shows the object-action correlation for the MSR Daily Activity Data set. For each action there is 20 videos, some actions do not use any objects (Cheer up, sit/stand still and walking), and other action use the same object (Lay down on sofa, Stand up and Sit down). 4.3 Results 4.3.1 Results for Object Detection in Video To evaluate the accuracy of the object detection model, we manually determined the number of correct and incorrect detections of each object in the test data for each object class. Table 3 shows the numbers of correct and incorrect detections as well as the

862

M. Boudmagh et al. Table 2. List of objects used in the actions Action

Object

Video Number

Drink

Drink (Can, Goblet) 20

Eat

Food (Chips)

20

Read

Book

20

Talking on the Phone Cell phone

20

Writ

Paper

20

Use laptop

Laptop

20

Use vacuum cleaner

Vacuum cleaner

20

Cheer up

//

20

Sit/Stand still

//

20

Toss paper

Folded paper

20

Play game

Joystick

20

number of videos actually containing the object. Objects for which there are no correct or incorrect detections are not show in the table. Note that the model confused little objects taken with hand, such as cell phone, joystick and paper, due to non-diversity of the data set in the training phase. Table 3. Result for our integrated object detection model Object

True positive False positive Video with object

Book

6

0

6

Cellphone

2

4

6

Drink

5

1

6

Food

5

1

6

Guitar

6

0

6

Joystick

4

2

6

Laptop

5

1

6

Paper

3

3

6

Sofa

18

0

18

Folded paper 2

4

6

Vacuum

1

6

5

Note that there is 18 videos used to test the object Sofa, because there are three actions use the sofa (Lay down on Sofa, stand up and sit down).

Video Activity Recognition Based on Objects Detection

863

4.3.2 Activity Recognition Same as Tamou et al. [43], we evaluated our approach on the MSR Daily Activity 3D dataset using Leave-One-Out Cross-Validation technique (LOOCV) where we left in each iteration an actor out, The Fig. 6 illustrates the average accuracy of each action, and the confusion matrix is shown in Fig. 7.

Fig. 6. Recognition accuracies for the 16 action classes of the MSR Daily Activity 3D dataset.

The system shows good results on activities, which are in direct interaction with the objects like (drink, eat, playing guitar…). We achieved an average accuracy of 74% but only on actions with interaction with objects. From the recognition accuracies (Fig. 6) and the confusion matrix (Fig. 7), we deduce that the system can handle only cases where subjects interact with external objects, actions without interactions with objects or uses the same objects like Lay Down on Sofa, Sit Down on sofa and stand up that uses the object Sofa, are not handled by the system. The analysis of some misclassified examples given in Fig. 8 shows in fact that. the major reason of this misclassification is the wrong detection of objects in videos that make the system to predict the wrong activities.

864

M. Boudmagh et al.

Fig. 7. Confusion matrix of our approach on MSR Daily Activity 3D dataset.

Fig. 8. Some examples of actions misclassified by our approach.

5 Conclusion and Future Work In this paper, a novel human action recognition algorithm based on Convolution and recurrent neural network has been presented. First, we introduced a method for the recognition of objects in videos using CNN. Second, we demonstrated how using detecting object with LSTM could give a good result for a better classify activity recognition. The integration of these components have produced an activity recognizer that improve accuracy on videos. Future work will focus mainly on three directions. First, more generalization ability is required to allow the system to work in real scenarios applications without additional training. Second, add text mining to learn effective activity recognition for video description, and finally our ultimate goal is to construct a system that can enhanced existing human action recognition algorithms that improve accuracy on a diverse and realistic video corpus.

Video Activity Recognition Based on Objects Detection

865

References 1. Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: a review. ACM Comput. Surv. 43(3), 1–43 (2011) 2. Ziaeefard, M., Bergevin, R.: Semantic human activity recognition: a literature review. Pattern Recognit. 48(8), 2329–2345 (2015) 3. Van Gemert, J.C., Jain, M., Gati, E., Snoek, C.G.: Action localization proposals from dense trajectories. In: Proceedings of the British Machine Vision Conference 2015: BMVC 2015, Swansea, UK, 7–10 September (2015) 4. Zhu, H., Vial, R., Lu, S.: A spatio-temporal convolutional regression network for video action proposal. In: Proceedings of the CVPR, Venice, Italy, 22–29 October (2017) 5. Papadopoulos, G.T., Axenopoulos, A., Daras, P.: Real-time skeleton-tracking-based human action recognition using kinect data. In: Proceedings of the International Conference on Multimedia Modeling, Dublin, Ireland, 6–10 January (2014) 6. Presti, L.L., Cascia, M.L.: 3D skeleton-based human action classification: a survey. Pattern Recognit. 53, 130–147 (2016) 7. Paul, S.N., Singh, Y.J.: Survey on video analysis of human walking motion. Int. J. Signal Process. Image Process. Pattern Recognit. 7, 99–122 (2014) 8. Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013) 9. He, K., Xiangyu, Z., Shaoqing, R., Jian, S.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770– 778 (2016) 10. Donahue, J., Hendricks, L.A., Rohrbach, M., Venugopalan, S., Guadarrama, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. IEEE Trans. Pattern Anal. 39(4), 677–691 (2017) 11. Bell, S., Lawrence, C., Kavita, Z., Ross, B., Girshick, B.: Inside-outsidenet: detecting objects in context with skip pooling and recurrent neural networks. To appear in CVPR 2016, abs/1512.04143 (2015) 12. Spyros, G., Nikos, K.: Object detection via a multi-region and semantic segmentation-aware cnn model. In: The IEEE International Conference on Computer Vision (ICCV), December (2015) 13. Ross, B., Girshick, R.: Fast R-CNN (2015) 14. Kang, K., Ouyang, W., Li, H., Wang, X.: Object detection from video tubelets with convolutional neural networks. In: CVPR (2016) 15. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015) 16. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014) 17. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, Berg, C.: Single shot multibox detector. In ECCV (2016) 18. Subarna, T., Serge, B., Youngbae, H., Truong, N.: Detecting temporally consistent objects in videosthrough object class label propagation WACV (2016) 19. Prest, A., Ferrari, V., Schmid, C.: Explicit modeling of human-object interactions in realistic videos. IEEE Trans. PAMI 35(4), 835–848 (2013) 20. KyungHyun, C., Bart, V.M., Dzmitry, B., Yoshua, B.: On the properties of neural machine translation: encoder-decoder approaches. In: Proceedings of Workshop on Syntax, Semantics and Structure in Statistical Translation (2014) 21. Sepp, H., Jürgen, S.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

866

M. Boudmagh et al.

22. Xu, C., Corso, J.: Actor-action semantic segmentation with grouping-process models. In: CVPR (2016) 23. Joe, Y.-H., Matthew, H., Sudheendra, V., Oriol, V., Rajat, M., George, T.: Beyond short snippets: deep networks for video classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4694–4702 (2015) 24. Xu, C., Hsieh, S.H., Xiong, C., Corso, J.J.: Can humans fly? Action understanding with multiple classes of actors. In: CVPR (2015) 25. Lu, C., Krishna, R., Bernstein, M., Fei-Fei, L.: Visual relationship detection with language priors. In: ECCV (2016) 26. Mao, J., Xu, W., Yang, Y., Wang, J., Huang, Z., Yuille, A.: Deep captioning with multimodal recurrent neural networks (M-RNN). In: ICLR (2015) 27. Tzutalin. LabelImg. Git code https://github.com/tzutalin/labelImgLk. Accessed 11 Sept 2020 28. Ni, B., Wang, G., Moulin, P.: RGBD-HUDAACT: a color-depth video database for human daily activity recognition. In: Fossati, A., Gall, J., Grabner, H., Ren, X., Konolige, K. (eds.) Consumer Depth Cameras for Computer Vision, 1st edn. Springer, London (2013) 29. Imagenet large scale visual recognition challenge (ILSVRC) (2015). https://www.image-net. org/challenges/LSVRC/2015 30. Wu, X., Sahoo, D., Hoi, S.C.H.: Recent advances in deep learning for object detection. Neurocomputing (2020) 31. Zhou, X., Wang, D., Krahenb, P.: Objects as points (2019) 32. Zhu, C., He, Y., Savvides, M.: Feature selective anchor-free module for single-shot object detection. In: CVPR (2019) 33. Xizhou, S.L., Zhu, H., Dai, J.: Deformable convnets v2: more deformable, better results. In: CVPR (2019) 34. Chen, X., Gupta, A.: Spatial memory for context reasoning in object detection. In: ICCV (2017) 35. Guo, G., Lai, A.: A survey on still image based human action recognition. Pattern Recognit. 47(10), 3343–3361 (2014) 36. Zhilu, Z., Mert, R.S.: Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels. arXiv (2018) 37. Yao, B., Fei-Fei, L.: Recognizing human-object interactions in still images by modeling the mutual context of objects and human poses. IEEE Trans. Pattern Anal. 34(9), 1691–1703 (2012) 38. Desai, C., Ramanan, D.: Detecting actions, poses, and objects with relational phraselets. In: Proceedings of the European Conference on Computer Vision. Springer, Heidelberg (2012) 39. Meng, M., Drira, H., Boonaert, J.: Distances evolution analysis for online and off-line human object interaction recognition. Image Vis. Comput. 70, 32–45 (2018) 40. Meng, M., Drira, H., Daoudi, M., Boonaert, J.: Human object interaction recognition using rate-invariant shape analysis of inter joint distances trajectories. In: Proceedings of the Computer Vision and Pattern Recognition Workshops, Las Vegas, NV, USA, 26 June–1 July (2016) 41. Koppula, H.S., Gupta, R., Saxena, A.: Learning human activities and object affordances from RGB-D videos. Int. J. Robot. Res. 32(8), 951–970 (2013) 42. Diederik, P.K., Jimmy, L.B.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations, pp. 1–13 (2015) 43. Bentamou, A., Ballihi, L., Aboutajdine, D.: Automatic learning of articulated skeletons based on mean of 3D joints for efficient action recognition. Int. J. Pattern Recogn. Artif. Intell. 31(04), 1750008 (2017)

A Comparative Study Between the Most Usable Object Detection Methods Based on Deep Convolutional Neural Networks Ayyoub Fakhari1(B) , Mohamed Lazaar1 , and Hicham Omara2 1 ENSIAS, Mohammed V University in Rabat, Rabat, Morocco [email protected], [email protected] 2 Faculty of Science, Abdelmalek Essaadi University, Tetuan, Morocco [email protected]

Abstract. Object detection is a computer vision technique that has been revolutionized by the rapid development of convolutional neural network architectures. These networks consist of powerful tools, able to learn and extract high-level features more complex. They are introduced to deal with the problems existing in traditional architectures, to find and characterize a large number of objects in an image. This technique has two types of detection: a simple detection that aims to identify a single object in an image, it is a classification problem. And multiple detections that aim not only to identify all the objects in the image but also to find the location of the objects. This article describes a simple summary of datasets and deep learning algorithms commonly used in object detection. Keywords: CNN · Object detection · Faster R-CNN · SSD · YOLO · RetinaNet

1 Introduction The history of neural networks is part of the experiments of neurophysiologist Warren McCulloch and mathematician Walter Pitts in 1943, where they modeled a simple neural network with electrical circuits. The neuron took inputs and according to the weighted sum, it gave a binary output [1]. The main purpose of these networks was to simulate the human brain system to solve general learning problems realistically. Moreover, with the proposal of the back-propagation algorithm in the 80s, Yann LeCun proposed in [2] an algorithm simplifying the formation of neural networks. In the late 90s, Microsoft deployed convolutional neural networks in OCR and handwriting recognition systems. Recently, Google has deployed a supervised convolutional neural network to detect faces and license plates in street images to protect privacy. The major revolution in the application of the CNN network took place in 2012 when Alex Krizhevsky et al., applied deep learning for image classification [3]. However, in 2016, deep learning was popular with decisive results very advanced in speech recognition [4]. In this paper, we will focus on object recognition and detection that have long been a big challenge for computer vision. Many different approaches have been adopted to try to overcome it, some of these approaches include mapping visual aspects of an object, © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 867–876, 2021. https://doi.org/10.1007/978-3-030-66840-2_66

868

A. Fakhari et al.

such as edges, contours, and colors, with similar occurrences in an image, or using more specific features to do the same. To acquire a complete understanding of the image, we must not only focus on classifying images but also try to accurately estimate the concepts and locations of the objects contained in the images. Object detection [5] is a task that generally consists of different subtasks to provide valuable information for semantic understanding of images and videos, which is used in many applications, such as image classification [3], human behavior analysis [6], facial recognition [7], and autonomous driving [8], etc. However, due to the large variations in image characteristics such as viewing angles, occlusions, and lighting conditions, it is difficult to achieve perfect object detection and good localization of objects in the image. Much attention has been drawn by researchers in this field in recent years. Due to the urgency of deep neural networks [3], and region-based convolutional neural networks (R-CNN) [9] which have deeper architectures with the ability to learn more complex features than shallow networks, a more significant gain is being achieved in this field. Moreover, they allow us to learn object representations without manually designing the features [10]. Since the R-CNN proposal, many improved models have been proposed, including Fast R-CNN which jointly optimizes the classification and regression tasks of the selection framework [5], Faster R-CNN, and RetinaNet which uses an additional subnetwork to generate region proposals [11, 12], YOLO, and SSD proposals which perform object detection via fixed grid regression [13, 14]. Throughout this paper, we have presented a brief review of object detection using convolutional neural networks. Section 2 summarizes some algorithms based on CNN as well as the most useful databases for object detection, in Sect. 3 we tried to compare several results from different papers, and Sect. 4 presents the conclusion.

2 Object Detection: Algorithms and Datasets Due to the relationship between the object detection field and image/video understanding, object detection has attracted the attention of researchers in recent years, as most traditional object detection methods are based on manual feature construction and trained with shallow architectures. But with the rapid development of deep learning, more powerful tools are being developed to solve the problems of traditional architectures, capable of learning high-level and deeper semantic features. Deep learning models based on CNNs behave differently in the network architecture, depending on the training objective, the optimization function, and the feature extractor, etc. Therefore, the problem of object detection can be defined as a problem of determining the position of objects in an image (object location) and to which category each object belongs (object classification). 2.1 Neural Networks Algorithms 2.1.1 R-CNN The basic article of R-CNN (regions with convolutional neural networks) [15] published in 2014 by R. Girshick et al. changed the general idea of object detection. Later, in many pieces of literature, deep learning algorithms for object detection inherited this idea, which is the basic algorithm for deep learning for object detection. One of the most

A Comparative Study Between the Most Usable Object Detection

869

important points of this paper is that CNN is applied to a candidate box to extract the feature vector, and the second is to propose a way to effectively train deeper CNN. It performs a supervised training on a large dataset such as ILSVRC and then performs adjustment training in a specific range on a small dataset such as PASCAL [16]. This algorithm as an input image and focus around 2000 area proposals, then with the help of CNN, compute the characteristics for each proposition, and then a linear SVM classifies each region [15]. 2.1.2 SPP-NET It is an improvement based on the R-CNN with a faster speed [17], proposed a Space Pyramid Pooling (SPP) layer to remove restrictions on the fixed network size. SPPNet executes only once the convolution layer (the entire image, regardless of its size), then uses the SPP layer to extract entities, relative to the R-CNN, to avoid repeated convolution in the candidate zone, which reduces convolution time, as the same zone is not calculated multiple times. Therefore, a large part of the time (90%) is consumed on the R-CNN convolution layers that will be solved in SPP-NET, which will considerably reduce the computation time. SPP-NET not only achieves better results with a correct estimate of the different regional proposals in their corresponding scales but also improves the efficiency of the detection during the trial period by sharing the calculation cost before the SPP layer between different proposals. The SPP-Net speed calculated from the convolution of the dataset Pascal VOC 2007, 30 to 170 times faster than the R-CNN, and the overall speed is 24 to 64 times faster than that of R-CNN. 2.1.3 Fast R-CNN The Fast R-CNN algorithm made improvements in the basic R-CNN and SPP-Net weaknesses [18]: an average (mAP) higher than R-CNN and SPP-Net, write the function of losing multiple tasks to get a single-level training process, also with the Fast R-CNN training can update all layers and you do not need to store any features on the disk. Fast R-CNN can accelerate the formation of deep neural networks, such as VGG16. Compared to R-CNN, the speed of the Fast R-CNN training phase is 9 times faster and the test speed is 213 times faster. Moreover, compared to SPP-Net, Fast R-CNN training speed is 3 times faster, and the test speed is 10 times faster, and of course, the accuracy rate has some increase. 2.1.4 Faster R-CNN The emergence of SPP-net and Fast R-CNN has significantly reduced the time of the object detection network. However, the time required for the region proposal method is long, and the task of getting a regional proposal is a hindering factor. The Faster R-CNN provides a solution to this problem by converting traditional practices (such as selective search, SS) to use a deep network to compute a proposition framework (such as the regional proposition network - RPN) [11]. It shares the convolutional layers with the object detection networks by sharing the convolutions at the time of testing so that the minimum cost for the computation of the propositions is lower.

870

A. Fakhari et al.

RPN takes an input image and generates a set of rectangular object proposals. RPN works by passing a sliding window on the CNN characteristics map and at each window, it will obtain k potential selection boxes and scores for the expected quality of each of these boxes. The k-boxes represent these common proportions. This small network takes as its input a spatial window n × n of the map of convolutional input characteristics. Each sliding window is mapped to a smaller entity. This feature is used in two FC sister layers, two box regression layers, and one box classification layer [11]. 2.1.5 R-FCN Region-based Fully Convolutional Network developed by Dai et al. in 2016 [19], which is a fully convolutional network with almost all computations shared over the entire image and which differs from the Faster R-CNN only in the RoI subnetwork, because in R-FCN, the computations after the RoI pooling layer cannot be shared, and to reduce the rate of these computations, Dai et al. proposed to use all the CONV layers to build a shared RoI subnetwork and the RoI cultures are extracted from the last CONV feature layer before the prediction. And since object detection requires representations of the location of the objects in the image, Dai et al. [19] have built a set of object positionsensitive scorecards by using a bank of specific CONV layers as FCN output, to which is added a position-sensitive RoI grouping layer which is different than the most standard layer of Fast RCNN and Faster RCNN. 2.1.6 YOLO You Only Look Once is a method of detecting objects in a single step presented for the first time by Redmon et al. [20] in 2015, in which pixels in the raw image were converted to selection frame coordinates and class probabilities and could be optimized directly end-to-end. This allowed to directly predict the boxes in a single pass with feed-forward without reusing any components of the neural network or generating proposals of any nature whatsoever, thus accelerating the detector. They began by dividing the image into an S × S grid and assuming that B delimits B by a grid. Each cell containing the center of an object instance is responsible for detecting that object. Each bounding box predicts four coordinates, objectivities, and class probabilities. This reformulated object detection as a regression problem. To have a receptive field coverage that covers the entire image, they have included a fully connected layer in their design towards the end of the network. YOLO’s architecture [20] is extremely fast. Fast YOLO [13], a smaller version of the network, processes 155 frames per second while doubling the mAP of other sensors in real-time. YOLO learns very general representations of objects. It surpasses other detection methods, such as R-CNN, by generalizing natural images to other domains. 2.1.7 SSD W. Liu et al. [14] proposed an SSD (Single Shot MultiBox Detector) that responds to Yolo’s difficulties in processing small objects in groups, which is caused by the spatial constraints imposed on the predictions of the selection grids [20]. SSD is based on

A Comparative Study Between the Most Usable Object Detection

871

anchors adopted in MultiBox [21], RPN [11], and a multiscale representation [22], all this to generate a set of anchors boxes with different formats, scales, and resolutions. The SSD architecture is based on the venerable VGG-16 architecture, but at the end of the network, it replaces the fully connected layers with several layers of entities, which allow extract features at multiple scales and gradually reduce the size of the entry for each subsequent layer [22]. It should be noted that SSD can be adapted with better functional extractors, by adding deconvolution layers with jump connections to introduce an additional context on a large scale [23]. 2.1.8 Mask RCNN In 2017, He et al. proposed the Mask RCNN [24] to answer the problems of the existing branches in Faster R-CNN for classification and bounding box regression, by adding a new branch to predict pixel-by-pixel segmentation masks. In general Mask RCNN based on the same pipeline of stages in Faster RCNN. The first stage is the same (RPN), but in the second stage, Mask RCNN adds a new branch that generates a binary mask for each RoI [25]. This branch is called RoIAlign, their main role is to preserve the spatial correspondence at the pixel level. The Mask branch adds a very high computational load and also its cooperation with other tools will give additional information for object detection, however, the most interesting thing is it’s easy to implement and its flexibility for instant recognition. 2.1.9 RetinaNet T. Lin et al. proposed in their paper “Focal Loss for Dense Object Detection” [12] the RetinaNet network which is mainly composed of 3 parts: a backbone sub-network and another feature pyramid (FPN) and the third sub-network for detection. The backbone network to compute a convolutional feature map over the entire input image, the pyramid network to perform convolutional classification of objects on the output of the backbone network, and the last one to perform convolutional regression by bounding boxes. RetinaNet uses feature maps of different resolutions since the resolution and size of the input images vary from one image to another, which makes training faster and less clumsy [26]. 2.2 Datasets Datasets play an essential role, not only in measuring and comparing the performance of object detectors but also in providing resources for learning object models based on examples. In deep learning, these resources play a critical role, as it has been demonstrated that deep convolutional neural networks are designed to learn from a large quantity of data. Table 1 represents the most popular datasets for object detection.

872

A. Fakhari et al. Table 1. Popular databases for object detection.

Dataset name

Total images

Classes

Image size

Highlights

28 × 28

MNIST is one of the most popular deep learning datasets out there. It’s a dataset of handwritten digits

MNIST [27]

70,000

10

Fashion MNIST [28]

70,000

10

28 × 28

Fashion-MNIST consists of 60,000 training images and 10,000 test images. It is an MNIST-like fashion product database

330,000+

91

640 × 480

MS COCO is a large-scale object detection, segmentation, and captioning dataset

ImageNet [30]

14,197,122

21,841

500 × 400

ImageNet is a large dataset of annotated photographs intended for computer vision research

PASCAL VOC [31]

11, 540

20

470 × 380

PASCAL VOC provides normalized image datasets for object class recognition and provides a common set of tools for accessing datasets and annotations

Varied

Open Images V4, a dataset of 9.2 M images with unified annotations for image classification, object detection, and visual relationship detection

MS-COCO [29]

Open images [32]

SVHN [33]

CIFAR-10 [34]

9,011,219

5,000+

630,420

10

60,000

10

32 × 32

32 × 32

The Street View House Numbers (SVHN) Dataset: is a real-world image dataset for developing object detection algorithms This dataset is another one for image classification. It consists of 60,000 images of 10 classes (each class is represented as a row in the above image)

A Comparative Study Between the Most Usable Object Detection

873

3 Experiences and Results In this section, we will try to present the results of several different articles with different points of view. Table 2 presents a comparative analysis based on the Pascal Voc 2007 test dataset and parameters such as MAP (mean average precision) and FPS (frames per second), Table 3 presents a comparative analysis based on the Pascal Voc 2012 test dataset and the MAP parameters and accuracy of each object detected, and Fig. 1 shows a comparison of some detectors with the COCO dataset since in recent years many results have been measured exclusively with this database.

Fig. 1. Comparative analysis based on the COCO dataset.

These experiments are realized in different contexts that are not intended for comparisons on the same performances. Nevertheless, we have decided to represent them together so that you can get an overview of the approximate place where they are. But we warn you that we should never compare these results directly. Table 2. Presents a comparative analysis based on Pascal Voc 2007 test database, and parameters such as MAP, FPS. Detection frameworks

mAP FPS

Fast R-CNN [18]

70.0

0.5

Faster R-CNN VGG-16 [11] 73.2

7

Faster R-CNN Resnet [35]

76.4

5

R-FCN [19]

79.5

6

YOLO [20]

63.4

45

SSD300 [14]

74.3

46

SSD500 [14]

76.8

19

YOLOv2 544x544 [13]

78.6

40

874

A. Fakhari et al. Table 3. Presents a comparative analysis based on the Pascal Voc 2012 test dataset.

Method

Data mAP Aero Bike Bird Boat Bootle Bus

Fast R-CNN [18]

07 ++ 12

68.4

82.3

78.4 70.8 52.3 38.7

77.8 71.6 89.3 44.2

Car

Cat

Chair Cow Table Dog 73.0 55.0

87.5

Faster R-CNN [11]

07 ++ 12

70.4

84.9

79.8 74.3 53.9 49.8

77.5 75.9 88.5 45.6

77.1 55.3

86.9

R-FCN [19]

07 ++ 12

77.6

86.9

83.4 81.5 63.8 62.4

81.6 81.1 93.1 58.0

83.8 60.8

92.7

YOLO [20]

07 ++ 12

57.9

77.0

67.2 57.7 38.3 22.7

68.3 55.9 81.4 36.2

60.8 48.5

77.2

SSD300 [14]

07 ++ 12

72.4

85.6

80.1 70.5 57.6 46.2

79.4 76.1 89.2 53.0

77.0 60.8

87.0

SSD512 [14]

07 ++ 12

74.9

87.4

82.3 75.8 59.0 52.6

81.7 81.5 90.0 55.4

79.0 59.8

88.4

YOLOv2 07 544 [13] ++ 12

73.4

86.3

82.0 74.8 59.2 51.8

79.8 76.5 90.6 52.1

78.2 58.5

89.3

4 Conclusion It is very difficult to have a fair comparison between the different object detectors. Because there is no direct answer to the question of which is the best model. For realworld applications, we make choices to balance accuracy and speed. In addition to types of detectors, we need to be aware of other choices that impact performance, such as feature extractors, the number of proposals or predictions, the training data set, the loss function, etc. Worse still, the technology is evolving so rapidly that any comparison quickly becomes obsolete. But by comparing the results of the papers, we can conclude that Single-Shot Detectors - SSD has a rather impressive frame rate (FPS) using lower resolution images at the expense of accuracy. These papers try to prove that they can beat the accuracy of region-based detectors. However, this is less conclusive since higher resolution images are often used for such claims. As a result, their scenarios change. And different optimization techniques are applied and make it difficult to isolate the merit of each model. The design and implementation of a single plane and regional detectors are now very similar. But with some reservation, we can say that region-based detectors such as the faster R-CNN have a small advantage in terms of accuracy if real-time speed is not required, and single-shot detectors are there for real-time processing. However, applications need to check whether they meet their accuracy requirement. We can conclude that detectors like YOLO and SSD (which perform detection via fixed grid regression) give results with the Pascal Voc database that is quite impressive

A Comparative Study Between the Most Usable Object Detection

875

either for the number of images per second FPS or in terms of Map for low-resolution images. But we can say that region-based detectors such as RetinaNet, Faster R-CNN, and Mask RCNN have a small advantage in terms of accuracy if real-time speed is not required.

References 1. Pitts, W., McCulloch, W.S.: How we know universals; the perception of auditory and visual forms. Bull. Math. Biophys. 9, 127–147 (1947). https://doi.org/10.1007/BF02478291 2. Cun, Y.L., et al.: Handwritten digit recognition with a back-propagation network. In: Presented at the Advances in Neural Information Processing Systems, vol. 2, January 1990. http://dl. acm.org/citation.cfm?id=109230.109279. Accessed 30 Aug 2019 3. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Neural Information Processing Systems, vol. 25, January 2012. https:// doi.org/10.1145/3065386 4. Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012). https:// doi.org/10.1109/msp.2012.2205597 5. Felzenszwalb, P.F., Girshick, R.B., McAllester, D.A., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1627– 1645 (2009). https://doi.org/10.1109/tpami.2009.167 6. Cao, Z., Simon, T., Wei, S.-W., Sheikh, Y.: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields, April 2017. arXiv:1611.08050 http://arxiv.org/abs/1611.08050. Accessed 10 Nov 2019 7. Yang, Z., Nevatia, R.: A multi-scale cascade fully convolutional network face detector, pp. 633–638, December 2016. https://doi.org/10.1109/icpr.2016.7899705 8. Chen, C., Seff, A., Kornhauser, A., Xiao, J.: DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving, September 2015. arXiv:1505.00256 http://arxiv.org/abs/ 1505.00256. Accessed 10 Nov 2019 9. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 1, pp. 886–893, June 2005. https://doi.org/10.1109/cvpr.2005.177 10. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014) 11. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, June 2015. arXiv:1506.01497 http://arxiv.org/abs/1506. 01497. Accessed 04 Sep 2019 12. Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal Loss for Dense Object Detection, February 2018. arXiv:1708.02002 http://arxiv.org/abs/1708.02002. Accessed 08 Mar 2020 13. Redmon, J., Farhadi, A.: YOLO9000: Better, Faster, Stronger, December 2016. arXiv:1612. 08242 http://arxiv.org/abs/1612.08242, Accessed 11 Nov 2019 14. Liu, W., et al.: SSD: Single Shot MultiBox Detector, vol. 9905, pp. 21–37 (2016). arXiv: 1512.02325 https://doi.org/10.1007/978-3-319-46448-0_2 15. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation, November 2013. https://arxiv.org/abs/1311.2524v5. Accessed 02 Sep 2019 16. Madal, W., Ijritcc, I.J.: A Survey on Object Recognition Using Deep Neural Networks. https://www.academia.edu/36782329/A_Survey_on_Object_Recognition_ Using_Deep_Neural_Networks. Accessed 02 Sep 2019

876

A. Fakhari et al.

17. He, K., Zhang, X., Ren, S., Sun, J.: Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition, vol. 8691, pp. 346–361 (2014). arXiv:1406.4729 https://doi.org/10. 1007/978-3-319-10578-9_23 18. Girshick, R.: Fast R-CNN, April 2015. arXiv:1504.08083 http://arxiv.org/abs/1504.08083. Accessed 04 Sep 2019 19. Dai, J., Li, Y., He, K., Sun, J.: R-FCN: Object Detection via Region-based Fully Convolutional Networks, June 2016. arXiv:1605.06409 http://arxiv.org/abs/1605.06409. Accessed 04 Nov 2019 20. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You Only Look Once: Unified, Real-Time Object Detection, June 2015. arXiv:1506.02640 http://arxiv.org/abs/1506.02640. Accessed 05 Sep 2019 21. Erhan, D., Szegedy, C., Toshev, A., Anguelov, D.: Scalable Object Detection using Deep Neural Networks, December 2013. arXiv:1312.2249 http://arxiv.org/abs/1312.2249. Accessed 05 Sep 2019 22. Fu, C.-Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD : Deconvolutional Single Shot Detector, January 2017. arXiv:1701.06659 http://arxiv.org/abs/1701.06659. Accessed 27 Nov 2019 23. Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). https://doi.org/10.1109/5.726791 24. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN, January 2018. arXiv:1703.06870 http://arxiv.org/abs/1703.06870. Accessed 08 Mar 2020 25. Jaderberg, M., Simonyan, K., Zisserman, A., kavukcuoglu, K.: Spatial transformer networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 2017–2025. Curran Associates, Inc., (2015) 26. Anil, K.: Weights & Biases - Object Detection with RetinaNet. https://www.wandb.com/art icles/object-detection-with-retinanet. Accessed 08 Mar 2020 27. LeCun, Y., Cortes, C., Burges, C.: MNIST handwritten digit database. http://yann.lecun.com/ exdb/mnist/. Accessed 01 Dec 2019 28. Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms, September 2017. arXiv:1708.07747 http://arxiv.org/abs/1708. 07747. Accessed 01 Dec 2019 29. Lin, T.-Y., et al.: Microsoft COCO: Common Objects in Context, May 2014. arXiv:1405. 0312 http://arxiv.org/abs/1405.0312. Accessed 27 Aug 2019 30. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Li, F.F.: ImageNet: a Large-Scale Hierarchical Image Database, pp. 248–255, June 2009. https://doi.org/10.1109/cvpr.2009.5206848 31. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The Pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010). https://doi.org/ 10.1007/s11263-009-0275-4 32. Kuznetsova, A., et al.: The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale, November 2018. arXiv:1811.00982 http:// arxiv.org/abs/1811.00982. Accessed 01 Dec 2019 33. Goodfellow, I.J., Bulatov, Y., Ibarz, J., Arnoud, S., Shet, V.: Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks, April 2014. arXiv: 1312.6082 http://arxiv.org/abs/1312.6082. Accessed 01 Dec 2019 34. Krizhevsky, A.: Convolutional Deep Belief Networks on CIFAR-10, May 2012 35. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2015). https://doi.org/10.1109/cvpr.2016.90

Performance Analyses of AES and 3DES Algorithms for Encryption of Satellite Images Yasin Ortakci(B)

and Mohammed Yaseen Abdullah

Karabuk University, Karabuk, Turkey [email protected], [email protected]

Abstract. As the size of the data increases, securing the data has become an important concern as much as storing and transferring. Especially, the sensitive satellite images obtained from earth observation satellites should be transferred to the grand stations by preventing the intruders from accessing the images. For this purpose, cryptography algorithms are used to encrypt the satellite images and these images are transferred to ground station as ciphered image. In this study, AES and 3DES, which are symmetric block encryption algorithms, were used to encrypt three different size satellite images and the performance comparisons of these algorithms were performed by considering some metrics such as histogram, correlation coefficient, NPCR, UACI, PSNR, and computational time. In the simulations, AES and 3DES showed very close results in terms of histogram, correlation coefficient, NPCR, UACI and PSNR metrics for all three satellite images. However, it was observed that AES had performed better in both encryption and decryption of satellite images than 3DES in terms of the computational time criteria, and it had spent almost twice less time than 3DES. Keywords: Image encryption · Satellite images · AES · 3DES

1 Introduction In this era, most of the data transmissions are performed on public networks such as the Internet. Since public networks are vulnerable to intruders, ensuring the privacy and security of data in these mediums has become an important security issue. It is observed that, there has been a serious increase in ransomware attacks since 2013 [1]. In addition, storing data confidentiality in both local and cloud storage is another security concern. These issues indicate how significant and necessary data encryption is. Many cryptographic techniques have been developed up to today to encrypt/decrypt different types of sensitive data. Images are one of the most sensitive data sources since the data stored in a small image can be equal to the pages of text data. Thus, the secret images must be encrypted to prevent unauthorized users from accessing the content of images both during storage and transmission on the network. Image encryption algorithms are exploited to protect the images used in many areas such as military applications, medical imaging, telecommunication, video conferencing, e-commerce, multimedia systems against hackers [2]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 877–890, 2021. https://doi.org/10.1007/978-3-030-66840-2_67

878

Y. Ortakci and M. Y. Abdullah

Image encryption is different from text encryption due to some characteristics of image such as containing large chunks of data, strong correlation between pixels, high redundancy, and high computational expense [3]. The fundamental features expected from an image encryption algorithm are being resistant to brute force attacks, providing high confusion and diffusion, low run-time, and low resources consumption [2]. Today, one of the hottest research topics in the image encryption is encryption of satellite images. Satellite images are the most vital data sources in the field of national security, military operations, weather forecasting, monitoring earth resources, geologic researches and trainings. It is insecure to transmit images taken from earth observation satellites directly to ground stations through communication channels. Various encryption algorithms are exploited to preserve confidentiality of satellite images. Constraints such as computing resource and time limitation should be taken into consideration when images are being encrypted in on-board computers in satellites. In this study, a performance comparison of two symmetric block cipher algorithms, AES (Advanced Encryption Standard) and 3DES (Triple Data Encryption Standard), is performed on the encryption of different size satellite images in terms of security and speed criteria. The contribution of this study is to reveal the ciphering power of AES and 3DES in for satellite images by considering the histogram, correlation coefficient, NPCR, UACI, PSNR and calculation time criteria.

2 Related Work Cryptography is an information technology that protect data against unauthorized users by encrypting data with mathematical formulations. Encryption is the transformation of sensitive data into ciphered form by encoding through a key. Decryption, which is the reverse of encryption, is the process of decoding the ciphered data back to its original form through a key. Cryptographic algorithms are divided into two categories as symmetric and asymmetric according to their key usage. Symmetric algorithms use the same key, which is called secret key or private key, in both encryption and decryption. AES, DES, 3DES, CAST5, IDEA, RC4, RC5, RC6, Blowfish are widely used symmetric algorithms [1, 2]. In asymmetric algorithms, different keys are used in encryption and decryption, namely, while sender uses a public key during the encryption, the receiver uses a different private key in the decryption. Thereby, asymmetric algorithms are safer than symmetric algorithms, but the time complexity is higher [4]. RSA, ECC, DSA, Merkle’s Puzzles, YAK, Diffie Hellman are widely used asymmetric algorithms [2]. Many researches have been conducted and many encryption methods have been developed on image encryption up to today. AES, one of the most common encryption algorithms, were used in many studies for image encryption purpose [5–8]. These studies indicate that AES algorithm provides sufficient cipher security and speed for image encryption. In addition, for big size data encryption (e.g. image, video), it was observed that symmetric algorithms such as AES had performed better than asymmetric encryption algorithms in terms of time complexity [9]. Furthermore, it was explored that the speed of AES image encryption algorithm was higher than the chaotic base image encryption algorithms in [8]. AES image encryption steps were parallelized and the total time of encryption and decryption processes was almost halved in [10]. DES encryption

Performance Analyses of AES and 3DES Algorithms

879

algorithm has been used in the encryption of bitmap and JPEG images in [11]. On the other hand, 3DES, another common encryption algorithm, was modified to be implemented in image encryption [12, 13]. In [12], the image encryption performance of 3DES and DES algorithms were compared, while DES algorithm performed faster encryption, 3DES provided more secure encryption. Chaos based ciphering is one of the methods commonly used in image encryption recently [14–16] and one of the first studies on this subject [17] was conducted in the encryption of two-dimensional chaotic maps. Chaotic methods are both reliable and fast in image encryption due to the fact that the randomness feature of the chaos theory is adequate for image encryption and that it can be applied more easily according to the traditional methods. However, chaotic methods have some shortages due to the low number of cycles. Therefore, they are generally implemented by combining with other algorithms to eliminate this shortage [18]. In [19], a chaotic based image encryption scheme was developed and the results showed that its performance was better than AES in terms of speed and security. RSA, an asymmetric cryptographic algorithm, was tried in image encryption and showed better performance than other traditional algorithms in security criteria [20–22]. A comparative image encryption study of different algorithms including AES and RSA was conducted in [23] and all algorithms except watermarking succeeded to encrypt image confidentially. Various image encryption schemes were also used in the transmission of satellite images. While the performance comparison of the chaos based satellite image encryption algorithms was presented in [24] and [25], the performance of different modes of AES encryption algorithm was measured in [26] for satellite images. In some studies, different encryption techniques for satellite image encryption were combined and more reliable image encryption schemes were obtained. For instance, In [27], GEFFE generator was integrated to AES algorithm for encryption of satellite images. Similarly, a new image encryption scheme, which is to be used in transmission of images from satellites to ground stations, was presented by combining AES and chaotic map methods in [28] and this scheme was designed to work on FPGA devices which have low computing resource. In this study, the performance analysis of well-known two cipher algorithms, AES and 3DES is done.

3 AES (Advanced Encryption Standard) Algorithm AES is a symmetric block cipher algorithm which was published by the National Institute of Standards and Technology in 2000 to eliminate vulnerabilities in the DES cipher by modifying Rijndael algorithm. The size of data block is fixed to 128 bits in AES. In contrast, depending on the case the round number of AES equals to 10, 12 or 14, the key size may vary to 128, 196 or 256 bits, respectively [1]. 128-bit data block is converted to a 4 × 4 byte array called as State at the beginning of algorithm. Each round of AES includes Substitute Bytes, Shift Rows, Mix Columns, and Add Round Key steps which are transformations of State. Only the Mix Columns transformation is bypassed in the last round. Pseudocode of AES encryption process is given below and the decryption includes the inverse transformations of each step in the encryption [29].

880

Y. Ortakci and M. Y. Abdullah

INPUT STATE, KEY OUTPUT CIPHER State=AddRoundKey(STATE, KEY[0,…,3] ); FOR i =1 to Round { STATE = SubstituteBytes (STATE); STATE = ShiftRows (STATE) IF (i < Round) STATE = MixColumns (STATE) STATE =AddRoundKEY (STATE, KEY [ 4*i, …., 4*i+3]); } CIPHER=STATE

4 3DES (Triple Data Encryption Standard) Algorithm 3DES is a symmetric block cipher algorithm which was developed in 1998 to replace DES algorithm since DES uses short key and is not sufficiently resistant against brute force attacks. The block size of 3DES is 64 bit and the key size is 196 bit, three times the key size of DES [1]. Encryption is a combination of three single DES executions with different keys and includes basically the following steps [30]: Step 1. Encrypt the data blocks using single DES with key K1. Step 2. Decrypt the output of step 1 using single DES with key K2. Step 3. Encrypt the output of step 2 using single DES with key K3. Decryption is reverse process of encryption and includes following steps: Step 1. Decrypt cipher using single DES with key K3. Step 2. Encrypt the output of step 1 using single DES with key K2. Step 3. Decrypt the output of step 2 using single DES with key K1.

5 Performance Measurement Metrics A strong image encryption algorithm is expected to provide some features such as low correlation, high confusion and diffusion, high resistance to Brute force attacks, high key sensitivity, low computational time. and low resource consumption [1]. In this section, the metrics, which measure the performance of an image encryption algorithm, are explained in line of these expected features. 5.1 Histogram Analysis Histogram and correlation analyses are the indicators to measure the confusion and diffusion capability of an encryption algorithm [2]. The histogram depicts the frequency

Performance Analyses of AES and 3DES Algorithms

881

distribution of the pixels’ intensity values of an image [1]. In the histogram analysis, whereas the pixels intensity values have generally a non-uniform distribution in the plain image, they are expected to have a uniform distribution in the ciphered image. Thus, there will be no histogram similarity between the plain image and ciphered image and the leakage of data to the hackers via histogram will be prevented. 5.2 Correlation Analysis Correlation coefficient is a metric, that indicates the relation between the pixels of an image (signifies how strongly the pixels are related to each other). Besides, correlation coefficient determines the quality of the encryption process [3]. Since hackers can decode some parts of or all of the ciphered image through correlation data, an image encryption algorithm should convert the plain image to a ciphered image, whose pixels are uncorrelated and have a high randomness, by hiding all features in the plain image [1]. The correlation coefficient values of the images are between −1 and 1. If the correlation coefficient of an image is equal to 1, the positive correlation (similarity) between the pixels is maximum in this image. On the other hand, if it is equal to −1, the negative correlation (contrast) between the pixels of this image is maximum. When an image is encrypted, the correlation coefficient of the ciphered image is expected to be close to 0. Thus, the correlation between the pixels of plain image is removed in the ciphered image. Three correlation coefficients can be computed for an image as horizontal, vertical, and diagonal. Let us denote the grayscale intensity values of nxn image by I matrix. ⎡

I1,1 ⎢ I2,1 ⎢ I =⎢ . ⎣ ..

I1,2 I2,2 .. .

⎤ . . . I1,n . . . I2,n ⎥ ⎥ .. .. ⎥ . . ⎦

In,1 In,2 . . . In,n

While computing horizontal, vertical, and diagonal correlation coefficients, we split I matrix into two sub-matrixes as X and Y as shown in Fig. 1.a, Fig. 1.b, and Fig. 1.c, respectively.

Fig. 1. Splits of I matrix in correlation coefficient calculation

The dimensions of the X and Y sub-matrices are equal and while calculating the correlation coefficient of the I image, the corresponding elements of X and Y sub-matrices

882

Y. Ortakci and M. Y. Abdullah

in Fig. 1 are used for one-to-one comparison. For instance, while calculating horizontal correlation coefficient, horizontal adjacent pixels, I1,1 and I1,2 ; while calculating vertical correlation coefficient, vertical adjacent pixels, I1,1 and I2,1 ; and while calculating diagonal correlation coefficient, diagonal adjacent pixels,I1,1 and I2,2 , are used correspondingly and so on. The correlation coefficient of an image is mathematically calculated as [25]:

N i=1 (xi − E(x))(yi − E(y)) cc = (1)



N 2 N 2 − E(x)) − E(y)) (x (y i=1 i i=1 i E(x) =

1 N xi i=1 N

(2)

E(y) =

1 N yi i=1 N

(3)

where xi and yi are intensity values of two adjacent pixels of the image, xi from X matrix and yi is the corresponding intensity value from Y matrix. While N is equal to (n)x(n − 1) for both horizontal and vertical correlation coefficient calculations and it is equal (n − 1) × (n − 1) for diagonal correlation coefficient calculation. 5.3 NPCR (Number of Pixel Change Rate) and UACI (Unified Average Change Intensity) When a robust encryption algorithm encrypts both the original image, and its modified image, which is obtained by making a small change (generally one bit) at one pixel of the original image, the encrypted two images are expected to be significantly different. To measure this difference, NPCR and UACI metrics are used [3]. Larger NPCR and UACI values imply more resistant image encryption against differential attacks [1]. NPCR is the percentage of the different pixels count of two encrypted images to the total number of pixels. Let us denote C1 and C2 are encrypted images of original and its one-bit changed version, respectively. NPCR can be calculated as [1]: H W i=1

NPCR =  D(i, j) =

j=1 D(i, j)

× 100

(4)

0, IFC1(i, j) = C2(i, j) 1, IFC1(i, j) = C2(i, j)

(5)

W ×H

The dimensions of C1 and C2 are equal, W and H are width and height of the encrypted images, respectively. UACI is the average intensity of differences between the pixels of two encrypted images. UACI can be calculated as [4]:   C1i,j − C2i,j  1 UACI = × 100 (6) i,j W ×H 255

Performance Analyses of AES and 3DES Algorithms

883

5.4 PSNR Analysis PSNR is an indicator of encryption quality and shows the change of pixels between the plain image and encrypted image. In PSNR analysis, original plain image and ciphered image are accepted as signal and noise, respectively. Lower PSNR values indicate higher encryption quality [1]. PSNR value can be calculated as [4]:   MAX 2 (7) PSNR = 10 × log10 MSE

2 1 n m MSE = P(i, j) − C(i, j) (8) i=1 j=1 n×m where MAX is the maximum value, that can be taken by a pixel. If each pixel holds eight bits, MAX is equal to 255. P is the original plain image, C is the ciphered image and n, m are the width and height of images. 5.5 Computational Time Analysis The images’ encryption and decryption times are important criteria for evaluating an image encryption algorithm. They are also important indicators that show whether it can be used in real-time image encryption, especially in on-board satellite computers whose computing resources are limited. These run-times of the encryption algorithms depend on the characteristics of the encryption algorithm used, as well as the resources of the computer such as CPU, RAM, and disc.

6 Experimental Results Three different grayscale satellite images, whose sizes are 256 × 256 (as small image), 512 × 512 (as medium image), and 1024 × 1024 (as large image) pixels, were used in this study. They were encrypted with both AES and 3DES algorithms, and image encryption and decryption performances of these algorithms were measured in the simulations. The satellite images were taken from online Google Earth application. All simulations were coded in Python programming language and they were tested on a computer, the specifications of which were given in Table 1. Table 1. The specifications of the test computer. Processor

2.70 GHz Intel® Core™ i7-4800 MQ

Main memory

16.0 GB

Hard disc

1 TB SSD Disc

Operating system

64-bit Windows 10

Simulation platform Python

884

Y. Ortakci and M. Y. Abdullah Table 2. Visual forms and sizes of the original, encrypted, and decrypted images. Process

Image

Original Image

AES

3DES

14,000 bytes

36,764 bytes

36,717 bytes

49,182 bytes

145,987 bytes

145,861 bytes

148,873 bytes

583,018 bytes

582,987 bytes

14,000 bytes

14,000 bytes

14,000 bytes

49,182 bytes

49,171 bytes

49,171 bytes

148,873 bytes

148,873 bytes

148,873 bytes

Small Image (256x256)

Encryption

Medium Image (512x512)

Large Image (1024x1024)

Small Image (256x256)

Decryption

Medium Image (512x512)

Large Image (1024x1024)

Table 2 shows three plain satellite images with their encrypted and encrypted forms. In addition, their size are given in byte, respectively. The difference of size among plain, encrypted, and decrypted images are similar for both AES and 3DES algorithms. According to the visual assessment both AES and 3DES succeeded to confidentially encrypt all images by hiding all details and decrypt them back to their original form. However, visual assessment is not a sufficient criterion to measure the performance of encryption algorithms, some quantitative analysis should be done as well. Therefore,

Performance Analyses of AES and 3DES Algorithms

885

histogram, correlation coefficient, NPCR, UACI, PSNR and computational time criteria are calculated to measure the image encryption performance of both AES and 3DES algorithms in the simulation. 6.1 Histogram Analysis Table 3 depicts the histograms of the original, encrypted, and decrypted images of three different sizes for AES and 3DES algorithms. Both AES and 3DES algorithms succeed to encrypt images by distributing the intensity values of the pixels uniformly in the histograms and hide all the statistical data of original images. Since the statistical relation between the original image and the encrypted image is eliminated, the encrypted image becomes very resistant to statistical attacks of intruders. Table 3. Histogram graphics. Process

Image

Original Image

AES

3DES

Small Image

Encryption

Medium Image

Large Image

Small Image

Decryption

Medium Image

Large Image

6.2 Correlation Analysis The horizontal, vertical, and diagonal correlation coefficients of the original and encrypted images are listed in Table 4, Table 5, respectively. Since the pixels of the

886

Y. Ortakci and M. Y. Abdullah

original images have tight relationship with each other, the horizontal, vertical, and diagonal correlation coefficients of original images are above 0.93. As seen in Table 5, when these images are encrypted with AES and 3DES, all correlation coefficients are very close to 0. These results show that both methods have very similar performance in terms of correlation analysis, and both have high confusion and diffusion capability. Table 4. Correlation coefficients of original image. Image

Horizontal Vertical Diagonal

Small

0.9665

0.9568

0.9323

Medium 0.9704

0.9677

0.9486

Large

0.9749

0.9565

0.9771

Table 5. Correlation coefficients of encrypted image. Image

Horizontal AES

Vertical 3DES

AES

Diagonal 3DES

AES

3DES

Small

−0.0053

−0.0069

−0.0068

−0.0107

0.0044

0.0049

Medium

−0.0052

−0.0032

−0.0084

−0.0063

0.0035

0.0007

Large

−0.0048

−0.0059

−0.0064

−0.0050

−0.0005

−0.0015

6.3 NPCR and UACI Analysis In Table 6, NPCR and UACI values of both AES and 3DES algorithms are given for all images. The larger NPCR and UACI values mean that image is encrypted more robustly and confidentially. Table 6 shows that the ciphered modified image, which is obtained by making a small change in a pixel of the original image, and ciphered original image are significantly different from each other. Both AES and 3DES algorithms are able to generate NPCR and UACI values greater than 99% and 33% in the simulation, respectively. As a result, the AES and 3DES algorithms produce very close PCNR and UACI results, which show that these two image encryption algorithms are highly resistant to the differential attacks.

Performance Analyses of AES and 3DES Algorithms

887

Table 6. NPCR and UACI values. Image

NPCR of AES

NPCR of 3DES

UACI of AES

UACI of 3DES

Small

99.5605

99.5574

33.5653

33.473

Medium

99.5876

99.5754

33.5720

33.578

Large

99.5636

99.5676

33.6263

33.594

6.4 PSNR Analysis Table 7 list the PSNR values of AES and 3DES algorithms for three satellite images. Lower values of PSNR indicate better image encryption. AES and 3DES algorithms could not succeed to outperform each other in PSNR analysis as well. Their PSNR values are also very close like correlation coefficient, NPCR and UACI values. On the other hand, they both have a sufficient capacity to hide the image details against attacks of intruders. Table 7. PSNR values. Image

PSNR Value of AES

PSNR Value of 3DES

Small

8.5341

8.4759

Medium

9.0300

9.0367

Large

9.253

9.2475

6.5 Computational Time Analysis The average computational times of AES and 3DES algorithms for both encryption and decryption are shown in Table 8. Each algorithm was run ten times to calculate the average time for three satellite images. As seen in Table 8, the computational times of AES for all images are lower than 3DES in both encryption and decryption. For small image, the speed of AES is three times less than the speed of 3DES, for medium image nearly twice the speed of 3DES. For large image, the decryption time of AES is twice lower than 3DES, the encryption time of AES is nearly 40% lower than 3DES.

888

Y. Ortakci and M. Y. Abdullah Table 8. Computational times.

Image

Encryption Time of AES

Encryption Time of 3DES

Decryption Time of AES

Decryption Time of 3DES

Small

0,003855

0,010784

0,005934

0,018124

Medium

0,012440

0,024792

0,022422

0,042852

Large

0,051465

0,083043

0,078504

0,143325

7 Discussion AES and 3DES produced very similar results at the ciphering of satellite images in terms of histogram, correlation coefficient, NPCR, UACI, PSNR criteria. Besides, the size of the images does not change the results. These results show that the diffusion and confusion capabilities and resistance to Brute force attacks of these two algorithms are almost equal for satellite images. Thus, their security performance is very close. In contrast, when two algorithms are compared in terms of computational time criteria, it is observed that AES is faster than 3DES. This result shows that AES is more suitable than 3DES at real-time satellite image encryption in on-board satellite computers.

8 Conclusion In this study, a detailed performance analyses of AES and 3DES algorithms, which are the most widely used cryptographic algorithms, has been made in the encryption of satellite images in terms of security and speed. Three different sizes of satellite images have been used in analyses. In terms of security related parameters, the two algorithms have obtained very close results. For speed parameter, the AES algorithm has achieved superiority against 3DES in both encryption and decryption operations. As a result, AES runs faster than 3DES and using AES is recommended in cipher operations where the time parameter is significant. As a future plan, it is thought that traditional encryption techniques will not be sufficient for encrypting a large number of satellite images and utilizing big data technologies will increase efficiency in terms of time. It will be an interesting research field to adapt the big data techniques to cryptography algorithms to encrypt the chunks of big size satellite images. Acknowledgment. We would like to thank BAP Unit of Karabuk University for their financial support to this research.

References 1. Kumari, M., Gupta, S., Sardana, P.: A survey of image encryption algorithms. 3D Research 8(4), 37 (2017)

Performance Analyses of AES and 3DES Algorithms

889

2. Mohammad, O.F., et al.: A survey and analysis of the image encryption methods. Int. J. Appl. Eng. Res. 12(23), 13265–13280 (2017) 3. Ahmad, J., Ahmed, F.: Efficiency analysis and security evaluation of image encryption schemes. Computing 23, 25 (2010) 4. Jain, Y., et al.: Image encryption schemes: a complete survey. Int. J. Signal Process. Image Process. Pattern Recogn. 9(7), 157–192 (2016) 5. Kalubandi, V.K.P., et al.: A novel image encryption algorithm using AES and visual cryptography. In: 2016 2nd International Conference on Next Generation Computing Technologies (NGCT). IEEE (2016) 6. Subramanyan, B., Chhabria, V.M., Babu, T.S.: Image encryption based on AES key expansion. In: 2011 Second International Conference on Emerging Applications of Information Technology. IEEE (2011) 7. Zhang, Q., Ding, Q.: Digital image encryption based on advanced encryption standard (AES). In: 2015 Fifth International Conference on Instrumentation and Measurement, Computer, Communication and Control (IMCCC). IEEE (2015) 8. Zhang, Y., Li, X., Hou, W.: A fast image encryption scheme based on AES. In: 2017 2nd International Conference on Image, Vision and Computing (ICIVC). IEEE (2017) 9. Karthigaikumar, P., Rasheed, S.: Simulation of image encryption using AES algorithm. In: IJCA Special Issue On “Computational Science-New Dimensions & Perspectives” NCCSE, pp. 166–172 (2011) 10. Raghu, M.E., Ravishankar, K.C.: Encryption and decryption of an image data – a parallel approach. Int. J. Eng. Technol. 7(3.34), pp. 674–677 (2018) 11. Ziedan, I.E., Fouad, M.M., Salem, D.H.: Application of data encryption standard to bitmap and JPEG images. In: Proceedings of the Twentieth National Radio Science Conference (NRSC 2003) (IEEE Cat. No. 03EX665). IEEE (2003) 12. Silva-García, V., et al.: Image encryption based on the modified triple-DES cryptosystem. In: International Mathematical Forum. Citeseer (2012) 13. Silva-García, V., Flores-Carapia, R., Rentería-Márquez, C.: Triple-DES block of 96 bits: an application to colour image encryption. Appl. Math. Sci. 7(21–24), 1143–1155 (2013) 14. Li, C., et al.: Cryptanalysis of a chaotic image encryption algorithm based on information entropy. IEEE Access 6, 75834–75842 (2018) 15. Muhammad, Z.M.Z., Özkaynak, F.: An image encryption algorithm based on chaotic selection of robust cryptographic primitives. IEEE Access 8, 56581–56589 (2020) 16. Özkaynak, F.: Brief review on application of nonlinear dynamics in image encryption. Nonlinear Dyn. 92(2), 305–313 (2018) 17. Fridrich, J.: Image encryption based on chaotic maps. In: 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation. IEEE (1997) 18. Yun-Peng, Z., et al.: Digital image encryption algorithm based on chaos and improved DES. In: 2009 IEEE International Conference on Systems, Man and Cybernetics. IEEE (2009) 19. Asim, M., Jeoti, V.: On image encryption: comparison between AES and a novel chaotic encryption scheme. In: 2007 International Conference on Signal Processing, Communications and Networking. IEEE (2007) 20. Chepuri, S.: An RGB image encryption using RSA algorithm. Int. J. Curr. Trends Eng. Res. (IJCTER) 3(3), 1–7 (2017) 21. El-Deen, A., El-Badawy, E., Gobran, S.: Digital image encryption based on RSA algorithm. J. Electron. Commun. Eng. 9(1), 69–73 (2014) 22. Sunita: Image encryption/decryption using RSA algorithm. Int. J. Comput. Sci. Mobile Appl. 5(5), 1–14 (2017) 23. Ray, A., et al.: Comparative study of AES, RSA, genetic, affine transform with XOR operation, and watermarking for image encryption. In: 2017 International Conference on Recent Innovations in Signal processing and Embedded Systems (RISE). IEEE (2017)

890

Y. Ortakci and M. Y. Abdullah

24. Usama, M., Khan, M.K.: Classical and chaotic encryption techniques for the security of satellite images. In: 2008 International Symposium on Biometrics and Security Technologies. IEEE (2008) 25. Ahmad, M., Farooq, O.: Secure satellite images transmission scheme based on chaos and discrete wavelet transform. In: International Conference on High Performance Architecture and Grid Computing. Springer (2011) 26. Banu, R., Vladimirova, T.: Investigation of fault propagation in encryption of satellite images using the AES algorithm. In: MILCOM 2006 IEEE Military Communications Conference. IEEE (2006) 27. Bensikaddour, E.-H., Bentoutou, Y., Taleb, N.: Satellite image encryption method based on AES-CTR algorithm and GEFFE generator. In: 2017 8th International Conference on Recent Advances in Space Technologies (RAST). IEEE (2017) 28. Bentoutou, Y., et al.: An improved image encryption algorithm for satellite applications. Adv. Space Res. 66(1), 176–192 (2020) 29. Wadday, G., Salim, M.A., Abdullah, H.J.M.A.A.: Study of WiMAX based communication channel effects on the ciphered image using MAES algorithm. Int. J. Appl. Eng. Res. 13(8), 6009–6018 (2018) 30. Rahmad, C., et al.: Noble method for data hiding using steganography discrete wavelet transformation and cryptography triple data encryption standard: DES. Int. J. Adv. Comput. Sci. Appl. 9(11), 261–266 (2018)

A Survey on Deep Learning-Based Approaches to Estimation of 3D Human Pose and Shape from Images for the Smart Environments Sh. Maleki Arasi1 and E. Seyedkazemi Ardebili2(B) 1 Department Electrical Engineering, Sahand University of Technology, Tabriz, Iran

[email protected] 2 Department Computer Engineering, Kocaeli University, 41001 ˙Izmit, Kocaeli, Turkey

[email protected]

Abstract. Nowadays, on the one hand, due to the increase of equipment and smart environments, increasing attention to intelligence and the need to know more about the pose and shape of humans in these smart environments, and on the other hand the increase of being integrated the virtual world with the real world caused that the proper representation of humans in the virtual world has a great importance. Hence, human analysis of images has become very important. However, this work of human analysis not only goes beyond estimating a two-dimensional pose for one or multiple person, but it also goes beyond estimating a simple three-dimensional skeleton. The estimation of 3D human pose and shape of images has received special attention due to various applications in the real world. After studying and reviewing the papers in this field, it can be concluded that the existing approaches to obtain these estimates can be broadly grouped into two main approaches. An optimization-based approach and a deep learning-based approach are presented in which deep learning-based approaches are made in two methods: parametric and non-parametric. Optimization-based approaches provide the most reliable solution for obtaining these three-dimensional estimates. However, optimization-based approaches are slow to implement, sensitive to appropriate initialization, and often fail due to weak local minimums. So, the focus is on deep learning approaches that regress poses and shapes directly from images. But on the drawbacks, can be said of these deep learning-approaches require a lot of training data and time-consuming; in the other words, their execution time is also high and produce low-resolution 3D predictions. So after reviewing both available approaches, it is found that both methods are challenged to obtain acceptable results. In this paper we have mentioned the optimization-based approaches in general and our main focus is on the deep learning-based approaches. Keywords: Deep learning · Convolutional Neural Network (ConvNet) · Skinning Multi-person Liner (SMPL) · Active Shape Model (ASM) · Neural Body Fitting (NBF)

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 891–916, 2021. https://doi.org/10.1007/978-3-030-66840-2_68

892

Sh. M. Arasi and E. S. Ardebili

1 Introduction Smarting is one of the most attractive topics that technology manufacturers and technology customers have shown interest in recent years. A survey on the concept of smarting can be found in a personal vehicle, smart cars, smart home, smart building, and even in a smart city. For more detailed and perhaps more specialized examination of smartening, an environment such as a home, a hospital, or an office with a smart management center (or decision center) is considered. After studies and efforts to increase the smartening of this environment, it is concluded that in these environments, to increase smartening, one of the basic needs is to increase the ability of the management center to communicate effectively with the environment and the elements in those environments. Table 1. The table above shows the main approaches for estimation 3D human pose and shape

Given the most important element of any environment can be considered human, so recognizing more and more human poses and shapes leads to a better understanding of the environment, and as a result in many cases leads to increased smart decision-making power. Therefore, with the increasing integration of virtual environments with humans as the most important element in the scene and to increase the relationship between

A Survey on Deep Learning-Based Approaches

893

virtual environments and everyday life, one of the most basic and challenging tasks is human perception of available two-dimensional images and its analysis. Estimation of 2D and 3D human pose and shape (mesh) from images is done for the purpose of understanding and analysis. Due to the comprehensiveness outputs of these estimating, it can be immediately used in animation, correction, measurement, manipulate and reload to be used. However, their most important application is in the smartening of environments and buildings that deal with humans in some way. In general, the proposed approaches for estimation 3D human pose and shape from images categorized in optimization-based and deep learning-based which is shown in the Table 1. Each of these approaches has advantages and disadvantages, which according to trade-off of speed and accuracy, one of the methods of these two approaches has been used. All our efforts are to introduce the best methods for reconstructing and estimating the human pose and shape from in-the-wild images so that we can use them to have smarter environments. Therefore, in this article, we try to present the approaches and methods for those who want to choose an approach and method according to their datasets and environment by reviewing the approaches and summarizing the available methods. Our main aim is on presenting the deep learning-based approaches, so the optimization-based approaches are mentioned in general.

2 Optimization-Based Approaches In optimization approaches [12], the best answer (according to a set of criteria) is selected from a set of possible answers for a specific problem. The goal is minimizing or maximizing of a Real Function. In general, the term optimization refers to a process that aims to find the best values of one (or more) functions in a defined domain. That is, the problem is finding the best answer from a set of possible candidate answers. The methods that use an intermittent optimization scheme to update parameters locally are sensitive to initial values. The work of Zhou et al. In 2015 [12] is a convex relation approach which estimate 3D Shape from 2D Landmarks. This method uses an augmented shape-space model, in which a shape is represented as a linear combination of rotating base shapes. It can show a linear representation of both intrinsic shape deformation and exterior viewpoint changes. Convex relaxation of orthogonality constraint to convert the entire problem into a spectral norm regularized linear inverse problem, which is a convex program. So this convex relaxation provides an efficient algorithm for solving global convex applications. Another method to this approaches is an Improved Method for 3D Shape Estimation Using Active Shape Model (ASM), [20] which presented by Hoang et al. In 2017 [24]. This work uses the active shape model to estimate 3D poses and shapes as a linear combination of predefined basic shapes and fits with the 2D input landmarks. This model has improved the execution time and output accuracy by categorizing the data into sub-spaces.

894

Sh. M. Arasi and E. S. Ardebili

3 Deep Learning-Based Approaches Deep Learning-Based Approaches Description Over the past decade, Deep Learning and Computer Vision have been among the interesting areas of research in Artificial Intelligence and Machine Learning. Therefore, it is normal for researchers in these two fields to pay more attention to the use of deep learning models in computer vision, and the next logical step is to move forward in the field of computer vision. As a result, the field of computer vision is shifting from statistical methods to deep learning. There are still challenging problems in computer vision. Nevertheless, deep learning methods are achieving state-of-the-art results to solve some specific problems. One of the most important models of deep learning for computer vision and its applications in related fields are convolutional neural networks (CNN) which by applying it, state-of-the-art results have been obtained. In this section, methods based on deep learning that has been done in both parametric and non-parametric ways and their significant impact on the results are expressed. 3.1 Parametric Approaches A parametric model records all its information about the existent data in its parameters. That means to predict the amount of future data from the current state of the model, only its parameters are needed. For example, linear regression with one variable has two parameters. If these two parameters are available, a new value can be predicted. On the other hand, a non-parametric model can capture more subtle aspects of data. Parametric approaches, also considered “traditional”, require a number of hypotheses. This approaches includes linear regression, logistic regression, linear differentiation analysis and so on. 3.1.1 End-to-End Recovery of Human Shape and Pose This is an end-to-end framework for the full 3D Human Mesh Recovery (HMR) of the human body by a single RGB image [1]. This work describes the shape and angles of parameterized 3D joints. The main purpose of which is minimizing the projection function of key-points, (Due to its existing network, this approach makes it possible to train the model with real images that only include 2D Ground truth interpretations. As a result, it eliminates the need for costly 3D ground truth. It has also used an adversarial training to check the reality of production parameters. To evaluate the production meshes in an adversarial network, there is a database of 3D meshes of the human body with various shapes and poses. These meshes do not necessarily need the corresponding images, so this data is expressed as unpaired. In this network, the Skinned Multi-person Linear (SMPL) model [11] is used, which Parameterizes the body mesh by 3D joint angles (pose) and linear shape space (shape) with low dimension. The full 3D mesh of the human body is reconstructed directly with a forward-looking process of a single RGB image. In this method Convolutional features of the image are sent to the iterative 3D regression module whose objective is to infer the 3D human body and the camera such that its 3D joints project onto the annotated 2D joints. The inferred parameters are also

A Survey on Deep Learning-Based Approaches

895

sent to an adversarial [11] discriminator network whose task is to determine if the 3D parameters are real meshes from the unpaired data [7]. This encourages the network to output 3D human bodies that lie on the manifold of human bodies and acts as a weak-supervision for in-the-wild images without ground truth 3D annotations. Due to the rich representation of the 3D mesh model, this data-driven prior can capture joint angle limits, anthropometric constraints (e.g. height, weight, bone ratios), and subsumes the geometric priors used by models that only predict 3D joint locations. More concretely, the shape and pose resulting from SMPL [12] decomposition are mirrored, and train a discriminator for shape and pose independently. The pose is based on a kinematic tree, so the pose discriminators are decomposed and train one for each joint rotation. An overview of this framework is shown in Fig. 1. HMR has the ability to train with or without the use of any paired 2D-to-3D supervision. Because of that during the training, all images are with 2D ground truth joint interpretations, and in some cases, 3D interpretations are considered. When Ground truth 3D information is available, it is used as an intermediate loss. The overview of proposed framework:

Fig. 1. An image I is passed through a convolutional encoder. Then it sent to an iterative 3D regression module that infers the latent 3D representation of the human that minimizes the joint reprojection error. The 3D parameters also are sent to the discriminator D, whose goal is to tell us if these parameters come from a real human shape and pose or not [1].

In this approach, a model without any 3D monitoring pairs is also trained. All methods approximately rely on direct 3D supervision and cannot train without it. Given these challenging learning setting, the results of this method are very competitive. 3.1.2 Learning to Estimate 3D Human Pose and Shape from a Single Color Image Conv-Net approaches do not work well for estimation due to the lack of training data and low-resolution 3D predictions. Therefore, iterative optimization approaches prevail in this way, despite the high execution time and their common failures due to local minimums.

896

Sh. M. Arasi and E. S. Ardebili

The new solution is a direct prediction of the pose and shape of the color image, and aims to bridge this gap and provide an effective Conv-Net-based approach. This method is a two-step approach that is the main part of the SMPL statistical body shape model combination approach in an end-to-end framework. Advantages of this method Fully accurate 3D estimates, requiring a small number of parameters and direct network prediction is possible, accurate and therefore easier using only 2D key points and silhouettes. As a result, the limiting assumption of a lack of natural images with 3D Grand truth for training is weakened. And while parametric model examples are used to teach 2D to 3D inference, the available 2D image interpretations can be used to teach 2D inference. One of the important advantages of using this parametric model is that its structure allows the use of a 3D loss at each vertex of the estimated 3D mesh at the time of training, and optimizes it directly for the surface. This loss correlates better with the 3D head-to-head error typically used for evaluation, and improves training compared to parametric regression. Finally, a separable rendering is used for the 3D mesh effect generated by the 2D image, which makes it possible to adjust a differentiable renderer to project the generated 3D mesh back to the 2D image. A schematic framework of this method is shown in Fig. 2. Schematic framework of the method:

Fig. 2. (1) An initial Conv-Net predicts heat-maps, and 2D masks using 2D pose data to train. (2) The two networks estimate the parameters of the SMPL statistical model using examples of parametric models for training. (3) The framework can be adjusted end-to-end without the need for 3D grand truth images [11].

Instead of using two Conv-Nets, one Conv-Net is trained as Human2D, which follows a stacked hourglass design [2], (using 2 hourglasses) that deals well between accuracy and execution time. And it has 2 outputs, one for key points and the other for silhouette. The output is a key point in the form of heat maps, and the Silouette outputs has two body and background channels using a pixel-wise binary cross-entropy. The second step of the work requires estimating the pose and 3D shape of the whole body from these key

A Survey on Deep Learning-Based Approaches

897

points and 2D silhouettes. This mapping can also be learned from the data for which 2 components of the network are trained: 1) Pose-Prior: Its inputs are 2D key-point locations with the confidence of the detections (realised by the maximum value of each heat-map) and its outputs are estimates of 72 the pose coefficients θ. 2) Shape-Prior: Its inputs are the silhouette and its outputs are estimates of 10 the shape coefficients β. This method creates a modular path (i.e. updating Pose-Prior without retraining the entire network). The type of these inputs and outputs allows a large amount of training data to be generated by producing SMPL model samples with different 3D poses and shapes. 3.1.3 Neural Body Fitting: Unifying Deep Learning and Model Based Human Pose and Shape Estimation This model-based method estimates the parameters from a single color image by a statistical body model. Traditional model-based approaches often have a goal function that measures the fit between the model and image observations. They do not require 3D training data but must be initialized. While forward-looking models such as CNNs, which directly predict key points, do not require initialization, images with 3D pose interpretations should be available. The CNN architecture provides a link to take advantage of both methods and does not require initialization and large amounts of 3D training data. The Neural Body Fitting (NBF) approach [13] integrates a statistical body model [12] within a CNN, leveraging reliable bottom-up semantic body part segmentation and robust top-down body model constraints. This work makes several principled steps towards a full integration of parametric 3D human pose models into deep CNN architectures, and use a region-based 2D representation, namely a 12-body-part segmentation, as an intermediate step prior to the mapping to 3D shape and pose. This segmentation provides full spatial coverage of a person as opposed to the commonly used sparse set of key-points, while also retaining enough information about the arrangement of parts to allow for effective lifting to 3D. The NBF method is a linked architecture that integrates a human body model into a deep learning architecture (CNN) and uses body partitioning as an intermediate representation. From a color image or meaningful image segmentation, it directly predicts the model parameters, and these parameters are transferred to the flexible and realistic SMPL body model to produce 3D mesh, and then to evaluate the cost function in 2D space, 3D joints are projected to 2D images. NBF therefore accepts both full 3D supervision (in the model or 3D Euclidean space) and weak 2D supervision (if images with only 2D annotations are available).

898

Sh. M. Arasi and E. S. Ardebili

The goal is to build a simple processing path with components that can be optimized in isolation to reduce both the number of hyperparameters and interactions and to train the components of the model sequentially. There are 2 main step in this architecture: 1) Segment the predicted body parts from color images (first receives the input cut (512 × 512) and produces a component segmentation. A RefineNet model (based on ResNet-101) is used. This component partition is in the form of color code (colorcoded) and its size has been changed to (224 × 224) and is given to the second step as an RGB image). 2) Use this segmentation to predict the lower dimension parameters of the mesh (this step itself consists of two parts: a regression network (ResNet-50) whose output is 226 to the SMPL parameter (shape and pose) and a set of non-trainable from the layers that implement the SMPL model and an projection image). NBF predicts the parameters of the body model from a color-coded part segmentation map I using a CNN-based predictor parameterized by weights w. The SMPL model and a simple 2D projection layer are integrated into CNN estimator. Depending on the kind of supervision used to train, output a 3D mesh, 3D skeleton joint locations or 2D joints. This flexible implementation allows us to experiment with the 3D losses only for parts of the data, moving towards a weakly supervised training scenario that avoids expensive 3D labeled data. With 3D information for only 20% of our training data, we could reach similar performance as with full 3D annotations. This encouraging result is an important finding for the design of future datasets and the development of 3D prediction methods that do not require expensive 3D annotations for training. 3.1.4 3D Human Pose Estimation Using Cascade of Multiple Neural Networks This method proposes a method called cascade of multiple neural networks (CMNN) [24] in following two steps: 1) Create the initial estimated 3D shape using the Zhou et al. [28] method with a small number of basis shapes, 2) Make this initial shape more alike to the original shape by using the CMNN. In comparing to existing works, the proposed method shows a significant outperformance in both accuracy and processing time. In this method, the problem of 3D-to-2D compatibility has been done according to the ASM method. The way to use the CMNNs to estimate 3D shapes: First, from the input 2D shape, the initial estimated 3D shape is created by using Zhou et al. method. Then, this shape is adjusted by the CMNN [25] to make it more resemblance to the real 3D shape. This proposed method uses the Zhou et al. method with a small number of predefined basis shapes to make sure it can be used in the real-time application. Network input: includes the coordinates (x, y) of the 2D input (X), and the z coordinates of S— (t−1) . Network output: An update vector to generate the z coordinate of S— (t) . The structure if cascade is shown in Fig. 3. The cascade consists of T stages

A Survey on Deep Learning-Based Approaches

899

C = {C1, . . ., CT}. Each stage Ct includes L neural networks. Each neural network predicts an update vector for q ∈ [1, p] joints, and consists of one input layer with 3 × P nodes, two hidden layers each containing 20 nodes, and an output layer with q nodes. The update vector at stage t is the combination of the outputs of all neural networks in this stage.

Fig. 3. The cascade consists of T stages. There are L neural networks in each stage. The combination of outputs of L neural networks is the update vector for the current estimated shape ˆ S. Each neural network has four layers: an input layer, two hidden layers, and an output layer [25].

At the training step after learning the dictionary of basis 3D shapes, the 2D shapes of all 3D shapes in the training data are created by projecting the 3D joints to a 2D plane. Then, the initial-estimated shapes 3D shapes ˆ S(0)s of these 2D shapes are initialized by using the method proposed by Zhou et al. The target to train the neural network is the difference between the (z) coordinate of the current estimated shapes ˆ S(t − 1)s and the ground truth shapes Ss of q joints corresponding to this neural network. The overview of the CMNN: 3.1.5 Convolutional Mesh Regression for Single-Image Human Shape Reconstruction The purpose of this method is to address the problem of estimating posture and shape by trying to reduce reliance on the parametric model, which is usually SMPL. In this method, the poses and shapes are regressed directly from the images. This approach [16] proposed to take a more hybrid route towards pose and shape regression. While maintaining the SMPL mesh topology, for an input image, instead of directly predicting the model parameters, the positions of the 3D mesh vertices are first estimated. To achieve this, the Graph-CNN architecture is proposed, which explicitly encodes the mesh structure and, while the regression target is for each vertex of its 3D location, processes the image properties attached to its vertices. Each typical CNN is used to extract the features attached to the coordinates of the vertex of the pattern mesh, and the processing on the graph structure defined for Graph-CNN continues, and finally each vertex deforms its 3D position in the mesh to finds targets.

900

Sh. M. Arasi and E. S. Ardebili

This makes it possible to retrieve the full 3D geometry of the human body without the explicit need for a predefined parametric space, and after estimating the 3D position for each vertex, if the existing prediction is required to match a particular model, the parameters can be regressed it from mesh geometry. The first part of the work is an imagebased CNN that extracts the general feature of the input representation and follows the Resnet-50 architecture whose final fully connected layer is ignored and only the 2048D feature vector after the pulling layer. Is kept. To regress the 3D coordinates of the vertices of the mesh from the CNN graph, this method starts from a template human mesh with N vertices as depicted in Fig. 4. The architecture starts from a patterned human mesh with N vertices, and according to the extracted feature vector 2048D, these attributes are attached to the 3D coordinates of each vertex in the pattern mesh.

Fig. 4. Depending on an input image, CNN will encode the image into a low-dimensional feature vector. This feature vector is embedded by placing it in three-dimensional coordinates of each vertex i in the graph specified by the template human mesh. It then processes through a series of convolutional Layers and processes the coordinates of the three-dimensional vertices of the deformed mesh [16].

The CNN graph uses the 3D coordinates of each vertex along with the input properties as input, and the purpose of estimating the 3D coordinates for each vertex in the output is the deformation mesh. The existing processes are by graph convolution layers, in which the formulas of the approach of Kipf et al. [17] are used. For the graph convolution layers, this work makes use of residual connections as they help in speeding up significantly the training and also lead in higher quality output shapes. Also, Batch Normalization [19] is replaced by Group Normalization [32]. Batch Normalization leads to unstable training and poor test performance, whereas with no normalization the training is very slow and the network can get stuck at local minima and collapse early during training. Besides the 3D coordinates for each vertex, this Graph CNN also regresses the camera parameters for a weak perspective camera model. Following Kanazawa et al. [12], this work predicts a scaling factor s and a 2D translation vector t. Since the prediction of the network is already on the camera frame, so there is no need to regress an additional global camera rotation. The camera parameters are regressed from the graph embedding and not from the image features directly. This way gets a much more reliable estimate that is consistent with the output shape.

A Survey on Deep Learning-Based Approaches

901

In general, this hybrid approach is comparable to model-based approaches and is not largely sensitive to the type of inputs. And it allows us to connect features extracted from RGB pixels, segmentation of meaningful segmentations or even dense correspondence. The overview of proposed Framework: It should be noted that model-based approaches create precise meshes of naked bodies under human clothing, but are unsuccessful in estimating the details and elements of the model, such as hair or clothing. On the other hand Non-parametric volumetric approaches estimate complete shapes but are limited in resolution and partial estimates. 3.1.6 Texture-Pose: Supervising Human Mesh Estimation with Texture Consistency As mentioned, due to the lack of natural images with three-dimensional shape grand truth for training, the main challenge is reliable resources. This work [4] has relied on more clues that are present in natural images without the need for additional interpretations or changes in network architecture and are often ignored. Texture-Pose is a neural network training approach for model-based human pose estimation, with direct supervision of natural images. This method uses the conclusion that a person’s appearance does not change significantly in a short film or for multi-view images. This seemingly insignificant and often overlooked cue goes a long way for modelbased pose estimation. This parametric model is used to calculate the texture map for each frame, assuming it is fixed. Which makes each point of the texture map have the same value in all frames. Due to the texture transfer in the space of this map, there is no need to calculate the camera movement and assume that the frames are smooth. This general formulation makes the approach flexible and practical, especially in video images and duplicate images. The parametric model used in this work is also SMPL. The joints of the body X are the linear composition of the vertices of the mesh, so using a pre-trained linear regressor W, the mesh can be mapped to the desired joint (X = WM). The overview of this work is shown in Fig. 5.

Fig. 5. Here, for simplicity, the input during the training contains two j, i images of the same person. The basic assumption is that a person’s appearance does not change dramatically in the input images (i.e., the frames are made from a single film, or from synchronized multimedia cameras). The deep network works on both the image and estimates the shape of the person. After that, the projected shape is scattered on the image and after creating visibility for any point on the surface, texture maps Ai and Aj are created [4].

902

Sh. M. Arasi and E. S. Ardebili

SMPL production meshes are modifications of the original T pattern. The corresponding UV map guides the pattern surface onto an A image, a texture map of each pixel t called texel. By making the mapping between the textiles and the mesh, the surface coordinates become fixed and independent of the changes in the surface geometry in 3D. The goal is to learn a predictor f that is perceived by a deep network and maps a single I input image to the parameters of a person’s pose and shape on the image. The network output specifies the SMPL position and shape parameters. This deep network, with the exception of the output, regresses the 3D rotations with the representation provided by Zhou et al. [31]. The overview of the proposed texture consistency supervision: The important observation, (that the person’s appearance remains constant translates to a texture consistency loss), forces the two texture maps to be equal for all points on the Vij surface that can be seen in both images. This lass acts as a network monitor and complements other weak lasses commonly used in training. According to the parameters of the pose and the shape of the mesh M and the corresponding 3D joints, X is generated. The mesh can be projected to the image using the estimated camera parameters. Through efficient computation (MPI-IS. Mesh processing library. https://github.com/ MPI-IS/mesh.), this work can infer the visibility for each point on the surface, and as a result, for every texelt of the texture map A. To guarantee that this method gets a valid 3D shape, this method used the adversarial prior, which factorizes the model parameters into: (i) pose parameters θ, (ii) shape parameters β, and (iii) per-part relative rotations, that is one 3D rotation for each of the 23 joints of SMPL. In the end, it trains a discriminator D k for each factor of the body model. When there is access to multiple views i and j of a subject at the same time instance, then the main additional constraint it needs to enforce is that the pose of the person is the same across all viewpoints. This could be incorporated, by simply forcing all the pose parameters to have the same value. This generic formulation makes this approach particularly flexible and applicable in monocular video and multi-view images alike. 3.1.7 Estimating Human Shape Under Clothing from Single Frontal View Point Cloud of a Dressed Human In general, model-based methods are not practical for estimating loose clothing, but nonmodeled, free-change methods, because they are not limited to the naked body space, can hold clothing or other surface details but cannot shape Estimate the actual clogged with clothing. In this approach, the advantages of both methods are combined and a personalized statistical body model is presented that describes the clothes as deviations from the parametric model of naked man. This approach is the first method to accurately estimate the parameters of the naked body shape from the depth or point cloud images of the uniformed front view of a dressed man, which has been proposed to deal with the situation of comfortable clothes, and a new target function has been designed that offers the benefits of model-based shape estimation and free variations for comfortable clothing. Provides free changes to deal with casual wear. The task is to estimate the naked shape parameters of a human wearing casual clothing from single-frame point cloud. Depth images of humans are captured by only one Microsoft Kinect Sensor v2. So the point

A Survey on Deep Learning-Based Approaches

903

clouds generated from depth images only contain part of clothed human surface which is visible to the depth camera. The overview of this method is displayed in Fig. 6. The overview of proposed method:

Fig. 6. At first, according to 3D joint locations which are automatically detected by algorithm [8] integrated in Kinect, the shape and pose parameters of model are initialized. Then, according to input front-view point cloud, multi-step searching for correspondences and optimizing are applied, and finally, the estimated shape parameters and estimated model are obtained [6].

In order to personalize the original SMPL model [11] for dealing with the task of estimating shape under casual clothes, a set of auxiliary variables Dcp applied to model template T is used to describe personalized deviation for more accurate shape estimation. Shape and pose parameters are initialized according to 3D articulated locations automatically by the algorithm in the approach of Alldieck [22], Video-based remake on Kinect. If the human height is known, the shape parameters are controlled to make the model height more realistic, and if it is unknown, the human height can be calculated with 3D joint locations. Then, due to the drastic changes in the human condition, the focus is more on the global orientation of the body. Finally, the corresponding pairs of vertices (vi, pi) are found, where v is the vertex of the body model and p belongs to the point cloud. To be specific, they are achieved by calculating the rigid transformation of torso, and 3D rotation of limbs respectively. The important part of this work is its objective function which is minimized in the last step. In this work, the Microsoft Kinect v2 sensor is used for training. And the performance of the work is compared with other methods that have different objective functions and the existing method is selected as the most effective method. 3.1.8 Indirect Deep Structured Learning for 3D Human Body Shape and Pose Prediction This method [21] is used for indirect training of deep networks, for structural prediction of three-dimensional human shape and pose, and has been proposed due to the need to reduce reliance on expensive three-dimensional Grand truth labels. Unlike most modern approaches, this method of training in real-world images does not require hard-to-obtain 3D human-shape tags, but instead uses the trained decoding power of artificial data. To achieve this goal, an encryption-decryption network (Auto Encoder) is trained using the two-step method described below. In the first step, an encoder is trained to predict a body silhouette using SMPL parameters (a statistical body shape model) as input. In the second step, the entire

904

Sh. M. Arasi and E. S. Ardebili

network is trained on the actual image and the corresponding silhouette pairs, while the encoder is held constant. As a result, this method allows indirect learning of body shape and posture from real images without the need for Grand truth parametric data. In this work the encoder and decoder split into three units each, serving particular purposes as described in Fig. 7. As a result, this method allows us for an indirect learning of body shape and pose parameters from real images without the need for any ground truth parameter data. Components of proposed encoder-decoder network:

Fig. 7. The figure above shows the main components of the encoder-decoder network of the existing approach [21].

For ease of explanation, the encoder is divided into appearance, compression and transmission units. The appearance unit teaches convolutional filters for human silhouette and background separation. The compression unit further compresses the output of the appearance unit to a vector of dimensions 64 × 1 × 1. The transfer unit then converts this vector into shape and pose parameters using three fully connected layers. Similarly, decoders are divided into units of transmission, expansion, and learning. The transfer unit converts 3D shape and pose parameters into a low-dimensional image (9 × 9), an 8-channel image through three fully connected layers, and a deformation layer. This method has shown high accuracy in artificial images. While the accuracy of the method in real-world images decreases, even when using training methods, by not exposing any real image and a pair of corresponding shape and pose parameters, it can regain a close fit to Grand truth. On the other hand, by implementing more complex architectures in the network of this method and additional higher quality training data, it is possible to enable the proposed method to compete with modern direct learning approaches. 3.2 Non-parametric Approach The non-parametric model allows more information to be provided from the current data set so that future data can be predicted. Usually these parameters can express the properties of the data much better than the parametric models, have a greater degree of freedom and are more flexible. For example, a Gaussian mixed model has more flexibility. If more data is observed, future data can be predicted even better. So knowing only the parameters is enough for a parametric model to predict new data. In the case of a non-parametric model, the prediction of future data is based not only on the parameters but also on the current state of the observed data.

A Survey on Deep Learning-Based Approaches

905

3.2.1 Moulding Humans: Non-parametric 3D Human Shape Estimation from Single Images While the recent progress in convolutional neural networks has allowed impressive results for 3D human pose estimation, estimating the full 3D shape of a person is still an open issue. Model-based approaches can output precise meshes of naked undercloth human bodies but fail to estimate details and un-modelled elements such as hair or clothing. On the other hand, non-parametric volumetric approaches can potentially estimate complete shapes but, in practice, they are limited by the resolution of the output grid and cannot produce detailed estimates. This method [23] uses a binary depth map representation to show and encode the 3D shape. To reconstruct the full 3D human shape, there are two depth maps, a depth map that records the visible surface elements that are directly visible in the image, and a hidden depth map that records the blocked surface of the estimate. In general, this method designs an encoder decoder architecture that takes the single image as input and simultaneously creates an estimate for both depth maps. These depth maps are then combined to obtain a full 3D surface point cloud that can be easily reconstructed using Poisson reconstruction. And produce high-resolution outputs with the same amount of image input but much smaller dimensions than vertex-based volume representations (O (N2) compared to O (N3) where N is the size of the box that restricts humans. Frames in the input image). This (depth or deep) depth-based model also provides a competitive separator to improve the accuracy and humanity of 3D output. To reconstruct the full 3D human shape, there are two depth maps, a depth map that records the visible surface elements that are directly visible in the image, and a hidden depth map that records the blocked surface of the estimate, which is shown in Fig. 8. Non-parametric representation for human 3D shape:

Fig. 8. With a single image, “visible” and “hidden” depth maps are estimated from the camera. Two depth maps can be viewed as two halves of a virtual “pattern” template [23].

Two 2D depth maps z vis and z hid according to a 3D mesh, obtained by animating a 3D human model or by reconstructing a real person from multiple views, and according to a camera hypothesis, i.e. location and parameters, by ray-tracing is introduced. To keep the depth values within a reasonable range and estimate them more accurately, a flat background a distance L behind the subject to define all pixels values in the depths maps in the range [− z orig . . . L]. The method framework is based on the stacked hourglass

906

Sh. M. Arasi and E. S. Ardebili

network proposed by Newell et al. (Alejandro Newell et al. 2016), and designed a 2stack hourglass architecture that takes as input an RGB image I cropped around the human and outputs the 2 depths maps zvis and zhid aligned with I. See Fig. 9. Each of these modules has a set of convolutional and pooling layers that process features down to a low resolution and then up-sample them until reaching the final output resolution. The error obtained in mould-representating of this method is reduced and converged to a minimum value that corresponds to surface details that cannot be correctly encoded even with high resolution depth maps, i.e. when some rays intersect more than twice with the human surface for particular poses.

Fig. 9. According to a single image, “visible” and “hidden” depth maps are estimated. 3D dot clouds from these two depth maps are combined to form a 3D dot cloud of the whole body, as if they hold two halves of a pattern [23].

Finally, a competitive training method is followed according to the Generative adversarial network (GAN) [5], which is a framework for estimating productive models through an adversarial process, in which two models simultaneously, a productive G model that distributes the data. Obtains and teaches a discriminant model D that estimates the probability of a sample coming from training data or from generator G. The goal in this section is to accurately distinguish Grand truth depth maps from generated ones. The overview of proposed approach:

A Survey on Deep Learning-Based Approaches

907

The 3D figure is reconstructed using Poisson [14] reconstruction, and to increase the humanity of the estimate, a competitive training with a discriminator has been used. This method can recover detailed surfaces while keeping the output to a reasonable size. This makes the learning stage more efficient. And this architecture can also efficiently incorporate a discriminator in an adversarial fashion to improve the accuracy and “humanness” of the output. Table 2. The table above shows an overview of the optimization-based and deep learning-based approaches for estimation 3D human pose and shape. Title

Author date

Techniques

Dataset

Properties

End-to-end recovery of human shape and pose

Kazanawa et al. /2018

GAN

LSP LSP_ extended MPII MS COCO Human 3.6 M MPI_INF_3DHP

1. Infer 3D mesh parameters, directly from image features This avoids the need for two stage training and also avoids throwing away a lot of image information 2. Going beyond skeletons, so output meshes, which are more complex and more appropriate for many applications 3. Its framework is trained in an end-to-end manner 4. It remains open whether increasing the amount of 2D data will significantly increase 3D accuracy

(continued)

908

Sh. M. Arasi and E. S. Ardebili Table 2. (continued)

Title

Author date

Learning to Georgios Pavlakos estimate 3D human et al. /2018 pose and shape from a single color image

Techniques

Dataset

Properties

The conventional ConvNet-based approach

UP-3D SURREAL Human 3.6 M

1. An end-to-end framework 2. Incorporation of a parametric statistical shape model, SMPL, within the end-to-end framework, enabling: – Prediction of the SMPL model parameters from ConvNetestimated 2D key-points and masks to avoid training on synthetic image examples – Generation of the 3D body mesh at training time and supervision based on the 3D shape consistency – Use of a differentiable renderer for 3D mesh projection and refinement of the network with supervision based on the consistency with 2D annotations 3. Superior performance compared to previous approaches for 3D human pose and shape estimation at significantly faster running time

(continued)

A Survey on Deep Learning-Based Approaches

909

Table 2. (continued) Title

Author date

Neural body fitting: Mohamed Omran unifying deep et al. /2018 learning and model based human pose and shape estimation

Techniques

Dataset

Neural Body Fitting UP-3D (NBF) Human 3.6 M a. hybrid architecture

Properties 1. Directly predicts the parameters of the model 2. Admits both full 3D supervision (in the model or 3D Euclidean space) and weak 2D supervision (if images with only 2D annotations are available) 3. It requires neither initialization nor large amounts of 3D training data 4. Build a simple processing pipeline with parts that can be optimized in isolation and avoiding multiple network heads 5. Analyze: (1) How the 3D model can be integrated into a deep neural network, (2) How loss functions can be combined and, (3) How a training can be set up that works efficiently with scarce 3D data

(continued)

910

Sh. M. Arasi and E. S. Ardebili Table 2. (continued)

Title

Author date

3D Human pose Van-Thanh Hoang estimation using et al. /2018 cascade of multiple neural networks

Techniques

Dataset

Cascade of multiple MoCap neural networks Human 3.6 M (CMNN)

Properties 1. Create the initial 3D shape by using the method proposed by ASM methods with a small number of predefined basis shapes 2) Make estimated shape more accurate by using the CMNN 3. The proposed method outperforms in both accuracy and processing time 4. Its speed is fast enough to use in the real-time application

(continued)

A Survey on Deep Learning-Based Approaches

911

Table 2. (continued) Title

Author date

Convolutional Nikos Kolotouros mesh regression for et al. /2019 single-image human shape reconstruction

Techniques

Dataset

Properties

Graph-CNN architecture

Human 3.6 M UP-3D LSP

1. Reformulate the problem of human pose and shape estimation in the form of regressing the 3D locations of the mesh vertices, to avoid the difficulties of direct model parameter regression 2. Propose a Graph CNN for this task which encodes the mesh structure and enables the convolutional mesh regression of the 3D vertex locations 3. Demonstrate the flexibility of framework by considering different input representations 4. Current limitations (e.g., low resolution of output mesh, missing details in the recovered shape)

(continued)

912

Sh. M. Arasi and E. S. Ardebili Table 2. (continued)

Title

Author date

Techniques

Dataset

Properties

TexturePose: supervising human mesh estimation with texture consistency

Georgios Pavlakos et al. /2019

TexturePose, (an approach to train CNN)

Human 3.6 M MPII 3LSP

1. A novel approach to leverage complementary supervision from natural images through appearance constancy of each human across different frames 2. Demonstrate the effectiveness of texture consistency supervision in cases of monocular video and multi-view capture, consistently outperforming approaches with access to the same or more annotations

Estimating human shape under clothing from single frontal view point cloud of a dressed human

Wang et al. /2019

Using of point cloud

Point Cloud

1. Proposed the first method of estimating 3D naked body shape parameters from a single-frame frontal view point cloud of a dressed human 2. Design a novel objective function that combines the advantages of model-based shape estimation and free deformation method to deal with casual clothes

(continued)

A Survey on Deep Learning-Based Approaches

913

Table 2. (continued) Title

Author date

Techniques

Dataset

Properties

Indirect deep Jun Kai Vince Tan structured learning et al. /2017 for 3D human body shape and pose prediction

Autoencoder (encoder-decoder network)

Artificial Images Real Images (the Unite the People)

1. A novel encoder-decoder architecture for 3D body shape and pose prediction, 2. This method does not require hard-to-obtain 3D human shape and pose labels for training on real world images, but instead leverages the power of a decoder trained on artificial data

Moulding Humans: Gabeur et al./2019 Non-parametric 3D Human Shape Estimation from Single Images

Double depth map

3D HUMANS

1. This method can recover detailed surfaces while keeping the output to a reasonable size. This makes the learning stage more efficient 2. This architecture can also efficiently incorporate a discriminator in an adversarial fashion to improve the accuracy of the output 3. This representation allows a higher resolution output, potentially the same as the image input, with a much lower dimension than voxel-based volumetric representaions

914

Sh. M. Arasi and E. S. Ardebili

4 Discussion and Conclusion The purpose of this article is to address the existing approaches and methods for the problem of estimation of three-dimensional human pose and shape from images. Existing approaches to this work are categorized into two forms optimization-based and deep learning-based. In this paper, first, methods optimization-based and then in detailed parametric and non-parametric deep learning-based approaches for estimation threedimensional human meshes were presented, which is shown in Table 2. For this purpose, optimization-based approaches provide reliable results, but due to the lack of proper initialization and the usual failures due to the weakness of the minimum initialization, run time and slow convergence of the adaptation process, the use of deep learning approaches, which regresses poses and shapes directly from images, is enhanced by their high efficiency and accuracy. Convolutional networks are not a practical candidate for this problem, due to the need for a lot of training data and low-resolution of 3D predictions. However, by providing a direct and efficient forecasting approach that is better than repetitive optimization methods, it has been shown that convolutional networks can provide an attractive solution to this problem. In general, the goal of all methods is a proper trade-off between the speed and accuracy of the output results. To achieve the best results in estimating of the human pose and shape in smart environments, the best method must be studied and selected. In general, it seems that non-parametric methods perform better than parametric methods in these three-dimensional estimates of images, although they also have some drawbacks. Finally, we would like to thank all those who used to collect these articles, and especially the articles whose photos we have used in this work. We have tried to mention the names of all these people in our references.

References 1. Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT (2018) 2. Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: ECCV/CVPR (2016) 3. Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep it SMPL: automatic estimation of 3D human pose and shape from a single image. In: European Conference on Computer Vision (ECCV) (2016) 4. Pavlakos, G., Kolotouros, N., Daniilidis, K.: TexturePose: supervising human mesh estimation with texture consistency. In: IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South)/ICCV (2019) 5. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: NIPS (2014) 6. Wang, J., Lu, Z., Liao, Q.: Estimating human shape under clothing from single frontal view point cloud of a dressed human. In: IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan (2019)

A Survey on Deep Learning-Based Approaches

915

7. Zhu, J.-Y., Park, T., Isola, P. and Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV (2017) 8. Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. Commun. ACM 56, 116–124 (2013) 9. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016) 10. Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: CVPR (2014) 11. Loper, M., Mahmood, N., Tung, H.F., Harley, A.W., Seto, W., Fragkiadaki, K.: Adversarial inverse graphics networks: learning 2D-to-3D lifting and image-to-image translation from unpaired supervision. In: IEEE International Conference on Computer Vision (ICCV), Venice (2017) 12. Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multiperson linear model. ACM Trans. Graph. (TOG) 34, 1–16 (2015) 13. Omran, M., Lassner, C., Pons-Moll, G., Gehler, P., Schiele, B.: Neural body fitting: unifying deep learning and model based human pose and shape estimation. In: International Conference on 3D Vision (3DV), Verona (2018) 14. Kazhdan, M., Hoppe, H.: Screened poisson surface reconstruction. ACM Trans. Graph. 32, 1–13 (2013) 15. MPI-IS: Mesh processing library. https://github.com/MPI-IS/mesh 16. Kolotouros, N., Pavlakos, G., Daniilidis, K.: Convolutional mesh regression for single-image human shape reconstruction. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA (2019) 17. Litany, O., Bronstein, A., Bronstein, M., Makadia, A.: Deformable shape completion with graph convolutional autoencoders. In: CVPR (2018) 18. Johnson, S.: Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: BMVC (2010) 19. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015) 20. Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active shape models-their training and application. Comput. Vis. Image Underst. 61, 38–59 (1995) 21. Tan, J., Budvytis, I., Cipolla, R.: Indirect deep structured learning for 3D human body shape and pose prediction. In: British Machine Vision Conference 2017. BMVC (2017). https://doi. org/10.17863/CAM.21421 22. Alldieck, T., Magnor, M., Xu, W., Theobalt, C., Pons-Moll, G.: Video based reconstruction of 3D people models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018) 23. Gabeur, V., Franco, J., Martin, X., Schmid, C., Rogez, G.: Moulding humans: non-parametric 3D Human shape estimation from single images. In: IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South) (2019) 24. Hoang, V., Jo, K.: An improved method for 3D shape estimation using active shape model. In: 10th International Conference on Human System Interactions (HSI), Ulsan, (2017) 25. Hoang, V., Jo, K.: 3-D human pose estimation using cascade of multiple neural networks. In: IEEE Transactions on Industrial Informatics (2019) 26. Ramakrishna, V., Kanade, T., Sheikh, Y.: Reconstructing 3D human pose from 2D image landmarks. In: Computer Vision–ECCV (2012) 27. Sun, X., Shang, J., Liang, S., Wei, Y.: Compositional human pose regression. In: IEEE International Conference on Computer Vision, ICCV (2017)

916

Sh. M. Arasi and E. S. Ardebili

28. Zhou, X., Zhu, M., Leonardos, S., Daniilidis, K.: Sparse representation for 3D shape estimation: a convex relaxation approach. IEEE Trans. Pattern Anal. Mach. Intell. 39(8), 1648–1661 (2017) 29. Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Weaklysupervised transfer for 3D human pose estimation in the wild. In: IEEE International Conference on Computer Vision, ICCV (2017) 30. Zhou, X., Leonardos, S., Hu, X., Daniilidis, K.: 3D shape estimation from 2D landmarks: a convex relaxation approach. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston (2015) 31. Zhou, Y., Barnes, C., Lu, J., Yang, J., Li, H.: On the continuity of rotation representations in neural networks. In: CVPR (2019) 32. Wu, Y., He, K.: Group normalization. In: ECCV (2018)

Smart Devices and Softwares

A 3-DOF Cable-Driven Robotic Ankle Rehabilitation Device Romel S. Saysay(B) , Nicanor R. Roxas Jr., Nilo T. Bugtai, Homer S. Co, and Renann G. Baldovino Manufacturing Engineering and Management Department, Gokongwei College of Engineering, De La Salle University, 2401 Taft Ave., 0922 Manila, Philippines [email protected]

Abstract. With the ankle considered as one of the more complex joints in the body, it can rotate about all three anatomical axes of rotation and it is crucial for balance and propulsion during walking. However, disabilities and physical injuries to the ankle joint can severely affect normal daily tasks of standing and walking. Currently, several robotic devices have been developed to assist in physical rehabilitation of the ankle. However, most designs only facilitate the dorsi-plantarflexion motion of the ankle which is just one of its three possible axes of rotation. The study prototype uses a cable-driven mechanism that attaches to the foot and routed to an actuation unit through cable guides. This mechanism allows for the separation of the motors and other bulky components from the ankle. A wooden mock ankle with a ball joint was used to simulate the 3-DOF ankle movements. Several experiments have been performed for validation such as if the device can reach the maximum angles of a natural ankle and if the GUI delivers calibrated movements. Overall, the development of a 3-DOF ankle rehabilitation device was successful. This prototype provides a new design concept to the current state of the art in robotic ankle devices that can facilitate a more naturalistic movement during therapy. Keywords: Ankle rehabilitation · Cable-driven · Degrees of freedom (DOF) · Robotic rehabilitation device

1 Introduction 1.1 Ankle Joint Pathophysiology and Rehabilitation The ankle is one of the most complex joints in the human skeleton [1, 2]. It is composed of four individual bones together with several muscle, tendon, and ligament groups that can turn rigid or flexible as needed. The ankle is capable of rotation about all three anatomical axes and can withstand loadings up to several times the body weight. The ankle foot complex plays a vital role in normal daily life functions, especially for balance, walking and running. With the foot being constantly under stress, the ankle is responsible for its proper positioning for stability during standing and for propulsion during locomotion © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 919–929, 2021. https://doi.org/10.1007/978-3-030-66840-2_69

920

R. S. Saysay et al.

[3]. With the ankle exposed to heavy loads most of the time, it is the most commonly injured joint in the skeletal system. Ankle sprain is the most common injury of the ankle, and it is caused by overstretching and sometimes even tearing of ligaments around the ankle [4]. Almost all people have sprained their ankle at some point, and recovery time can vary from two to six weeks [5]. If not addressed properly, some severe types of sprains can lead to ankle instability and make it more susceptible to recurrent injuries in the future [6, 7]. On another note, according to the World Health Organization, 15 million people worldwide suffer from stroke each year. Locally, the Philippines and neighbouring Southeast Asian nations report that strokes are the second most prevalent cardiovascular disease that either cause mortality or permanent disability [8]. It is also reported that half a million Filipinos are affected by strokes, with healthcare costs amounting to $350 million to $1.2 billion each year. Disability resulting from a stroke can greatly affect daily life functions, which is why physical therapy must be implemented properly. Strokes are a leading cause of long-term locomotor impairment [9]. Together with spinal injuries and other types of paralysis, conditions such as ‘drop foot’, passive ankle stiffness (PAS) and Amyotrophy can affect the ankle, making it non-functional, which in turn makes standing and walking virtually impossible [10]. The same situation applies to other recovering patients, such as athletes and dancers, who have undergone orthopedic ankle surgery due to ligament tear and other foot injuries. The standard rehabilitation protocol for ankle disabilities first involves passive mobility exercises, gradually improving to active stretching and strength training exercises [11]. In passive exercises, the therapist mobilizes the ankle into its full range of motion (ROM) with no physical exertion from the patient. This is to promote blood flow, especially when the ankle is completely immovable, and to cultivate neural recovery. On the other hand, active exercises are an intermediary type of therapy which involve effort coming from both patient and therapist. These are done until the ankle is ready for strengthening in preparation for standing and walking [12]. Rehabilitation must be intensive, however, which means therapy must be consistent, frequent, and properly monitored for it to be effective [13]. Hence, prescribed ankle therapy programs are loosely followed due to factors such as cost, scheduling, availability of the patient and physical therapists, lack of precise progress monitoring, and a host of other reasons. In light of this, ankle rehabilitation robots have been developed over the years to assist patients and physical therapists. Robotic ankle rehabilitation devices have several significant advantages over conventional methods, hence promoting a more effective form of ankle therapy. Aside from being capable of executing a wide variety of exercises with precision and control, these devices can track patient progress and deliver repetitive exercises of varying intensity, all without exhausting the physical therapist [14]. The main design consideration for these devices is the integration of the proper actuation to move the ankle joint within its complete natural range of motion. Subsequently, the ankle joint can be modelled close to a ball joint, which means it can move through all three axes of rotation. These motions are the dorsi-plantar flexion, abduction-adduction, and inversion-eversion. All three pairs are of opposite directions of rotation parallel to the sagittal, frontal and transverse planes respectively.

A 3-DOF Cable-Driven Robotic Ankle Rehabilitation Device

921

Current designs of robotic ankle rehabilitation devices have been classified into two main types: exoskeleton robots and platform-based robots. Exoskeleton ankle robots, sometimes also called ‘active orthoses’, make use of a variety of actuators that are attached to the limb, while platform-based types have a drivable platform to tilt the bottom of the foot. Of the two types, exoskeleton types have less complex actuation mechanisms and are more feasible to replicate [15]. However, most exoskeleton type designs have only focused on the dorsi-plantar flexion of the foot. Therefore, with active orthoses, the ankle is trained on an incomplete range of motion. 1.2 Robotic Ankle Rehabilitation As shown in Fig. 1, Robotic devices used in ankle rehabilitation are classified into two main categories: (1) active orthoses (AO) or ‘exoskeletons’ and (2) platform-based robots (PR). The terms ‘active orthosis’ and ‘exoskeleton’ can generally be regarded as very similar because both are anthropomorphic in nature and are worn or fit close to the body part to be assisted or rehabilitated on. Though both are categorized in one type, the main function of an active orthosis robot is to restore, reinforce and correct proper joint movement whereas exoskeletons are more of augmenting strength and function [16]. The second type, platform-based ankle rehabilitation robots, make use of a moving and tilting platform to move the bottom of the foot for ankle movement.

Fig. 1. Basic categories of robotic ankle rehabilitation devices (Alvarez-Perez et al., 2019)

Of the two types, exoskeleton or active orthosis type has more applicability since it can still be used during gait training [17]. Also, platform-based robots require that the limb be vertical to the ground while exoskeleton and active orthoses can be used while lying down and in other lower limb orientations. Additionally, Fig. 2 illustrates actual prototype examples of Active Orthoses (AO) and Platform-based Robots (PR). Moreover, in building robotic ankle rehabilitation device, a sufficient understanding of the anatomy of the ankle is required. As mentioned, the ankle is a complex joint and it can move about all the three anatomical axes (longitudinal, transverse, and sagittal). Every person may have different levels of ankle flexibility which can be measured by the angles it can reach along these axes of rotation. Nevertheless, the study of [18] have summarized the natural ranges of motion of the ankle as shown in Table 1. These basic rotations can also be combined with one another to produce the rotation of the foot about the ankle. This combination of the basic rotations makes the movement of the foot (and ankle) very versatile such as its use during various sports, in swimming, in balancing, and many other tasks that require agile movements. The full rotation of the

922

R. S. Saysay et al.

Fig. 2. Prototype examples of AO and PR type robotic ankle devices

Table 1. Natural range of motion of the ankle rotations according Type of motion Max. allowable motion Dorsiflexion

20.3°–29.8°

Plantarflexion

37.6°–45.8°

Inversion

14.5°–22.0°

Eversion

10.0°–17.0°

Abduction

15.4°–25.9°

Adduction

22.0°–36.0°

foot through the ankle is called pronation and supination. In designing a robotic ankle rehabilitation device, these complete rotations must be achieved to match the manual therapy that is done with the ankle. Without using robots, a physical therapist normally drives a patient’s foot to rotate to provide passive stretching exercises. This is to provide the necessary neuroplasticity that is essential for the recovery and restoration of the functions of the ankle. Therefore, there is a need for robotic rehabilitation devices to match what the therapist does so that they can provide the assistance while still being effective in delivering therapy.

2 Methodology 2.1 Prototype Development The prototype of the robotic ankle orthosis is constructed using a commercial ankle-foot brace and is modified to allow controlled automated movements. As shown in Fig. 3, the ankle-foot brace is cut along the position of the ankle where a Velcro is attached on lateral sides to allow for 3-axis rotation The commercial foot brace already has a padding the ensures comfort and good fit along the lower limb and ankle-foot complex. Cable-driven actuation is typically done using steel cables with fixed position housing that serves as stoppage for the motion to be achieved. Steel cables are used because of its durability and ability to resist elongation over long periods of repeated pulling. The

A 3-DOF Cable-Driven Robotic Ankle Rehabilitation Device

923

Fig. 3. Commercial ankle foot brace (left) and output after cutting along the ankle part and placing a Velcro for connected movement (right)

main purpose of a cable driven assembly is to separate the actuation mechanism from the other end where the part that needs to be moved is connected. Hence, the cables are a means of delivering the force on both ends while keeping the actuation mechanism physically attached to the main parts to be moved. Doing this reduces the weight that must be worn on the foot and provides less complexity because the electric wirings, power supply and other controls are not attached to the main unit. Shimano© shifting cables commonly used in bicycle gear-shifts were used because of their low friction. Two of these cables were each securely attached to linear actuators (Actuonix L16-P 100 mm stroke, 100 N) on one end and fastened to both left and right sides at the front of the footplate. The linear actuators that provide movement onto the main orthosis is placed separately in a wooden box casing together with the motor drivers, an Arduino Uno main controller, 12 V DC power supply, wirings, and connection to a computer for the main interface controls. Figure 4 shows the actual components setup of the actual prototype. For the angle measurement between the shank and the footplate, two MPU6050 dual-axis accelerometers are placed on the orthosis. The angles for plantar-flexion, inversion-eversion and abduction-adduction are calculated using the difference between the angle/tilt measurements from the two MPU6050 accelerometers. The actuation unit is composed of the two linear actuators attached to the steel cables that routes to the main orthosis. It is a wooden box that also houses the motor drivers, the Arduino controller and cables that link them to each other and to the computer. Figure 5 shows the actual setup, with the actuators placed on one side and the electric components on the other side. A moving mock ankle is fabricated using wood as material (Fig. 6a). The movement is enabled through a ball and socket joint which easily provides rotation along all three axes. This mock ankle is used to wear the device and check the movement of the cable driven actuation. Figure 6b shows the mock ankle wearing the prototype. Shown also is the elastic rubber component that provides the antagonistic force from the pair of steel cables at the front. This is to provide the necessary tension to the overall mechanism and prevent the unwanted slack from the cables.

924

R. S. Saysay et al.

Fig. 4. Setup and components of actual prototype

Fig. 5. Components of the actuation unit (a) front side and (b) rear side

A 3-DOF Cable-Driven Robotic Ankle Rehabilitation Device

925

Fig. 6. Mockup ankle made of wood (left) and elastic component placed on prototype (right)

2.2 Testing and Experimentation A goniometer experiment is conducted to test the calibration and the maximum angles that the prototype can reach along all 3 axes of motion. Using a goniometer, the maximum angle rotations along three axes that the prototype can reach worn by the mock ankle is measured. The goniometer is placed along the corresponding neutral axes and the separation between the foot and the lower leg (also called ‘shank’) is the basis for the angle measurements. After the measurements are obtained, the values are checked if it can accommodate the maximum functional ankle ranges of motion defined by previous studies. Figure 7 shows the series of maximum angles that the prototype can reach. The plantarflexion and dorsiflexion movements can be obtained by synchronously having the two linear actuators both extend or both retract at the same time. This causes both the cables to lengthen resulting in plantarflexion, and oppositely if they both decrease lengths then dorsiflexion is achieved. The maximum angles that the prototype can reach accommodates the maximum angles that a real ankle would reach according to previous literature. Also the inversion and eversion movements of the prototype and the abductionadduction can also reach the maximum angles according to the previous studies. More specifically, these angles are 30 and 45° for the dorsiflexion and plantarflexion respectively. The angles are 30 and 25° for inversion and eversion respectively and 36 and 25° for adduction and abduction respectively. These angle measurements show that the prototype can drive a real ankle to its maximum range of motion which is important so that it can properly be stretched.

3 Results and Conclusions 3.1 Data Analysis To make sure that the prototype delivers the maximum angles for a real ankle, the placement of the cables on the main body and the actuator stroke length were adjusted

926

R. S. Saysay et al.

Fig. 7. Results of the experiment with a goniometer measuring the maximum angles that the prototype can reach

A 3-DOF Cable-Driven Robotic Ankle Rehabilitation Device

927

accordingly. After the calibration process, the position of the cables was fixed. Figure 8 shows the Degrees vs. Actuator length graph of the calibrated prototype. In producing the plantarflexion and dorsiflexion pair of rotations, both the actuators move to the same length and at the same speed. If both actuators lengthen, the cables would also lengthen providing a plantarflexion movement, and the opposite applies for dorsiflexion.

Fig. 8. Degrees vs. Actuator length graph of the calibrated prototype in plantarflexion and dorsiflexion rotations

With actuator length values from 0 to 100, the prototype was calibrated to have the 50 length (l) as the neutral position. This means that if both actuators have l = 50, then the foot would be perpendicular to the lower leg (shank). An actuator length greater than 50 would produce plantarflexion and it would have a linear correlation with the angle reached by the prototype until it reaches the max length of 100 where it also reaches the maximum plantarflexion angle. The same goes for the dorsiflexion motion, and it is achieved when the lengths of both actuators is less than 50 and when both reaches l = 0 then maximum dorsiflexion angle is also reached by the prototype. To achieve the inversion-eversion and abduction-adduction movements, here the actuators must have varying lengths to provide the ‘twisting’ motion of the foot. As shown by Fig. 9, to achieve the neutral position wherein the foot is perpendicular to the lower leg, both actuators have a length of 50. And to have the maximum inversion rotation, actuators 1 and 2 must have lengths of 20 and 80 respectively. The opposite goes for the eversion movement where their lengths must be 80 and 20. These values of 80 and 20 were obtained through the calibration process done and the remaining angles in between were linearly correlated to easily determine the corresponding actuator length to prototype angle. 3.2 Conclusions and Recommendations This research presented the design and development of a 3-DOF robotic assisted ankle rehabilitation device. The design was able to provide plantar-dorsiflexion, inversioneversion, and abduction-adduction of the ankle, which is the complete 3-DOF natural

928

R. S. Saysay et al.

Fig. 9. Actuator Length vs. Angle in Degrees graph of the prototype during inversion-eversion and abduction-adduction movements.

ankle range of motion. The orthosis fits the shape of the ankle-foot complex and has minimal parts connected for simpler usability. The actuation was achieved through cable drives connected to linear actuators that provide extending and retracting motion for the movement of the foot relative to the lower leg. Because of cable-driven actuation, the device has separated the actuators and electronic components from the unit that shall be worn making it a low profile design that can be easily worn by the user. Through a wooden ankle mock up, the movement of the ankle joint has been simulated and all maximum ranges of motions along all three axes can be accommodated by the device with sufficient force during actuation. This first prototype design for the ankle serves as a reference for the implementation of the TAYO project under the Institute of Biomedical Engineering and Health Technologies (IBEHT). Iterations and improvements of the design of the device can be done towards having a workable design that can be used commercially for people with ankle injuries and disabilities such as in stroke patients. Acknowledgments. This research is part of the TAYO project under the Institute of Biomedical Engineering and Health Technologies (IBEHT) of De La Salle University Manila. Special thanks to the Department of Science and Technology – Philippine Council for Health Research and Development DOST-PCHRD for the funding and implementation of this project.

References 1. Chinn, L., Hertel, J.: Rehabilitation of ankle and foot injuries in athletes. Clin. Sports Med. 29, 157–167 (2010). https://doi.org/10.1016/j.csm.2009.09.006 2. Peeters, K.: Kinematic Modeling of Ankle and Foot Bone Motion for Applications in Orthopedics and Gait Analysis (Kinematische modellering van enkel- en voetbeenbeweging voor toepassingen in de orthopedie en ganganalyse) (2012)

A 3-DOF Cable-Driven Robotic Ankle Rehabilitation Device

929

3. Neptune, R.R., Kautz, S.A., Zajac, F.E.: Contributions of the individual ankle plantar flexors to support, forward progression and swing initiation during walking. J. Biomech. 34, 1387–1398 (2001) 4. Hertel, J.: Pathophysiology of lateral ankle instability. J. Athl. Train. 37, 364–375 (2002) 5. Khan, M., Nuhmani, S.: Lateral ankle sprain: a review. Saudi J. Sport. Med. 14, 14 (2014). https://doi.org/10.4103/1319-6308.131588 6. Al Adal, S., Pourkazemi, F., Mackey, M., Hiller, C.E.: The prevalence of pain in people with chronic ankle instability: a systematic review. J. Athl. Train. 54, 662–670 (2019). https://doi. org/10.4085/1062-6050-531-17 7. Palmieri-Smith, R.M., Ty Hopkins, J., Brown, T.N.: Peroneal activation deficits in persons with functional ankle instability. Am. J. Sports Med. 37, 982–988 (2009). https://doi.org/10. 1177/0363546508330147 8. Ren, Y., Wu, Y.N., Yang, C.Y., Xu, T., Harvey, R.L., Zhang, L.Q.: Developing a wearable ankle rehabilitation robotic device for in-bed acute stroke rehabilitation. IEEE Trans. Neural Syst. Rehabil. Eng. 25, 589–596 (2017). https://doi.org/10.1109/TNSRE.2016.2584003 9. American Heart Association 10. Stewart, J.D.: Foot drop: where, why and what to do? Pract. Neurol. 8, 158–169 (2008). https://doi.org/10.1136/jnnp.2008.149393 11. Saglia, J.A., Tsagarakis, N.G., Dai, J.S., Caldwell, D.G.: Control strategies for patient-assisted training using the ankle rehabilitation robot (ARBOT). IEEE/ASME Trans. Mechatronics. 18, 1799–1808 (2013). https://doi.org/10.1109/TMECH.2012.2214228 12. Alvarez-Perez, M.G., Garcia-Murillo, M.A., Cervantes-Sánchez, J.J.: Robot-assisted ankle rehabilitation: a review. Disabil. Rehabil. Assist. Technol. 15, 1–15 (2019). https://doi.org/ 10.1080/17483107.2019.1578424 13. Kwakkel, G., Wagenaar, R.C., Twisk, J.W.R., Lankhorst, G.J., Koetsier, J.C.: Intensity of leg and arm training after middle-cerebral-artery stroke: a randomized trial. 354, 191–196 (1999) 14. Khalid, Y.M., Gouwanda, D., Parasuraman, S.: A review on the mechanical design elements of ankle rehabilitation robot. Proc. Inst. Mech. Eng. Part H J. Eng. Med. 229, 452–463 (2015). https://doi.org/10.1177/0954411915585597 15. Bharadwaj, K., Sugar, T.G.: Kinematics of a robotic gait trainer for stroke rehabilitation. In: Proceedings of IEEE International Conference Robotics Automation. pp. 3492–3497 (2006). https://doi.org/10.1109/ROBOT.2006.1642235 16. Herr, H.: Exoskeletons and orthoses: classification, design challenges and future directions. J. Neuroeng. Rehabil. 6, 1–9 (2009). https://doi.org/10.1186/1743-0003-6-21 17. Jamwal, P.K., Hussain, S., Xie, S.Q.: Review on design and control aspects of ankle rehabilitation robots. Disabil. Rehabil. Assist. Technol. 10, 93–101 (2015). https://doi.org/10.3109/ 17483107.2013.866986 18. Wang, C., Fang, Y., Guo, S., Chen, Y.: Design and kinematical performance analysis of a 3- r us/ r rr redundantly actuated parallel mechanism for ankle rehabilitation. J. Mech. Robot. 5, 041003 (2013). https://doi.org/10.1115/1.4024736

A Cognitive Radio Spectrum Sensing Implementation Based on Deep Learning and Real Signals Mohamed Saber1,2(B) , Abdellah Chehri3 , Abdessamad El Rharras1 , Rachid Saadane1 , and Mohammed Wahbi1 1 SIRC-LaGeS, Hassania School of Public Works, Casablanca, Morocco

[email protected], [email protected] 2 Computer Science Research Lab, Ibn Tofail University, Kenitra, Morocco 3 Department of Applied Sciences, University of Quebec, Chicoutimi, QC G7H 2B1, Canada

[email protected]

Abstract. In a cognitive radio environment, spectrum sensing is an essential phase for improving spectrum resources management. Based on a deep learning method and real signals, a new spectrum sensing implementation is proposed in this work. The real signals are artificially generated, using an ARDUINO UNO card and a 433 MHz wireless transmitter, in ASK and FSK modulation types. The reception interface is constructed using an RTL-SDR receiver connected to MATLAB software. The signals classification is carried out by a convolutional neural network (CNN) classifier. Our proposed model’s main objective is to identify the spectrum state (free or occupied) by classifying the received signals into a licensed user (primary user) signals or noise signals. Our proposed model’s performance evaluation is evaluated by two metrics: the probability of detection (Pd) and the false alarm probability (PFA). Finally, the proposed sensing method is compared with other used techniques for signal classification, such as energy detection, artificial neural network, and support vector machine. The experimental results show that CNN could classify the real signals better than traditional methods and machine learning methods. Keywords: Cognitive radio network · Spectrum sensing · CNN · RTL-SDR · ASK-FSK signals

1 Introduction In recent decades, the demand for spectrum resources has augmented rapidly with the fast development of wireless communication services and the propagation of the Internet of Things (IoT). The traditional static allocation of spectrum causes numerous challenges to the recent wireless industry. Since IoT enables many autonomous devices to share information, spectrum scarcity is one of the main challenges due to the limited natural radio spectrum resource that helps wireless communications [1]. The statistics of spectrum allocation around the world show that the radio spectrum has been almost fully © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 930–941, 2021. https://doi.org/10.1007/978-3-030-66840-2_70

A Cognitive Radio Spectrum Sensing

931

allocated, and compared to the demands of high data-rate devices, it was evident that the current fixed spectrum allocation leads to an inefficient spectrum utilization [2, 3]. Additionally, some frequencies bands are not used all the time by their owners, and others are highly utilized. The unused frequencies are termed as white spaces or spectrum holes. A spectrum hole is a spectrum band assigned to a licensed user, also known as primary user (PU), but it is momentarily unoccupied by this PU. Therefore, due to this static allocation and user-needs growth, the radio spectrum is inefficiently exploited while a secondary utilization is possible. The underutilization of the allocated spectrum, a high-efficient spectrum sharing, and access are recommended to exploit the radio spectrum effectively. In this context, Cognitive Radio (CR) Networks is introduced in [4, 5] as an emerging paradigm that ensures efficient spectrum allocation to address spectrum scarcity. In CR networks, the unlicensed users, also known as secondary users (SUs), can access the spectrum of authorized users (PUs) if the primary spectrum is idle or simultaneously share PUs long as the PUs services can be completely protected. By doing this, the SUs can gain transmission opportunities without requiring a dedicated spectrum. Therefore, CR is considered as a critical enabling technology for dynamic spectrum access (DSA) [6]. To enable the DSA, a cognitive user (SU) needs to obtain the correct information of the spectrum (Free or Occupied), so that the quality of services (QoS) of the PUs can be guaranteed. Two methods can be adopted by SUs to detect spectrum holes. The first is the Geolocation Database that can be used when the PU’s activity is regular and highly predictable. The second is Spectrum Sensing, in which the SU can carry out spectrum sensing, periodically or consistently, to observe the primary spectrum and detect the spectrum holes. Figure 1 illustrates a typical cognitive cycle for a CR, as introduced by Haykin in [7]. Although enabling DSA with CR is a technical issue that involves multidisciplinary efforts from various research communities, such as signal processing, information theory, communications, computer networking, and machine learning. As shown in the above figure, the CR cycle’s essential phase is spectrum sensing operation (radio-scene analysis). Its objective is to sense the spectrum holes to obtain the band state (free/occupied), and based on this result. Data transmission is adapted to optimize the performance by maximizing the throughput and minimizing the interference. Different spectrum sensing techniques have been discussed in the literature [8, 9], such as the cyclostationarity feature detection technique [10], Energy detection techniques [11], and Matched filter detection method [10]. Efficient spectrum sensing can be achieved by adopting strategies that can classify signals, with high accuracy, to PU signals or noise signals. Deep learning has recently gained popularity in signal processing into applications involving signals/time-series data such as voice assistants, digital health, radar, and wireless communications. In our context, we propose a deep learning model for spectrum sensing operation, in which we use real-world signals that represent the PU signals. The evaluated performance of our proposed model is compared with our previous proposed classifiers in [12]. The remainder of this paper is organized as follows. In Sect. 2, we describe the proposed system model for spectrum sensing, and we present the generation and acquisition of the used database. Section 3 gives more detailed descriptions of the system model

932

M. Saber et al.

Fig. 1. Basic cognitive cycle (The figure focuses on three fundamental cognitive tasks).

implementation and presents the results and discussions. Finally, Sect. 5 concludes the paper.

2 System Model In this paper, we consider a spectrum sensing model that classifies the received signals from a single antenna (RTL-SDR device) to PU signals or Noise signals. Our proposed platform can operate in a dynamic spectrum access environment to identify the spectrum holes to attain higher spectrum utilization. With Matlab and Python 3 language, we can developpe a deep learning model and build real-world smart signal processing systems. In general, there are four steps involved. The first step in building a deep learning model is to access and manage the data. Using Matlab, we can acquire signals from a variety of hardware devices. We can also generate synthetic data via simulation or use data augmentation techniques. We proposed a model that transmits real ASK/FSK signals by using the Arduino Uno card and the 433 MHz transmitter. The transmitted signals are collected by an RTL-SDR device connected to the Matlab environment. The received signals are processed as features and fed into a CNN classifier developed in Python language to decide the PU availability. Once the data is collected and ready, we interpret the signal data and label it. We visualize and analyze the received signals with attribute regions. We use the domainspecific tools to label all our signals to prepare our data for training. There are two main approaches to performing deep learning on signals. The first approach involves converting signals into time-frequency representations and training custom convolutional neural networks to extract patterns directly from these representations. A time-frequency presentation is a view of a signal represented over both time and frequency. More specifically, it describes how spectral components in signals evolve as a function of time [13].

A Cognitive Radio Spectrum Sensing

933

This approach enhances the patterns that may not be visible in the original signal. There are various techniques generating time-frequency representation from signals and saving it as images, including spectrogram, continuous wavelet transforms (or scalograms), and constant Q transforms. The second approach involves feeding signals directly into deep networks, such as Long Short-Term Memory (LSTM) networks. To make the deep network learn the patterns more quickly, a reduction of the signal dimensionality and variability is needed in this approach. We have two options, we can manually identify and extract features from signals, or we can automatically extract features using invariant scattering convolutional networks, which provide low variance representations without losing critical information. We adopt the first approach for our signals in our work, and we follow the described steps in Fig. 2 for training and testing the proposed deep network.

Fig. 2. The proposed model.

934

M. Saber et al.

2.1 Database Generation Deep learning is a part of machine learning methods that, based on artificial neural networks, enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Generally, learning can be supervised, semi-supervised, or unsupervised [14, 15]. In our context, we have supervised learning. Supervised learning methods are generally composed of two main phases: training/learning and classification. To make our work more independent and realistic, we will create our signals data. For these reasons, we have constructed a real signal-emitter by using an Arduino Uno card and a 433 MHz wireless device that transmits signals in ASK/FSK modulation type. The Arduino Uno is a microcontroller board [16] used to construct and program electronics projects. It is flexible, inexpensive, and can present a diversity of digital and analog inputs. Arduino Uno, enabled by a software program (Simplified C ++), can send and receive information to most devices, in our case, the 433 MHz ASK/FSK, and even though the internet to control the specific electronic devices. The 433 MHz ASK/FSK is an electronic module that operates at 434 MHz to transmit radio signals. It is commonly used in wireless data transfer applications such as mobile robots and remote control. It receives programmed serial data from a specific platform and transmits it wirelessly over the radio frequency via its antenna connected at pin4 of the Arduino Uno. 2.2 Database Acquisition Spectrum sensing is based on a famous technique called signal detection, which allow us to determine whether a given frequency band is being used. In our work, we consider an SU spectrum sensor with 1 antenna (represented by the RTL-SDR in Fig. 2). Mathematically, signal detection can be reduced to a simple identification problem, formalized as a hypothesis test [17, 18]. There are two hypotheses: H0 , the PU is inactive; and H1 , the PU is active. The received signal at the antenna is given by:  n(k) H0 y(k) = (1) s(k) + n(k) H1 Where y(k) is the sample, the detected signal in the licensed channel, to be analyzed at each instant k, and n (k) is the received noise plus possible interference (not necessarily white Gaussian noise) of variance σ 2 . At hypothesis H1 , s(k) represents the signal that the network wants to detect, which is the transmitted signal through the wireless channel. Generally, the receiver reconstructs the message sent from the captured signal by the inverted processing operations done on transmission. Those processes can be performed using a single RTL-SDR hardware and MATLAB software. RTL-SDR or Software Defined Radio is a radio communication system, wherein the traditional hardware components (e.g. mixers, filters, amplifiers, modulators/demodulators, detectors, etc.) are instead implemented by means of software on an embedded system [19–27]. RTL-SDR is capable of receiving any signal in its frequency range. This range varies depending on the type of device used. In this work, the used dongle has a frequency capability of approximately 25 MHz–1750 MHz. The RTL-SDR software provides the same functions of a traditional receiver: Selection of the desired frequency and bandwidth, Demodulation of the received signal and Interference suppression (Noise blanker).

A Cognitive Radio Spectrum Sensing

935

Figure 3 shows the block diagram of the different processing stages of an RTL-SDR dongle.

Fig. 3. Synoptic diagram of SDR card.

3 Implementation and Results 3.1 Description of the Used Data In the proposed sensing model, the used database is collected in the SIRC/LAGES laboratory of HASSANIA school of public works, for different distances between the transmitter and the receiver. It consists of 2000 ASK/FSK signals. Table 1 shows how we have organized the used database in the learning phase and test phase. Table 1. The used Data-base in learning and testing phases Signal

Learning phase

Test phase

Primary signal

600

500

Noise signal

400

500

This data is generated for several distances between the sender and the receiver because SUs which are close to the PU (e.g., SU1 and SU2 in Fig. 4) are probably can detect the PU presence more reliably than SUs which are far away from the PU (e.g., SU3 in Fig. 4). Moreover, SUs close to each other (SU1 and SU2 in Fig. 4) are likely to report the same sensing results.

936

M. Saber et al.

Fig. 4. The effect of distance on spectrum sensing results

3.2 Performance Evaluation As previously mentioned, H0 and H1 are the sensed states for absence and presence of signal, respectively. Therefore, as presented in Fig. 5, we can define four possible scenarios in detecting a signal. The first is declaring H0 when H0 is true (H0 /H0 ), the second is declaring H1 when H1 is true (H1 /H1 ), the third is declaring H0 when H1 is true (H0 /H1 ), and the fourth is declaring H1 when H0 is true (H1 /H0 ).

Fig. 5. Hypothesis test and possible scenarios with their corresponding probabilities.

The first and the second scenarios are known as a correct detection of a PU signal and a spectrum hale, respectively. Whereas scenarios 3 and 4 are known as a missed detection and a false alarm, respectively. The performances of our proposed spectrum sensing model are characterized by: the probability of detection Pd and the false alarm probability Pfa . Pd : The probability of detecting the PU as being present when the PU is truly present (the band is occupied H1 ). It is desirable to keep the detection rate as high as possible for spectrum sensing, since low Pd value (Failed detection) causes interference with the PU. We calculated Pd by (2): Pd = P(decisionH1 /H1 ) =

Nc × 100 N

(2)

Pfa : the probability of detecting the PU as being present (the band is occupied H1 ) when the PU is actually absent (the band is not occupied H0 ). It is desirable to keep the

A Cognitive Radio Spectrum Sensing

937

false alarm rate as low as possible for spectrum sensing, since high Pfa value (Failed detection) reduces the efficiency of spectrum use. We calculated Pfa by (3): Pfa = P(decisionH1 /H0 ) =

Ne × 100 N

(3)

Where: Nc : The number of times in which the PU presence is declared, while hypothesis H1 ; Ne : The number of times in which the PU signal is detected, while hypothesis H0 ; N : The number of all captured signals. 3.3 Results The data acquirement part is implemented in MATLAB 9.4.0 (R2018a) for a 64-bit computer with core i5 processor, clock speed 2.4 GHz, and 8 GB RAM. The used energy detector is described in our previous work [12], whereas, for ANN, SVM and DNN detectors are implemented in python 3. For SVM detector, the used parameters are: C = 1.0, break_ties = False, cache_size = 200, class_weight = None, coef0 = 0.0, decision_function_shape = ‘ovr’, degree = 3, gamma = ‘scale’, kernel = ‘rbf’, max_iter = −1, probability = False, random_state = None, shrinking = True, tol = 0.001, verbose = False. While, the used parameters for the ANN detector are: activation = ‘relu’, alpha = 1e−05, batch_size = ‘auto’, beta_1 = 0.9, beta_2 = 0.999, early_stopping = False, epsilon = 1e−08, hidden_layer_sizes = (6,2), learning_rate = ‘constant’, learning_rate_init = 0.001, max_fun = 15000, max_iter = 200, momentum = 0.9, n_iter_no_change = 10, nesterovs_momentum = True, power_t = 0.5, random_state = 1, shuffle = True, solver = ‘lbfgs’, tol = 0.0001, validation_fraction = 0.1, verbose = False, warm_start = False. For the deep learning-based detector, we have used a fully-connected network structure with three layers. Due to the high performance that can achieve, we have used the rectified linear unit activation function, referred to as ReLU, on the first two layers. While, the Sigmoid function in the output layer to ensure our network output is between 0 and 1, where 0 for H0 and 1 for H1. The first hidden layer has 1024 nodes, uses the relu activation function, and expects rows of data with 1024 variables (the input_dim = 1024 argument). The second hidden layer has 8 nodes and uses the relu activation function. The output layer has one node and uses the sigmoid activation function. Table 2 shows the comparison between the obtained results by the proposed detection techniques. In a radio channel, the received signal at CR devices is processed as a feature vector and fed into a classifier to decide the channel availability. The classifier categorizes each processed signal into either one of the two classes, namely, H0: the “channel available class” and H1 the “channel unavailable class”. The obtained values of P d and P fa are shown in Table 2. 3.4 Discussion The obtained results manifest the capability of the proposed classifiers in detecting the PU presence. As we can see, the artificial intelligent-based techniques are significantly

938

M. Saber et al. Table 2. Performance evaluation of the proposed sensing techniques. Detection technique

Pd

Pfa

The classifier accuracy (%)

ED

0,961

0,0120



ANN

0.986

0.0011

96,7

SVM

0.977

0.0011

95,7

CNN

0.994

0.0023

100

outperforming the traditional methods (ED) on spectrum sensing. Therefore, we can obtain the spectrum state with high efficiency with machine learning and deep learning techniques. Compared to all other classifiers, the supervised CNN classifier performs well in terms of the detection probability. Its higher detection efficiency compensates for the computational complexity of this classifier. Moreover, the proposed deep learning classifier shows the most elevated and perfect sensing accuracy. This sensing accuracy is crucial in the protection of primary users and the detection of spectrum holes. The plot of our proposed deep learning classifier’s accuracy history is represented in Fig. 6. From the classification accuracy plot, we can see that the model achieves high accuracy in a low number of epochs. On the other hand, Fig. 7 shows the classification loss curve for the training and the testing. The obtained results show that the proposed classifier improves radio signal classification accuracy and efficiency compared with other methods, including energy detection, ANN, and SVM.

Fig. 6. Model accuracy in the training and testing phases.

A Cognitive Radio Spectrum Sensing

939

Fig. 7. Model loss in the training and testing phases

In the future we will evaluate the cognitive radio for Vehicular Ad hoc Networks (CRVs or CR-VANETs). The CR-VANETS is a new trend in the automotive market. Recent and future vehicles will offer functionalities for the transmission of intravehicular commands and dynamic access to wireless services, while the car is in transit [28, 29].

4 Conclusions In this paper, we used a deep learning model for spectrum sensing in cognitive radio networks. The proposed model allows us to identify the spectrum state by classifying the captured signals into PU signals or Noise signals. Therefore, a simplified spectrum sensing implementation has been proposed based on real signals generated by an Arduino Uno card and 433 MHz Wireless transmitter. Then, the transmitted signals are detected in MATLAB software by RTL-SDR dongle using the deep learning model. The system performance is evaluated by two metrics, the probability of detection and the probability of false alarm. Then, the obtained results are compared with three other techniques, such as energy detection (ED), artificial neural networks (ANN), and support vector machine (SVM). The experimental results showed that the deep learning model using a convolutional neural network (CNN) possessed better classification accuracy and faster training speed than other methods, including ED, ANN, and SVM.

References 1. Identification and quantification of key socio-economic data to support strategic planning for the introduction of 5G in Europe-SMART, 2014/0008. Technical report, European Union (2016)

940

M. Saber et al.

2. FCC Spectrum Policy Task Force, “Report of the spectrum efficiency working group,” (2002) http://www.fcc.gov/sptf/reports.html 3. Kolodzy, P., et al.: Next generation communications: Kickoff meeting. In: Proceedings of DARPA (2001) 4. Mitola, J., Maguire, G.Q.: Cognitive radio: making software radios more personal. IEEE Pers. Commun. 6(4), 13–18 (1999) 5. Mitola Iii, J.: Cognitive radio for flexible mobile multimedia communications. Mobile Netw. Appl. 6(5), 435–441, 2001 6. Akyildiz, I.F., Won-Yeol, L., Vuran, M.C., et al.: A survey on spectrum management in cognitive radio networks. IEEE Comm. Magaz. 46(4), 40–48 (2008) 7. Haykin, S.: Cognitive radio: brain-empowered wireless communications. IEEE J. Sel. Areas Commun. 23(2), 201–220 (2005) 8. Yucek, T., Arslan, H.: A survey of spectrum sensing algorithms for cognitive radio applications. IEEE Commun. Surv. Tut. 11(1), 116–130 (2009) 9. Elrharras, A., Saadane, R., Wahbi, M., et al.: Signal detection and automatic modulation classification-based spectrum sensing using PCA-ANN with real word signals. Appl. Math. Sci. 8(160), 7959–7977 (2014) 10. Cabric, D., Mishra, S.M., Brodersen, R.W.: Implementation issues in spectrum sensing for cognitive radios. In: Signals, Systems and Computers (2004) 11. Digham, F.F., Alouini, M.-S., Simon, M. K.: On the energy detection of unknown signals over fading channels. In: Communications, ICC 2003 (2003) 12. Saber, M., El Rharras, A., Saadane, R., et al.: Artificial neural networks, support vector machine and energy detection for spectrum sensing based on real signals. Int. J. Commun. Netw. Inf. Secur. 11(1), 52–60 (2019) 13. Sejdi´c, E., Djurovi´c, I., Et Jiang, J.: Time–frequency feature representation using energy concentration: An overview of recent advances. Digital Signal Processing 19(1), 153–183 (2009) 14. Bengio, Y., Courville, A., Et Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013) 15. Schmidhuber, Jürgen: Deep learning in neural networks: an overview. Neural Networks 61, 85–117 (2015) 16. Banzi, M.: Getting Started with arduino. O’Reilly Media, Inc (2009) 17. Vantrees, H.L.: Detection, Estimation and Modulation Theory. Wiley, vol. 1 (1968) 18. Poor, V.: An introduction to signal detection and estimation. Springer, Berlin (1994) 19. Dillinger, M., Madani, K., Alonistioti, N.: Software defined radio: Architectures, systems and functions. John Wiley & Sons (2005) 20. Chehri, A., Mouftah, H.: An empirical link-quality analysis for wireless sensor networks. In: 2012 International Conference on Computing, Networking and Communications (ICNC), Maui, HI, pp. 164–169 (2019) https://doi.org/10.1109/iccnc.2012.6167403 21. Farjow, W., Chehri, A., Hussein, M., Fernando, X.: Support Vector Machines for indoor sensor localization. In: 2011 IEEE Wireless Communications and Networking Conference, Cancun, Quintana Roo, pp. 779–783 (2011) https://doi.org/10.1109/WCNC.2011.5779231 22. Chehri, A., Mouftah, H., Farjow, W.: Indoor cooperative positioning based on fingerprinting and support vector machines. In: Sénac, P., Ott, M., Seneviratne, A., (eds) Mobile and Ubiquitous Systems: Computing, Networking, and Services. MobiQuitous 2010. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 73. Springer, Berlin, Heidelberg (2012) 23. Chehri, H., Chehri, A., Saadane, R.: Traffic signs detection and recognition system in snowy environment using deep learning. In: Fifth International Conference on Smart City Applications, Safranbolu, Turkey (2020)

A Cognitive Radio Spectrum Sensing

941

24. Saber, M., El Rharras, A., Saadane, R., Chehri, A., Hakem, N., Kharraz, H.A.: Spectrum sensing for smart embedded devices in cognitive networks using machine learning algorithms. In: 24th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems, Verona, Italy, to appear 16–18 September (2020) 25. Chehri, A.: Non-cooperative spectrum allocation based on game theory in IoT-oriented narrowband plc networks. In: IEEE 91st Vehicular Technology Conference, Antwerp, Belgique (2020) 26. Saber, M., Saadane, R., Aroussi, H., Chehri, A.: An optimized spectrum sensing implementation based on SVM, KNN and TREE algorithms. In: IEEE 15th International Conference on Signal Image Technology & Internet Based Systems, Sorrento (NA), Italy (2019) 27. Slalmi, A., Saadane, R., Chehri, A.: Energy efficiency proposal for Iot call admission control in 5G Network. In: IEEE 15th International Conference on Signal Image Technology Internet Based Systems, Sorrento (NA), Italy (2019) 28. de Carvalho, F.B., Lopes, W.T., Alencar, M.S., José Filho, V.S.: Cognitive vehicular networks: an overview. Procedia Computer Science, 65, 107–114 (2015) 29. Chehri, A., Mouftah, H.T.: Autonomous vehicles in the sustainable cities the beginning of a green adventure. Sustain. Cities Soc. 51, 101751 (2019)

Appliance-Level Monitoring with Micro-Moment Smart Plugs Abdullah Alsalemi1(B) , Yassine Himeur1 , Faycal Bensaali1 , and Abbes Amira2 1

2

Electrical Engineering, Qatar University, Doha, Qatar {a.alsalemi,yassine.himeur,f.bensaali}@qu.edu.qa Institute of Artificial Intelligence, De Montfort University, Leicester, UK [email protected] http://em3.qu.edu.qa

Abstract. Human population are striving against energy-related issues that not only affects society and the development of the world, but also causes global warming. A variety of broad approaches have been developed by both industry and the research community. However, there is an ever increasing need for comprehensive, end-to-end solutions aimed at transforming human behavior rather than device metrics and benchmarks. In this paper, a micro-moment-based smart plug system is proposed as part of a larger multi-appliance energy efficiency program. The smart plug, which includes two sub-units: the power consumption unit and environmental monitoring unit collect energy consumption of appliances along with contextual information, such as temperature, humidity, luminosity and room occupancy respectively. The plug also allows home automation capability. With the accompanying mobile application, endusers can visualize energy consumption data along with ambient environmental information. Current implementation results show that the proposed system delivers cost-effective deployment while maintaining adequate computation and wireless performance. Keywords: Smart plug · Domestic energy usage · Energy efficiency Recommender systems · Micro-moments · Internet of things

1

·

Introduction

Undoubtedly, energy saving and energy security have become major contemporary issues. We are facing an energy shortage that not only affects the world’s economy, environment, and growth, but also results in global warming [8]. A set of recent developments is about to change this picture and propose effective energy policies. These policies explicitly or indirectly create drivers that, from the point of view of business and end-users, are perceived to be any activity taken This paper is made possible by National Priorities Research Program (NPRP) grant No. 10-0130-170288 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021  M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 942–953, 2021. https://doi.org/10.1007/978-3-030-66840-2_71

Appliance-Level Monitoring with Micro-Moment Smart Plugs

943

by the energy-efficient system created. In Qatar, the goal of the TARSHEED initiative is to raise awareness of energy-saving activities and the unnecessary energy use in the country as a whole [15]. This has been achieved across a variety of advertising promotions, standards, and competitions [1]. Smart metering is instrumental in collecting and analyzing energy market data [22]. The smart meter is an important tool for managing the energy consumption curve efficiently. This calls for a connection with the quantity of usage and the quantity of production, allowing for the substitution of flat-rate prices with better strategies [16]. There is, therefore, a need for development and standardization of metering schemes. One of the main elements of smart metering is tracking the volume of the system. This enables processing of unique data linked to each device without the need for aggregation algorithms. This is why smart plugs can play a significant role in the monitoring and operation of domestic appliances, where end-users mount them for each device at a fairly low cost, and thus, obtaining the benefit of precise calculation of power usage and home automation. Throughout this sense, we introduce micro-moments, which are time-based time-slots where the end-user uses an appliance or occupy a space (e.g. room, corridor, hall, etc.) [4]. This concept enables the extraction of different moments when the end-user engages in unhealthy energy consumption patterns, making it easier to classify and produce recommendations to enhance the energy efficiency of the household. Throughout this article, we are presenting a micro-moment smart plug as part of the broader (EM )3 project, which seeks to use artificial intelligence to achieve high domestic energy performance. The (EM )3 smart plug enables real-time power measurements to be collected in addition to environmental information, such as temperature, humidity, and occupancy. The remainder of this paper is structured as follows. Section 2 outlines the latest research on smart plugs. Section 3 provides a description of the broader (EM )3 structure and its elements. The smart plug is described in Sect. 4, and data visualization scenarios are described in Sect. 5. System implementation results are expounded upon in Sect. 6. The paper is concluded in Sect. 7.

2

Related Work

In this section, an overview of recent developments to smart plugs is provided. First, Ahmed et al. proposed a smart plug prototype that measures power consumption in home energy management systems [2]. It is assisted by a Zigbee microcontroller. It has been demonstrated, from their findings, that the suggested plug absorbs less power and achieves better precision in contrast with the oscilloscope. In fact, the system provides the option to connect/disconnect the attached device from the power supply. In addition, Hajahan and Anand have suggested an Arduino microcontrollerbased smart plug that uses ENC28J60 for communication [21]. This is powered by a split core style current transformer for non-invasive current calculation and an Android-based user interface.

944

A. Alsalemi et al.

To alleviate the problems of peak shortages, Ganu et al. have suggested a cost-effective smart plug that monitors the synchronization (by flipping on or off) of loads to the grid at on or off on times [9]. This is done with the help of real-time analysis and data collection [7]. It allows off-peak and unpredictable scheduling by addressing user choice settings in addition to grid load conditions. Another research relevant to in-device detection, Petrovic et al. suggested a smart plug for electrical load detection focused on the [17] active sensing method. The input signal produced by the smart plug is modified before the output is calculated in such a way that more distinct real-time data is created. An artificial neural network is used to evaluate the approach to specify classification performance metrics; accuracy and speed. Environmental control is another important aspect. That is why Gomes et al. have suggested a smart plug that is mindful of the environmental factors and awareness of the device resource background [10]. The technique adopts a multi-agent structure strategy that helps the agent to react to any adjustments that arise and to communicate with other agents. To make progress on the literature studied, the main contributions of this paper are summarized as follows: – Modern application of the concept of micro-moments in the control of energy use. – Design and implementation of a micro-moment based smart plug for collecting contextual details, such as temperature, humidity, brightness and room occupancy. – The smart plug enables edge computing features, such as the identification of several devices attached to the smart plug.

3

The (EM )3 Energy Efficiency Framework

The Consumer Engagement Towards Energy Saving Behavior by means of Exploiting Micro Moments and Mobile Recommendation Systems (EM )3 platform has been developed to promote customer behavioral improvement through increasing energy use understanding [4]. The (EM )3 system consists of the following components [3]: 1. Data Collection: gathers data dependent on micro-moments for power usage and environmental monitoring [6]. 2. Classification: detects and analyzes abnormal energy patterns [13]. 3. Suggestions and Automation: provides personalized guidance to end-users to endorse energy saving management activities coupled with recommended actions for environmental change [19]. 4. Visualization: upload results, observations and feedback in an accessible and engaging manner via a mobile application. Sensing devices play an essential role in capturing data in a given datastore [5]. They are used for wirelessly uploading collected data to a testing center at

Appliance-Level Monitoring with Micro-Moment Smart Plugs

945

Qatar University (QU). The study lab consists of a variety of testing cubicles, some of which are installed with sensing instruments to the (EM )3 storage server housed in the QU building. Each cubicle contains data aggregated from monitors, screens and table lamps. The No-SQL CouchDB cloud platform is used to store customer micromoments and usage levels, user expectations and resources, energy management guidelines and ranking ratings. The dataset used in this study, Qatar University Dataset (QUD), is compiled in a micro-moment laboratory at QU [5]. The set-up collects environmental sensors (indoor temperature and humidity, room luminosity and motion sensors) and power consumption for a number of appliances (i.e. light bulb, computer). Data is collected using wireless sensing modules that communicate in real-time to the backend. QUD contains data points divided by a couple of seconds each. The dataset was collected earlier before this work.

4

Proposed System Design

The (EM )3 smart plug helps to combine both energy tracking at the device level and the environmental details of the current household space. Data on the energy usage of equipment is assumed to be the key piece of knowledge, whereas environmental factors such as temperature, humidity, luminosity and occupancy provide a background under which the knowledge on use is evaluated. The system is composed of two sub-units: the power consumption unit and the environmental monitoring unit. 4.1

Power Consumption Unit

Figure 1 provides a description of the (EM )3 smart plug. The power wire from the grid travels into the smart plug and goes to the extension socket where the appliance(s) can be worked. Smart plug modules calculate the current used by the equipment attached to the extension cord and compare it with the normal voltage of the nation in which it is worked. The HLW8012 intrusive power sensor was used to calculate current values up to 20A with 5% tolerance. 4.2

Environmental Monitoring Unit

The environmental unit is used to measure temperature, humidity, luminosity and occupancy of a given household’s room. The data is transmitted in real-time to the backend for further processing. The HC-SR501 motion sensor is used to assess space occupancy and the DHT-22 temperature and humidity sensor operates between −40–80 ◦ C and 0–100% for ambient temperature and relative humidity, respectively. Also, the light sensor TSL2591, which can sense light in the range of 0.1–40,000 lx. The sensors are described in Table 1. In addition, real-time synchronization between the chosen micro-controller and the server side is accomplished where a delay of only a millisecond is required.

946

A. Alsalemi et al.

Fig. 1. Overview of the micro-moment smart plug.

Real-time synchronization is checked by matching the time stamp on the sensorside microcontroller with the time stamp on the server-side. Data obtained from various data collection programs were stored on the server side. Data fusion algorithms are used to obtain data and provide a full overview of the world [11]. In fact, it provides scalability if further sensors are to be installed. Instead, machine learning classifiers are educated on cumulative data where they can be used to grasp how energy use looks like [12]. The efficiency of such classifiers can be checked by measuring precision, memory, responsiveness, F1 score and accuracy, along with a confusion matrix [14]. The recommendation framework [20], focused on these algorithms, would communicate advice and recommendations persuading customers of energy from their smartphones [18]. Once excessive energy consumption has been detected, suggestions are sent to users to advise them on how to reduce it. The simulation of energy consumption data, the use of existing datasets and the setting up of the actual environment are all used to check the validity of the recommendations provided. In addition, since the smart plug can be connected to a socket extension, multiple appliances can be connected and identified. It can inhibit the accuracy of understanding what device absorbs how much electricity. For this purpose, we are focusing on the implementation of a range of appliance recognition algorithms to recognize the appliances used with high precision [13]. It is noteworthy to mention that the internal power consumption of the plug is designed to

Appliance-Level Monitoring with Micro-Moment Smart Plugs

947

minimally affect the overall consumption and multi-appliance identification is under-development. Table 1. Smart plug sensors

5

Name

Description

Components used

Energy monitoring

Measures appliances power consumption in Watts

ACS712 invasive Hall effect current sensor

Occupancy

Detects whether room is occupied. Selected for accuracy

AM312 motion sensor

Temperature and humidity

Measures indoor ambient temperature. Selected for cost-efficiency and adequate accuracy and relative humidity

DHT22

Luminosity

Measures room’s luminosity in Lux. Selected for accuracy and wide range

Adafruit TSL2591

Consumer Data Visualization

(a)

(b)

(c)

Fig. 2. The (EM )3 mobile application: (a) the electric power consumption (b) airrelated consumption, and (c) appliance control screens.

After data collection and preliminary data analysis, data visualization is provided through a mobile application where users can actually see their consumption. In addition, meaningful information and data are provided to moderate users’ behavior towards energy efficiency. In this way, the smart plug data is easily displayed and the signals behind it are shown.

948

A. Alsalemi et al.

The (EM )3 smartphone device seeks to encourage energy conservation by visualizing smart plug data in real-time. Figure 2 shows the main screens of the first version of the (EM )3 mobile app. Line plots are commonly used to provide data along with a tiny overview that offers supplementary detail on the chosen device (as seen in Fig. 2a and 2b). Additional energy efficiency standards are shown in 2a and ambient environmental information (i.e. indoor and outdoor temperature and humidity, indoor illumination and room occupancy). A data visualization analysis has been undertaken to evaluate the right visualization for energy end-users in order to improve the field of data visualization. Best visualizations for the current usage of energy data visualization are suggested1 .

6

Results

This segment outlines the latest implementation and performance of the new smart micro-moment connection. In terms of configuration, the smart plug comprises a printed circuit board (PCB), a 3D-printed plastic casing, a connector and an extension of the outlet. The results are described for both system’s subunits. 6.1

Power Consumption Unit

As the current smart micro-moment plug requires several devices to be linked at the same time, it is of vital importance to incorporate an appliance identification program that can recognize each unit utilizing its power consumption signature. In this regard, we are presenting a simple but successful method focused on first detecting appliance events using a cepstrum-dependent detector defined in [13]. Following, a combination of two time-domain attribute extraction algorithms, including root mean square (RMS) and mean absolute deviation (MAD) is used to identify each linked system utilizing observed events. In this way, various machine learning algorithms are deployed to identify five types of appliances utilizing specific parameter settings, namely support vector machine (SVM), K-nearest networks (KNN), decision tree (DT), deep neural networks (DNN) and decision bagging tree (DBT). In fact, a 10 USD cross-validation contract has been used to test the existing device recognition method. Table 2 shows the accuracy and the results of the F1 score obtained using the proposed summation based fusion technique compared to the use of the RMS and MAD descriptors, separately. It is clear that this fusion strategy can improve identification accuracy by 2.46% and 2.38% compared to MAD and RMS, respectively. In comparison, the F1 value was improved by 3.34% and 2.59%, respectively, relative to MAD and RMS. It is worth mentioning that smart plug device recognition capabilities are in the early stages of growth. In addition, the impact of the fusion strategy on the performance of the proposed recognition of appliances has been assessed. The findings indicate strong 1

http://em3.qu.edu.qa/index.php/data-visualization-app.

Appliance-Level Monitoring with Micro-Moment Smart Plugs

949

Fig. 3. Power consumption unit PCB. Table 2. Performance of the proposed descriptor fusion used to recognize electrical appliances. ML

Classifier

algo

parameters

Acc

RMS F1

Acc

MAD F1

Acc

Fusion F1

SVM

Linear Kernel

89.05

88.74

88.22

88.74

91.76

91.59

SVM

Quadratic kernel

90.22

89.57

89.22

88.57

92.83

91.57

SVM

Gaussian kernel

91.63

90.6

91.27

89.71

92.7

92.23

KNN

K=1/Euclidean distance

92.24

91.19

93.24

90.19

93.95

93.66

KNN

K=10/Weighted Euclidean dist

93.65

92.33

93.11

91.96

94.92

94.85

KNN

K=10/Cosine dist

90.75

89.43

89.75

89.43

92.87

92.7

DT

Fine, 100 splits

93.94

93.69

93.59

93.22

95.63

95.51

DT

Medium, 20 splits

90.79

90.17

90.44

90.32

93.44

93.11

DT

Coarse, 4 splits

87.92

86.83

87.77

87.4

90.57

90.41

DNN

50 hidden layers

93.25

92.68

92.22

91.55

95.11

94.78

DBT

30 learners, 42 k splits

96.41

95.93

96.33

95.18

98.49

98.32

performance rates obtained by combining TD descriptors in respect to the DBT classifier (30 learners, 42k splits) and considering three fusion strategies. It is obvious that the summation-based approach produces the best precision and F1 ratings. Specifically, 98.79% accuracy and 98.52% F-score were achieved by combining the number, while 96.98% accuracy and 96.49% F1 score and 97.43% accuracy and 97.11% F1 score were achieved by combining and multiplication approaches, respectively. In terms of hardware, the PCB is the heart of the power plug. The board, seen in the Fig. 3, features a self-powered system, eliminating the need for a specific power source to run it. In addition, a relay is installed to allow remote control of the device, and intrusive energy monitoring is used due to direct contact to the device. The PCB is designed to handle two types of micro-controllers, the most fitting of which are assisted by two micro-controllers. Both the Arduino MKR1010 and the ESP32 can be assisted. In fact, the (EM)3 smart plug is projected

950

A. Alsalemi et al.

to greatly accelerate the implementation of domestic energy use control systems worldwide. This can be manufactured at a cost of between 20 USD and 40 USD. Performance-wise, the micro-controllers used will do fairly sophisticated computation in addition to real-time wireless connectivity. 6.2

Environmental Monitoring Unit

Similarly to the power consumption unit, the environmental monitoring unit is housed in a PCB that contains the micro-controller and the sensors needed to monitor temperature, humidity, luminosity, and presence. The micro-controller is an ESP32, which supports wireless transmission to the CouchDB backend after preliminary post-processing. In effect, on-board processing reduces noise in data as well as the packet size. Performance, communication latency, and costs have been computed and compared in Table 3. The smart plug micro-controller has been programmed with a platform-agnostic Arduino program. The PCB is shown in Fig. 4. Table 3. Smart plug performance per micro-controller Used board name

Processing speed (s)

Communications latency (s)

Cost (USD)

ESP-WROOM-32

0.16

3.19

10

Arduino MKR 1010 1.05

2.25

33.90

Fig. 4. Environmental sensing unit PCB.

Appliance-Level Monitoring with Micro-Moment Smart Plugs

951

The concept of a smart plug that processes data before pushing it to the cloud is linked to the trend of edge computing, where some processing takes place where data is produced, saving communication resources and increasing efficiency. Compared to the literature reviewed, the proposed approach provides the advantage of micro-moment extraction, which allows for a more accurate analysis of daily consumption. It also enables several devices to be attached at the same time. However, there is indeed a range of drawbacks of the new application. The thickness of the device is a downside that is deemed cumbersome relative to many current solutions. In addition, in terms of cyber security, the current implementation lacks the various cyber-attack defense mechanisms that will be considered in future publications. Eventually, a more computationally efficient board such as the ESP32-S2 may be used to help operate more complicated in-chip classification algorithms.

7

Conclusions

In this paper, a micro-moment smart plug is proposed as part of the (EM )3 framework. The smart plug, which includes two sub-units: the power consumption unit and environmental monitoring unit collects energy consumption of appliances along with contextual information such as temperature, humidity, luminosity, and room occupancy respectively. The plug also allows home automation capabilities. With the (EM )3 mobile app, end-users can see visualized power consumption data along with ambient environmental information. In addition, the foundations of the appliance recognition method was successfully verified by combining two time-domain descriptors, resulting in high accuracy and F1 score benchmarks. Acknowledgments. This paper is made possible by National Priorities Research Program (NPRP) grant No. 10-0130-170288 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.

References 1. Qatar General Electricity & water Corporation (2020). https://www.km.qa/ Tarsheed/Pages/TarsheedIntro.aspx 2. Ahmed, M.S., Mohamed, A., Homod, R.Z., Shareef, H., Sabry, A.H., Bin Khalid, K.: Smart plug prototype for monitoring electrical appliances in Home Energy Management System. In: IEEE Student Conference on Research and Development (SCOReD), pp. 32–36 (2015). https://doi.org/10.1109/SCORED.2015.7449348 3. Alsalemi, A., Himeur, Y., Bensaali, F., Amira, A., Sardianos, C., Varlamis, I., Dimitrakopoulos, G.: Achieving domestic energy efficiency using micro-moments and intelligent recommendations. IEEE Access p. 1 (2020). https://doi.org/10. 1109/ACCESS.2020.2966640

952

A. Alsalemi et al.

4. Alsalemi, A., Sardianos, C., Bensaali, F., Varlamis, I., Amira, A., Dimitrakopoulos, G.: The role of micro-moments: A survey of habitual behavior change and recommender systems for energy saving. IEEE Syst. J. pp. 1–12 (2019). https://doi.org/ 10.1109/JSYST.2019.2899832 5. Alsalemi, A., Ramadan, M., Bensaali, F., Amira, A., Sardianos, C., Varlamis, I., Dimitrakopoulos, G.: Boosting domestic energy efficiency through accurate consumption data collection. Leicester, UK (2019) 6. Alsalemi, A., Ramadan, M., Bensaali, F., Amira, A., Sardianos, C., Varlamis, I., Dimitrakopoulos, G.: Endorsing domestic energy saving behavior using micromoment classification. Appl. Energy 250, 1302–1311 (2019). https://doi.org/10. 1016/j.apenergy.2019.05.089 7. Arjunan, P., Khadilkar, H.D., Ganu, T., Charbiwala, Z.M., Singh, A., Singh, P.: Multi-user energy consumption monitoring and anomaly detection with partial context information. In: Proceedings of the 2nd ACM International Conference on Embedded Systems for Energy-Efficient Built Environments. pp. 35–44. BuildSys 2015, Association for Computing Machinery, New York, NY, USA (2015). https:// doi.org/10.1145/2821650.2821662 8. Dileep, G.: A survey on smart grid technologies and applications. Renew. Energy 146, 2589–2625 (2020) 9. Ganu, T., Seetharam, D.P., Arya, V., Kunnath, R., Hazra, J., Husain, S.A., De Silva, L.C., Kalyanaraman, S.: nPlug: a smart plug for alleviating peak loads. In: Proceedings of the 3rd International Conference on Future Energy Systems: Where Energy, Computing and Communication Meet. pp. 1–10. e-Energy 2012, Association for Computing Machinery, Madrid, Spain (2012). https://doi.org/10. 1145/2208828.2208858 10. Gomes, L., Sousa, F., Vale, Z.: An intelligent smart plug with shared knowledge capabilities. Sensors 18(11), 3961 (2018). https://doi.org/10.3390/s18113961, https://www.mdpi.com/1424-8220/18/11/3961, number: 11 Publisher: Multidisciplinary Digital Publishing Institute 11. Himeur, Y., Alsalemi, A., Al-Kababji, A., Bensaali, F., Amira, A.: Data fusion strategies for energy efficiency in buildings: Overview, challenges and novel orientations. Inf. Fusion 64, 99–120 (2020). https://doi.org/10.1016/j.inffus.2020.07. 003. http://www.sciencedirect.com/science/article/pii/S1566253520303158 12. Himeur, Y., Alsalemi, A., Bensaali, F., Amira, A.: Building power consumption datasets: survey, taxonomy and future directions. Energy and Buildings, p. 110404 (2020) 13. Himeur, Y., Alsalemi, A., Bensaali, F., Amira, A.: Robust event-based nonintrusive appliance recognition using multi-scale wavelet packet tree and ensemble bagging tree. Appl. Energy 267, 114877 (2020). https://doi.org/10. 1016/j.apenergy.2020.114877. http://www.sciencedirect.com/science/article/pii/ S0306261920303895 14. Himeur, Y., Alsalemi, A., Bensaali, F., Amira, A., Sardianos, C., Varlamis, I., Dimitrakopoulos, G.: On the applicability of 2D local binary patterns for identifying electrical appliances in non-intrusive load monitoring. In: Proceedings of SAI Intelligent Systems Conference. pp. 188–205. Springer (2020) 15. Kaabi, F.: Conservation plan for tarsheed. Qatar General Electricity and Water Corporation, Conservation & Energy Efficiency Department, Doha, Qatar (2012) 16. Osaretin, C.: Smart meter and energy management in an integrated power system (2016). https://doi.org/10.13140/RG.2.1.3664.0242

Appliance-Level Monitoring with Micro-Moment Smart Plugs

953

17. Petrovi´c, T., Morikawa, H.: Active sensing approach to electrical load classification by smart plug. In: IEEE Power Energy Society Innovative Smart Grid Technologies Conference (ISGT), pp. 1–5 (2017). https://doi.org/10.1109/ISGT.2017.8086053 iSSN: 2472-8152 18. Sardianos, C., Varlamis, I., Chronis, C., Dimitrakopoulos, G., Himeur, Y., Alsalemi, A., Bensaali, F., Amira, A.: Data analytics, automations, and micromoment based recommendations for energy efficiency. In: IEEE Sixth International Conference on Big Data Computing Service and Applications (BigDataService), pp. 96–103. IEEE (2020) 19. Sardianos, C., Varlamis, I., Dimitrakopoulos, G., Anagnostopoulos, D., Alsalemi, A., Bensaali, F., Amira, A.: “I want to ... change”: micro-moment based recommendations can change users’ energy habits. Heraklion, Crete - Greece, pp. 30–39 (2019) 20. Sardianos, C., Varlamis, I., Dimitrakopoulos, G., Anagnostopoulos, D., Alsalemi, A., Bensaali, F., Himeur, Y., Amira, A.: Rehab-c: recommendations for energy habits change. Future Gener. Comput. Syst. 112, 394–407 (2020). https://doi. org/10.1016/j.future.2020.05.041. http://www.sciencedirect.com/science/article/ pii/S0167739X19317327 21. Shajahan, A.H., Anand, A.: Data acquisition and control using Arduino-Android platform: smart plug. In: International Conference on Energy Efficient Technologies for Sustainability, pp. 241–244 (2013). https://doi.org/10.1109/ICEETS.2013. 6533389 22. Wouters, C.: Towards a regulatory framework for microgrids-the singapore experience. Sustain. Cities Soc. 15, 22–32 (2015). https://doi.org/10.1016/j.scs.2014.10. 007. http://www.sciencedirect.com/science/article/pii/S2210670714001152

Backhaul Networks and TV White Spaces (TVWS) with Implementation Challenges in 5G: A Review Teena Sharma1 , Abdellah Chehri1(B) , Paul Fortier2 , and Rachid Saadane3 1 University of Quebec in Chicoutimi, Saguenay, QC G7H 2B, Canada

{teena.sharma1,achehri}@uqac.ca 2 Department of ECE, Laval University, Quebec City, QC G1V 0A6, Canada

[email protected] 3 SIRC/LaGeS-EHTP, EHTP Km, 7 Route El Jadida, Oasis, Morocco

[email protected]

Abstract. Mobile Backhauling provides interface between radio controller and base stations mostly realized with physical medium such as optical fibers or microwave radio links. With the huge mobile traffic due to increase in mobile subscribers as well as deployment of 4G and 5G cellular network technologies, better solutions for capacity and coverage should be provided in order to enhance spectral efficiency. For 4G cellular networks, mobile backhaul networks deals with capacity, availability, deployment cost and long-distance reaches. In addition, mobile backhaul networks based on 5G network incurs additional challenges that include 1 ms or less ultralow latency time requirements and ultradense nature of the network capabilities. Therefore, for 5G technologies, latency delay, QoS, packet efficiency, noise suppression and mitigation techniques, efficient modulation schemes and packet network timing synchronization are some aspects which are to be dealt with while designing efficient backhaul approaches (wired/wireless). Current backhaul systems typically uses cost-effective solutions (e.g. -Wi-Fi and WiMAX) based packet switched technologies especially Ethernet/Internet technologies and high speed optical fiber links. In this survey, a comprehensive study of state-of-the-art regarding advancement in 5G based backhaul technologies based on research articles and standard documents is presents. The main features, research findings as well as requirements and challenges related to recent emerging backhaul technologies of 5G networks are also discussed. TV band white space (TVWS) backhaul as a possible solution is suggested as a cost effective way in rural technology growth in regards to 5G infrastructure. Keywords: Backhaul · 5G · Smart cities · 4G · TVWS · WIMAX · Millimeter wave (mmWave) · Free space optics (FSO) · Microwave

1 Introduction Mobile industry has evolved significantly due to utilization of advanced technologies and multiple services and applications. Mobile backhaul plays an important role and it © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 954–965, 2021. https://doi.org/10.1007/978-3-030-66840-2_72

Backhaul Networks and TV White Spaces (TVWS)

955

is defined as part of the network that interconnects the Base Stations (BSs) and their air interface to the Base Station Controllers (BSCs), which are further connected to the mobile core network in cellular systems [1]. These are mainly relied upon physical mediums such as copper, microwave radio link and optical fibers based on specific applications [2]. In combination with existing and new technology (LTE, advanced LTE, Wi-Fi, HighSpeed Packet Access [HSPA], distinct cell sizes with dense cell deployment (macro, micro, femto, pico) and various physical locations (outdoor, indoor), all supported by different solution providers (i.e., a multivendor environment) will provide spatial reuse of spectrum and increased spectral efficiency [3–9]. Moreover, 5G based services will enable additional services such as remote sensing as well as real time monitoring of diverse range of smart devices, supporting machine to machine (M2M) traffic (e.g., moving robots, connected office and home and sensors) [10]. Considering all the above benefits of 5G, support for smart devices for the human and IoT, extreme broadband delivery, ultralow latency and highly robust backhaul networks are the major requirements for 5G networks. To implement 5G technology, a number of challenges need to be entertained which includes new spectrum allocation, network densification; inter-cell interference suppression and massive multiple input multiple output. Optical backhaul solution with hybrid of millimetre wave is presented in which novel software defined radio approach is utilized in order to enhance backhaul network capacity in a fair and dynamic way to get better QoE [11]. Another solution, which is combination of millimetre wave radio and optical laser in free space optics environment, is presented with features such as extended reach, affordable cost, high capacity and high capacity [12]. Optimal deployment and smart management is another challenges in managing 5G wireless backhaul networks. Cost efficient optimization methods using Bellman-Ford and shortest path is proposed in [13]. In mobile wireless backhaul networks, utilization of self-interference (SI) cancelation technology makes full duplex communication effective with QoS-aware FD concurrent scheduling algorithm [14]. 5G cellular systems, with IoT as connectivity tool is presented with enabling features as ubiquitous, reliable, scalable, and cost-efficient is presented in literature. 5G networks will incorporate intelligent network capabilities and highly efficient differentiated services such as Internet on Things (IoT) that can extend the wireless connectivity from medical equipment to household appliances and personal belongings. In future 5G mobile networks for using shared and dynamic spectrum access technologies as well as to avoid spectrum scarcity, the TV White Spaces (TVWS) spectrum is also considered an ideal candidate to enable the deployment of smart grid networks via cognitive radio paradigm. Unlicensed bands of TVWS are used as backhaul network solutions for carrying internet traffic is also popular in areas where no pre-existing wired infrastructure is available. Innovative technologies such as cognitive radio is used based on a number of spectrum policy proposals [15]. This survey paper is organized as follows. Section 2 presents requirements and key challenges in 5G backhaul network implementation and traditional and emerging backhaul technologies such as leased T1/E1 copper, optical fiber, free space optics,

956

T. Sharma et al.

microwave and satellite are demonstrated with merits and demerits. Section 3 discusses TVWS Backhaul Networks for Rural areas and conclusion is drawn in Sect. 4.

2 5G Backhaul Requirements and Challenges Advancement in mobile network standards with each decade and their countless applications are shown in Fig. 1.

Fig. 1. Mobile technology evolution.

Based on the previous study, major challenges regarding 5G networks are mainly categorized into six different points: 1. Network capacity: 5G networks will interconnect multiple smart devices while supporting different services mainly M2M services and IoT to other mobile connections. Therefore high capacity requirement from transport network to the core is major challenge in the of 5G cellular network. [16]. 2. Ultra dense network: Due to usage of RAN frequency band in 5G, the cell site reach will become very low when compared to present’s cell sites (i.e., macro or micro cell). Further, it is not possible to enhance cell site capacity by 1000 times. Therefore, dense small cell deployment is the only possible and efficient way to support 1000 time increase in capacity in 5G network. However, small cell dense nature limits the frequency reuse capability. This will require better utilization of wireless backhaul spectrum and unrivalled requirements for cell site synchronization [30]. For 5G, three times higher accuracy from 1.5 µs to will be demanded than LTE-A 5G network will need three times stricter accuracy requirements than LTE-A [17]. 3. Availability: For wireless backhauls especially in mm waves and microwaves, due to adverse weather conditions and multipath propagation, backhaul links are affected

Backhaul Networks and TV White Spaces (TVWS)

4.

5.

6.

7.

957

which further leads to availability of links. Adaptive modulation schemes are utilized to lower the line rates in order to eliminate such problems. Therefore maintaining availability is crucial to achieve new services and for many machine to machine applications for 5G networks [12, 19]. Ultra-low latency: Achieving ultralow latency (lesser than 1 ms) and denser small cell deployment is another big challenges in establishing 5G networks connectivity. Moreover, cost efficiency and network reliability in case of huge demand plays important role while designing such networks. Network energy consumption: Energy efficiency is a global parameter which includes reduction in carbon footprint, energy bills and extended terminal battery life. In case of 5G based cellular devices it is expected to provide increased capacity without increase in energy consumption. 50% more rise in energy consumption is occured due to growth in small cell density, blossoming of HetNets and generation of UDNs [17]. Therefore energy consumption is also an important aspect in solving backhaul energy related bottlenecks. Deployment cost: Dense small cell deployment is the key for 5G networks to support higher network capacity approx. 1000 times more. In this scenario, cost efficient backhaul solution for the dense cell is a big challenge. An application based trafficengineering model needs to be developed to fulfil the customer demands by service vendors [17]. Coverage: Long distance reach has been a big issue for the backhaul network in terms of cost and additional equipment such as total deployment cost of fiber backhaul will increase with the fiber distance [18]. Reach shows how far a cell site can get backhaul support from the core network with the required QoS. Typically, cell sites are interconnected in a hierarchical mesh and all the traffics are transported back to an aggregation point (sometimes-called super cell) where all the traffics are aggregated and transport to the core network. Dense small cell deployment in the 5G networks leads to massive backhaul traffic at the super cell that creates congestion and sometimes collapses the backhaul networks [19]. Therefore, coverage is a big challenge for the 5G backhaul network.

3 Mobile Backhaul Types and Key Challenges A number of backhaul solutions are available for the mobile operators and these options should be selected based on different parameters such as economical for particular deployment scenario. Further, location of small cells depends on latency, target QoS, traffic load intensity and cost factors. Therefore different backhaul technology is adopted for different atmospheric conditions (e.g., good LOS connectivity or fiber connectivity is not available for every desirable place. There exist several wireless backhauling solutions for these small cells, such as the TV spectrum, known as TV white spaces (TVWS) between 600–800 MHz, sub-6 GHz (licensed and unlicensed), microwave spectrum between 6 GHz and 60 GHz and FSO (free space optics) spectrum within the laser spectrum. Among the existing wireless backhaul solutions, Millimeter wave spectrum (30 GHz to 300 GHz) has the potential to meet the requirements of the 5th generation (5G) networks as it can provide wider

958

T. Sharma et al.

channel bandwidth to deliver faster, higher-quality video, and multimedia content and services with highly directive narrow beams and, in turn, low interference [4]. Backhaul solutions suitable for 5G networks are discussed in detail in this section. 3.1 Wired Backhaul Solution Optical fiber is one of the popular backhaul network due to its improved bit error rate (BER) performance along with huge bandwidth, increased capacity and data rates. It also allows longest reach before any signal needs to be resend. New fiber connections deployment is time consuming as well as it is not possible to lay fibers at highways, mountains, rivers and under buildings. Initial deployment cost is another factor which affects its usages adversely including splicing, cable cost and trenching. Optical fiber transport and aggregation cost is also an influencing factor. 3.2 Wireless Backhaul Solutions Wireless backhaul is in use worldwide due to its viability, and cost-effectiveness. Likewise optical fibers its deployment depends on number of factors such as traffic intensity, propagation conditions, cost factor, site locations and interference conditions. Microwave and millimetre wave as wireless backhaul medium provides end to end control of the network to the operator. These two backhaul networks gives optimal solutions in case of 5G and are discussed in detail: 3.2.1 Millimeter Wave Millimeter wave (mmWave) are basically used for small cell backhaul solutions due to enormous spectrum (EHF band in the range of 30–300 GHz). Therefore smaller wavelengths in its spectrum can integrate multiple antennas in an easier configuration and small cells resulting in Massive MIMO for non LOS or LOS applications. mmWave based backhaul solutions provides high data rates (1 to 2 Gbps) with shorter reach when compared to microwave RF but Typically, millimeter wave RF can support high data rate up to 1–2 Gbps range but the reach is shorter compared to Microwave RF. The reason for small reach is high propagation loss at millimeter wave [16, 21]. Fading due to rain, absorption, and multipath propagation are main reason for propagation loss in this scheme. Typically, the millimeter wave has narrow beams leading to alignment problems and can be solved by placing mm wave RF equipment’s in a solid structure [22]. 3.2.2 Free Space Optics (FSO) FSO uses Invisible light spectrum of LED and LASER for data transmission with much higher bandwidth in the range of 300 GHz to 1 THz and it supports data transmission rates upto 10Gbps. Power consumption is low due to usages of low power based devices in FSO with demerits such as scattering, interference due to ambient light, physical obstructions and fading due to fog with small coverage area. However, FSO scheme can be one of the possible solutions for 5G backhaul due to its high throughput, increased flexibility and scalability features with low latency [20, 21].

Backhaul Networks and TV White Spaces (TVWS)

959

3.2.3 Satellite and TV White Spaces (TVWS) TVWS shared spectrum might be considered as a clear opportunity for future 5G mobile networks for using shared and dynamic spectrum access technologies. In addition, TVWS spectrum bands are can eliminate interference between nearby broadcasting stations therefore it is recommended for future wireless technologies e.g., 5G, in which collision or interference is major issue due to high amount of traffic. On the other hand, and due to its attractive characteristics, the TVWS spectrum is also considered an ideal candidate to enable the deployment of smart grid networks via cognitive radio paradigm. Backhaul is a key application of TVWS and Fig. 2 describes TVWS as backhaul link in 5G networks.

Fig. 2. TVWS as a backhaul technology.

White space access technology faces a variety of challenges such as driving investment, ensuring security, protecting the incumbent and enabling appropriate standards, policies, and rules. Therefore, there are many ways that white space systems could be enablers of 5G through such important aspects as backhaul and/or M2M spectrum. Unlicensed bands of TV white spaces and as backhaul network solutions for carrying internet traffic is also popular in areas where no pre-existing wired infrastructure is available. Innovative technologies such as cognitive radio is used based on a number of spectrum policy proposals [16]. Different satellite technologies (e.g., DVB-RCS and Immarsat BGAN) are utilized strategies which provide backhauling for terrestrial mobile radio networks, such as GSM, WiMAX, TETRA, and Wi-Fi [22, 23]. Maximum data rate which can be supported by these two schemes is generally less than 1Gbps with high latency requirements [24]. TV spectrum presents new opportunities for wireless access applications and technologies such as femtocell networks. In utilization of TV white spaces in femtocell networks, major challenge is achieving capacity and energy efficiency. The capacity and energy efficiency for femtocell network is developed for TV white spaces reuse and power allocation scheme with the objective to maximize the energy efficiency achieved by a femtocell network, while keeping the interference to the primary receiver and macro

960

T. Sharma et al.

receiver at an acceptable level [25]. Simulation results reveal that the femtocell can have a considerable capacity and energy efficiency improvement by using the TV channels. Spectrum sensing techniques are mainly classified into three types: energy detection matched filtering and signal feature detection and it is mainly dependent on primary transmitter detection. These methods placed all together can be used to achieve good results in terms of signal classification, sensitivity and computational time and cost [26]. Similar two stage spectrum sensing methods are proposed in (Du et al. 2016). Moreover, white spaces use in the digital terrestrial television (DTT) bands, necessities for reliable and faster signal classification and identification methods. Such a two stage identification method is presented for the signals in white spaces utilizing combined energy detection and feature detection [27]. Discrete Wavelet Packet Transformation (DWPT) is used to divide band of interest in sub bands in which signal power is calculated. Further, signal transmission model is presented based on Motion JPEX XR to evaluate and explore application of indoor applications such as multimedia signal destitution over white spaces.

Fig. 3. TVWS information acquisition functioning model in 5G scenario [28]

Backhaul Networks and TV White Spaces (TVWS)

961

Optimization in terms of spectrum usages and protection of primary users is an important measure regarding TV white spaces. Dynamic spectrum access is one of the mainly utilized techniques which allocates spectrum dynamically in place of fixed access of spectrum. Novel strategies are presented to be adopted by geo-location database operators in order to find adaptive maximum permitted power levels for secondary devices [28]. This is fixed as per the permissible levels of interferences into the digital terrestrial television primary system. TVWS is the first spectrum open for real applications as well as for better spectrum utilization and it uses cognitive Radio (CR) access systems due to its artificial intelligence based decision making for efficient spectrum sensing and detection. In case of 5G scenario, TVWS information acquisition system plays an important role in exploiting desirable attributes of geo-location database, spectrum sensing and spectrum prediction (Fig. 3). Further, TVWS information acquisition schemes is driven by spectrum prediction into the intelligent networks domain by extracting important pattern recognition attributes from the existing TVWS data stored in device memory. By utilizing the prediction algorithm, interference with the PU can be reduced by choosing spectrum channels proactively and in this way efficiency of spectrum utilization increases. In this technique, an internet backbone is required which acquires information about the free channels to be used. Spectrum sensing technique is not location specific and therefore can be deployed for TVWS network. 3.2.3.1. TVWS Backhaul Networks for Rural Areas With the spectrum scarcity crunch, harvesting TVWS spectrum is generating a lot of excitement. Industry Canada, the FCC’s Canadian counterpart concerning spectrum policy, published its TV White Space decision after its rulemaking process. The TVWS could provide a suitable solution to bridge the broadband networks to remote rural areas and northern communities. To connect these regions, traditional links such as fiber or multi-hop microwave terrestrial backhaul do not make it economically viable for telecommunication providers [28, 29]. Furthermore, utilization of the White Spaces allows efficient spectrum reuse in various frequency bands. In particular, TVWS is expected for mobile and/or long-range communication systems due to its superior propagation and penetration characteristics [30]. A common problem for campuses, municipalities, fixed or mobile broadband providers, and large venues is carrying traffic from access points or base stations back to a central point. The problem is generally one of cost either to purchase backhaul from an incumbent or to deploy a network and incur the construction costs of deploying cable. Here TVWS offers higher power, lower ranges, and better propagation characteristics than Wi-Fi as a backhaul solution. This would be a natural backhaul solution for many WISPs (Wireless Internet Service Providers), given that these networks are common in rural areas that have a high availability of TVWS channels [29, 31]. A comparative performance analysis of various backhaul technologies is presented in Table 1 to be considered for rural development.

962

T. Sharma et al.

Table 1. Comparative analysis of wireless backhaul technologies regarding rural area deployment Technology

Throughput (Mbps)

Range (KM)

Rural concern

WiLD

3–4

100–280

Cost, stability

WiMAX

0.3–49

0.3–49

Complexity, cost

Satellite

5–25

Not related

Costly equipment

TVWS (UHF)

10–30

10–30

Spectrum sensing

Wi-Fi

600

0.05–0.45

Range

3.2.4 Microwave Microwave bands (6, 11, 18, 23 and 28 GHz) are utilized as backhaul frequency bands with maximum reach 30 miles and data handling capacity up to 500 Mbps which can be enhanced up to 10 Gbps for medium and long haul communications by adopting ultrahigh spectral-efficiency schemes (e.g., MIMO, LOS), advanced modulation schemes (4096 QAM and higher) and much wider channel gaps (e.g., for traditional microwave bands 4–42 GHz channel spacing of 112 MHz) is considered [31, 32]. Over worldwide, 50% of mobile backhaul traffic uses microwave RF technology in majority. Microwave radio deployment requires single time capital cost including power/space maintenance and rental. Advantages of microwave RF usages are lower deployment time and cost efficient. However, its performance is adversely affected by weather conditions and propagation environment Sometimes data transmission rate is lowered to meet the availability requirements and by using low frequency bands reach can be extended but it results in data congestion [33–40]. Based on literature study, we summarize the available features (e.g., cost, latency, reach, and throughput) of all wireless backhaul technologies in Table 2. As it is seen from Table 2, FSO provides highest throughput with lowest latency which is the basic requirements of 5G backhaul. This is the main motivation of this study, where FSO is used for 5G backhaul networks with addition of ambient light cancelation technique at the receiver. Table 2 presents a comparative performance analysis of wireless backhaul options discussed so far. Table 2. Performance comparison of wireless backhaul technologies Parameters/Technology

Throughput

Coverage

Deployment cost

Latency

Uplink

Downlink

FSO

10 Gbps

10 Gbps

1–3 km

Low

Low

Microwave PtP

1 Gbps

1 Gbps

2–4 km

Medium

≤1 ms/hop (continued)

Backhaul Networks and TV White Spaces (TVWS)

963

Table 2. (continued) Parameters/Technology

Throughput

Coverage

Deployment cost

Latency

Uplink

Downlink

Microwave PtmP

1 Gbps

1 Gbps

2–4 km

Medium

≤1 ms/hop

TVWS

18 Mbps/ch

18 Mbps/ch

1–5 km

Medium

10 ms

Satellite

15 Mbps

50 Mbps

all-pervading

High

300 ms

MmWave 60 GHz

1 Gbps

1 Gbps

1 km

Medium

200 µs

MmWave 70-80 GHz

10 Gbps

10 Gbps

3 km

Medium

65–350 µs

Sub-6 GHz 2.4, 3.5, 5 GHz

150–450 Mbps

150–450 Mbps

250 m

Medium

2–20 ms

Sub-6 GHz 800 MHz-6 GHz

170 Mbps

170 Mbps

1.5––2.5 km urban, 10 km rural

Medium

5 ms single hop

4 Conclusion With evolution and rapid growth in 5G and 6G technologies, very low latency and large spectral efficiency is being expected while efficiently managing high amount of traffic without interference. This study presents a state-of-the-art survey of wired and wireless backhaul networks and role of TVWS in 5G with the potential trends, research opportunities, and associated challenges. In addition, this paper mainly emphasizes on wireless backhaul as network solutions in upcoming 5G research area. Further, a comparative performance analysis between all the existing backhaul technologies is presented based on various performance parameters. Among all the existing backhaul networks, TVWS shows superior propagation characteristics due to its spectrum bands in UHF and VHF range and long communication distance and better penetration through obstacles makes TVWS an attractive option for rural connectivity. Therefore, in this study we have presented a brief overview of TVWS in rural areas while focusing on TVWS applicability scenarios, prospects and development, and challenges with respect to 5G. Finally we conclude that white space technology has significant development and economic benefits. In addition, TVWS will help in obtaining affordable cost of using license exempt devices to be used on broadband networks and therefore greatly expand the utility.

References 1. Ivanek, F.: Mobile backhaul from the guest editor’s desk. IEEE Microwave Mag. 10(5), 10–20 (2009) 2. Anthony, M.: Synchronization in next-generation mobile backhaul networks. IEEE Commun. Mag. 48(10), 110–116 (2010) 3. Ritter, M.: Mobile Backhaul Evolution White Paper. ADVA Optical Networking, October 2009

964

T. Sharma et al.

4. Chia, S., Gasparroni, M., Brick, P.: The next challenge for cellular networks backhaul. IEEE Microwave Mag. 10(5), 54–66 (2009) 5. Limaye, P., El-Sayed, M.: Domains of application for backhaul technologies in 3G wireless networks. In: Networks 2006. 12th International Telecommunications Network Strategy and Planning Symposium, pp. 1–6. IEEE (2006) 6. Pekka, P.: A brief overview of 5G research activities. In: 1st International Conference on 5G for Ubiquitous Connectivity, pp. 17–22. IEEE (2014) 7. Ahamed, M.M., Faruque, S.: 5G backhaul: requirements, challenges, and emerging technologies. Proc. Broadband Commun. Netw. Recent Adv. Lessons Pract. 43 (2018) 8. Sharma, T., Chehri, A., Fortier, P.: Review of optical and wireless backhaul networks and emerging trends of next generation 5G and 6G technologies. Trans. Emerging Telcommun. Technol. e4155 (2020) 9. Bojic, D., Sasaki, E., Cvijetic, N., Wang, T., Kuno, J., Lessmann, J., Schmid, S., Ishii, H., Nakamura, S.: Advanced wireless and optical technologies for small-cell mobile backhaul with dynamic software-defined management. IEEE Commun. Mag. 51(9), 86–93 (2013) 10. Palattella, M.R., Dohler, M., Grieco, A., Rizzo, G., Torsner, J., Engel, T., Ladid, L.: Internet of things in the 5G era: enablers, architecture, and business models. IEEE J. Sel. Areas Commun. 34(3), 510–527 (2016) 11. Ahamed, M.M., Faruque, S., Gaire, S.K.: Laser radio: backhaul solution for 5G networks. In: Laser Communication and Propagation through the Atmosphere and Oceans, International Society for Optics and Photonics (2016) 12. Ge, X., Tu, S., Mao, G., Lau, V.K., Pan, L.: Cost efficiency optimization of 5G wireless backhaul networks. IEEE Trans. Mobile Comput. 18(12), 2796–2810 (2018) 13. Ding, W., Niu, Y., Wu, H., Yong, L., Zhong, Z.: QoS-aware full-duplex concurrent scheduling for millimeter wave wireless backhaul networks. IEEE Access 6, 25313–25322 (2018) 14. Palattella, M.R., Dohler, M., Grieco, A., Rizzo, G., Torsner, J., Engel, T., Ladid, L.: Internet of things in the 5G era: enablers, architecture, and business models. IEEE J. Sel. Areas Commun. 34(3), 510–527 (2016) 15. Gupta, A., Jha, R.K.: A survey of 5G network: architecture and emerging technologies. IEEE Access 3, 1206–1232 (2015) 16. Gerami, C., Narayan, M., Greenstein, L.: Backhauling in TV white spaces. In: 2010 IEEE Global Telecommunications Conference GLOBECOM 2010, pp. 1–6. IEEE (2010) 17. Berioli, M., Chaves, J.M., Courville, N., Boutry, P., Fondere, J.L., Skinnemoen, H., Weinlich, M.: WISECOM: a rapidly deployable satellite backhauling system for emergency situations. Int. J. Satell. Commun. Netw. 29(5), 419–440 (2011) 18. Tombaz, S., Monti, P., Farish, F., Fiorani, M., Wosinsa, L., Zander, J.: Is backhaul becoming a bottleneck for green wireless access networks? In: 2014 IEEE international conference on communications (ICC), pp. 4029–4035. IEEE (2014) 19. Pham, A.T., Trinh, P.V., MAI, V.V., Dang, N.T., Truong, C.T.: Hybrid free-space optics/millimeter-wave architecture for 5G cellular backhaul networks. In: 2015 OptoElectronics and Communications Conference (OECC), pp. 1–3. IEEE (2015) 20. Suman, M., Kumar, P.: Free space optics/millimeter-wave based vertical and horizontal terrestrial backhaul network for 5G. Opt. Commun. 459, 125010 (2020) 21. Huq, K.M., Jonathan Rodriguez, J.: Backhauling 5G small cells with massive-mimo-enabled mmWave communication. In: Backhauling/ Fronthauling for Future Wireless Systems, Wiley, pp. 29–53 (2016) 22. Shimomura, T., Teppei, O.: Analysis of TV white space availability in Japan. IEICE Trans. Commun. 97(2), 350–358 (2014) 23. Kumar, A., Karandikar, A., Naik, G., Khaturia, M., Saha, S., Arora, M., Singh, J.: Towards enabling broadband for a billion plus population with TV white spaces. IEEE Commun. Mag. 54(7), 28–34 (2016)

Backhaul Networks and TV White Spaces (TVWS)

965

24. Anabi, H.K., Nordin, R., Abdulghafoor, O.B., Sali, A., Mohamedou, A., Almqdshi, A., Abdullah, N.F.: From sensing to predictions and database technique: a review of TV white space information acquisition in cognitive radio networks. Wireless Pers. Commun. 96(4), 6473–6502 (2017) 25. Ghosh, C., Roy, S., Cavalcanti, D.: Coexistence challenges for heterogeneous cognitive wireless networks in TV white spaces. IEEE Wireless Commun. 18(4), 22–31 (2011) 26. Höyhtyä, M., et al.: Spectrum occupancy measurements: a survey and use of interference maps. IEEE Commun. Surv. Tuts. 18(4), 2386–2414 (2016) 27. Gao, B., Park, J.-M., Yang, Y., Roy, S.: A taxonomy of coexistence mechanisms for heterogeneous cognitive radio networks operating in TV white spaces. IEEE Wireless Commun. 19(4), 41–48 (2012) 28. Siddique, U., Tabassum, H., Hossain, E.: Downlink spectrum allocation for in-band and out band wireless backhauling of full-duplex small cells. IEEE Trans. Commun. 65(8), 3538–3554 (2017) 29. Ghosh, C., Roy, S., Cavalcanti, D.: Coexistence challenges for heterogeneous cognitive wireless networks in TV white spaces. IEEE Wireless Commun. 18(4), 22–31 (2011) 30. Liao, Y., Wang, T., Song, L., Han, Z.: Listen-and-talk: protocol design and analysis for fullduplex cognitive radio networks. IEEE Trans. Veh. Technol. 66(1), 656–667 (2017) 31. Siddique, U., Tabassum, H., Hossain, E.: Downlink spectrum allocation for in-band and out band wireless backhauling of full-duplex small cells. IEEE Trans. Commun. 65(8), 3538–3554 (2017) 32. Slalmi, A., Kharraz, H., Saadane, R., Hasna, C., Chehri, A., Jeon, G.: Energy Efficiency Proposal for IoT call admission control in 5G network. In: 2019 15th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), Sorrento, Italy, pp. 396–403 (2019). https://doi.org/10.1109/sitis.2019.00070 33. Chehri, A., Mouftah, H.T.: New MMSE downlink channel estimation for sub-6 GHz non-lineof-sight backhaul. In: 2018 IEEE Globecom Workshops (GC Wkshps), Abu Dhabi, United Arab Emirates, pp. 1–7 (2018). https://doi.org/10.1109/GLOCOMW.2018.8644436 34. Chehri, A., Mouftah, H.T.: Phy-MAC MIMO precoder design for sub-6 GHz backhaul small cell. In: 2020 IEEE 91st Vehicular Technology Conference (VTC2020-Spring), Antwerp, Belgium, pp. 1–5 (2020). https://doi.org/10.1109/VTC2020-Spring48590.2020.9128733 35. Slalmi, A., Saadane, R., Chehri, A., Kharraz, H.: How will 5G transform industrial IoT: latency and reliability analysis. In: Human Centered Intelligent Systems. Smart Innovation, Systems and Technologies, vol. 189. Springer, Singapore (2020) 36. Chehri, A., Jeon, G.: Optimal matching between energy saving and traffic load for mobile multimedia communication. Concurrency Comput. Pract. Exper. e5035 (2018) 37. Chehri, A., Mouftah, H.T.: Exploiting multiuser diversity for OFDMA next generation wireless networks. In: 2013 IEEE Symposium on Computers and Communications (ISCC), Split, pp. 000665–000669 (2013) 38. Chehri, A., Mouftah, H.: An empirical link-quality analysis for wireless sensor networks. In: Proceeding of International Conference Computer Network Communication (ICNC), pp. 164–169, January/February 2012 39. Chehri, A., Fortier, P., Tardif, P.-M.: On the TOA estimation for UWB ranging in complex confined area. In: Proceeding of International Symposium Signals Systems and Electronics, pp. 533–536 (2007) 40. Slalmi, A., Chaibi, H., Saadane, R., Chehri, A., Jeon, G.: 5G NB-IoT: efficient network call admission control in cellular networks. Concurrency Comput. Pract. Exper. e6047 (2020). https://doi.org/10.1002/cpe.6047

Comparative Study via Three MPPT Techniques Methods for PV Systems Mohamed Chouiekh1(B) , Amine Lilane1 , Karim Benkirane2 , Mohamed Abid1 , and Dennoun Saifaoui1 1 Laboratory of Renewable Energies and Dynamics of Systems Faculty of Science Ain Chock,

Hassan II University Casablanca, Casablanca, Morocco [email protected] 2 Laboratory of Renewable Energies and Dynamics of Systems Royal Navy School, Hassan II University Casablanca, Casablanca, Morocco

Abstract. This paper presents three techniques of maximizing power point tracking (MPPT) using incremental conductance, perturbation and observation (P&O), and fuzzy logic controller for photovoltaic systems under environmental conditions. The objective of this study is to compare the performance between these methods, the simulation results obtained are developed under MATLAB/Simulink software. All three methods are used with a DC/DC modified CUK converter “Mcuk”, a photovoltaic system including a solar panel, and a resistive load. Keywords: Photovoltaic generator · MPPT · DC/DC converter

1 Introduction Renewable energies are emerging as a potential solution for reducing pollution. the photovoltaic (PV) solar energy has been growing rapidly in recent years because it is an inexhaustible source of energy that is non-polluting for the environment, silent and non-disturbing for residents [1]. Photovoltaic solar energy comes from the direct conversion of part of the sun’s rays into electrical energy, but there is a problem in maximizing the power transfer from the photovoltaic generator (PVG) to the load. This is due to the non-linear nature of the electrical I-V (current-voltage) characteristics of photovoltaic cells [2], these characteristics depend on the level of illumination, cell temperature, and load. To increase the output power of a photovoltaic power system, it is essential to force the photovoltaic panel to work at the Maximum Power Point (MPP) [3]. To extract the maximum power available at each moment at the terminals of the GPV, the technique conventionally used is to insert an adaptation stage between the GPV and the load. This adaptation stage consists of a static converter controlled by pulse-width modulation (PWM) [4]. In this work, we present three robust techniques for the continuation of the PPM of the PV panel system, this paper is organized as follows: After a brief introduction, we present in Sect. 2, the modeling of the photovoltaic system, Sect. 3, exposes the proposed MPPT methods design. In Sect. 4, Simulation results are provided to demonstrate the performance of the proposed MPPT methods, Finally, the conclusion is contained in Sect. 5. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 966–976, 2021. https://doi.org/10.1007/978-3-030-66840-2_73

Comparative Study via Three MPPT Techniques Methods

967

2 Modeling of the Photovoltaic System The configuration of the PV system, consisting of a PV panel, an MCUK converter placed between the PV panel and the load, and the MPPT algorithm (Fig. 1).

Fig. 1. Photovoltaic system

2.1 Modeling of the PV Module The solar cells are generally associated in series and in parallel, then encapsulated under glass to obtain a photovoltaic module. A PV generator is made up of interconnected modules to form a unit producing high continuous power compatible with conventional electrical equipment. PV modules are usually connected in series parallel to increase the voltage and current at the generator output. The interconnected modules are mounted on metal supports and inclined at the desired angle depending on the location, this set is often called by field of modules [5]. Thus, the I-V characteristic of the PV generator is based on that of an elementary cell modeled by the well-known equivalent circuit in Fig. 2. This circuit introduces a current source and a diode in parallel, as well as series resistors Rs and parallel resistors Rsh to take into account the dissipative phenomena in the cell [6].

968

M. Chouiekh et al.

Fig. 2. Equivalent schema of a PV cell

The relationship between V and I of a PV module is given by the following equation: I = Iph − ID − Ish

(1)

      q(V + IRs ) V + IRs I = NP Iph − NP I0 exp − 1 − Np akTNs NS Rsh

(2)

where: I and V are the output current and voltage of photovoltaic cell, k is the Boltzmann’s constant (j/k), a is the ideality factor, T is cell temperature(K), Iph is photo generated current, I0 is the reverse saturation current, q is the electron charge, NP and Ns are respectively the number of cells in series and in parallel,Rs and Rsh are respectively. The PVG series resistance and parallel resistance ().  I0 = I0r

T Tr

3

 exp

  qEC 1 1 − ka Tr T

I = {Iscr + Ki (T − 298)}

Gn 100

(3) (4)

where: Tr reference temperature I0r Saturation current at Tr, EC is the band-gap energy of the cell semiconductor (eV), Gn is solar irradiation (W/m2 ). The parameters of the photovoltaic panel used in this study are described in the Table 1.

Comparative Study via Three MPPT Techniques Methods

969

Table 1. Electrical characteristics of the BP3220T. Parameter (at 25°)

Value

Maximaum power (Pmax)

219.64 W

Voltage at Pmax (Vmpp)

28.9 V

Current at Pmax (Impp)

7.6 A

Open circuit voltage (Voc)

36.6 A

Short circuit current (Isc)

8.2 A

Cell serial modules (ns)

60

Temperature coefficient of Isc 0.124 A/C°

2.2 DC-DC Modified CUK Converter The modified CUK (MCUK) is considered an improvement of the CUK DC-DC converter with reverse output polarity and can be used for applications with increasing and decreasing voltage, so MCUK is like a CUK converter but with the addition of a diode and a capacitor [7]. Additional components have been introduced to solve the stabilization problem encountered with CUK converters; the circuit diagram of the modified cuk converter is shown in Fig. 3.

Fig. 3. Equivalent schema of a PV cell

3 Maximum Power Point Tracking (MPPT) MPPT algorithms are necessary in photovoltaic applications because the maximum power of a solar panel varies depending on the irradiation and the temperature, there are a large number of algorithms that are capable of tracking maximum power point (MPP) In this study, we will use three MPPT methods.

970

M. Chouiekh et al.

3.1 Perturb and Observe The perturbation and observation method (P&O) is a widely used approach in MPPT research because it is simple and requires only voltage and current measurements of the photovoltaic field V and I respectively, this method operates by periodically disturbing the voltage of the PV field, and by comparing the power previously supplied with the new after disturbance [8], this method allows to find the maximum power point even during variations of light and temperature, the flow chart of P&O algorithm shown in Fig. 4.

Begin

Measures of V(k) and I(k)

P(k)=V(k)

X

I(k)

∆P = P(k) – P(k-1)

Yes

No

ΔP (k ) > 0

D(k-1) > D(k)

Yes

D(k+1)=D(k)+∆D

D(k-1) < D(k)

No

D(k+1)=D(k)-∆D

Yes

D(k+1)=D(k)-∆D

No

D(k+1)=D(k)+∆D

Fig. 4. Flow chart of the classical P&O algorithm.

3.2 Incremental Conductance (IC) The Incremental Conductance control which is based on the derivative of the conductance of the GPV (dG = dI/dV) to know the relative position of the MPP in order to apply an adequate control action following the MPP [9]. The flowchart of the proposed method is shown in Fig. 5.

Comparative Study via Three MPPT Techniques Methods

971

Begin

dV =V (k ) −V (k − 1) dI = I (k ) − I (k − 1)

No

Yes Yes dI = 0

dV = 0

No

Yes

dI I =− dV V

No No change

Yes

Yes

dI > 0

No Increase Duty cycle

Decrese Duty cycle

No change dI I >− dV V

No Increase Duty cycle

Decrese Duty cycle

Update

V (k − 1) =V (k ) I (k − 1) = I (k )

Return Fig. 5. Flow chart of the incremental conductance algorithm.

4 Fuzzy Logic Mppt Controller Fuzzy logic control has the advantage of being a robust control that is relatively simple to develop and does not require exact knowledge of the model to be regulated [10]. The implementation of a fuzzy controller is carried out in three stages, which are: fuzzification, inference and defuzzification (Fig. 6).

Fig. 6. Block diagram of the fuzzy logic controller.

972

M. Chouiekh et al.

Fuzzification is used to blur input variables. A preliminary step is to define a maximum allowed range of variation for the input variables. The purpose of fuzzification is to transform input variables into linguistic or fuzzy variables. In our case, we have two input variables which are the error E(k) and the error variation CE at time k which are defined as follows: P(k) − P(k − 1) V (k) − V (k − 1)

(5)

CE(k) = E(k) − E(k − 1)

(6)

E(k) =

where P(k) and V(k) are the power and the voltage of the PV panel, respectively. Inference is a step that involves defining a logical relationship between input andoutput. Indeed, membership rules will be defined for the exit as it was done for entries [11], thanks to these rules an inference table can be drawn up (Table 2). Table 2. Fuzzy rules. ECE NB NS Z

PS PB

NB

Z

Z

NB NB NB

NS

Z

Z

NS NS NS

Z

NS Z

PS

PS PS PS Z

Z

Z

Z

PS

PB

PB PB PB Z

Z

where Negative Big (NB), Negative Small (NS), Zero (Z), Positive Small (PS) and Positive Big (PB). Finally, we have to perform the inverse operation of fuzzification, here we have to calculate a numerical value understandable by the external environment from a fuzzy definition is the purpose of defuzzification [12].

5 Simulation Results In the MATLAB Simulink 9.6 environment, we defined the entire PV system design model, which is shown in Fig. 7. The simulation results have been archived considering a BP3220T photovoltaic panel providing a resistive load via an MCUK converter. The PV panel parameters used in this simulation are given in Table 1, in this simulation, the temperature is maintained at 25 °C. The simulation results, from the previously studied algorithms, are given in Figs. 8 and 9. These figures show the output values of power and voltage for a pair of temperature and irradiance equal to (25 °C, 1000 W/m2 ). The previous simulation figures show that the photovoltaic system converges to the optimum values.

Comparative Study via Three MPPT Techniques Methods

973

Fig. 7. Simulink model of the global PV system.

Fig. 8. Photovoltaic power at 1000 w/m2 and 25 °C.

The variations in solar radiation given in Fig. 10, the Figs. 11 and 12 illustrate the power and voltage at the terminals of the PV panel following the change of the irradiance at the constant temperature equal at 25 °C.

974

M. Chouiekh et al.

Fig. 9. Photovoltaic voltage at 1000 w/m2 and 25 °C.

Fig. 10. Irradiance profile.

The P&O algorithm is a classic and simple algorithm. In general, this algorithm is highly dependent on the initial conditions and has oscillations around the optimal value. The major disadvantage of this algorithm is its poor behaviour following a sudden change in illumination. The INC algorithm seems to be an improvement of the P&O algorithm. Indeed, it behaves better during a rapid change in metrological conditions. However, it is a more complex algorithm than the previous one. The fuzzy logic algorithm is a robust and efficient algorithm. Indeed, this algorithm works at the optimal point without oscillations. In addition, it is characterized by good

Comparative Study via Three MPPT Techniques Methods

975

Fig. 11. Photovoltaic power for variable radiation.

Fig. 12. Photovoltaic voltage for variable radiation.

behavior in transient state. However, the implementation of this type of algorithm is more complex than traditional algorithms. In addition, the efficiency of this algorithm is highly dependent on the inference table. By comparing the performance of the proposed three methods (P&O, IC, FLC), as shown in Fig. 8 and 9, we find that the proposed IC is much better at achieving rapid convergence in the presence of model uncertainties and external perturbations.

6 Conclusion In this work we have described the main elements of the PV system. Then, we recalled the principle of three MPPT algorithms (P & O, IC, FLC). Finally, we finished with

976

M. Chouiekh et al.

a simulation of the three algorithms. The results of the simulations show that the INC algorithm gives better results than the P&O. Among these three algorithms, fuzzy logic shows good behavior and better performance compared to other methods. For the rest, we plan to improve the work by inserting an inverter which converts the direct current from the MCUK into an alternating current usable for the supply of alternating loads or for injection into the electrical network.

References 1. Villalva, G.M.G., Gazoli, J.R., Filho, E.R.: Comprehensive approach to modeling and simulation of photovoltaic arrays. IEEE Trans. Power Electron. 24(5), 1198–1208 (2009) 2. Rekioua, D., Matagne, E.: Optimization of photovoltaic power systems: modelization, simulation and control. Green EnergyTechnol 2012, 102 (2012) 3. Sahebrao, P., Prasad, N.R.C.: Design and simulation of MPPT algorithm for solar energy system using Simulink model. Int. J. Res. Eng. Appl. Sci. (IJREAS), 02(01), 37–40 (2014). ISSN: 2249-9210 4. Elgendy, M.A., Zahawi, B., Atkinson, D.J.: Assessment of perturb and observe MPPT algorithm implementation techniques for PV pumping applications. IEEE Trans. Sustain. Energy, 3(1), 21–33, January 2012 5. Kassmi, K., Hamdaoui, M., Olivié, F.: Conception et modélisation d’un systèmephotovoltaïque adapté par une commande MPPT analogique. Revue des Energies Renouvelables 10(4), 451–462 (2007) 6. Rahman, S.A., Varma, R.K., Vanderheide, T.: Generalized model of a photovoltaic panel. lET Renew. Power Gener. J. Mag. 8(3), 217–219 (2014) 7. Ahmed, A.H., Al-Khatat, M.K., Abdullah, A.J.: Synthesis for Cuk convertor circuit controller. Al-Rafidain Eng. J. 14(4). Mosul-Iraq (2006) 8. Jacobs, I.S., Bean, C.P., Surya Kumari, J., Dr. Babu, C.S., Babu, A.K.: Design and analysis of P&O and IP&O MPPT techniques for photovoltaic system. Int. J. Mod. Eng. Res. (IJMER). 2(4), 2174–2180, July-August 2012 9. Yan, Z., Fei, L., Jinjun, Y., Shanxu, D.: Study on realizing MPPT by improved incremental conductance method with variable step-size. In: Proceeding of IEEE ICIEA, pp. 547–550, June 2008 10. Diaz, N., Hernandez, J., Duarte, O.: Fuzzy MPP method improved by a short circuit current estimator, Applied to a grid connected PV system. In: IEEE 12th Work Shop on Control and Modeling for Power Electronics. Colombia, pp. 1–6 (2010) 11. Rahmani, R., Seyedmahmoudian, M., Mekhilef, S., Yusof, R.: Implementation of Fuzzy Logic Maximum power Point Tracking Controller For Photovoltaic System. Am. J. Appl. Sci. 10(3), 209–218 (2013) 12. Larbes, C., Cheikh, S.M.A., Obeidi, T., Zerguerras, A.: Genetic algorithms optimized fuzzy logic control for the maximum power point tracking in photovoltaic system. Renew. Energy 34, 2093–2100 (2009)

Design and Realization of an IoT Prototype for Location Remote Monitoring via a Web Application S. M. H. Irid1,3 , M. Hadjila1,3 , H. E. Adardour2,3(B) , and I. Y. Nouali1,3 1

3

Department of Telecommunications, Faculty of Technology, University Abou Bakr Belkaid-Tlemcen, Tlemcen, Algeria {sidimohammedhadj.irid,mourad.hadjila}@univ-tlemcen.dz, [email protected] 2 Department of Electronics, Faculty of Technology, University Hassiba Benbouali-Chlef, Chlef, Algeria [email protected] STIC Laboratory, Faculty of Technology, University Abou Bakr Belkaid-Tlemcen, Tlemcen, Algeria

Abstract. Currently, a remarkable new technology in the field of electronics and computer networks offers great potential for the development of new services and applications connecting the physical world to the virtual world. This technology is the Internet of Things (IoT). It allows drawing a digital map of the real world, giving an electronic identity to the objects and places of the physical environment. In this paper, the concept of the Internet of Things has been applied and its spectrum of use has been extended by making an IoT-based prototype to give the user the possibility of monitoring a remote location through a Web application. Our system consists of three nodes responsible for capturing physical quantities such as temperature/humidity, movement and gas/smoke. The monitoring is carried out via a Web application where the user can consult these quantities in real time on a graphical interface or will be alerted by receiving a notification on his Smartphone or on his computer if the application turns in the background following the triggering of an event such as exceeding a parameter threshold or detecting an intrusion into the monitored location. Keywords: Internet of Things · Connected objects development · Web application · Monitoring

1

· Arduino · Web

Introduction

Today, millions of people use the Internet for various purposes: from searching for information on the Web to online gambling; from sending and receiving emails to social applications and many other activities. While millions of devices offer us these possibilities, a big step forward is made in the use of the Internet c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021  M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 977–991, 2021. https://doi.org/10.1007/978-3-030-66840-2_74

978

S. M. H. Irid et al.

as a global platform for everyday objects to coordinate and communicate with each other. And from this point of view is born the Internet of Things [1–4]. The development of the Internet of Things (IoT) leads to transforming many objects in our environment into so-called “connected” objects [5]. A connected object can be defined as an object to which the addition of an Internet connection and / or algorithm provides additional value in terms of functionality, information or interaction. The connected object establishes a communication (sending and / or receiving information) with its environment and with a server or the “cloud” via an Internet connection [6]. IoT has gained a significant share of the mind, let al.one the attention, in academia and industry especially in recent years. The reasons for this interest are the potential capabilities that the IoT promises to offer. On the personal level, it paints a picture of a future world where all things in our environment are connected to the Internet and communicate transparently with one another to work intelligently. The ultimate goal is to enable the objects around us to effectively capture our environment, to communicate cheaply and create a better environment for us: an environment where everyday objects act according to what we need without explicit instructions. The Internet of Things will connect everything and everyone in an integrated global network. To supply Big Data with its planetary nervous system, billions of sensors are already being fixed on natural resources, production lines, the electricity grid, logistics networks, recycling flows, and are being installed in homes, offices, vehicles and even human. Prosumers will be able to connect to the network and use Big Data, analytics, and algorithms to improve efficiency, increase productivity enormously, and reduce the marginal cost of producing and sharing a wide range of goods and services, just as they do today for information goods. The Internet of Things is being introduced in all industrial and commercial sectors. Companies install sensors throughout their business chain to monitor and control the flow of goods and services. UPS, for example, uses Big Data to keep up-to-date on the state of its vehicles in the United States: the logistics giant has loaded sensors to monitor each of their parts for signs malfunction or potential wear; it can replace them before an expensive failure occurs on the road [7]. IoT is used to create smart cities [8]. Sensors measure the vibration and condition of materials in buildings, bridges, roads and other infrastructure to assess structural wear in the built environment and decide when to make the necessary repairs. Others follow noise pollution from neighborhood to neighborhood, or monitor congestion on the streets and human density on sidewalks to optimize car and pedestrian traffic. The application of the Internet of Things to the natural environment is advancing rapidly to better manage the Earth’s ecosystems. In forests, sensors warn firefighters about the risk of fire. Scientists deploy them in cities, and villages to measure pollution and warn residents in case of toxic levels, so that they can reduce their exposure to danger by staying at home. Researchers are implanting sensors in wildlife and locating them along their migratory routes to identify environmental and behavioral changes that may be detrimental to their well-being: this allows for preventive measures to restore the dynamics of wildlife ecosystems.

Design and Realization of an IoT Prototype for Location Remote Monitoring

979

The Internet of Things is also transforming the way we produce and deliver food. Farmers use sensors to monitor weather conditions, changes in soil moisture, pollen diffusion and other factors that affect the returns, and install automatic reaction mechanisms, which ensure good growing conditions [9,10]. Even inside the human body, doctors fix or implant sensors that monitor certain functions, such as heart rate, pulse, temperature, and skin color, to be alerted to vital changes that may require their active attention [11–13]. This paper aims to trace the fundamental aspects of the Internet of Things, by creating a prototype of remote monitoring based on IoT, which allows to connect different types of devices via a Web application to a Cloud that stores sent data by control cards. With real-time access from the web application to the cloud, it’s possible to effectively monitor a location at any distance. The remaining of this paper is organized as follow: Sect. 2 presents description of the IoT system where we present its three main layers. In Sect. 3, we have described the web application and its corresponding interfaces followed by a flow chart. Section 4 concludes this paper with discussion on some future research.

2

IoT System Description

An Internet of Things model consists of three main layers, our proposed prototype follows the same decomposition. The first layer is designed around three nodes (End-device); each is equipped with a microcontroller, a WiFi module, and a sensor to collect different types of information. The constitution of each node is as follows: Node 1 includes the following components: an Arduino Uno [14], an ESP-01 module and a DHT11 sensor [15], node 2 contains a WeMos D1 Mini [16] and an HC-SR501 sensor, and node 3 contains a WeMos D1 Mini and an MQ-2 sensor. For the second layer, a wireless access point (WiFi) was used to connect the nodes to the Internet to send this collected information to an IoT Cloud platform. In the third layer, IoT Cloud platforms were used to receive the information sent by the nodes. In addition, we developed a Web application that performs the following functions: i) create visualization of the data stored in the Cloud, ii) view notifications to specific events. Node 1 is responsible for sensing temperature and humidity by the sensor while Nodes 2 and 3 are respectively responsible for detecting whether there is a movement or a gas leak [17]. The communicating nodes constitute the big material part of our system. There are two types of nodes, a node whose function is to acquire information about the physical state (temperature and humidity), and the detector nodes that detect a movement, a gas leak or a smoke. The set of nodes sends the information to ThingSpeak Cloud [18], moreover the detector nodes use the Firebase Cloud Messaging [19] to deliver the notifications, while the Web application takes care of displaying this information and trigger the notification following the detection of movement or gas/smoke.

980

S. M. H. Irid et al.

The proposed IoT prototype is shown in Fig. 1:

Fig. 1. Global synoptic diagram of the proposed IoT system.

2.1

Communicating Nodes

Communicating nodes constitute the major material part of our prototype. We can distinguish two types of nodes, a node which has the function of acquiring information on the physical state (temperature, humidity), and the detector nodes: motion detector, gas/smoke detector. The whole posts the information to ThingSpeak Cloud, in addition the detector nodes use the Firebase Cloud Messaging to deliver the notifications. the web application takes care of displaying this information and triggering the notification upon detection of motion or gas/smoke. In what follows, we will detail the composition of each node, give its electrical diagram as well as the flowchart which describes the principle of its operation. Node 1 Connection of the ESP-01 and the DHT11 Sensor: The ESP-01 has 8 pins of which 5 pins are used for the connection with the Arduino Uno: – – – – –

Vcc: the 3.3V power supply of the module. GND: the mass. TX: connected to pin 2 of the Arduino Uno. RX: connected to pin 3 of the Arduino Uno. CH PD: must be set to Vcc (3.3V) to activate the operating mode of the module.

Design and Realization of an IoT Prototype for Location Remote Monitoring

981

For the DHT11 sensor, the connection is as follows: – Vcc: the 5V supply of the sensor. – GND: the mass. – S: connected to the A0 pin of the Arduino Uno. Figure 2 shows the electrical circuit of the branch, designed by the Fritzing printed circuit design software. After validating the electrical diagram, we will switch to the Fritzing software interface used for the design of the PCB. The final printed circuit board used for the ESP-01 shield and DHT11 sensor is shown in Fig. 3.

Fig. 2. Electrical diagram of the DHT11 sensor connection and the ESP-01 with the Uno.

Temperature and Humidity Acquisition Algorithm: The temperature and humidity are acquired via the DHT11 sensor, whose operating algorithm is given by the flowchart given in Fig. 4. Node 2 Connecting the HC-SR501 Sensor: The connection of the HC-SR501 sensor is very simple, in fact, all you need to do is connect the 5V power supply and we will have a digital reading from the WeMos D1 Mini to the OUT pin of the sensor, as mentioned below:

982

S. M. H. Irid et al.

Fig. 3. ESP-01 shield PCB and DHT11 sensor.

Fig. 4. Temperature and humidity acquisition algorithm.

– Vcc: the 5V supply of the sensor. – GND: the mass. – OUT: connected to the D0 pin of the WeMos D1 Mini. The electrical connection diagram is shown in Fig. 5.

Design and Realization of an IoT Prototype for Location Remote Monitoring

983

Fig. 5. Electrical diagram of the D1 Mini connection with the HC-SR501 sensor.

Motion Detection Algorithm: The main role of this algorithm is to inform the user in real time when movement occurs. This is ensured by the HC-SR501 sensor. The proposed flowchart for this algorithm is shown in the following Fig. 6, and Fig. 7. Node 3 Connecting the MQ-2 Sensor: Connecting the MQ-2 sensor consists of connecting the Vcc and GND pins of the sensor to the 5V power supply and to the ground of the WeMos D1 Mini. The analog output of the A0 sensor should not be connected directly to the A0 pin of the WeMos D1 Mini, as the latter does not tolerate more than 3.2V as the input voltage for the A0 pin. To do this, we have to use a voltage divider that reduces the input voltage from 5V to an output voltage of 3.2V. The electrical circuit of the connection is shown in Fig. 8.

984

S. M. H. Irid et al.

Fig. 6. Motion detection algorithm - Part 1.

Fig. 7. Motion detection algorithm - Part 2.

Design and Realization of an IoT Prototype for Location Remote Monitoring

985

Fig. 8. Electrical diagram of the D1 Mini connection with the MQ2 sensor.

Gas/Smoke Detection Algorithm: This algorithm detects combustible gas or smoke using the MQ2 sensor, the purpose of which is to warn the user in the event of danger. The operation of this algorithm is described by the flowchart in Fig. 9.

3

Web Application

The Web application makes it possible to display the information collected by the first node (temperature, humidity) and to alert the user by a notification when the second node detects a movement or the third node detects a gas/smoke. It is based on the responsive Web design, which gives the user the possibility to benefit from the application regardless of its range of devices such as mobile phones, tablets, and desktop monitors. Several programming languages are used for the development of our Web application, they are presented below: – HTML5: for the creation and structuring of our application. – CSS3: for formatting. – JavaScript: to create the notification system and import the data from the ThingSpeak Cloud to our application using AJAX. – PHP: to manage the authentication mechanism. – SQL: used for the management of the database.

986

S. M. H. Irid et al.

Fig. 9. Gas/smoke detection algorithm.

The cloud computing is an infrastructure in which computing power and storage are managed by remote servers to which users connect via a secure Internet connection. The desktop or mobile computer, the mobile phone, the touch pad, and other connected objects become access points for running applications or viewing data that is hosted on the servers. Cloud is also characterized by its flexibility, which allows vendors to automatically tailor storage capacity and computing power to users’ needs. For the general public, the Cloud materializes in particular by storage services and digital data sharing type Box, Dropbox, Microsoft OneDrive or Appele iCloud on which users can store personal content (photos, videos, music, documents, etc.) and access anywhere in the world from any connected device. In this paper, we will use the ThingSpeak Cloud; it is an IoT platform that allows collecting and storing the data of connected objects through the HTTP protocol via the Internet or via a local network. With ThingSpeak, you can create instant live data visualizations, act on data, and send alerts using Web services such as Twitter, Twilio. It gives engineers and scientists the ability to prototype and build IoT systems without installing servers or developing Web software. For the implementation of the notifications, we will use Firebase Cloud Messaging. The latter makes it possible to notify a client application that a new e-mail or other data is available for synchronization and to send notification messages to stimulate re-engagement and retention of users.

Design and Realization of an IoT Prototype for Location Remote Monitoring

3.1

987

Authentication Interface

Authentication is done by prompting the user to enter a username and password i.e. we query our database to test the couple username/password if it is correct in order to access the home page of our application otherwise the authentication is unsuccessful. This interface is illustrated in the Fig. 10:

Fig. 10. Authentication interface of computer version of the Web app.

Fig. 11. Control interface of computer version web application.

3.2

Control Interface

Once the authentication is successful, the user will be redirected to the control page (see Fig. 11) on which he can: consult the values of the temperature and the

988

S. M. H. Irid et al.

humidity, inquire of the last date / time of the product movement or gas/smoke detected, as it can also click on: the “visualize” button to display the last ten values of the temperature or humidity via a representative curve (see Figs. 12 and 13), and on the “reset” button to reset motion detection and gas/smoke functions.

Fig. 12. Representative curve of the last ten values of the temperature.

The operating principle is explained by the flowchart depicted by Fig. 14 associated with the code used for the development of the application. The getTemp, getHum, getMotion and getGas Smoke functions allow us to calculate temperature, humidity, motion and gas/smoke, respectively. These methods send

Fig. 13. Representative curve of the last ten values of humidity.

Design and Realization of an IoT Prototype for Location Remote Monitoring

Fig. 14. Control flow chart

989

990

S. M. H. Irid et al.

requests through the HTTP protocol via the GET method to the ThingSpeak cloud. At the end of this request, a response is received from the server in JSON format. In the case of temperature or humidity, the response is directly obtained, but in the case of movement, a test must be made on the response. If the latter is equal to one, it means that there is a detected movement. The same thing is repeated for the case of gas/smoke. If the user has already authorized the notifications, we proceeded to the recovery of the token, then to the execution of the function sendToken, otherwise, we ask permission from the user to receive notifications and recover the token. The sendToken function allows the user to subscribe to the IoT subject to receive notification messages sent by the communicating nodes (node 2 and node 3). This is achieved by sending a token through an HTTP POST request to the Firebase Cloud Messaging.

4

Conclusion

In this paper, we realized a remote monitoring system based on Internet of Things technology where we formed three nodes each controls a well-defined parameter (unauthorized presence, combustible gas or smoke, temperature and humidity). In addition, we developed a Web application by creating a usermonitoring interface with real-time visualizations and notifications. The interconnection between these two is via IoT Cloud platforms. In perspective, we plan to: firstly, secure the data sent by the communicating nodes by integrating cryptographic protocols such as SSL or TSL, secondly, increase the efficiency of the system by switching to a paid service of IoT Cloud platforms, or the use an ideal server, and finally, a real implementation requires the implementation of several nodes for each parameter and the activation of a battery power to keep the nodes constantly online.

References 1. Atzori, L., Iera, A., Morabito, G.: The internet of things: a survey. Comput. Netw. 54(15), 2787–2805 (2010) 2. Alam, N., Vats, P., Lashyap, N.: Internet of Things: a literature review. In: 2017 Recent Developments in Control, Automation & Power Engineering (RDCAPE), pp. 192–197. IEEE (2017) 3. Atzori, L., Iera, A., Morabito, G.: Understanding the Internet of Things: definition, potentials, and societal role of a fast evolving paradigm. Ad Hoc Netw. 56, 122–140 (2017) 4. CISCO, Internet of things (IoT) (2012). http://www.cisco.com/web/solutions/ trendsliot/overview.html 5. Adardour, H.E., Hadjila, M., Irid, S.M.H., et al.: Outdoor Alzheimer’s patients tracking using an IoT system and a kalman filter estimator. Wireless Pers. Commun. 1–17 (2020) 6. Dhainaut, J.-F., et al.: Utilisation des objets connect´es en recherche clinique (2017) 7. May er-Sch¨ onberger, V., Cukier, K.: Big Data : A Revolution That Will Transform How We Live, Work, and Think. Houghton Mifflin Harcourt, Boston, MA, p. 59 (2013)

Design and Realization of an IoT Prototype for Location Remote Monitoring

991

8. Tanwar, S., Tyagi, S., Kumar, S.: The role of internet of things and smart grid for the development of a smart city. In: Intelligent Communication and Computational Technologies. Springer, Singapore, pp. 23–33 (2018) 9. Raut, R., Varma, H., Mulla, C., et al.: Soil Monitoring, fertigation, and irrigation system using iot for agricultural application. In : Intelligent Communication and Computational Technologies. Springer, Singapore, pp. 67–73 (2018) 10. Rukhmode, S., Vyavhare, G., Banot, S., et al.: IOT based agriculture monitoring system using Wemos. In: International Conference On Emanations in Modern Engineering Science and Management (ICEMESM-2017), pp. 14–19 (2017) 11. Qi, J., Yang, P., Min, G., et al.: Advanced internet of things for personalised healthcare systems: a survey. Pervasive Mobile Comput. 41, 132–149 (2017) 12. Yuehong, Y.I.N., Zeng, Y., Chen, X., et al.: The internet of things in healthcare: an overview. J. Ind. Inf. Integr. 1, 3–13 (2016) 13. Dhainaut, J.-F., Huot, L., Pomar, V.B., et al.: Utilisation des objets connect´es en recherche clinique. Th´erapie (2017) 14. Massimo, B.: Getting started with Arduino. Make: Books (2011) 15. Kale, V.S., Kulkarni, R.D.: Real time remote temperature & humidity monitoring using arduino and Xbee S2. Int. J. Innov. Res. Electr. Electron. Instrum. Control Eng. 4(6) (2016) 16. Kodali, R.K., Sahu, A.: An IoT based weather information prototype using WeMos. In: 2016 2nd International Conference on Contemporary Computing and Informatics (IC3I), pp. 612–616. IEEE (2016) 17. Irid, S.M.H., Hadjila, M., Adardour, H.E.: Design and implementation of an IoT prototype for the detection of carbon monoxide. In: 2019 6th International Conference on Image and Signal Processing and their Applications (ISPA), pp. 1–5. IEEE (2019) 18. Maureira, M.A.G., Oldenhof, D., Teernstra, L.: ThingSpeak–an API and Web Service for the Internet of Things. Retrieved7/11/15World Wide Web (2011). http:// www.Mediatechnology.leiden.edu/images/uploads/docs/wt2014 thingspeak.pdf 19. Kumar, K.N., Akhi, K., Gunti, S.K., et al.: Implementing smart home using firebase. Int. J. Res. Eng. Appl. Sci. 6(10), 193–198 (2016)

Design of a New CP Microstrip Patch Antennas for WPT to a UAV at 5.8 GHz Salah Ihlou1(B) , Hafid Tizyi2 , Abdelmajid Bakkali1 , Ahmed El Abbassi1 , and Jaouad Foshi1 1 Team Electronics, Instrumentation and Measurement, Faculty of Sciences and Techniques,

University of Moulay Ismail, Meknes, Morocco [email protected] 2 STRS Laboratory, INPT, Rabat, Morocco

Abstract. WPT (Wireless power transmission) has become an attractive energy option that will expand the range and provide electricity without a wired connection to a microplane (MAV). In this paper we will suggest a modern CPMA (Circular Polarized Microstrip Antenna) configuration for the 5.8 GHz Rectenna system. The lightweight antenna is centered on a single microstrip patch to achieve CP (circular polarization). This antenna is activated by a microstrip line with a power port adjusted to 50 . A prototype has been designed to examine the performance of the structure proposed. CST (Computer simulation technology) simulator has been used to do the study and modeling. The results show the antenna proposed presents a good Axial Report (AR) of 0.98 dB, a bandwidth of 160 MHz at −10 dB, the reflection coefficient reaches a level of −38.25 dB ( 0; For all js.

2.3 The Second Stage: Patient Classification Using SVM SVM is one of the first statistical learning approaches put advancing by Vapnik et al. in 1995 [17]. It is major factor attainment of machine learning research in current years. This approach not only has a hard-theoretical basis but also suits to resolve the problem of extremely nonlinear classification and reversion compared with the conventional intelligent approaches such as statistical mechanisms and artificial neural networks. Temporarily, this approach has been generally used in image recognition, automatic text classification and more [18]. Meanwhile, this method has exposed exceptional performance over the current methods, the approach and technology have become novel research hotspots after neural network research and will help the research and application of machine learning theory and technology. The classification algorithm SVM allows a binary decision to be achieved, assigning an M-dimensional feature-vector to one of two classes. The presence of a supervised approach, an SVM needs to be qualified using a proper dataset, which must be satisfactorily large and illustrative of the two classes, with concerning the selected features. A training phase is then needed to regulate a subcategory of the training vectors named support vectors, that will be used for resolving the classification problem. One significant advantage of SVMs remains in the fact which the number of support vectors is normally much lesser than the cardinality of the training dataset. whence, though the training of the SVM can be a resource-in-tensive task, the real classification algorithm

COVID-19 Patient Classification Strategy Using a Hybrid BWM-SVM Model

1215

Fig. 4. Algorithm of hybrid model BWM-SVM

can be very slight. Equation 2 represents the standard formulation for a two-class for classification problem [19]: y(x) = V T φ(x) + b

(2)

The algorithm used in the proposed work are as follows by two-level: • Calculation of weight are Input, generate the optimal weights. • Classified by SVM are training data, Pre-processing, Feature Extraction and Classifier SVM.

1216

S. Achki and L. Aziz

All the overhead cited stages and their interconnection for the planned handwritten The recognition system is denoted in Fig. 4. which is a linear model where x is the M-dimensional input vector, M is the size of the feature space, V = v1; v2; :::; V M is the vector of coefficients for the linear model, ϕ is a general feature-space transformation function and b signifies the bias of the model. 2.3.1 Numerical Illustration and Discussion This section explains the numerical example of the present study. The first is based on determining the values of preference of each criterion compared to the best and the worst criterion. Table 1 and Table 2 show the preference predefined of all criteria in terms of the best and the worst criterion respectively. the next step aims to resolve the optimal problem to find the optimal values for the criteria weights. Table 3 depicts the optimal values of the study preferences. After calculating the importance. Table 1. Matrix of importance of the criteria compared to the Best Criterion. Best to others

Fever

Cough

Fatigue

Shortness of breath

Age

Fever

1

5

3

7

9

Table 2. Matrix of importance of the criteria compared to the Worst criterion Best to others

Age

Fever

9

Cough

5

Fatigue

3

Shortness of breath 3 Age

1

Table 3. Matrix of the optimal weights Criteria

Fever

Cough

Fatigue

Shortness of breath

Age

Optimal weight

0,53

0,20

0,12

0,08

0,05

After calculating the importance value of each criterion, we classify the patients using the SVM. As show in Fig. 5 depicts the classification of the patient’s group considered for this study.

COVID-19 Patient Classification Strategy Using a Hybrid BWM-SVM Model

1217

Fig. 5. Patient classification using the hybrid-model

We observe that the proposed hybrid model allows us to make a diagnostic of patients by the classification of the patients in two classes: the positive class and the negative class (healthy). Among the advantages and feature of this model: the exactitude of diagnostic, the rapidity of diagnostic, and minimizing the risk by avoiding contact between the patients and the hospital staff.

3 Conclusion In this paper, we investigated an efficient patient classification using a hybrid BWMSVM model. For making an effective classification, we have simultaneously considered the fever, cough, fatigue, shortness of breath, and the age of the patient. In our novel strategy, patients belong to ill class or healthy class depending on their criteria values. Moreover, to make this classification more real, we have calculated the optimal weights of criteria using BWM. From the illustrative example and the classification results, we conclude that the hybrid BWM-SVM model has successfully classified the patients. We recommend the use of medical sensors in the banks, the industrial companies for easing the symptoms collection, and the rapid intervention of the hospital staff. The future work proposes to evaluate a system for COVID-19 by combining deep learning and feature extraction using different approach of multicriteria analysis. In addition, the performance may be enhanced and evaluated using more datasets defined and more feature extraction approach and techniques.

1218

S. Achki and L. Aziz

References 1. Sun, D., Li, H., Lu, X.-X., Xiao, H., Ren, J., Zhang, F.-R., Liu, Z.-S.: Clinical features of severe pediatric patients with coronavirus disease 2019 in Wuhan: a single centers observational study. World J. Pediatr. 3, 1–9 (2020) 2. Guan, W., Ni, Z., Hu, Y., Liang, W., Ou, C., He, J., Liu, L., Shan, H., Lei, C., Hui, D.S.C., Du, B., Li, L., Zeng, G., Yuen, K., Chen, R., Tang, C., Wang, T., Chen, P., Xiang, J., Li, S., Wang, J., Liang, Z., Peng, Y., Wei, L., Liu, Y., Hu, Y., Peng, P., Wang, J., Liu, J., Chen, Z., Li, G., Zheng, Z., Qiu, S., Luo, J., Ye, C., Zhu, S., Zhong, N.: Clinical characteristics of coronavirus disease 2019 in China. New Engl. J. Med. (2019) 3. Fehr, A., Channappanavar, R., Perlman, S.: Middle east respiratory syndrome: emergence of a pathogenic human coronavirus. Ann. Rev. medicine 68, 08 (2016) 4. Buheji, M., Buhiji, A.: Designing intelligent system for stratification of COVID-19 asymptomatic patients 246–257, March 2020 5. Huang, C., Wang, Y., Li, X., Ren, L., Zhao, J., Hu, Y., Zhang, L., Fan, G., Xu, J., Gu, X., Cheng, Z., Yu, T., Xia, J., Wei, Y., Wu, W., Xie, X., Yin, W., Li, H., Liu, M., Cao, B.: Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet 395, 497–506 (2020) 6. AlMomani, A.A., Bollt, E.: Informative ranking of stand out collections of symptoms: a new data-driven approach to identify the strong warning signs of COVID 19, March 2020 7. Buheji, M.: Re-inventing Our Lives A Handbook for Socio-Economic “Problem-Solving”, November 2018 8. Peng, Z., Yang, X.-L., Wang, X.-G., Ben, H., Lei, Z., Wei, Z., Si, H.-R., Yan, Z., Bei, L., Huang, C.-L., Chen, H.-D., Jing, C., Yun, L., Hua, G., Jiang, R.-D., Liu, M.-Q., Ying, C., Shen, X.-R., Xi, W., Zheng, X.-S., Kai, Z., Chen, Q.-J., Fei, D., Liu, L.-L., Bing, Y., Zhan, F.X., Wang, Y.Y., Xiao, G.-F., Shi, Z.L.: Re-inventing our lives a handbook for socio-economic “problem-solving”. Nature 579, 03 (2020) 9. Lu, H., Ai, J., Shen, Y., Li, Y., Li, T., Zhou, X., Zhang, H., Zhang, Q., Ling, Y., Wang, S., Qu, H., Gao, Y., Li, Y., Yu, K., Zhu, D., Zhu, H., Tian, R., Zeng, M., Li, Q., Zhang, W.: A descriptive study of the impact of diseases control and prevention on the epidemics dynamics and clinical features of SARS-CoV-2 outbreak in Shanghai, lessons learned for metropolis epidemics prevention, January 2020 10. Hande, A., Cem, E.: Wireless sensor networks for healthcare: a survey. Comput. Netw. 54(15) (2010). http://www.springer.com/lncs. Accessed 21 Nov 2016 11. Pathak, Y., Shukla, P.K., Tiwari, A., Stalin, S., Singh, S., Shukla, P.K.: Deep transfer learning based classification model for COVID19 disease. IRBM (2020) 12. Sarker, L., Islam, M., Hannan, T., Ahmed, Z.: COVID-DenseNet: a deep learning architecture to detect COVID-19 from chest radiology images. Preprint (2020) 13. Asnaoui, K.E., Chawki, Y.: Using X-ray images and deep learning for automated detection of coronavirus disease. J. Biomol. Struct. Dynam. (2020) 14. Achki, S., Gharnati, F., Ouahman, A.: Enhancing energy consumption in wireless communication systems using weighted sum approach. Indian J. Sci. Technol. 10, 01 (2017) 15. Samira, A., Fatima, G., Abdellah, A.O.:Assessment of energy efficiency of base station using SMART approach in wireless communication systems: special issue on data and security engineering. In: Innovations in Smart Cities Applications, edn. 2, pp. 850–857, Springer, Berlin (2019) 16. Layla, A., Said, R., Hanane, A.: An improved multipath routing protocol using an efficient multicriteria sorting method: special issue on data and security engineering, pp. 837–849, January 2019

COVID-19 Patient Classification Strategy Using a Hybrid BWM-SVM Model

1219

17. Veropoulos, K., Campbell, C., Cristianini, N.: Controlling the sensitivity of support vector machines. In: Proceedings of International Joint Conference Artificial Intelligence, June 1999 18. Srivastava, D., Bhambhu, L.: Data classification using support vector machine. J. Theor. Appl. Inf. Technol. 12, 1–7 (2010) 19. Robert, N., Gary, M., Ken, Y.. Advanced algorithms for data mining. J. Theor. Appl. Inf. Technol. (2018)

Development of a Simulator to Model the Spread of Coronavirus Infection in a Closed Space Mohamed Almechkor(B) , Lotfi El Aachak, Fatiha Elouaai, and Mohammed Bouhorma Computer Science, Systems and Telecommunication Laboratory (LIST), Faculty of Sciences and Technologies, University Abdelmalek Essaadi, Tangier, Morocco [email protected], [email protected], [email protected], [email protected]

Abstract. The outbreak of the novel coronavirus COVID-19 and its level of infectiousness, as well as its status as a global pandemic, has led to concerns about the most effective ways to reduce the transmission rate, however predicting the effectiveness of intervention strategies in a pandemic is difficult, hence the importance of simulating and modeling for understanding the spread of the disease. In this work we designed and developed a computer simulator to enable the study of infectious diseases evolutions, the simulator imitates random movements and positions of a given number of citizens in a closed space. It takes into account several parameters: contamination ratio of the disease, contaminated ratio of population, and mask-wearing ratio. We also elaborated some experiments for studying the impact of the contamination ratio over the disease spreading, as a result for a sample of 100 citizens in a closed place, we used a combination of simulations for the aforementioned 3 parameters. The results show that there’s no significant impact of the contamination ratio of the disease over the number of contaminations per each generation, although it has an impact on the propagation speed. Keywords: Infectious · Simulation · Modeling · Disease.COVID19

1 Introduction As part of the fight against the COVID1 -19 pandemic, the world has adopted several measures and imposed many restrictions to slow down the spread. By putting societies and economies on hold, we have curtailed the ability of the virus to spread through our communities. These defensive measures helped to limit some of the short-term impacts of the virus. Among its actions, social distancing standards prove to be essentials, limiting social and community contacts and increasing home isolation are embedded within the pandemic COVID-19 preparedness plans of most countries and appear in current WHO2 recommendations, in condition to plan for a phased transition away from such restrictions in a manner that will enable the sustainable suppression of transmission at a low-level [1]. 1 COVID: Corona Virus Disease. 2 WHO: World Health Organization.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 1220–1230, 2021. https://doi.org/10.1007/978-3-030-66840-2_93

Development of a Simulator to Model the Spread of Coronavirus Infection

1221

Social distancing interventions are important as they represent the only type of intervention measure guaranteed to be available against a novel infectious disease in the early phases of a pandemic. The goal of these interventions is to reduce the overall illness attack rate and the consequential excess mortality attributed to the pandemic and to delay and reduce the peak attack rate, reducing pressure on health services and allowing time to distribute and administer antiviral drugs, and possibly suitable vaccines. Since there are no approved vaccines available today [2], we must find ways to live with this pandemic. Before we could reopen public spaces, we need to specify the most effective policies in terms of safety, forasmuch as the results and effectiveness of these measures are not available and cannot be accurately estimated until they are implemented, this causes hesitation to make decisions, because of the high uncertainty towards results, hence the need of virtual simulations that allow policymakers to model the current environment and analyze the interactions between humans. The present work has two aims, First, to provide a predictive and descriptive simulator that can be used to any closed area/region and by any user, allowing them to generate simulations to understand the evolution and the spread of the disease over different scenarios, and evaluate different behaviors of an infectious disease. Second, to run a sample of tests based on this simulator to find out the impact of the contamination ratio parameter. Section 2 provides a review of some related works and comparisons. In Sect. 3, we cover the methodology used. In Sect. 4 we describe the simulator process, the parameters used, and the rationale of choosing them. Section 5 includes a sample of generated scenarios (outcoming from the simulator) based on different inputs parameters to discover the relation between several disease characteristics such as contamination ratio, propagation speed and the number of contaminations occurred per generation. Finally, we concluded by summarizing the most important remarks and outlining future research.

2 Related Works Recently several models have been developed in the literature for modeling the infectious diseases spreading, according to our objective and need; we focus especially on agentbased modeling (ABM). Many studies have investigated to implement this model, as the Washington post simulator in [3], where each individual is represented with a 2D circle with a constant speed, living in a rectangle which represents a small town of 200 people, moving randomly, starting by 1 infectious individual, it detects contaminations occurred over time, and outcomes with a chart representing the sick-healthy-recovered curves change over time. Another implementation of this model (ABM3 ) by Petrônio Silva team in [4], a solution that simulates the dynamics of the COVID-19 epidemic and the epidemiological and also economic effects of social distancing interventions, for emulating a closed society living on a shared environment, consisting of agents that represent people, houses, businesses, the government and the healthcare system, each one with specific attributes and behaviors. It’s very interactive but on the condition of knowing python. 3 ABS: Agent Based Model.

1222

M. Almechkor et al.

Furthermore, researchers in [5] present two models for the COVID-19 pandemic predicting the impact of universal face mask-wearing upon the spread of the SARS-CoV-2, employing a stochastic dynamic network-based compartmental SEIR (susceptible-exposed-infectious-recovered) approach, and the other employing individual ABM (agent-based modeling) available in [6], simulation results argue for urgent implementation of universal masking. In this work, we developed an agent-based simulator, that allows users to assign themselves -according to their main goal- parameters that represent the states and the behaviors (such as wearing the mask, population movement speed, contaminated ratio, number of population, contamination ratio) for understanding the infectious disease circulation through the population and compare results in each scenario have been treated. It guarantees more flexibility and ensures more interactivity for general users, and the ability to build new models as per the need for specific users and scientists.

3 Methodology In this part, we cover technologies used in the development of the simulator. 3.1 Angular Framework Angular is a platform and framework for building single-page client applications using HTML and TypeScript. Angular is written in TypeScript. It implements core and optional functionality as a set of TypeScript libraries that you import into your apps [7]. 3.2 Perlin Noise Algorithm We used the Perlin algorithm in the simulator to generate continue random value for managing entities’ movement. Perlin Noise is an extremely powerful algorithm that is used often in procedural content generation. It is especially useful for games and other visual media. The algorithm takes as input a certain number of floating-point parameters (depending on the dimension) and returns a value in a certain range (for Perlin noise, that range is generally said to be between −1.0 and +1.0 [8]. 3.3 Quad-Tree Algorithm Quad-tree is a hierarchical data structure used for compact representation of 2D images. The quadtree and its variants have been extensively used in such applications as computer vision, computer graphics, robotics, geometric modeling, and geographic information system. Conceptually, a quadtree is generated by dividing an image into quadrants and repeatedly subdividing the quadrants into sub-quadrants until all the pixels in each quadrant have the same value [9]. (in our case 4 the value must be less than 4 people) (Fig. 1).

Development of a Simulator to Model the Spread of Coronavirus Infection

1223

Fig. 1. Quad-tree application in Morrocovid simulator

3.4 Agent-Based Model In models of disease spread, the modeling of social networks and spatial movements is vital for accurately describing transmission and these can be incorporated into ABMs. ABMs are stochastic models, enabling the variability of human behavior to be incorporated into the model to help understand the variability in the likely effectiveness of proposed interventions [10]. Usually used to model the interactions of individuals within a population, allowing a decision-maker to determine how small changes in behavior and interaction may influence population-level outputs. The proposed agent-based model for the spread of infectious disease detects each agent state changes (susceptible or infectious), that depend on his interaction with his neighbors and the probability of infection (transmission-risk, mask-wearing ratio, the density of population).

4 Proposed Solution 4.1 Simulator Parameters This solution runs simulations to determine the possible scenarios (contaminations occurred) based on several characteristics’ analysis token as inputs, namely: – Number of citizens: number of persons in the environment to be simulated. – The ratio of contaminated citizens: the approximate expectation of the percentage of individuals suspected to be infected in the given environment (at least 1 person, which means 0.01 if the total number of citizens is 100). – The ratio of contamination: it’s used to describe the probability of a healthy person to become ill when he is encountering an ill member (estimated from contact tracing). • It’s important to obtain a realistic estimate ratio to create a useful and realistic simulation model for decision support.

1224

M. Almechkor et al.

• This ratio (also called transmission risk) differs according to several other factors such as social behaviors, population density in the given place, and the contact duration [11]. – The ratio of wearing mask population: the percentage of people wearing masks from the total number of citizens. 4.2 Simulator Process When running a given simulation, the population is represented by circles of different colors in a screen of (1300px–1300px), reds are infected individuals, greens are healthy, and blues are the ones wearing a mask (Fig. 2).

Fig. 2. Screenshot of the view projected by the simulator during execution.

The simulation is designed to imitate random movements and positions of the given number of citizens, the initial number of contaminated individuals deduced from the contaminated ratio (we should have at least one person infected for running a simulation). Assuming that the disease is extending probabilistically, the simulation takes into account the ratio of contamination, hence whenever a healthy person comes into contact with an infected one, the model sets the status of this person based on the probability of transmission using a random function that generates this status. Besides as the World Health Organization WHO has mentioned in [12], the COVID19 virus spreads primarily through droplets of saliva or discharge from the nose when an infected person coughs or sneezes, so the model approach considered that if an infected person (the red one) coughs, sneezes, or talks while wearing a medical mask; in this case, he is not able to infect others, because the mask can help protect those around him from infection [13]. Furthermore, whenever a mask-wearing person (the blue one) comes into contact with anybody, he can be infected with infectious ones [13].

Development of a Simulator to Model the Spread of Coronavirus Infection

1225

The user can pause in due time and download the result of the simulation in a CSV file which represents the contaminations resulting during this time and also ‘who acquires infection from whom’. The tool also allows a multi-Simulations option, for displaying a sample of many simulations by exchanging the same parameters belonging to specific intervals for each parameter, and downloading the results in a CSV4 file that contains all the contaminations for each combination of inputs (Fig. 3).

Fig. 3. Running multiple simulations with different inputs

You can check and test the simulator by cloning the link GitHub in [14].

5 Experimentations and Results 5.1 Preliminaries • Note1: The sample below is not specified just for the case of COVID-19, it describes in general the spreading process of an epidemic disease. • Note2: In the analysis of the results below, we took the notion of the frame (see definition thereafter) instead of the notion of time so that the results would be standardized and independent of the used machine qualifications. i.

The frame is a single image displayed by the simulator and represents a part of a larger sequence of images that make up the simulation view. ii. The propagation speed calculated in the sample below is the number of contaminations that happened in a given number of frames iii. The propagation speed (or spread-rate) in a given cell n (in the graphs below) represents the number of contaminations occurred counted from the appearance of generation n until the appearance of generation (n + 1) divided by (Max frames-Min frames) of the generation n. Propagation − speed = Number of contaminations/Frame range. 4 CSV: comma-separated values.

(1)

1226

M. Almechkor et al.

ii. The min-frame of generation is the first frame corresponding to the first contamination that occurred in this generation. iii. The max-frame of generation is the latest frame corresponding to the last contamination that occurred in this generation. Let’s see first how the disease progresses from each viewpoint. Considering that each infectious individual infects 2 other persons on average, as the COVID-19 grows exponentially [15], the spread of the disease will be explosive as may you have seen in the diagram below, the generation perspective represents the transmission rate and its effect on the evolution of the disease through the population, but in the diagram in the right which takes into consideration the time factor, we can see the disease behavior (in time); a generation n can be active (infects elements which will be classified in generation n + 1) even it’s still evolving (the generation n - 1 still infects element which will be classified in generation n). Different generations may be synchronously active (exp: members from G1 and G2 infecting other elements simultaneously) (Fig. 4 and Table 1).

Fig. 4. Example of generation 1 and 2 are synchronously actives

5.2 Sample In this work we used a simplified model of 100 population, we proceed a sample of simulations of multiples combinations of 3 parameters: • The ratio of contaminated citizens [from 0.1 to 0.95], with a step of 0.1(i.e. [0.1; 0.2; 0.3; ….0.9]) • The ratio of contamination [from 0.1 to 1], with a step of 0.1 • The ratio of wearing-mask [from 0.1 to 1], with a step of 0.1 Each simulation proceeds in 30 000 frames. The objective is to analyze contamination-ratio impact over the disease process, from the perspective of the generation. – In each part we are going to analyze the effect of the contamination ratio parameter, so we took all the nested data (with different values of all the parameters) and group it by the contamination ratio factor;

Development of a Simulator to Model the Spread of Coronavirus Infection

1227

Table 1. Spreading of the through generations and time perspective

The spread of contaminations through- generations perspective

G

The spread of contaminations through time- (or frames) perspective

t

Fig. 2.

G0

t0

t1 G1 t2

t3 t4 G2

5.3 Results and Analyzes: The Impact of Contamination Ratio The results of the sample show that the number of contaminations in each generation does not depend on the contamination-ratio (Fig. 5). • As you may have noticed: whatever the contamination-ratio value is (probability of the transmission of a given disease); the number of contaminated individuals in a given generation Gi remains almost the same (vertical values in a column are almost the same), and when we look at the values horizontally will notice that each time we iterate over generations the numbers of contaminations is divided by 2. Let’s see now how the propagation speed proceeds (Fig. 6): • The greater the contamination-ratio, the greater the propagation speed is.

1228

M. Almechkor et al.

Fig. 5. Number of contaminations in each generation according to contamination-ratio

Fig. 6. Propagation speed after the appearance of each generation according to contaminationratio

Development of a Simulator to Model the Spread of Coronavirus Infection

1229

• So, the contamination ratio affects the propagation speed of the disease and not the number of contaminations within each generation. Let’s consider 2 events: – A: an infectious person X to have contact with someone (let’s say Y) – B: the person Y is susceptible (healthy people that are likely to catch the virus from infectious ones) And assuming that: – s: numbers of susceptible persons (number of all healthy people, likely to catch the virus from infectious ones) – p: number of all populations (in a given test; in our sample is 100). – Rc: contamination ratio. – P(A) probability of event A; it’s constant during the simulation, depends on speed movement and density of the population… – P(B) = s/p; the probability of event B; The probability of an infectious element to infect someone is: P = Rc * P(A) * P(B) = Rc * P(A) * s/p

(2)

• The diagram of the propagation speed shows that the speed of propagation accelerates with the contamination ratio which implies that s (the number of susceptible people) decreases quickly, so, the probability that an infectious person has to infect someone is decreasing quickly, as the number of members which are possible to be infected is decreasing. (We can imagine the behavior in the case of high propagation speed is like an infectious person A running for contaminating a susceptible one (if exist); but on his way, this susceptible will get contaminated from another infectious B; as if there is a competition between infectious members to infect others).

6 Conclusion The built simulator has succeeded in giving us an insight of an otherwise a novel case with a high level of uncertainties, the simulation proved to be a useful tool to be used for understanding new phenomena and model the problem, the insights generated by analyzing the data given by the simulation help us in predicting the behavior of the disease so that we can make the appropriate decisions in a given situation. In the next paper, we intend to improve the simulator by introducing the space as a variable for more flexibility, so that will allow us to model different spaces and large scale of population, also we aim to involve artificial intelligence features for developing a more intelligent model to evaluate accurately the effectiveness of specific behaviors and measures.

1230

M. Almechkor et al.

Acknowledgments. This project is subsidized by the MENFPESRS and the CNRST as part of the program to support scientific and technological research related to “COVID-19” (2020). Also, we acknowledge financial support for this research from CNRST.

References 1. World Health Organization: covid-19 strategy updates 14 April 2020. https://www.who. int/docs/default-source/coronaviruse/covid-strategy-update-14april2020.pdf?sfvrsn=29da3b a0_19 2. The guardian. https://www.theguardian.com/world/ng-interactive/2020/aug/25/covid-vac cine-tracker-when-will-we-have-a-coronavirus-vaccine 3. Stevens, H.: The Washington post. https://www.washingtonpost.com/graphics/2020/world/ corona-simulator/ 4. Petrônio, C.L.S.: COVID-ABS: an agent-based model of COVID-19 epidemic to simulate health and economic effects of social distancing interventions. https://www.sciencedirect. com/science/article/pii/S0960077920304859?via%3Dihub 5. https://arxiv.org/pdf/2004.13553.pdf 6. Masksim Simulator demonstration. http://dek.ai/masksim/ 7. https://angular.io/guide/architecture#:~:text=Angular%20is%20a%20platform%20a nd,you%20import%20into%20your%20apps 8. https://rtouti.github.io/graphics/perlin-noise-algorithm 9. Chien, C.H., Kanade, T.: Distributed Quadtree Processing. https://link.springer.com/chapter/ 10.1007%2F3-540-52208-5_29 10. Christine, S.M.: Currie: how simulation modelling can help reduce the impact of COVID19. https://www.tandfonline.com/doi/full/10.1080/17477778.2020.1751570?scroll=top&nee dAccess=true 11. Kenneth McIntosh, M.D.: Coronavirus disease 2019 (COVID-19): epidemiology, virology, and prevention. https://www.uptodate.com/contents/coronavirus-disease-2019-covid-19-epi demiology-virology-and-prevention#H3174740477 12. World Health Organization. https://www.who.int/health-topics/coronavirus#tab=tab_1 13. World Health Organization. https://www.who.int/publications/i/item/advice-on-the-use-ofmasks-in-the-community-during-home-care-and-in-healthcare-settings-in-the-context-ofthe-novel-coronavirus-(2019-ncov)-outbreak 14. https://github.com/leaderiop/simulator.git 15. By David Robson 13th August 2020, BBC. https://www.bbc.com/future/article/20200812exponential-growth-bias-the-numerical-error-behind-covid-19

Patient Classification Using the Hybrid AHP-CNN Approach Layla Aziz(B)

and Samira Achki

Computer Science Departement, FP-Sidi Bennour- Choua¨ıb Doukkali University, El Jadida, Morocco [email protected]

Abstract. Covid19 is a horrible disease, which upset our life everywhere. The main complexity of this disease lies in its rapid evolution and through people’s contact and gaps in our understanding. Moreover, it represents critical cases when the immune system has not presented any symptoms. Hence, the design of an effective classifier is necessary. This paper aims to hybrid the multi-criteria Analytic Hierarchy Pprocess (AHP) tool and the process of the convolutional neural network (CNN), for making the classification of a category of patients. Our novel method is divided into two main phases: the first one focuses on the generation of the priorities of the essential criteria using the AHP model, while the second phase aims to classify the patients using the neural network classifier. In the present study, we considered three important criteria: fever, patient, localiztion, and the age of the patient. From the obtained results, the proposed model has proved its efficiency even if we consider different cases.

Keywords: Covid19

1

· Healthcaren · AHP · Machine learning · CNN

Introduction

By the end of 2019, humanity was on a date with a horrible health crisis that started at a wildlife market in Wuhan and then spread in the whole world quickly. Novel coronavirus (COVID-19) pandemic [1], instigated by a novel coronavirus [2], has upset the world meaningfully, not only the healthcare system but as well economics, transportation, education, etc. Diseased COVID-19 people normally experience respiratory infection and can recuperate with real and appropriate treatment methods [3]. What makes COVID-19 much more unsafe and easily spread than other Coronavirus families is that the COVID-19 coronavirus has become extremely efficient in human-to-human transmissions [4]. Until now as the writing of this paper, the COVID-19 virus has spread quickly in 215 countries, causing diseased more 3 million people, and 207 thousand dead cases. Since, several governments, academia, and industries. Also, the global effort to develop an efficacious vaccine and medical handling for the COVID-19 coronavirus, computer science, and technology researchers makes initial efforts for c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021  M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 1231–1243, 2021. https://doi.org/10.1007/978-3-030-66840-2_94

1232

L. Aziz and S. Achki

the fight versus COVID-19. Meanwhile, technology leaders, including Alibaba [5], Huawei [6], and others, have accelerated their company’s health initiatives. These young tech companies are working closely with caregivers, academics, and institutions around the world to make the best use of technology, while the virus continues to spread to many other countries. Interested by the enormous success of Artificial intelligence AI and big data in several areas, we present many works done of new technology, solutions, and approaches based on AI, and big data are used to fighting and attacking for the COVID-19 coronavirus illness [7]. We find AI for identifying, monitoring, and predicting houses, by this proposal they will be able to follow the virus, the better we will be able to fight it. By making the analysis of different sources such as newsletters, and social media, AI can learn to detect an epidemic. Also, AI company Infer vision has launched an AI-based solution that helps front-line caregivers effectively detect and monitor the disease. The medical imaging services of health facilities are subject to a heavy workload increased by the virus. Robots sterilize [8], a book of food and supplies, among others, and are therefore deployed to perform many tasks: cleaning, sterilization, delivery of food, and medicine to reduce contact between humans. This paper is organized as follows: Sect. 2 presents a recent review of the studies directed in the context of the classification of data using the neural network process. Section 3 describes the proposed methodology and includes a brief details about the combined methods. Section 4 presents the evaluation of the proposed work. Section 5 concludes the paper.

2

Related Works

Several studies have been proposed for the classification of data using the neural network. In this section, we present some recent works based on the CNN process. A global review of the different works proposed for COVID’19 [9], where the different methodologies are explained. Deep learning has been used as a powerful tool for achieving healthcare goals. In [10], authors have proposed the use of deep learning to detect the patients infected by the Influenza from those infected with the COVID’19. Similarly, in the work proposed in [11], authors have exploited the process of a deep CNN to identify the patients infected with the coronavirus. This study focuses on analyzing the chest radiography. Similarly, authors in [12] proposed the classification of the patients infected to COVID-19 by the use of the DenseNet201 deep transfer learning (DTL). In this study, a pre-trained is exploited to categorize the patients in those infected and not infected. CNN has been firstly proposed for making the classification of images.The different symptoms of COVID-19 are discussed in several studies such as the review presented by authors in [13]. Several works have been studied the feature extraction of images using a lot of techniques like the color information [14] or the metaheuristic tools. However, the work proposed in [15] has argued that CNN can be applied to classify the generic data. This is realized by the conversion of each instance into the most suitable format. To achieve this goal, authors have proposed the conversion of a generic dataset to a matrix that represents an image

Patient Classification Using the Hybrid AHP-CNN Approach

1233

format. The work is compared to 22 benchmarks and it has represented a significant accuracy In this global pandemic situation, the integration of technology, artificial intelligence, and data science has become a promising solution to assist societies to overcome this horrible situation. In the previously reviewed works, it is clear that the process of CNN has assisted the healthcare systems. To the best of our knowledge, our study based on the hybrid model proposed for the classification of patients has not previously been presented. The combination of the AHP model for generating the weights of criteria, with the CNN model represents a strong tool to classify the patients infected with COVID’19.

3

Proposed Work

3.1

Deep Learning

Deep learning represents machine learning that has the capacity to learn from the available data and achieve several tasks. In machine learning, we distinguish three major categories as shown in Fig. 1: [16] Supervised Learning Unsupervised learning Reinforcement learning The first category focuses basically on the classification and regression tasks. Its learning is based on labeled data, while the unsupervised kind didn’t necessitate any supervision. This model operates itself for discovering the information. Table 1 presents a comparative study between these two techniques [17]. 3.2

Artificial Neural Network

Artificial Neural Networks (ANNs) are mainly based on a set of inputs and a unique output. This output is expressed by a mathematical function, named as the activation function as shown in Fig. 2. The output is a linear or nonlinear function, which is responsible to search the activation of neurons using a threshold [18]. D  wi si + w0 ) (1) Y = f( i=1

Neural networks [19] income inspiration from the learning procedure occurring in human brains. It is a multi-layer system, where each layer focuses on unique or a set of neurons. Figure 3 depicts this architecture, wheres the neurons are represented by circles. This architectural system is basically composed of three layers: the first one represents the inputs of the system, the second one plays a role of intermediate between the first and the last layer. The third layer depends totally or partially on the previous layers and it is responsible for making the classification task. The CNN represents a mathematical tool that is mainly

1234

L. Aziz and S. Achki

Fig. 1. The machine learning categories

composed of three kinds of layers: convolution, pooling, and fully connected layers. The first and the second layers are responsible for making the extraction, while the third layer represents a connected layer. The third layer consists of mapping the output of the previous layer as a classification. It is the key layer for CNN that is purely based on mathematical operations such as the convolution operation. They contain an artificial network of functions, named parameters, which permits the computer to learn and to fine-tune itself, by analyzing new data. Each parameter, known as neuron represents a function that receives one or different inputs and products an output. The produced outputs are used as inputs of the next layer of neurons. The process continues similarly to produce a final output, which represents the model result. The task of classifying data is to choose class association y of an unidentified data item x based on a data set D = (x1, y1), . . . , (xn, yn) of data items xi with known class associations yi . Figure 4 shows the neural network process. The CNN is typically composed of three layers: convolution layer, pooling layer, and the fully connected layer. – Convolution layer: It represents the first layer of the CNN’architecture that is responsible for making extraction of features from an image. It can be mathematically defined as a function with two main parameters: the image matrix and a filter. – Pooling layer: It plays the role of an intermediate layer, which is based on reducing the number of the image’ parameters when they are too large. Spatial pooling consists of reducing the maps’dimensionality taking into account the importance of information. We distinguish betwen different kinds of spatial pooling:  – Max pooling  – Average pooling  – Sum pooling

Patient Classification Using the Hybrid AHP-CNN Approach

1235

Fig. 2. The ANN architecture

– Fully connected layer: Called also the FC layer, its first task is converting the input matrix in a vector and then creating the model using the combination of features. The major task is the classification of the outputs by the use of an activation function as the sigmod or softmax functions. The accuracy measure is exploited to validate the classification using CNN. It is calculated by dividing the number of the correct classifications by the total number of items as expressed the following equation: Ac =

Tpos + Tneg Tpos + Tneg + FP + FN

(2)

where: Tpos : the patients that are classified into the positive category [20]. Tneg : the patients that are classified into the negative category. FP : the patients that are incorrectly classified into the positive category. FN : the patients that are incorrectly classified into the negative category. It is required to calculate the F-measure (Fm ) in some cases, where the evaluated dataset has imbalance classes. This measure is calculated as follows: [21] Fm =

2 ∗ Pr ∗ R c Pr + R c

(3)

1236

L. Aziz and S. Achki

Fig. 3. The ANN layers

where: Pr represents the precision of the classifier. A low value means that the classifier has a significant number of FP. The precision measure is mathematically formulated as: Tpos (4) Pr = Tpos + FP Another measure is can be calculated for proving the completeness of the classifier. It is known as Recall (Rc ). Its mathematical expression is as follows: Rc =

Tpos Tpos + FN

(5)

On the other hand, it is possible to evaluate the patients, which are categorized as those that are not infected by the COVID19. Its mathematical formulation is as follows: Tneg (6) Pr = Tneg + FP

Patient Classification Using the Hybrid AHP-CNN Approach

1237

Table 1. Comparison of the supervised learning via the unsupervised learning.

3.3

Parameters

Supervised learning

Unsupervised learning

Process

Necessite a number of Necessite a given number of inputs and outputs inputs

Input data

Uses a labeled data

Uses unlabeled data

Algorithms

Support vector machine (SVM) Neural networks Classification trees

Cluster algorithms K-means Hierarchical clustering

Complexity

Simple

Computationally complex

Use of data

Based on training data technique

Requires only the inputs

Accuracy of results High accuracy

Low accuracy

Real-time learning

Offline process

Real time process

Number of classes

Known

Unknown

Drawbacks

Big data represents a critical challenge

Data sorting does not offer precise information and outputs are unkown

The Multi-criteria AHP Method

The AHP tool is one of the multi-criteria methods [22,23] that are exploited for ranking the alternatives of decision problems. Moreover, it is known as a strong tool for calculating the weights of criteria. In this goal, we have proposed the use of this tool for the calculation of the importance value of the criteria weights. AHP is mainly composed of these steps: Step1: Making the decomposition of the decision problem into essential criteria. Step2: Attribute the different values of the importance of each criterion using the Saaty scale as shown in Table 2. Step3: Determine the relative importance of factors by the calculation of the eigenvectors corresponding to the maximal eigenvalues. Step4: Verify the study’ consistency by comparing the predefined judgments to two factors: the Consistency Index (CI) and Consistency Ratio (CR) [24]: μmax − n (7) n−1 Where μmax is the adequate eigenvalue to the pair-wise comparison matrix N is the set of elements that are considered for the study. CR is calculated by the following equation: CI (8) CR = RCI CI =

1238

L. Aziz and S. Achki

Fig. 4. The neural network process Table 2. Criteria importance meaning. Relative importance Meaning 1

Equal

3

Weak

5

Strong

7

Demonstrated over the others

9

Absolute

RCI [25] is the random values of CI according to the number of criteria(n) as shown in Table 3. Table 3. RCI values. Criteria number RCI values 1

0

2

0

3

0.58

4

0.90

5

1.12

6

1.24

7

1.32

8

1.41

9

1.45

10

1.49

The generated weights have been exploited to show the importance of each criterion and inputs for CNN. Figure 5 depicts the proposed model. The novel model is based on two major phases: the first one consists of generating the weights that will be exploited by CNN to classify the patients.

Patient Classification Using the Hybrid AHP-CNN Approach

1239

Fig. 5. The proposed hybrid-model

4

Our Illustrative Example and Discussion

COVID19 has been characterized by various criteria such as Fever, patient localization, and the age of the studied patient. To give a real classification, we have exploited the process of the multi-criteria AHP tool to determine the weight of each criterion. Our decision problem consists to classify a set of patients into a positive class and negative class. Table 4 illustrates the decision matrix that corresponds to our decision problem, where the importance of the different criteria are determined. To validate these values, we have calculated the consistency that gives consistent matrices because each consistency rate is less than 0.1. Table 4. Matrix of criteria importance. Criteria C1

C2

C3

C1

1

5

7

C2

0.20 1

3

C3

0.14 0.33 1

The normalized matrix is given in Table 5: Table 5. Normlized matrix. Criteria C1

C2

C3

C1

0.74

0.78

0.63

C2

0.149 0.157 0.27

C3

0.10

0.05

0.09

1240

L. Aziz and S. Achki

The next step concerns the generation of the different weights of the criteria considered for this study. After calculating such weights, we found that C1 = 0.72 and C2 = 0.19 and C3 = 0.08. To prove the validity of this step, we have calculated the CR value. The weighted matrix is given in Table 6: Table 6. Weighted matrix. Criteria C1

C2

C3

C1

0.72 0.96 0.57

C2

0.14 0.19 0.24

C3

0.10 0.06 0.08

LamdaMAx = 3.05 CI = 0.027 CR = 0.047 After calculating the different weights by the use of the AHP tool, we use these generated weights to calculate the inputs of the neural network to convert data to image format. To do that, we have calculated the correlation Matrix M, the correlation vector L, and the reordering matrix O as follows: N

(xc,j − fj )(xc,k − fk ) Mj,k =  N c=1  N ( c=1 (xc,j − fj )2 ) ( c=1 (xc,j − fj )2 )

(9)

where fj is the means of the feature j and fk represents the means of the feature k. N (xc,j − fj )(yc − y¯) (10) L1,j =  N c=1  N ( c=1 (xc,j − fj )2 ) ( c=1 (yc − y¯)2 ) where y¯ represents the mean of label. The next step concerns making the classification of the group of patients using the CNN process. Figure 6 shows the results of the classification. We remark that the proposed hybrid AHP-CNN model permits us to affect a diagnostic of patients by the classification of the patients in two classes: the positive class and the negative class. The main objective of this work is to classify the set of patients into two classes: COVID+19 or COVID-19. The patient is classified into the positive class according to the values of each criterion. This result allows us to make initial diagnosis, the advantages of this model: the rapidity of diagnostic, and minimizing the risk by avoiding contact between the patients and the hospital staff, make the maximum diagnostic of the patient in a short period from a far distance.

Patient Classification Using the Hybrid AHP-CNN Approach

1241

To study the sensitivity analysis of the proposed model, we have calculated the accuracy for three cases. Table 7 summarizes the different cases. We present the different results of accuracy for the three cases. Table 7. The weights of cases. Cases C1

C2

C3

Case1 0.72 0.19 0.08 Case2 0.19 0.72 0.08 Case3 0.08 0.19 0.72

Table 8 shows the values obtained for accuracy for each case: Table 8. The accuracy of cases. Cases Accuracy Case1 88% Case2 87.8% Case3 87.75%

Patient classification

36

Covid19(+) Covid19(-)

35 34 33 32 31 30 29 28 0

5

10

15

Fig. 6. Patient classification

20

25

1242

5

L. Aziz and S. Achki

Conclusion

COVID19 has upset humans’ life because of its rapid evolution. This imposes our real interest to overcome this horrible situation. CNN has shown its efficiency on several problems of classification such as the healthcare system. Motivated by the results obtained by CNN in the context of the classification, we have interested to use it to classify the patients infected in COVID’19. Several studies have been directed to solve this critical problem. However, the situation requires more effort from the research community. In this work, we have proposed the use of a hybrid AHP-CNN tool for making the patient classification. This novel strategy is composed of two phases. The first phase consists of finding the adequate weights of the essential criteria. Then, the second phase focuses on making the classification of a group of patients using the process of CNN. This work results proved the efficiency of the combined model.

References 1. Indranil, C., Prasenjit, M.: COVID-19 outbreak: migration, effects on society, global environment and prevention. Sci. Total Environ. 728, 138–882 (2020) 2. Naji, H.: The emerging of the novel coronavirus 2019-nCoV. Eur. J. Med. Health Sci. 2 (2020) 3. Jason, P., Li, W., Lowell, L., Moritoki, E., Chae-Man, L., Jigeeshu, V., Babu, R.,Yaseen, A., Jensen, N.,Charles, G., Masaji, N., Younsuck, K., Bin, D.: Intensive care management of coronavirus disease 2019 (COVID-19): challenges and recommendations. Lancet Respir. Med. (2020) 4. Ralph, R., Lew, J., Zeng, T., Francis, M., Xue, B., Roux, M., Ostadgavahi, A., Rubino, S., Dawe, N., Al-Ahdal, M., Kelvin, D., Richardson, C., Kindrachuk, J., Falzarano, D., Kelvin, A.: 2019-nCoV (Wuhan virus), a novel Coronavirus: humanto-human transmission, travel-related cases, and vaccine readiness. J. Infect. Dev. Countr. 14, 3–17 (2020) 5. Kersting, K.: Machine learning and artificial intelligence: two fellow travelers on the quest for intelligent behavior in machines. Front. Big Data 1, 6 (2018) 6. Nguyen, C., Ding, M., Pathirana, P., Seneviratne, A.: Blockchain and AI-based solutions to combat Coronavirus (COVID-19)-like epidemics: a survey (2020). https://doi.org/10.36227/techrxiv.12121962.v1 7. Peng, M., Jie, Y., Shi, Q., Ying, L., Zhu, H., Zhu, G., Ding, X., He, Z., Qin, J., Wang, J., Yan, H., Bi, X., Shen, B., Wang, D., Luo, L., Zhao, H., Zhang, C., Lin, Z., Hong, L., Li, J.: Artificial intelligence application in COVID-19 diagnosis and prediction. SSRN Electron. J. (2020). https://doi.org/10.2139/ssrn.3541119 8. Tavakoli, M., Carriere, J., Torabi, A.: Robotics for COVID-19: how can robots help health care in the fight against Coronavirus (2020). https://doi.org/10.13140/RG. 2.2.21723.52004 9. Ng, M.Y., Lee, E.Y., Yang, J., Yang, F., Li, X., Wang, H., Lui, M.M.S., Lo, C.S.Y., Leung, B., Khong, P.L., Hui, C.K.M., Yuen, K., Kuo, M.D.: Imaging profile of the COVID-19 infection: radiologic findings and literature review. Radiol.: Cardiothoracic Imaging, 2, e200034 (2020) 10. Xu, X., Jiang, X., Ma, C., Du, P., Li, X., Lv, S., Yu, L., Ni, Q., Chen, Y., Su, J., Lang, G., Li, Y., Zhao, H., Xu, K., Ruan, L., Wu, W.: Deep learning system to screen coronavirus disease 2019 pneumonia (2020)

Patient Classification Using the Hybrid AHP-CNN Approach

1243

11. Wang, L., Wong, A.: COVID-net: a tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images (2020) 12. Jaiswal, A., Gianchandani, N., Singh, D., Kumar, V., Kaur, M.: Classification of the COVID-19 infected patients using DenseNet201 based deep transfer learning. J. Biomol. Struct. Dyn. (2020). https://doi.org/10.1080/07391102.2020.1788642 13. Singhal, T.: A review of coronavirus disease-2019 (COVID-19). Indian J. Pediatr. 87, 281–286 (2019) 14. Singh, D., Kumar, V., Vaishali, K., Manjit, K.: Classification of COVID-19 patients from chest CT images using multi-objective differential evolution–based convolutional neural networks. Eur. J. Clin. Microbiol. Infect. Dis. 1379–1389 (2020) 15. Han, H., Li, Y., Zhu, X.: Convolutional neural network learning for generic data classification. Inf. Sci. 477, 448–465 (2019) 16. Gupta, A., Singh, D., Kaur, M.: An efficient image encryption using non-dominated sorting genetic algorithm-III based 4-D chaotic maps. J. Ambient Intell. Hum. Comput. 11, 1309–1324 (2020) 17. Manjit, K., Hermant, K., Gianey, D., Munish, S.: Multi-objective differential evolution based random forest for e-health applications. Mod. Phys. Lett. B 335, 1950022 (2019) 18. Simon, H.: Introduction. In: Neural Networks and Learning Machines (2009) 19. Baughman, D.R., Liu, Y.A.: 3 - Classification: Fault Diagnosis and Feature Categorization. Neural Networks in Bioprocessing and Chemical Engineering. Academic Press, Boston (1995) 20. Canran, L., Matt, W., Graeme, N.: Measuring and comparing theaccuracy of species distribution models with presence-absence data. Ecography 34, 232–243 (2011) 21. Sokolova, M., Japkowicz, N., Szpakowicz, S.: Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. In: AI 2006: Advances in Artificial Intelligence (2006) 22. Aziz, L., Raghay, S., Aznaoui, H.: An improved multipath routing protocol using an efficient multicriteria sorting method: special issue on data and security engineering. Int. J. Internet Technol. Secur. Trans. 10(6) (2020) 23. Achki, S., Gharnati, F., Ouahman, A.: Enhancing energy consumption in wireless communication systems using weighted sum approach. Indian J. Sci. Technol. 10 (2017) 24. Aziz, L., Aznaoui, H.: Efficient routing approach using a collaborative strategy. J. Sens. 2020, 1–17 (2020) 25. Aziz, L., Raghay, S., Aznaoui, H.: An improved multipath routing protocol using an efficient multicriteria sorting method. In: Ben Ahmed, M., Boudhi, A.-A., Younes, A. (eds.) SCA: The Proceedings of the Third International Conference on Smart City Applications (2018)

Towards Automatic Diagnosis of the COVID-19 Based on Machine Learning El Arbi Abdellaoui Alaoui1,2(B) , Stephane Cedric Koumetio Tekouabou3,5 , Ismail Ougamane1 , and Imane Chabbar4 1

3

5

EIGSI-Casablanca, 282 Route of the Oasis, Casablanca, Morocco [email protected], [email protected] 2 Department Computer Science, Faculty of Sciences and Technologies, My Ismail University, Errachidia, Morocco Laboratory LAROSERI, Department of Computer Science, Faculty of Sciences, Chouaib Doukkali University, B.P. 20, 24000 El Jadida, Morocco [email protected] 4 Specialty hospital, CHU IBN SINA, Mohammed V University, Quartier-Souissi, B.P. 6220, Rabat, Morocco [email protected] Center of Urban Systems (CUS), Mohamed VI Politechnic University (UM6P), Hay Moulay Rachid, 43150 Ben Guerir, Morocco

Abstract. Since the beginning of the global health crisis attributed to the new coronavirus (COVID-19), several announcements of new diagnostic tests have been made. It has suddenly become complicated to provide a comprehensive overview detailing the specificity of each of them. In the absence of therapeutic drugs or vaccines specific for COVID-19, these tests are essential to detect patients at an early stage and to immediately isolate infected patients from the healthy population. Among the most commonly used tests are chest CT and laboratory tests (such as PCR) in the diagnosis of coronavirus 2019 (COVID-19). Analysis of imaging and laboratory test data from more than 1,000 patients shows that chest CT surpasses biological tests in the diagnosis of the epidemic associated with the new coronavirus, COVID-19. Thoracic computed tomography (CT) scans provide the best diagnosis for COVID-19 pneumonia, conclude these researchers from Huazhong University of Science (Wuhan, China) and Leiden University Medical Center (Netherlands). The researchers concluded that CT scans should be used as the primary screening tool for COVID-19. However, these techniques have the disadvantage of being slow and expensive, causing many patients to avoid screening if it is not free. In this paper, we propose an automatic decision model based on artificial intelligence to assist the physician during the screening for this pandemic in order to lighten the workload in hospitals. For this purpose, the main objective of our study is to build and test predictive diagnostic models based on machine learning that can analyze patient data in depth in order to separate coronavirus types. Thus, we can save time during the diagnostic process of this Covid-19 pandemic. The results obtained are very promising. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021  M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 1244–1255, 2021. https://doi.org/10.1007/978-3-030-66840-2_95

Towards Automatic Diagnosis of the COVID-19 Based on Machine Learning Keywords: Coronavirus sensors

1

1245

· COVID-19 · Machine learning · CT chest

Introduction

Corona-viruses are a large family of viruses that can cause a variety of illnesses in humans, ranging from the common cold to Middle East Respiratory Syndrome (MERS) and Severe Acute Respiratory Syndrome (SARS). A new corona-virus (COVID-19) hat had not previously been identified in humans was identified in 2019 in Wuhan, China [12,17]. Corona-virus 2019 (also known as Covid-19 or SARS-CoV-2) is an infectious disease caused by the corona-virus SARS-CoV-2 belonging to the very large family of viruses [19]. These viruses are in constant mutation and evolution. It is during one of these mutations that it became capable of infecting humans. Unlike its predecessors, this virus appears to be particularly contagious. In fact, it has been found in many biological fluids and excretions (secretions from the mouth and nose, blood, stools, urine), which suggests that there is a risk of multiple transmission, especially since not all infected patients, especially the youngest, necessarily show symptoms. In 80% of cases, Covid-19 causes few problems and the patient recovers quickly, without the need for hospitalization. But in people who are already weakened - by chronic illness, immunosuppression, old age, etc., Covid-19 can become complicated and require hospitalization or even resuscitation. It is especially for them that everyone must take responsibility and respect the instructions given in case of suspicion of infection or containment measures if the infection is proven. Meanwhile, researchers around the world are working to find an effective treatment for the most fragile and, in the longer term, to find a vaccine. Indeed, it is because the Covid-19 is new and therefore we are not immune to it, that it can spread so rapidly around the world! In the absence of curative treatment or specific therapeutic vaccines for COVID-19, it is essential to detect the disease at an early stage and to be able to immediately isolate an infected patient. The latest guidelines recommend confirming the diagnosis of COVID-19 by RT-PCR (polymerase chain reaction from an RNA sample or gene sequencing) biological analysis of respiratory or blood samples prior to hospitalization. However, taking into account the possible hazards during sample collection and transport, as well as the performance of the kits, the total sensitivity of RT-PCR analysis for throat swab samples is estimated to be between 30% and 60%. In a study of more than 1000 patients, published in the journal Radiology, chest CT surpassed laboratory tests in the diagnosis of coronavirus 2019 (COVID-19). The researchers concluded that CT should be used as the primary screening tool for COVID-19 [1]. Indeed, from January 6 to February 6, 2020, 1014 patients from Wuhan, China, who underwent both chest CT and RT-PCR testing were included. With RT-PCR as the reference standard, the performance of chest radiography in the diagnosis of COVID-19 was evaluated. In addition, for patients with multiple RT-PCR tests, the dynamic conversion of RT-PCR

1246

E. A. Abdellaoui Alaoui et al.

results (negative to positive, positive to negative, respectively) was analyzed compared to serial chest scans for those with a time interval of 4 days or more [1]. In the face of the rapid spread of the coronavirus all means are good, including artificial intelligence (AI), which had already predicted the spread of the coronavirus at the end of December. Artificial intelligence (AI) can help us solve the urgent problems raised by the COVID-19 pandemic. It is not the technology itself that will make the difference, but rather the knowledge and creativity of the humans who use it. Indeed, the COVID-19 crisis is likely to highlight some of the main shortcomings of AI. Machine learning, the current form of AI, works by identifying patterns in historical learning data. When used properly, AI has the potential to outpace humans not only in its speed, but also in detecting patterns in learning data neglected by humans [8,9,14,16]. Thus, the automation of diagnosis using new technologies and machine learning will make the diagnosis process faster and more reliable to prevent the loss of life of people infected by this virus. Indeed, artificial intelligence is one of the major topics of upheaval affecting our time. Rarely has a technological evolution generated so many opportunities for problem solving, so many changes in uses, so many fears. However, this is by no means a technological breakthrough. Artificial intelligence is part of the continuity of computer science, whose computing power is constantly growing, increased by the availability of large masses of data that the Internet world knows how to aggregate [2,13]. The objective of this paper is to design a simplified computer decision model to assist the physician during the pandemic screening process in order to ease the workload in hospitals. Indeed, the main objective of our study is to build and test models based on machine learning that are capable of in-depth analysis of patient data in order to predictively separate coronavirus types. Thus, we can save time during the diagnostic process of this Covid-19 pandemic and other types of this virus. The rest of this paper is organized as following: We describe the proposed method to predict the type of coronavirus in Sect. 2 before analyzing the results obtained in the Sect. 3. Finally, we conclude our study in Sect. 4.

2

Methodology

In general, human learning is an adaptive process whereby the individual provides adequate responses to certain situations. In Psychology or Cognitive Sciences, the term “learning” refers to the process of increasing the efficiency of mental or behavioural activity as a result of experience. Machine learning [11] is a sub-field of Artificial Intelligence (AI) whose objective is to study the means by which a machine can learn. Learning, in this context, means being able to adapt its behaviour in the presence of unknown situations (not foreseen by the designers of the machine) and being able to extract laws from databases of examples. Learning is therefore done through tools that

Towards Automatic Diagnosis of the COVID-19 Based on Machine Learning

1247

allow to acquire, extend and improve the knowledge available to the system. It consists of using computers to optimize an information processing model. according to certain performance criteria based on observations. Machine learning techniques are thus used for example for pattern recognition (writing, speech, vision), data mining (knowledge extraction), implementation of decision support tools, etc. The different methods of machine learning are classified into three groups: supervised learning [20], unsupervised learning and Semi-supervised learning (see Fig. 1): [18] (Fig. 2).

Fig. 1. Machine learning groups

Feature Extraction

Classification Covid-19 Pneumocystis E.Coli ARDS Legionella SARS Normal Chlamydophila Streptococcus Legionella No Finding

Feature selection

Data mining

Ensemble method

Explainability

Cleaning

Data Collection

Feature Selection & Feature Extraction

Modeling

Application

Fig. 2. Automatic system for diagnosis of the COVID-19

1248

2.1

E. A. Abdellaoui Alaoui et al.

Random Forest Classifier

The random forest algorithm [3] is a variant of bagging where a set of random trees close to the CART method is aggregated [4]. Usable in both regression and classification, this algorithm has shown very good performances in practice, especially for complex problems (nonlinear relations, interactions, high dimension, etc.). Algorithm 1. Random Forest (RF) Input: – y the observation to predict; – dn the observation; – M the number of Trees; – b ∈ N the number of candidate variables to cut a node. Output: fˆRF (x, L) = argmink (#{m : gˆ(x, L∗m ) = k}) 1: for k = 1 to B do 2: Draw a bootstrap L∗m sample in dn 3: Construct a CART tree on this bootstrap sample L, each cutoff is selected by minimizing the cost function of CART over a set of b randomly selected variables among the p. We note gˆ(., k) the built tree. 4: end for

2.2

ExtraTrees Classifier

ExtraTrees [7] is another comprehensive method specifically designed to use decision trees. Randomness is related to the way splits are calculated. In contrast to random forests, instead of looking for the most discriminating thresholds, thresholds are randomly selected for each attribute and the best of these randomly generated thresholds is chosen as the splitting rule. Bagging Classifier. The bagging algorithm improves prediction efficiency if a disturbance in the learning base leads to significant changes in the constructed model. The general principle of this algorithm is to aggregate a collection of weak classifiers to obtain a better classifier. In general for classification, aggregation by majority vote: fˆBagg (x, L) = arg max y∈Y

2.3

T 

I(ˆ gt (x) = y)

t=1

AdaBoost Classifier

AdaBoost is a classification algorithm that aims to use basic classifiers to build a powerful set. In the end, the model generates an aggregated classification.

Towards Automatic Diagnosis of the COVID-19 Based on Machine Learning

1249

Algorithm 2. Bagging Classifier (BC) Input: – Data set L = {(X1 , Y1 ), ..., (Xn , Yn )}; – Base learning algorithm Z; – Number of base learners t =  1, ..., T : gt (x) = y) Output: fˆBagg (x, L) = arg maxy∈Y Tt=1 I(ˆ 1: for t = 1 to T do 2: gˆt = (L, L∗ ) 3: end for

For each basic model, the algorithm assigns a weight based on the individual performance of the model. The idea is that the classifiers will have to focus on observations that are difficult to classify correctly. Any machine learning algorithm can be used as a basic classifier if it accepts weights on the training set [10,15]. 1 . At each Initially all learning examples have the same weights L1 (x) = m iteration (let us suppose that m designates the number of the current iteration) we choose in gˆ the classifier gˆm which minimizes the classification error on the training data weighted by the Li . Then we calculate αt , the weight of gˆm in the final mix, update the weight Li to boost the elements that were misclassified and move on to the next iteration. The detailed algorithm is given below. Algorithm 3. AdaBoost Classifier (BC) Input: – Data set L = {(X1 , Y1 ), ..., (Xn , Yn )}; – Base learning algorithm Z; – Number of base learners t = 1, ..., T : 1 : – Initialize the weight distribution L1 (x) = m  T Output: fˆAdaB (x, L) = sing( t=1 αt gˆt (x)) 1: for t = 1 to T do 2: gˆt = (L, L∗ ) gt (x)) = f (x) 3: t = Px∼L∗ (ˆ 4: if t > 0.5 then   1 − t 1 5: αt = ln 2 t  exp(−αt ) if gˆt (x)) = f (x) Lt (x)exp(αt gˆt (x)f (x)) Lt (x) = × 6: Lt+1 (x) = Zt Zt if gˆt (x)) = f (x) exp(αt ) 7: end if 8: end for

2.4

XGBoost Classifier

XGBoost [5] is an optimized way to realize the deTree Gradient Boosting algorithm. XGBoost has performed remarkably well in machine learning competitions because it effectively handles a wide variety of data types, relationships,

1250

E. A. Abdellaoui Alaoui et al.

and distributions, as well as the large number of hyper-parameters that can be modified and set for improvement. This flexibility makes XGBoost a solid choice for regression, classification (multiclass and binary) and ranking problems. For the model evaluation the algorithm can independently determine the types of the loss functions. An additional adjustment term is added to the model, to reduce the risk of overloading. As a predictive value for regression of each tree, the algorithm use the mean score. For the mthus decision tree, its calculation formula can be expressed as: yˆ =

m 

fm (xi ),

fm ∈ W,

(1)

i=1

where fm is a function in the functional space W , m is the number of trees, and W is the space of all decision trees. The objective function at the t-th iteration can be present as: Θ(t) = Φ(t) + Ω(t) Θ(t) =

n 

Φ(yi , yˆi ) +

i=1

where n is the nth prediction yˆi

(t)

yˆith =

m  i=1

t 

Ω(fk )

(2)

k=1

can be written as (t−1)

fm (xi ) = yˆi

+ ft (xi )

(3)

The regularization term Ω(fk ) for a decision tree is defined by Chen and Guestrin [5] as follows: m 1  2 ω (4) Ω(fk ) = γT + λ 2 i=1 j where λ is a parameter to scale the penalty, γ is the complexity of each leaf, T is the number of leaves in a tree, and ω is the vector of scores on the leaves. Then, the first-order along and the second-order Taylor expansions are taken to the loss function in XGBoost.

3 3.1

Experiments and Results Data Set

The covid-chestxray-dataset data frames describe the health status of individual patients of novel coronavirus (COVID-19). The covid-chestxray-dataset data frame does not contain information from Clinical Staff, but it does contain actual ages and other informations of patients. The principal source for data about patients is the covid-chestxray-dataset in github [6] (Fig. 3).

Towards Automatic Diagnosis of the COVID-19 Based on Machine Learning

1251

Fig. 3. Operating principle of boosting algorithms Table 1. Description N

Attribute

Description

Type

1 Patientid

ID of the patient

Numerical

2 Offset

Number of days since the start of symptoms or hospitalization

Numerical

If a report indicates “after afew days”, then 5 days is assumed 3 Sex

Man or woman

4 Age

Age of the patient in years

Numerical

5 Finding

Type of pneumonia

Categorical

6 Survival

If the patient is survived or not

Categorical

8 Intubated

If the patient was intubated or not

Categorical

9 intubation present

If the patient is intubated in present or not

Categorical

10 went icu

Categorical

If the patient went to intensive care unit in present or Categorical not

11 needed supplemental O2 If the patient needed supplemental oxygen or not

Categorical

12 Extubated

If the patient was extubated or not

Categorical

13 Temperature

Body temperature of the patient

Numerical

14 pO2 saturation

Partial pressure of oxygen of the patient

Numerical

15 leukocyte count

The percentage of leukocyte in blood

Numerical

16 lymphocyte count

The percentage of lymphocyte in blood

Numerical

17 neutrophil count

The percentage of neutrophil in blood

Numerical

18 View

Posteroanterior (PA), Anteroposterior(AP), AP Supine (APS), or Lateral (L)

Categorical

For X-rays; Axial or Coronal for CTscans 19 Modality

CT, X-ray, or something else

Categorical

21 Date

Date on which the image was acquired

Date

22 Location

The hospital where the patient is hospitalized

Categorical

22 Folder

The folder of the placement of X-Ray of the patient

Categorical

23 Filename

Filename of X-Ray of the patient

Categorical

24 Doi

Digital Object Identifier of data set

Categorical

25 Url

URl of origin data set

Categorical

26 License

The type of licence of the origin data set

Categorical

27 clinical notes

Clinical notes about the patient

Text

28 other notes

Other clinical notes about the patient

Text

1252

3.2

E. A. Abdellaoui Alaoui et al.

Data Processing

In the original data set, there are 21 columns before the processing, but after the cleaning of the data set(drop the columns that doesn’t contribute to the target variable ‘finding’ Table 1) the remain columns are five: offset, sex, age, modality and finding, and here the distribution of each column (see Fig. 4), we can conclude that random variable ‘age’ follows a normal distribution, the majority of patient are male.

Fig. 4. Distribution of variables

To summarize the data processing the “finding” is the target column/variable, the columns like ‘survival’, ‘temperature’ doesn’t contribute to the target variable ‘finding’. So we can remove it from the data, for columns ‘Age’, ‘sex’ and ‘offset’ has less number of missing value, we have to impute them using different techniques. 3.3

Experimental Protocol

We define the workflow of the project that includes all steps required to build the machine learning project (see Fig. 5), we can divide the workflow of our project into four stages: – – – –

Gathering data and Data pre-processing Researching the model that will be best for the type of data Training, cross-validate and testing the model Evaluation

Towards Automatic Diagnosis of the COVID-19 Based on Machine Learning

1253

Cross-validation Training ensemble Model

Train

Normalization

Feature's selection

Data transformation

Dataset

Feature Extraction

Preprocessing steps

preprocessed data

Training set

Ensemble Model

Test

Optimization

Testing set

Bad

Optimization

Model Evaluation

Bad

predicts

Type of Covid

Good New Prospect

preprocessed data

preprocessing steps

Final Model Explainability

Fig. 5. Global ensemble-based system for Covid prediction

All the exparimentations were done on python using models from the Scikitlearn library which are very efficient for predictive classification problems. The computer used is a HP computer with 16 GB of RAM and a dual NVDIA graphics card. In order to choose an optimal approach, we compared the performance of different models using three main measures: (TP: True Positives, TN: True Negatives, FN: False Negatives, FP: False positives): Accuracy =

TP + TN TP + TN + FP + FN

Recall =

TP TP + FN

P recision = 3.4

TP TP + FP

(5)

(6) (7)

Results Analysis

To show the reliability of our model, we compare the different algorithms, the validation performances and the test performances. We calculated the Accuracy, Precision and Recall of each algorithm on these two scores. The values of the performance metrics are grouped below in the Table 2. The performance metrics associated with each metric are illustrated graphically. From Table 2, we can see that for this metric, the bagging methods BC and ET give the best scores for both validation (0.9722, and 0.9709 respectively) and testing (0.9682 for both BC and ET). They are followed by the individual algorithms DT then KNC give respectively 0.9708 then 0.9396 for the validation and 0.9682 then 0.9318 for the test. Finally, the algorithms based on boosting, in particular XGBR, which gives 0.9620 and 0.9682 for validation and 0.9682 for the test. Whose values are summarized in the Table 2, we can see that overall

1254

E. A. Abdellaoui Alaoui et al. Table 2. Performance Results based in accuracy, precision and recall

Algorithm

Accuracy

Precision

Recall

Validation score

Test score

Validation score

Test score

Validation score

Test score

Random Forest (RF)

0.9680

0.9682

0.8501

0.8979

0.8492

0.7968

Bagging Classifier (BC)

0.9722

0.9682

0.8535

0.8979

0.8482

0.7968

Extra Trees (ET)

0.9709

0.9681

0.8493

0.8979

0.8463

0.7968

XGBoost (XGB)

0.9620

0.9682

0.8376

0.8979

0.8336

0.7968

Decision Tree (DT)

0.9708

0.9682

0.8483

0.8979

0.8183

0.7968

K-Neighbors Classifier (KNC)

0.9396

0.9318

0.7791

0.7020

0.7430

0.7215

the bagging methods always give the best scores and therefore the lowest errors both in validation and in test. The values obtained for BC, ET and RF are respectively 0.8535, 0.8493 and 0.8501 in validation and 0.8979 for each in test for precision. For the recall, the values obtained for BC, ET and RF are respectively 0.8482, 0.8463 and 0.8492 in validation and 0.7968 for each in test. These better performances are followed by those of the boosting methods, in particular XGB with respectively 0.8336 in validation and 0.7968 in test. Finally, the individual algorithms (DT and KNC) which give the lowest score corresponding to the highest errors, for the precision and recall metrics, the values obtained for these two metrics are respectively 0.8483, 0.7791, 0.8183, 0.7430 in validation then 0.8979, 0.7020, 0.7968, 0.7215 in test.

4

Conclusion

The current Covid-19 pandemic requires a rapid and effective diagnostic strategy for patient management. Recall that the probability of being detected positive is related to viral load and depends on the duration of symptoms and the severity of the disease. Thoracic CT scans are quick and relatively easy to perform. Recent research has revealed that the sensitivity of Chest CT for COVID-19 infection was 98% compared to the sensitivity to RT-PCR of 71%. The researchers concluded that Chest CT should be used as the primary screening tool for COVID19. Artificial intelligence can help fight the coronavirus, if applied creatively. Indeed, the main objective of our study is to build and test models based on machine learning that are able to analyze patient data in order to predictively separate coronavirus types. Thus, we can save time during the diagnostic process of this Covid-19 pandemic and other types of this virus. The results of this paper showed that our learning machine based model can accurately detect COVID-19 and other classes of coronaviruses with an accuracy of 97%.

Towards Automatic Diagnosis of the COVID-19 Based on Machine Learning

1255

References 1. Ai, T., Yang, Z., Hou, H., Zhan, C., Chen, C., Lv, W., Tao, Q., Sun, Z., Xia, L.: Correlation of chest CT and RT-PCR testing in coronavirus disease 2019 (COVID19) in China: a report of 1014 cases. Radiology 296(2), 200642 (2020) 2. Anastassopoulou, C., Russo, L., Tsakris, A., Siettos, C.: Data-based analysis, modelling and forecasting of the COVID-19 outbreak. PLoS One 15(3), e0230405 (2020) 3. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001) 4. Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. CRC Press, Boca Raton (1984) 5. Chen, T., Guestrin, C.: XGboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016) 6. Cohen, J.P., Morrison, P., Dao, L.: COVID-19 image data collection. arXiv preprint arXiv:2003.11597 (2020) 7. Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63(1), 3–42 (2006) 8. Karada˘ g, K., Tenekeci, M.E., Ta¸saltın, R., Bilgili, A.: Detection of pepper fusarium disease using machine learning algorithms based on spectral reflectance. Sustain. Comput.: Inf. Syst. 28, 100299 (2019) 9. Li, L., Qin, L., Xu, Z., Yin, Y., Wang, X., Kong, B., Bai, J., Lu, Y., Fang, Z., Song, Q., et al.: Artificial intelligence distinguishes COVID-19 from community acquired pneumonia on chest CT. Radiology 296(2), 200905 (2020) 10. Li, X., Wang, L., Sung, E.: Adaboost with SVM-based component classifiers. Eng. Appl. Artif. Intell. 21(5), 785–795 (2008) 11. Mitchell, T.M.: Does machine learning really work? AI Mag. 18(3), 11–11 (1997) 12. Muhammad, L., Islam, M.M., Sharif, U.S., Ayon, S.I.: Predictive data miningmodels for novel coronavirus (COVID-19) infected patients recovery. SN Comput. Sci. 1(4), 206 (2020) 13. Narin, A., Kaya, C., Pamuk, Z.: Automatic detection of coronavirus disease (COVID-19) using x-ray images and deep convolutional neural networks. arXiv preprint arXiv:2003.10849 (2020) 14. Nguyen, T.T.: Artificial intelligence in the battle against coronavirus (COVID-19): a survey and future research directions (2020) 15. Schapire, R.E., Freund, Y., Bartlett, P., Lee, W.S., et al.: Boosting the margin: a new explanation for the effectiveness of voting methods. Ann. Stat. 26(5), 1651– 1686 (1998) 16. Shi, F., Wang, J., Shi, J., Wu, Z., Wang, Q., Tang, Z., He, K., Shi, Y., Shen, D.: Review of artificial intelligence techniques in imaging data acquisition, segmentation and diagnosis for COVID-19. arXiv preprint arXiv:2004.02731 (2020) 17. Shinde, G.R., Kalamkar, A.B., Mahalle, P.N., Dey, N., Chaki, J., Hassanien, A.E.: Forecasting models for coronavirus disease (COVID-19): a survey of the state-ofthe-art. SN Comput. Sci. 1(4), 1–15 (2020) 18. Springenberg, J.T.: Unsupervised and semi-supervised learning with categorical generative adversarial networks. arXiv preprint arXiv:1511.06390 (2015) 19. Yang, Y., Peng, F., Wang, R., Guan, K., Jiang, T., Xu, G., Sun, J., Chang, C.: The deadly coronaviruses: the 2003 SARS pandemic and the 2020 novel coronavirus epidemic in China. J. Autoimmun. 109, 102434 (2020) 20. Zhou, Z.H.: A brief introduction to weakly supervised learning. Natl. Sci. Rev. 5(1), 44–53 (2018)

Internet of Things for Smart Healthcare: A Review on a Potential IOT Based System and Technologies to Control COVID-19 Pandemic M. Ennafiri(B) and T. Mazri Network Telecoms and Electrical Engineering Department, National School of Applied Sciences, Ibn Tofail University, Kenitra, Morocco [email protected], [email protected]

Abstract. Healthcare is an important part of life. Sadly, the spread of Covid19 has strained the majority of health systems and the demand for resources from hospital kits to doctors and nurses have become extremely high . However, the significant advancement in the computing sector have led to the emergence of Internet of Things (IoT) which has now become one of the most powerful information and communication technologies due to its capability to connects object such as medical kits, monitoring cameras, home appliances and so on… Capitalizing on the efficiency of data retrieval from smart objects in the health sector, it is clear that a solution is necessary and required to improve the health sector in the era of Covid-19 pandemic while continuing to provide a high-quality care to patients. In this paper, a real-time covid-19 monitoring system is introduced in a form of an IoT based bracelet that measures body temperature and blood oxygen level, which are essential factors for determining the patient’s condition and whether he needs a quick intervention to enter ICU room. The bracelet also has a GPS tracker to determine the patient’s commitment to quarantine and social distancing. Based on the study conducted with more than 50 medical stuff, the IoT based bracelet was identified as a promising tool that can help control the spread of the covid-19 virus, by providing a modern access to medical healthcare services anywhere and anytime which is useful for the patient and hospital management stuff. Keywords: Covid-19 · IoT · Smart bracelet

1 Introduction The Internet of Things (IoT) has been globally known as one of the most potential solutions to enhance and boost healthcare systems to a new level. It can be defined as a huge network, in which physical objects/devices are interconnected and can be controlled or monitored remotely. These objects are connected to the Internet and can interact with each other without human intervention. Therefore, they are considered intelligent objects. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 1256–1269, 2021. https://doi.org/10.1007/978-3-030-66840-2_96

Internet of Things for Smart Healthcare

1257

There are different types of IoT devices (portable devices, smart thermostats, IP cameras, robots, health monitoring devices …), and the majority of them have sensors which can automatically detect event and transmit these data to servers. A large amount of this data is collected from different devices, then transmitted using network protocols to servers (cloud) for analysis purposes, and finally the results are shared with other devices in order to improve the user experience. Several studies have shown novel designs for smart healthcare solutions using IoT based systems. An in-depth study is presented in [2], focusing on some of the available solutions, well known applications and remaining problems. Each subject is considered separately, rather than as a part of an overall system. In [3], the exploration, storage and analysis of data are considered, with little mention of their integration into a system. The types of sensors are compared in [4], with a certain emphasis on communications. Finally, in [1], the detection and management of big data is considered, with little consideration for the network which will take charge of the communications. This article therefore provides a survey of IoT based systems in the health sector and showcase one of the most relevant solutions to the pandemic we’re currently facing (COVID-19) that can help the overcrowded hospital to reduce the strain on resources while controlling the spread of this virus. This paper is structured as follow: At first, we present in Sect. 2 the general three-layer architecture of IoT. Then in Sect. 3, we highlight the important role of smart healthcare based IoT systems focusing on some of the IoT devices that are used in the healthcare field. Afterwards, we present in Sect. 4 an overview of the new global virus (Covid-19). Finally, we project a digital solution in a shape of a bracelet that helps authorities prevent further spread of COVID-19, while also tracking those that are unfortunately infected.

2 The Three-Layer Architecture of IoT If Internet is connecting the people, the Internet of Things (IoT) is connecting all the objects. These interconnected objects (controlled by the people) have data regularly collected, analyzed and used to initiate action, providing a wealth of intelligence for planning, management and decision making. The Internet of Things (IoT) also called the Internet of Everything or the Industrial Internet is a new concept in the technology and communication world which provides the capability of transferring data for anything (human, animal, or object) via a network connection. In another way, we can define it into three categories as below: Internet of things is an internet of four things: (1)- People to people, (2)- Machine or Things to people, (3)- People to machine or things, (4) Things or machine to things or machine Interacting through internet. The main goal of IoT is to enable things to be connected anytime, anyplace, with anything and anyone ideally using a mixture of different hardware & software t that are used to transfer; store and process data and communications technology. There isn’t a standard architecture for Internet of Things yet, but many researchers commonly present this architecture as three basic layers: Perception/Physical layer, Network layer and Application layer as shown in Fig. 1.

1258

M. Ennafiri and T. Mazri

Fig. 1. IoT three-layer architecture

• Perception layer: It includes the hardware used in the IoT ecosystem (sensors and actuators). In this layer we can find several technologies such as Wireless sensor network (WSN), Radio-frequency identification (RFID) and Near-field communication (NFC). The major requirements for this layer are the support of the heterogeneity of IoT devices and the energy efficiency, thus sensors should be operational for gathering and transmitting data in real-time [5]. • Network layer: The second layer is responsible to ensure the communication of IoT devices with each other, in the case of Wireless sensor networks (WSNs) for example, or in some other cases, the communication directly with the cloud via a gateway. Data are collected by the perception layer and transmitted for analysis and decision-making purpose by several communication protocols, such as Bluetooth Low Energy (BLE), IEEE 802.15.4 standard and ZigBee in the case of low-power and low-bandwidth needs, also WIFI, 4G and 5G are used in this layer. • Application layer: the third layer is the software part of the IoT architecture. It is responsible of presenting data after being collected and analyzed for a specific IoT application domain (e.g., healthcare, transportation, smart grid etc.). It is considered as the front end of the IoT architecture [9]. Several protocols are deployed in this layer, such as Constrained application protocol (CoAP), Message Queuing Telemetry Transport (MQTT), Hypertext Transfer Protocol (HTTP) etc. [6].

3 Internet of Things for Digital Healthcare Healthcare plays an important role in our societies, it makes a huge contribution to the economic progress. While the ultimate goal of many countries is to improve the health

Internet of Things for Smart Healthcare

1259

and well-being of people, there are different disciplines and solutions that contribute to it. Healthcare systems are primarily focused on treating patient’s conditions after a confirmed diagnosis and with the increase of storage capacity, advanced algorithms, smart objects and willingness to integrate IoT based solutions in healthcare, the effect of these solutions on the health system has increased considerably. In a global context, Internet of Things allows seamless interactions and communications among different types of objects such as monitoring devices, medical sensor, home appliances… And because of that IoT has become more productive in 3 several areas that can be classified into 3 categories of IoT scenarios: Hospitals: IoT medical systems implemented into the medical structure [7, 8, 9], Home healthcare: IoT medical systems realized for home health monitoring [10, 11] and finally doctor’s Offices: smart system that support doctors in their activities [12]. A lot of studies have shown that remote healthcare using IoT based systems is important because of the benefits it could provide in different contexts. For Example, remote health monitoring can keep non-critical patients under systematic review by observing and checking their health at home rather than the hospital, which could help the overcrowded hospital to reduce the strain on resources like beds and medical kits. It could also be used to provide a better access to healthcare for those living in rural areas, or to enable elderly people to enjoy modern medical healthcare services anywhere, any time. Many IoT healthcare systems have been developed using different technologies like radio frequency identification (RFID), wireless sensor network (WSN), smart mobile technologies and wearable devices. • RFID (Radio Frequency Identification) it’s the heart of IoT connected systems. These microchips replace the printed labels allowing precise location of the objects. The association of the cloud and connected objects has made these identification labels one of the most used technologies to develop IoT health care based systems [13]. • WSNs (Wireless Sensor Networks) consist of spatially distributed autonomous devices to cooperatively monitor real-world physical or environmental conditions, such as sound, pressure, vibration, motion, temperature and location. The major components of a normal WSN sensor node are a transceiver, microcontroller, memory, power source and one or more sensors to detect the physical phenomena. The structure of the sensor node is generally divided into four major parts: sensing unit, processing unit, communication unit and power unit [7, 14]. • Mobile health (m-health) consist of using mobile devices in collecting health data in real-time from patients, storing it to network servers connected to Internet. The m-health data help doctors to diagnosed, monitor, treat patients and predict health anomalies using wearable medical devices and body sensor [7, 15]. • Wearable devices are mostly known for healthcare observation and tracking. These wearable devices use one of the most essential elements in data collection which are the sensor. During recent years with the improvement of semiconductor technology, sensors have made investigation of a full range of parameters closer to reality [16, 17].

1260

M. Ennafiri and T. Mazri

Each of these technologies is able to collect data about patients, doctors, nurses …, These devices can also send alarms in case of an emergency, tutoring patients during therapy, and managing information about medical services but the question is how can IoT help us get through the unprecedented measures that have been put in place in response to the global COVID-19 pandemic.

4 Covid-19 Pandemic The novel coronavirus (COVID-19) was first emerged in China’s Hubei in December 2019, since then the virus has spread rapidly around the world, affecting more than 183 countries, infecting over a million and killing more than 80,000 people. In March 2020, the World Health Organization (WHO) declared the outbreak of the coronavirus a pandemic as “global spread of a new disease” which was the first step towards a global health emergency [18]. This is not the first time an international health crisis occurred due to the spread of a novel coronavirus or other zoonotic (animal-originated) viruses, such as influenza that created the swine, bird and seasonal flu epidemics in recent history. Seasonal flu alone is estimated to result in three to five million cases of severe illness, and 290,000 to 650,000 respiratory deaths annually. Figure 2 represent different information and data collected on 3 known species of human coronaviruses.

Fig. 2. Coronavirus outbreak

• SARS (Severe acute respiratory syndrome) was first reported in November 2002 in the Guangdong province of southern China. The viral respiratory illness spread to 29 countries across multiple continents before it was contained in July the following year. Between its emergence and May 2014, when the last case was reported, 8,098

Internet of Things for Smart Healthcare

1261

people were infected and 774 of them died. Various studies and the WHO suggest that the coronavirus that caused SARS originated from bats, and it was transmitted to humans through an intermediate animal - civet cats. The R0 (pronounced R-naught), is a mathematical term to measure how contagious and reproductive an infectious disease is as it displays the average number of people that will be infected from a contagious person. The R0 of SARS is estimated to range between 2 and 4, averaging at 3, meaning it is highly contagious. • MERS (Middle East Respiratory Syndrome) is a still active viral respiratory disease first identified in Saudi Arabia in 2012. Approximately 80 percent of human cases were reported by the kingdom, but it has been reported in 27 countries. However, human cases of MERS infections have been predominantly caused by human-tohuman transmissions. MERS might show no symptoms, mild respiratory symptoms or severe acute respiratory disease and death. Fever, shortness of breath and cough are common symptoms. If it gets severe, it might cause respiratory failure that requires. R0 of MERS is lower than one, identifying it as it is a mildly contagious disease. • COVID-19: On 31 December 2019, a pneumonia outbreak was reported in Wuhan China, the outbreak was traced to a novel strain of coronavirus. As of April 7, 2020, the number of global COVID-19 cases was more than 1,290,000 with over 76,000 deaths. According to the WHO, approximately one out of every six infected people becomes seriously ill and develops difficulty in breathing. The WHO puts the R0 of COVID-19 at 2 to 2.5 [20]. In Morocco, The Ministry of Health has confirmed the spread of Covid-19 on 2nd March 2020 (Fig. 3), when they detected the first case from an Italian who arrived to Morocco on February 27th. Since then the number of confirmed cases has gradually increased which made the country to implement social distancing measures and closure of land, air and sea borders. As of 26 May 2020, there have been 7577 confirmed cases, of which 4881 have recovered and 202 have died [19].

Fig. 3. Morocco daily new cases (From Feb 15 to May 25 – 2020)

1262

M. Ennafiri and T. Mazri

The majority of people who are infected by coronavirus show common symptoms like fever, dry cough and tiredness. Others also have runny nose, sore throat, nasal congestion, or diarrhea (Fig. 4). However, high body temperature and dry cough are the very common symptoms [19].

Fig. 4. Covid-19 symptoms

As the number of infected people keeps increasing and since there is no specific treatment for the virus till now, the only suitable way to prevent the spreading is the early detection of the symptoms, which can be extremely hard for the countries who don’t have enough medical resources to perform thousands of diagnostic test per day to minimize the spread impact. Numerus researches have shown that covid-19 can go through 3 levels of risks (Fig. 5): • A high level of risk where the patient should stay under medical observation in the hospital. • A medium level of risk where the patients are responsible for protecting themselves at home by social distancing from others, they are required to sign an auto certification stating that they wouldn’t visit public places unless for medical purposes. • A low level of risk which concerns the people who are of no danger to others, they can move freely yet they have to respect the basic precautions, unless they start showing the symptoms of infection. The world is now struggling to control the spread of the virus which includes a record number of morbidities and mortalities that’s why there is an urgent need for digital monitoring solution to prevent citizens from infection and to save those who are already infected . For this reason, a group of countries worked to develop mobile applications that work to limit the spread of the virus. In the following table you will find applications for some Arab countries (Table 1):

Internet of Things for Smart Healthcare

1263

Fig. 5. Covid-19 three levels of risk Table 1. Covid-19 tracking applications Country

Application

Technologies used

Usage

Tunisia

E7mi

Bluetooth & GPS

Voluntary

Qatar

EHTERAZ

Bluetooth & GPS

Mandatory

Bahrain

BeAware

Bluetooth & GPS

Mandatory for people in quarantine and foreigners

Saudi Arabia

Tatamman

Bluetooth & GPS

Mandatory for people in quarantine and foreigners

Morocco

Wiqaytna

Bluetooth & GPS

Voluntary

These applications have been subjected to fierce criticism by researchers in terms of protecting personal information, and they emphasized that countries exploit this circumstance in order to collect information from users, while the responsible authorities defended it and emphasized its safety.

5 Discussion Morocco is one of the developing countries in the health sector, mentioning the fact that one bed is provided for approximately 1000 citizens in addition to a doctor for every 2000 citizens, what brings us to think about some more practical solutions to reduce

1264

M. Ennafiri and T. Mazri

the pressure on hospitals, especially in the current situation of this global pandemic of Corona virus. In the previous section, we have provided a global research on the covid-19 virus and the types of infected patients. In this section, we will focus on patients with medium and low level of risk (asymptomatic patients) whose situation has improved, but they require periodic monitoring once every four hours for fear of the onset of symptoms requiring artificial respiration and intensive care unit. For this, we will propose a bracelet that include a body temperature sensor, pulse oximetry sensor for continuous recording of the patient’s condition and a GPS for tracking of the infected patient location. The collected data will be sent via a wireless connection to the patient smartphone and the server system. A software application will be developed using decision support technologies that predicts the emergence of disorders, which could help reduce the stress on doctor, reduce the number of patients kept inside hospitals and the strain on resources like beds and medical kits. 5.1 System Design: The System architecture shown in Fig. 6 describes the conceptual model that defines the general structure, behaviour and components that will work together to implement the overall system.

Fig. 6. System architecture

The main device of the system is the smart bracelet which contains a temperature body sensor that reads body temperature, an oximetry sensor for measuring blood oxygen level and finally a GPS to locate the patient and to inform whether he is committed to quarantine in the same authorized place. After consulting with doctors, it is not necessary to measure the blood oxygen level permanently, but the patient can be alerted by placing the pulse oximetry sensor on his fingernail every 6 h for more accurate information, that’s why the bracelet will contain an alarm that can only be stopped by receiving the necessary information from the sensor.

Internet of Things for Smart Healthcare

1265

This electronic unit will be placed on the patient’s wrist and will send signal and data wirelessly to the server through his mobile phone. Furthermore, the server will store data on the database and visualize patients’ status in real time through the doctor’s platform. In case of low oxygen level in the blood or high temperature, an alert will be sent to the medical staff for quick intervention. 5.2 Recommended Component for Implementation: The essential element of our system is the smart bracelet which can be developed using these components: • Pulse Oximetry Sensor: it’s a small, clip-like sensor that measure and monitor Blood oxygen level using small beams of light that pass through the blood. This sensor serves as an indicator of respiratory malfunction and can help in Covid-19 diagnostics and monitoring [23]. • Body Temperature Sensor: body temperature sensor is a useful diagnostics tool that can detect if the patient’s situation is abnormal. Many IoT applications use thermistortype sensors for the measurement of body temperature. In [21] and [22], the common negative-temperature-coefficient (NTC) type temperature sensors were used, NTC thermistors are a common type of temperature sensor to measure surface temperature. The sensor consists of a two wire connection which uses ceramic/metal composites properties resistance to measure temperature. Common uses for this type of sensor include skin probes, adult rectal and pediatric rectal. • Arduino UNO: Arduino is an open-source electronics platform which enable the creation of interactive objects. Arduino boards are able to read inputs from different sensors and redirect the output to the mentioned output pins. In the smart bracelet, Arduino UNO reads data from body temperature and pulse Oximetry sensors and transmit the output data wirelessly to the server. If the temperature read from the sensor by Arduino is greater than 37°C and blood oxygen level is lower than 94%, then Arduino forwards an alert to the server. • LEDS: Two LEDS red and green are used to indicate normal and abnormal situation. When the medical situation is normal then the green LED is blinking and when the situation is abnormal red LED is blinking. In our solution we can attached the LEDs to the bracelet to help inform the patient if situation goes abnormal. • Wires: Wires play an important role as connectors, they are used to connect all the above components together. In health monitoring system, wireless network is used to forward measurement through a gateway towards cloud [24]. • GPS Tracking Sensor: A GPS tracking sensor is a unit that receives information from GPS satellites and obtains the geographical position, latitude and longitude coordinate of the bracelet in a NMEA format using Arduino. 5.3 Methodology In order to give more importance to the research, we have simulated a smart bracelet that monitor medium and low risked covid-19 patients and follow their health status using IoT technologies. Since the percentage of active cases of the virus in Morocco in

1266

M. Ennafiri and T. Mazri

this category exceeds 90%, the Covid-19 bracelet will be an effective mean of helping medical staff to manage and control the spread of the virus. For this simulation, we have connected a body temperature sensor and an oximetry sensor with NodeMcu (a development board and an open source firmware based on the Wi-Fi module ESP8266 -12E). Then we used Arduino IDE for development and real time data analysis. For the simulation we have followed the following process: • Real time data storage: The real time data will be recorded and stored in the ThingSpeak cloud (an IoT analysis platform service which allows aggregation, visualization and analysis of live data streams in the cloud). • Data visualization (by doctors) using a web platform: For this part, the medical staff will be able to monitor the status of patients at the ThingSpeak platform, which allows the different data collected to be viewed in the form of figures or graphs. • Patients Geolocation: the GPS tracking sensor will receive information from GPS satellites and obtains the geographical position, latitude and longitude coordinate of the bracelet. The objective of this is to locate the patients and to allow other applications to interact with our database and obtain the position of covid-19 patients which will help to detect if there is an increase in cases in an area, inform them to wear their masks and to adopt preventive gestures. • Monitoring the patient’s condition on a smartphone: At the smartphone level, the user will get access to all the data such as body temperature, oxygen saturation and the geolocation of the patient. • Alerts: If the patient’s situation has worsened, the medical staff will receive a notification on a smartphone alerting them to transfer the patient to the resuscitation cell, for this situation we have defined two conditions either the temperature is above 37 degrees Celsius or the oxygen saturation is lower than 95%. The Covid-19 smart bracelet (Fig. 7) can detect body temperature, measure oxygen blood level and track the patient’s location using some modern IoT technologies, then send the collected data to be viewed on other health management applications. 5.4 Discussion and Results: After introducing the solution to some of the medical staff (46 specialist doctors, 1 general practitioner and 8 nurses), we carried out a questionnaire that contains 13 different questions in order to evaluate the solution and obtain suggestions for improvement. Most of the participants work in different Moroccan hospital in different cities like Tangier, Rabat, Errachidia, Chefchaouen, Marrakech… While the others work in UK and France. The majority of participants recognized the interest of such a solution and its advantages that will help improve the healthcare sector by providing a modern access to medical healthcare services anywhere and anytime. The bracelet can monitor non-critical patients under systematic review by observing and checking their health at home rather than the hospital, which could help the overcrowded hospital to reduce the strain on resources like beds and medical kits, also reduce the over workload for nursing staff.

Internet of Things for Smart Healthcare

1267

Fig. 7. Covid-19 smart bracelet

They also congratulated the choice of the target population which is the medium & low risk level patients considering that the percentage of active cases of the virus in Morocco in this category is the highest (by the Moroccan Ministry of Health). 51% of the participants said that the symptoms observed are sufficient to determine the patient’s condition, while the rest asked to add other elements to analyze such as heart and respiratory rate commissures in order to better identify the seriousness of the patient’s condition. On the other hand, some participants pointed out a negative aspect of this system, which restricts the personal lives of patients by using GPS to track their movements, which is better than confining them to places required by the state. They also called on the reliability and availability of the system to eliminate the risk of not receiving updated data in the event of a technical problem. Towards the end, 91% of the participants confirmed their desire to use the smart bracelet which they thought it was an innovative solution that can help them save time and improve the quality of care for the non-covid19 patients.

1268

M. Ennafiri and T. Mazri

6 Conclusion and Perspectives In this paper, we have presented a project proposal in forms of an innovative realtime monitoring system using smart bracelet for Covid-19 patients whose condition has improved to the fact that they need periodic monitoring by doctors while adhering to quarantine procedures until full recovery. The smart bracelet can detect high body’s temperature, measure blood oxygen level and track the patient location using some of the best known IoT technologies, then send the collected data to be displayed on health management applications. The proposed system is simple, energy efficient and easy to understand. It works as a link between the infected patient and the doctor to preserve the integrity of the medical procedures while providing the medical resources and kits to those who are in critical condition. As the latest big issue nowadays is happening across the world, the presented architecture is a very promising solutions to help control the spread of the virus which motivated us as student-researchers and future computer scientists to develop more smart solutions in order to serve the medical field and to help preserve the human life.

References 1. Yin, Y., Zeng, Y., Chen, X., Fan, Y.: The Internet of Things in healthcare: an overview. J. Ind. Inf. Integr. 1, 3–13 (2016) 2. Islam, S.M.R., Kwak, D., Kabir, H., Hossain, M., Kwak, K.-S.: The Internet of Things for health care: a comprehensive survey. IEEE Access 3, 678–708 (2015) 3. Dimitrov, D.V.: Medical Internet of Things and big data in healthcare. Healthc. Inform. Res. 22(3), 156–163 (2016) 4. Poon, C.C.Y., Lo, B.P.L., Yuce, M.R., Alomainy, A., Hao, Y.: Body sensor networks: in the era of big data and beyond. IEEE Rev. Biomed. Eng. 8, 4–16 (2015) 5. Li, S., Da Xu, L., Zhao, S.: 5G Internet of Things: a survey. J. Ind. Inf. Integr. 10, 1–9 (2018). https://doi.org/10.1016/j.jii.2018.01.005 6. Romdhani, I.: Architecting the Internet of Things. Archit. Internet Things (2011). https://doi. org/10.1007/978-3-642-19157-2 7. Distefano, S., Bruneo, D., Longo, F., Merlino, G., Puliafito, A.: Hospitalized patient monitoring and early treatment using IoT and Cloud. Bionanoscience 7(2), 382–385 (2017) 8. Dhariwal, K., Mehta, A.: Architecture and plan of smart hospital based on Internet of Things ( IOT ). Int. Res. J. Eng. Technol. 4(4), 1976–1980 (2017) 9. Natarajan, K., Prasath, B., Kokila, P.: Smart health care system using Internet of Things. J. Netw. Commun. Emerg. Technol. 6(3), 37–42 (2016) 10. Avila, K., Sanmartin, P., Jabba, D., Jimeno, M.: Applications based on service-oriented architecture (SOA) in the field of home healthcare. Sensors 17(8), 1703 (2017) 11. Pang, Z., Zheng, L., Tian, J., Kao-Walter, S., Dubrova, E., Chen, Q.: Design of a terminal solution for integration of in-home health care devices and services towards the Internet-ofThings. Enterp. Inf. Syst. 9(1), 86–116 (2015) 12. e Sá, J.O., Sá, J.C., Sá, C.C., Monteiro, M., Pereira, J.L.: Baby steps in E-health: Internet of Things in a doctor’s office. In: Advances in Intelligent Systems and Computing, vol. 569, pp. 909–916 (2017) 13. Nanni, U., et al.: RFID as a new ICT tool to monitor specimen life cycle and quality control in a biobank. Int. J. Biol. Markers 26(2), 129–135 (2011)

Internet of Things for Smart Healthcare

1269

14. Gope, P., Hwang, T.: BSN-Care: A secure IoT-based modern healthcare system using body sensor network. IEEE Sens. J. 16(5), 1368–1376 (2016) 15. Poh, M.-Z., Poh, Y.C.: Validation of a standalone smartphone application for measuring heart rate using imaging photoplethysmography. Telemed. e-Health, 23(8), 678–683 (2017). p. tmj.2016.0230 16. Germanese, D., Magrini, M., Righi, M., Acunto, M.D.: Selfmonitoring the breath for the prevention of cardio-metabolic risk, no. c, pp. 96–101 (2017) 17. Gottesman, O., et al.: The electronic medical records and genomics (eMERGE) network: past, present, and future. Genet. Med. 15(10), 761–771 (2013) 18. https://covid19.who.int/. Accessed 26 May 2020 19. https://www.covidmaroc.ma/.Accessed 26 May 2020 20. https://www.aljazeera.com/news. Accessed 26 May 2020 21. Aqueveque, P., Gutiérrez, C., Rodríguez, F.S., Pino, E.J., Morales, A., Wiechmann, E.P.: Monitoring physiological variables of mining workers at high altitude. IEEE Trans. Ind. Appl. 53(3), 2628–2634 (2017) 22. Narczyk, P., Siwiec, K., Pleskacz, W.A.: Precision human body temperature measurement based on thermistor sensor. In: Proceedings of IEEE 19th International Symposium Design Diagnostics Electron. Circuits System (DDECS), pp. 1–5, April 2016 23. Ženko, J., Kos, M., Kramberger, I.: Pulse rate variability and blood oxidation content identification using miniature wearable wrist device. In: Proceedings of International Conference on System Signals Image Process. (IWSSIP), pp. 1–4, May 2016 24. Navya, K., Murthy, M.B.R.: A zigbee based patient health monitoring system. Int. J. Eng. Res. Appl. 3(5), 483–548 (2013)

Covid -19: Performance of e-commerce in Morocco Asmaa Abyre1(B) , Zineb Jibraili2 , and Hajar Anouar1 1 Sultane Moulay Sliman University, Research structure: LERSEG, Beni Mellal, Morocco

[email protected], [email protected] 2 ENCGJ, Chouaib Doukkali University, Research structure: LERSEM, EL Jadida, Morocco

[email protected]

Abstract. The Coronavirus feeds easily on the effects of globalization. Thus, beyond the social dimension of this pandemic, the problem of Covid-19 poses another acute psychosocial concern. Spread of this virus has led us to several questions related to socio-economic phenomena, from which stems two major problematics of this article and which are as follows: – Has the period of confinement changed the behavior of Moroccan consumers? – What is the effect of social anxiety and modification of economic situation caused by covid-19 on the change in consumer behavior in Morocco? In order to deal with this subject, we adopted a quantitative study by administering an online questionnaire that was shared and disseminated on social networks. The results of our study showed firstly that there has been a remarkable change in the habits of Moroccan citizens, due to the period of confinement, marked mainly by the orientation towards e-commerce, which performed during this period of health crisis. Secondly, we were able to conclude that the increase in the level of anxiety had a positive impact on the change in the behavior of consumers, who turned much more towards e-commerce. Keywords: Covid-19 · Social anxiety · E-commerce · Economic change · Behavior of consumer

1 Introduction Since January 2020, humanity has been officially in the limbo of an unprecedented pandemic. The Coronavirus that broke out in Wuhan, China, has now spread to the rest of the world. From Asia to Oceania via the Old Continent, Covid-19 is gaining ground and increasingly reaching Africa. And among the states of the black continent, Morocco seems to be by far the most affected with its 4423 reported cases. Thus, the meteoric spread of this pandemic is in many ways reminiscent the famous “global village” of Marshall et al. (2007). The Coronavirus feeds easily on the effects of globalization. Migration and the interdependence of societies are its driving force. Thus, beyond the social dimension © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 1270–1281, 2021. https://doi.org/10.1007/978-3-030-66840-2_97

Covid -19: Performance of e-commerce in Morocco

1271

of this pandemic, the problem of Covid-19 poses another acute psychosocial concern. Indeed, the constant appearance of new cases on micro and macro scale, containment measures, curfews and above all the number of victims is contributing to a wave of anxiety among individuals. This paralyzing stress is all the more accentuated by the unknown linked to the virus, the overruling of health authorities and especially the wave of information and misinformation on the Coronavirus. However, the greatest consequence of this social anxiety linked to Covid-19 is to be found in the economic field. Stressed, the consumer, because of the identified risks, comes to modify his commercial habits. Therefore, what is the impact of the social anxiety and modification of economic situation emanating from the Coronavirus pandemic on the behavioral dynamics of the Moroccan consumer? After having formulated the research problem, our research pursues three main objectives: – study the change in consumer behavior due to the national lockdown period; – Showing the effect of social anxiety on consumer attitude. – Understanding the impact of economic change on consumer behavior.

2 Impact of Social Anxiety on Consumer Attitude The Coronavirus has plunged citizens of the world into a veritable social psychosis. As Sheena et al. (Date) point out, growing social anxiety is, first and foremost, linked to an intolerance due to the Coronavirus. Indeed, that authors, who have studied the spread effects of avian influenza on psychology health of individuals, concluded people who cannot manage their emotions are the most affected by an increase in their anxiety level. Nathalie et al. (date) had already pointed out the connections between lack of information, development of a dose of uncertainty and appearance of social psychosis. Thus, different studies shows that Covid-19 cause an unprecedented level of stress and anxiety in individuals. It should also be pointed out that all measures necessary for eradication of Covid-19 (in Morocco and other coutnries of the world) promote the development of psycho-sociological pathologies of individuals. It should be noted that the previous works which have focused on the links between the psycho-sociological effects of viral pandemics and macroeconomic problems are very abundant.. Several of these studies postulate that the physical and mental health of populations is dependent on their economic well-being and growth. These include the work of Pritchett and Summers (1996), Bloom and Sachs (1998), Bhargava et al. (2001), Cuddington et al. (1994), Cuddington and Hancock (1994), Robalino et al. (2002) and others. In addition, studies by the WHO Commission on Macroeconomics and Health (2001), Haacker (2004) and McKibbin and Sidorenko (2006) are also available. Recently, the scholarly literature has been enriched by a study by McKibbin and Fernando (2020) on macroeconomic impacts of Covid-19 by postulating seven scenarios Thus, scenarios 1 to 3 assume that the effects (mainly epidemiological) of Coronavirus are isolated in China. Furthermore, these scenarios assume that economic cost borne by China will be

1272

A. Abyre et al.

transposed to the rest of the world due to the interdependence of economic systems on global scale. The domino effect is thus justified by three basic elements: trade, capital flows, and changes in risk premia in global financial markets. Scenarios 4 to 6 support the hypothesis that the epidemiological effects of Covid-19 would affect the rest of the world. This in many ways is the case today. Finally, scenario 7 postulates that the pandemic could recur every year for an indefinite period. The authors conclude by drawing attention to the fact that global decision-makers should invest in public health. This would enable the decadent health systems of the South to withstand the rapid rise of Covid-19. With regard to the concrete effects of social anxiety on changes in consumer habits, it should be noted that it depends on the degree of exposure but also on the perception of risk. In fact, the consumer’s reaction is above all subject to the social representations of the disease; of “mass psychosis”. Such an analogy can be made with regard to the appearance of Covid-19 in Moroccan society. In fact, the first cases did not really impress a certain social anxiety on the consumer. Most of them developed what Taylor calls the tendency to ‘unrealistic optimism’. This presupposes that individuals will primarily consider that the effects (epidemiological, psychological and economic) of the virus will not be unfavorable to them. But then again, if the situation becomes alarming, they will be spared the harms of the virus. In spite of the risk incurred but unfortunately unknown (or at least ignored), the majority of individuals do not fundamentally change their behavior as users. The explanation is that exposure to stress and other psychological pathologies from Covid-19 is not great. Conversely, the more individuals are exposed to anxiety-provoking situations (death, growth in Covid-19 cases, alarming information, confinement, state of emergency, etc.), the more they will change their consumer habits. Scenarios 4 to 6 support the hypothesis that the epidemiological effects of Covid-19 would affect the rest of the world. This is, in many ways, the case today. Scenario 7 assumes that the pandemic could recur every year for an indefinite period. The authors conclude by drawing attention to the fact that global decision-makers should invest in public health. This would enable the decadent health systems of the South to withstand the rapid rise of Covid-19. With regard to the concrete effects of social anxiety on changes in consumer habits, it should be noted that it depends on the degree of exposure but also on the perception of risk. In fact, the consumer’s reaction is above all subject to the social representations of the disease; of “mass psychosis”. On the other hand, the more individuals are exposed to anxiety-provoking situations (death, growth of Covid-19 cases, alarming information, confinement, state of emergency, etc.), the more they will change their consumer habits.

3 Effect of Economic Change on Consumer Behavior According to the model adopted, it appears that the effect of social anxiety on consumer behavior depends on the degree of exposure but also on the consumer’s personality. To this must be added an important factor, in this case the economic effects of the pandemic.

Covid -19: Performance of e-commerce in Morocco

1273

In reality, the relationship between economic crisis, social anxiety and changes in consumer behavior is twofold and interdependent. Indeed, an economic crisis can strongly impact the mental health of individuals. As the WHO study points out, anxiety, and by ricochet (Date), depression depend on the degree of poverty and social inequity. It is such a picture of desolation offered by an economic crisis resulting from cessation of all activities due to Covid-19. In addition to that, anxiety and perception of risk may lead consumer to change his commercial behavior. In order to survive, he can develop coping strategies after identifying the risks of the pandemic. The first indicator of the change in consumer behavior is related to the quantity of purchases. Stress and anxiety linked to unknown Covid-19 led several consumers to increase their food ratios. Indeed, faced with the uncertainty surrounding the eradication and end of the pandemic, the general tendency was to raid the shelves in order to stock up on groceries. This social psychosis mainly revealed the consumer’s almost selfish survival instinct through his perception of risk. The second element indicative of a change in behavior is the quality of the products purchased. Indeed, the majority of consumers are abandoning imported products in favor of locavores. A neologism that appeared in the United States, locavorism is a current that promotes a return to local consumption. It is based on the idea that individuals should promote the consumption of local products that are located in an area of about 100 to 250 km from their habitat. Thus, whether you are for or against locavorism, the spread of Covid-19 is conducive to a strong return to this eating practice. In reality, the closing of national borders and the stop of food exports are considerably disrupting consumer habits. Locavorism becomes the happy outcome in the absence of the usual products promoted by the phenomenon of globalization. A third element attesting to the change in consumer behavior is related to the way of making purchases. Ankylosed by the fear of being contaminated, of joining the batch of “cases”, or even worse the “death” box, most consumers avoid going to the store. The option they find more reassuring is to use home delivery services. Finally, the loss of certain jobs in key sectors (businesses, tourism) has as a corollary the decline in purchasing power of some households. In fact, the people most exposed to the economic harms and mental stress of the pandemic is consumers who do not have very high purchasing power. Thus, it is incumbent on policy makers to find ways to reduce the socio-economic imbalance. It should be pointed out that such an undertaking is far from easy. However, notwithstanding the increasing development of public aid, the consequences are likely to be disastrous: unemployment, job losses, loss of purchasing power, depression, insanity, mental disorders, etc. Overall, it is therefore clear that growing anxiety is having a real effect on changing consumer behavior in Morocco. The anxiety-producing and economic impacts on consumer attitudes are mainly linked to the following factors.

4 Materials The appearance and spread of Covid-19 have caused discomfort in various countries around the world, causing mental and financial suffering. In this perspective, we decided

1274

A. Abyre et al.

to explore the effect of social anxiety on consumer. As the first step, in order to contextualize the model, we interviewed six Moroccan experts. As the second step, we adopted a quantitative study, in this sense; we randomly interviewed Moroccan citizens from different areas of the kingdom. Data was collected by sharing the survey on social media. In the end, we received 705 responses. Part of the questionnaire was reserved for studying the change in consumer behavior between the periods before and during confinement due to the spread of covid-19. In this sense, we used the SPSS software. Also, to answer the second problematic of this article, related to identification of the relations between social anxiety - consumer behavior and economic situation – consumer behavior, we used PLS. In this sense, we proceeded to the exploratory then confirmatory study.

5 Results 5.1 Impact of Anxiety and the Economic Change on Moroccan Consumer Behavior

Fig. 1. The variation in consumption behaviour of Moroccan citizens: Before and during confinement

Covid -19: Performance of e-commerce in Morocco

1275

The first question we asked the respondents related to the priorities in terms of purchases. In this sense, we made six choices that had to be ranked in order of priority before and during the confinement period (Fig. 1). As we can see on the graph, the product which was in the last position before covid19, was finally positioned as being the second and therefore represented a product of first necessity is “Basic prevention of the epidemic”. However, entertainment tools lost points between that two periods. And as we can see in the graph, foods and drinks kept their position as the leaders of the products consumed during the different periods (Fig. 2).

Fig. 2. Places of purchases of Moroccan citizens: before and during confinement

Our second question was about places of purchase before and during confinement. We can easily see that e-commerce has gained points by transforming itself into the second most used technique by citizens after supermarkets. On the other hand, the souks must have lost their customers during the period of confinement. This can be explained a priori by the confinement of individuals, who find themselves obliged to change their consumption behavior (Fig. 3). The third question related to the activities carried out by individuals during confinement. We noticed that cooking, watching tv and sleeping are most practiced by citizens during this period. After answering the first question of this study and which showed us clearly that the spread of covid-19 and the national confinement of citizens has impacted the behavior of individuals on the different levels, we now turn to the analysis of the model in question.

1276

A. Abyre et al.

Fig. 3. Spending time behaviour of individuals during the confinement period

This with the aim of verifying, if indeed the modification of the economic situation and the increase in the level of stress can contribute to the change of the behaviour of the consumer. 5.2 Model Specification We formed the equation model to test the hypotheses. The following aspects have been taken into account in the specification of the model. The model includes all of the potential links that seemed reasonable given our previous research on this subject. He tests the impact of social anxiety and the modification of economic situation of citizen’s

Fig. 4. The equation model

Covid -19: Performance of e-commerce in Morocco

1277

during propagation of covid’19 on the change of consumer’s behavior. Items released from the literature and the exploratory study measured those three variables (Fig. 4).

Fig. 5. The research model

On Smart Pls, the model was presented as well (Fig. 5). 5.3 Statistical Analysis We focus on testing the relationships between the variables in our model in order to validate or refute each of the hypotheses. In this perspective, we will go through two stages: the first relates to the explanation and adjustment of the measurement model through three tests: reliability, convergent and discriminant validity. The second will consist first in evaluating the overall validity of the model and then in testing the hypotheses of the structural model. We will use Smart PLS 2.0 software for this purpose. The measurement model, also called an external model, represents the assumed linear relationships between the latent variables and the manifest variables. To get it, we will have to follow the following three steps: – Reliability of items – Convergent validity – Discriminant validity Reliability of items The reliability of the items is verified by the “loadings” saturations, which consist in examining the correlation of the measurement indicators while respecting their theoretical constructs. Traditionally, it is evaluated using Cronbach’s Alpha, the threshold

1278

A. Abyre et al.

accepted by researchers is 0.70 (Chin 1998). The results after iterations are presented in the table below (Table 1). Table 1. The results after iterations

Indicators

Loading

ANX4

0,851

ANX8

0,875

ECON2

0,876

ECON3

0,711

Economy

ECON4

0,857

Consumer’s behavior

COBE5

0,922

COBE7

0,965

Anxiety

Alpha Cronbach 0,714

0,752

0,701

This study allowed us to retain the most relevant indicators, thus, for anxiety, of the nine items identified in the literature; two of them actually measure it. Regarding to the economy, out of 5 items, the PLS analysis made it possible to retain three, and finally for consumer behavior, two indicators out of nine found in the literature were validated. Convergent validity Now we move on to the verification of convergent validity, which is calculated, based on the average variance shared between a variable and its items (Hulland 1999). Researchers who used PLS used internal consistency, developed by Fornell and Larcker (1981) and adopted the instructions proposed by Nunnally (1994), who considers the threshold of 0.7 as a record for “modest” compound reliability (composite reliability) (Table 2). Table 2. Values of convergent validity Variables

Composite Reliability

Anxiety

0,728

Economy

0,759

Consumer’s behavior

0,813

The values of convergent validity are acceptable. Indeed, the constructs relating to consumer’s behavior have a strong internal coherence with a composite reliability that exceeds 0.9. The other constructs have good internal consistency with values greater than 0.8.

Covid -19: Performance of e-commerce in Morocco

1279

Discriminant validity Discriminant validity represents the traditional methodological complement of convergent validity. It consists in proving that the item is linked more strongly to its variable (AVE) than to the other constructs in the model. It happens when “the squared correlation between 2 latent variables is lower than the AVE index of each latent variable. Chin (1998) recommends that the AVE should have a value greater than or equal to 0.5” (Mourre 2013) (Table 3). Table 3. Discriminant validity AVE Anxiety

0,778145

Economy

0,891263

Consumer’s behavior

0,891654

Anxiety

Economy

Consumer’s

be-

havior

0,810023 0,759271 0,710342

0,989941 0,905925

0,96512

As this table indicates, all the cross-contributions of the variables are less than the square root of the AVE, which implies the validation of the discriminate validity of our construct. By using PLS regression, the analysis of the results obtained allowed us to stabilize the measurement model. In the following, we will assess the overall validity of the model and then test the hypotheses of the structural model. The Goodness of fit index is a statistical test that determines the level with which the sample data correspond to a normal distribution of a population. For our model, the GoF, which is calculated on the basis of the average of the different constructs of the explained variance (R2 ), the redundancy index and the community, presents a satisfactory level, GoF = root ((0, 30) * (0.56)) = 0.38. The threshold recommended in the literature being 0.30, this reflects on the one hand, a good quality of the links between the measurement variables and the latent variables, and on the other hand, a good quality of the structural relationships. After having evaluated the predictability of the model through the Goodness Fit index, we will proceed in this second step to testing the hypotheses. It consists first of all in examining the level of significance of the standardized coefficients (Path coefficient) of the relationships between the latent variables. A Bootstrap type simulation is carried out for this purpose. Thus, in order to test the research hypotheses, we calculated the correlation coefficients between the variables (Path coefficient) under the Smart-PLS software. The results are shown schematically in the table below (Table 4). The strong association between the social anxiety and the change in consumer behavior was confirmed by this study.

1280

A. Abyre et al. Table 4. To testing the hypotheses

Hypotheses H1: The rise in the level of social anxiety during covid-19 influences strongly the change in consumer behavior H2: the change in the economic situation of citizens during covid19 influences strongly the change in consumer behavior

B(correlation t-student coefficient) (Boots Trap) 0,651

6,731

0,791

7,582

Decision V

V

In addition, the change in the economic situation of citizens during covid-19 also influences strongly the change in consumer behavior. This was confirmed by the statistical analyses carried out on SmartPLS.2, with a correlation level of 0.79 and a t-Student of 7.58.

6 Discussion Current research aims to better understand the impact of social anxiety and the change in the economic situation of Moroccan consumers on their consumption behavior during the spread of covid-19. The results of this research stem from numerous lessons which are as follows: – An increase in the average consumption of basic prevention of the epidemic, cleansers, food and drinks. – The use of E-commerce platforms as a place to buy in order to avoid the risk of contamination. In fact, in this period of pandemic, E-commerce has become a reality in Morocco. The panic of displacement thus gives way to incessant orders on online platforms. Our sample is reluctant to go shopping in supermarkets and adoring the sedentary economy. This seems to explain why the average online shopping during Covid19 recorded a strong increase (4.10). – The use of television, cooking, surfing on social networks and sleep to pass the time during confinement. On the other hand, the change of economic situation pushes Moroccans to avoid spending their income on goods and services which are not strictly necessary such as entertainment tools and food supplements which are considered in this crisis as luxury products.

References Alvarez, J., Hunt, M.: Risk and resilience in canine search and rescue handlers after 9/11. J. Trauma. Stress 18, 497–505 (2005)

Covid -19: Performance of e-commerce in Morocco

1281

Bai, Y., Lin, C.-C., Lin, C.-Y., Chen, J.-Y., Chua, C.-M., Chou, P.: Survey of stress reactions among health care workers involved with the SARS outbreak. Psychiatry Serv. 55, 1055–1057 (2004) Bults, M., et al.: Perceived risk, anxiety, and behavioral responses of the general public during the early phase of the Influenza A (H1N1) pandemic in the Netherlands: results of three consecutive online surveys. BMC Publ. Health 11(1), 2 (2011) Bleich, A., Gelkopf, M., Solomon, Z.: Exposure to terrorism, stress-related mental health symptoms, and coping behaviors among a nationally representative sample in Israel. J. Am. Med. Assoc. 290, 612–620 (2003) Cava, M.A., Fay, K.E., Beanlands, H.J., McCay, E.A., Wignall, R.: The experience of quarantine for individuals affected by SARS in Toronto. Publ. Health Nurs. 22, 398–406 (2005) Chin, W.W.: The partial least squares approach to structural equation modelling. In: Marcoulides, G.A. (ed.) Modern Methods for Business Research, pp. 295–336. Erlbaum, Mahwah (1998) Fischhoff, B., Gonzalez, R.M., Small, D.A., Lerner, J.S.: Judged terror risk and proximity to the world trade center. J. Risk Uncertainty 26, 137–151 (2003) Forbes D.: What is COVID-19 doing to our mental health? First published on March 19, 2020 in Health & Wellbeing (2020) Fornell, C., Larcker, D.F.: Evaluating structural equation models with unobservable variables and measurement errors. J. Market. Res. 18, 39–50 (1981) Glassner, B.:The Culture of Fear: Why Americans Are Afraid of the Wrong Things (1999) Hulland, J.: Use of partial least squares (PLS) in strategic management research: are view or four recent studies. Strateg. Manage. J. 20, 195–204 (1999) Maderthaner, R., Guttmann, G., Swaton, E., Otway, H.J.: Effect of distance upon risk perception. J. Appl. Psychol. 63, 380–382 (1978) Marshall, R.D., Bryant, R.A., Amsel, L., Suh, E.J., Cook, J.M., Neria, Y.: The psychology of ongoing threat: relative risk appraisal, the September 11 attacks, and terrorism-related fears. Am. Psychol. 62, 304–316 (2007) Manly, C.M.: Create the Life of Your Dreams by Making Fear Your Friend, 1st edn. Familius, New York City (2019) Mourre, M.L.: La modélisation par équations structurelles basée sur la méthode PLS : Une approche intéressante pour la recherche en marketing. France, Paris (2013) Reuters. https://www.reuters.com/article/us-health-coronavirus-global-economy-idUSKBN21 702Y. Accessed 20 Mar 2020 Scheier, M.F., Carver, C.S., Bridges, M.W.: Distinguishing optimism from neuroticism. J. Personal. Soc. Psychol. 67, 1063–1078 (1994) Schlenger, W.E., et al.: Psychological reactions to terrorist attacks: findings from the national study of Americans’ reactions to September 11. J. Am. Med. Assoc. 288, 581–588 (2002) Sjöberg, L.: Factors in risk perception. Risk Anal. 20, 1–1 (2000) Xie, X.-F., et al.: The ‘Typhoon Eye Effect’: determinants of distress during the SARS epidemic. J. Risk Res. 14(9), 1091–1107 (2011)

Survey of Global Efforts to Fight Covid-19: Standardization, Territorial Intelligence, AI and Countries’ Experiences Boudanga Zineb1,2(B) , Mezzour Ghita1,2 , and Benhadou Siham1,2 1 National and High School of Electricity and Mechanic (ENSEM), HASSAN II University,

Casablanca, Morocco [email protected], [email protected], [email protected] 2 Research Foundation for Development and Innovation in Science and Engineering, Casablanca, Morocco

Abstract. The development of transportation and communication means and the opening up of the world due to the industrial, economic and social revolutions and the emergence of advanced urbanization have resulted in an acceleration of globalization, worldwide supply chains dependencies and greater openness of the world’s ecosystems. At present, the world is experiencing an unparalleled health crisis due to the SARS 2 or covid-19 pandemic, which has given rise to socioeconomic crises across the world. In the absence of a vaccine, countries are being forced to revolutionize their response and preparedness policies to health emergencies and compel themselves to the new global dynamic. Our paper, based on feedback from countries, proposed artificial intelligence solutions, the capitalization of standards-based knowledge in the face of covid-19 impacts and the concept of territorial intelligence, contributes to this global effort by proposing sustainable, smart and generic solutions against the current pandemic. Keywords: Covid-19 · Artificial intelligence · Territorial intelligence · Standardization · Preparedness and response policies

1 Introduction The world is currently in the midst of the most ferocious health crisis in its history. This pandemic belonging to the family of corona viruses and named by the scientific committee SARS 2 in reference to Severe Acute Respiratory Syndrome and more widely known by the public as Covid-19 began in December 2019 in China in the province of Wuhan and it has caused so far more than 894 342 deaths worldwide and 27 374 682 cases of infection [1]. Globalization, rapid urban development and the evolution of transportation, telecommunications and information technologies have reinforced the spread of the virus throughout the world in comparison to previous pandemics, thus giving way to a new double economic and social crisis [2]. Against the rapid and relentless © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 1282–1296, 2021. https://doi.org/10.1007/978-3-030-66840-2_98

Survey of Global Efforts to Fight Covid-19

1283

changes that the world is witnessing, all the actors of the socio-economical, scientific, industrial and governmental ecosystems are mobilized. To date, more than 13 600 indexed papers were published by different research communities around pandemic facets across the world [3]. In addition, numerous inventions, technologies and policies that are in line with the evolution of the digital transformation have emerged, involving smart cooperation of human, intellectual and technological capital worldwide, with the aim to unveil the multi-dimensional feature of this new virus [4]. Through this paper we explored global efforts being made against the pandemic according to four different dimensions. The first section presents the role that standardization plays in meeting the emerging challenges of the virus and capitalizes on the knowledge gained from these efforts by proposing a continuous and cross-cutting improvement plan to address pandemic risks world widely. The second section introduces the concept of territorial intelligence and its importance in countering the impacts of the pandemic on small and large scale. The third section presents a state of the art on the role of artificial intelligence and contributes to the field by proposing a generic architecture for fighting virus aspects based on AI key concepts. Finally, the fourth section presents the feedback from countries that have declared victory against the virus and through a critical look compares these countries with others that have been less effective. The last section synthesizes all these points and opens up different perspectives for improvement.

2 Standards, Policies and Referential Leveraging the worldwide expertise of numerous communities during Covid-19 has been achieved by proposing various dedicated standards, referential and road maps aiming at providing a high and broad-based description of the measures, barriers and best practices to be followed in order to address the current challenges that threaten several sectors and communities. These specifications included medical domains by certification organisms such as AFNOR [5], ASTM [6], IEC and ISO [7] and health organizations [8], industrial [9] and economic working groups [10], digital healthcare practitioner and digital solutions ecosystems and last but not least social science experts and research networks. Standard-setting builds international consensus and smart cooperation for enhanced response, it provides a well-developed auditing system for continuous monitoring of compliance with guidelines in the field of public health preparedness and emergency management [11]. Standardization efforts to date have addressed a series of challenges arising from the evolution of the pandemic and its short and long-term impacts. Through this part we will explore four essential challenges in the management of the Covid-19 pandemic which are, the management of the triple crisis sanitary, social and economic and the elaboration as well as the incorporation and respect of the prevention and response plans. In response to the health crisis, standard-setting initiatives have focused on the use of a number of existing standards for the design, verification and validation of PPE, medical devices and mental health management systems for both patients and healthcare professionals, as well as for all stakeholders involved in managing the pandemic [12]. Other standards have focused on the development of the digital healthcare roadmap and the strengthening of existing standards in the field through the conclusions drawn

1284

B. Zineb et al.

from the current health crisis [13]. A last category was interested in the development of a real time monitoring system of epidemiological indicators and the development of new indicators taking into account several features that combine the characteristics and clinical aspects of Covid-19 with its evolutionary social, economic and health conditions such as the open source system proposed by Oxford University [14]. Against the social crisis and the new constraints imposed by the pandemic on the territorial and global management of communities, the efforts made have focused on the establishment and implementation of a new global dynamic. This new dynamic has addressed a number of public services offered by the administrations and that could generate potential risks factors accelerating the spread of the pandemic, such as education and public transport services. Thus the new dynamic driven by digitalization has consisted in strengthening the implementation of standards for the use of digital tools in its fields through, for example, the proposal of guidelines for the integration of e-learning [15], the strengthening of the normative framework for digital public administration and cryptocurrency and the blockchain for the dematerialization of transactions and finance, which constitutes a major source of threat [16]. The third category of challenges that constitutes a turning point for the management of the pandemic in the post-lockdown period are the response and prevention plans. The generalization of prevention and response plans and their efficiency requires the establishment of a standard framework for compliance, implementation and audit of protection and prevention measures. Among the tools that have been used by a large community in the context of the current pandemic are tracing and tracking applications. These applications require a considerable level of confidentiality, interoperability and connectivity, and their deployment has required considerable efforts around the world, resulting in a number of communication protocols dedicated and specific to the current context, which have been proposed in the context of several projects in Japan [17], the USA and Germany. A number of countries within the framework of international cooperation against Covid-19 have also publicized their prevention and response plans which has resulted in a substantial international knowledge base [18]. Against the economic crisis, countries have tried to strengthen the resilience of their value chain by using existing standards or by trying to develop benchmarks that meet current constraints. The efforts made have particularly tried to address a twin crisis by proposing a new economic approach that integrates health constraints with respect to public healthcare of human resources that drive the economic wheel. Thus, some countries and normalisation organisms such as ISO or ILO have been working on the proposal of a new standard for the organization of working environments to ensure high productivity while providing a compliant OHS performance [19]. Other communities have focused on the definition of a structured framework for Tele-working. In some countries new laws have been put in place and new procedures have been introduced, revolutionizing particular areas, such as the example of Morocco with a new standardization of port procedures through its Port Net project [20]. By analysing the impact of its efforts on the evolution of the pandemic and the criticality of its impacts, we were able to deduce six relevant areas of development that could be integrated into the management of a risk of future pandemics, and thus put in place a roadmap and resilient health crisis management systems.

Survey of Global Efforts to Fight Covid-19

1285

2.1 Preparedness and Quick Response Standards, policies, accreditations and regulations allow innovation to benefit from all relevant expertise in numerous fields to accelerate the process of implementation and industrialization of technological solutions and medical equipment to counter Covid-19 impacts. This process can be enhanced by compare and comply solutions enabled by natural language processing technologies that takes as an input gathered expertise and gives as an output compliance results and shared with both innovators and institutions through cloud platforms. 2.2 Quality Assurance and Infrastructure Standardization can assure high quality levels and compliance with well-developed technical requirements to help design sustainable and mature hospital infrastructures for quality diagnostics, proven testing processes, efficient metrology systems and medical devices for Public Health Emergency Preparedness system maturity enhancement. 2.3 Transparent Communication and Global Consensus Acting as an interactive bridge of mutual exchange between different collaborators to build a trustful relationship between institutions in charge of crisis management and different social communities. Creation of collaborations and committees for pandemic management. 2.4 Experience Feedback and Knowledge Management Capitalization on past experiences in pandemic management on smaller scales and lessons learned in the management of epidemics that the world experienced before such as H1N1, Ebola, MERS or SARS and that the world is currently experiencing throughout the world according to the countries that succeeded in their preparedness and response plans such as South Korea, Singapore and Taiwan. 2.5 Continuity and Resilience Resilience is defined as the ability to recover from a crisis or unpredictable event that has occurred during a structure’s normal activity and operation. Several standards for business continuity management [21], risk management [22], and emergency management emphasize this concept as a key to achieving dynamic proactivity for healthcare, economic, industrial and social systems. The situation currently will give birth to new standards for the new global pandemic and to prevent potential outbreaks in the future which can help the reinforcement of organizational systems resilience. 2.6 3R Resistance-Relaunch-Recovery Management At present time of preparation for deconfinement according to the new requirements of community life encouraging the respect of the 3R’s, the role of standards is essential to define the guidelines for each of these three dimensions according to the new constraints, risks and parameters of each region [23].

1286

B. Zineb et al.

3 Territorial Intelligence 3.1 Pillars and Concepts Researches carried out on territorial intelligence identified it according to two strands. The first strand focused on economic intelligence and thus defined it as a set of governance strategy aiming at the creation of a dynamic network between socio-economic and institutional agents in order to ensure business intelligence and favorable conditions for the development of economic competitiveness. The second strand defines it according to a cooperative approach whereby IT is a cooperation of different scientific, industrial, governmental and economic ecosystems for the creation of an efficient local capital towards a sustainable local development [24]. The latter concerns the development of an intelligent territorial information system. The current pandemic context has led several countries to rethink their industrial and territorial policies and to promote cooperative logistics promulgated by IT as drivers for the improvement of their ecosystems’ response and preparedness to the short and long term impacts of the health crisis [25]. This cooperation can be apprehended according to three pillars Collective intelligence, economic intelligence, and geospatial intelligence and four levers of improvement Intelligent infrastructures, Blue economy and finally reinforcement of the digital transformation. Table 1 represents Territorial intelligence pillars and their potential against Covid-19. 3.2 Opportunities for Growth Infrastructure Rehabilitation Confronted with the health crisis, the reinforcement of health, transport and telecommunication infrastructures has become essential. To ensure that health measures are respected, particularly in rural areas, several conditions are required, including building ventilation, water availability, intelligent waste management and medical assistance availability. The development of renewable and especially photovoltaic technologies, geospatial intelligence, additive manufacturing and digital systems engineering for medical devices make it possible to achieve an intelligent and autonomous rural and urban health infrastructure. In Africa many areas due to the vulnerability of the infrastructure are isolated from the world during disasters. International associations such as the United Nations in collaboration with local institutions are continuously mobilizing to address this situation. Today, due to the Covid-19, the responsibility is exclusively passed on to local organizations [26]. This transition can be achieved efficiently with Territorial intelligence strategies that enable the establishment of advanced regionalization plans. The last point concerns further strengthening of telecommunications infrastructure. As a result of the pandemic, several activities such as education have been forced to go digital, which requires the establishment of an intelligent IT infrastructure that includes basically electrifying all of the areas, including rural areas, availability of connected objects mainly smart phones and computers, accessibility to the network and resilient, secure and mature information systems [27].

Survey of Global Efforts to Fight Covid-19

1287

Blue Economy Health crisis and preventive measures throughout the world have given rise to a double social and economic crisis due to the disruption of several value chains of the world’s Table 1. Territorial intelligence pillars geospatial, economic and collaborative intelligence Key features against Covid-19 challenges

Driven Technologies

Field of application under the auspices of Covid-19

Geospatial intelligence

Based on • Interactive mapping multidimensional and geo-statistics for analysis of imagery and collection, processing collected geospatial and dissemination of information, it aims to geographical data • Business information accurately capture, models BIM for cities visualize and portray 3D modeling reference physical • Machine learning assets and human exploitation through activities on earth in the Geospatial time of pandemic intelligence for forecasting in order to support decision making

• Local response matrix for the efficient management of commodities supply Through Geographic Information Systems • Social problems tracking and most vulnerable areas detection • Open geospatial data platforms for disaster response • Access to critical social and sanitary data and information’s for multinational joint operations

Economic intelligence

Constitutes actions and • Business intelligence plans governed by technologies and context analysis and market provided smart cooperation software’s • Knowledge between decision management for makers, technology internal resources watchers and project exploitation and manager with the aim of optimization ensuring economy • Unemployment competitiveness and forecasting and resilience to context management of new changes and jobs opportunities uncertainties through territorial cooperation

• Innovation and technological reinforcement through public and private investissements • Contextual Consumption survey and analysis • Policies preparing for the new normal and businesses recovery according to feedback capitalization and extended knowledge management • Establishment of economic and social monitoring committees at national level (continued)

1288

B. Zineb et al. Table 1. (continued)

Collaborative intelligence

Key features against Covid-19 challenges

Driven Technologies

Field of application under the auspices of Covid-19

Consists on creating territorial cooperation of stakeholder’s clusters in order to develop shared and available expert systems for decision support and collaborative well being

• Forecasting methods based on artificial intelligence • Collaborative platforms for Risk assessment • Cloud services • Expert systems for decision making support • Multiagent systems for autonomous systems development

• Large scale forecasting of key sanitary, social and economic indicators related to pandemic impacts • Smart supply Chain establishments based on General Collaborative Intelligence • Development of cities management systems resilience to encounter pandemic impacts • Establishment of collaborative platforms for crowdsourcing that gathers experts and local innovators

economic leaders, but above all of the emerging economies [28]. A response to this disruption by some countries has been achieved by local development of certain raw materials through innovation and reengineering, restructuring and accommodation of certain production chains and refocusing on local resources. These three aspects are typical principles of the blue economy. Blue economy consists of the creation of added value by relocating both economy and production through the exploitation of local resources and the mobilization of all local ecosystems and countries agents. As an enabler of territorial intelligence and a consolidator of circular economy, blue economy could be an efficient weapon against the current crisis especially for most affected fields such as tourism and commodities market. Reinforcing Digitalization Digitization solutions have been a driving force in the current pandemic, through their artificial intelligence [29], big data, and additive manufacturing pillars which contribute to improving the response of different systems, including healthcare, social and industrial systems. Territorial intelligence as we defined it in the first part is based on the establishment of intelligent territorial information systems. The latter which aims at the efficient communication of data concerning the territory and its components and their

Survey of Global Efforts to Fight Covid-19

1289

association with the knowledge developed by the country’s technological and strategic clusters for the proposal of strategic plans, policies and preventive measures and elements of response to the impacts of Covid-19.

4 Applications that Integrate AI to Fight Covid-19 AI can certainly be considered as a “powerful” tool that can help generate information to develop more accurate and effective strategies at all levels of the fight against the epidemic: detection of outbreaks, outbreak estimation, treatment research and medical diagnosis, prediction of future outbreak and management of the transition period and to allow a return to the rather normal situation. Considering the importance of AI to fight against Covid-19, several authors have attacked this aspect by proposing several solutions. There are the main applications against Covid-19 using IA: The authors of this work [30] propose a deconfinement strategy which is based on the following elements: • A progressive and decentralized deconfinement by steps in localities according to the relative demographic contagion factor (RDCF). • The algorithm makes it possible to have an epidemic monitoring cell. • The strategy requires that the inter-city traffic is kept extremely limited and regulated. • HPAHs and the elderly, as well as people at high risk, must be kept confined in reduced contact with the population. • The wearing of masks must become mandatory. “AlloCovids” [31] service is based on an artificial intelligence and designed by Inserm (the French National Institute for Health and Medical Research) to remotely diagnose possible Covid-19 infections, and if cases are found, they will be added to the database of infected people, while being directed to the most appropriate care service. Foch Hospital [32] developed AI-based medical imaging software to detect lung damage caused by Covid-19. The tool, developed by the German company Siemens Healthineers, was successfully tested on 150 patients. The AI aim also to develop a solution [33] that will be based on epidemiological modelling to propose an estimate of the number of medical staff who could become ill in the coming days. Several actors have used AI algorithms, which have regularly proven their potential for drug discovery [34]. With two logics at work: observing existing molecules to determine their potential in the face of a new pathology (repositioning) and inventing new molecules from scratch (design). This solution [35] is based on artificial intelligence (AI) to understand the evolution of the coronavirus and determine the actions to be taken to limit its impact. More specifically, this solution will make it possible to identify the clinical severity of the cases contaminated by the pandemic, and will help doctors to determine which patients really need to be treated and which ones can be confined at home.

1290

B. Zineb et al.

The authors proposed [36] a mobile application called BreakTheChain that integrates AI for virus tracking and prediction of its behavior. The BreakTheChain application can perform three main functions/ • Identify: all the essential data of the individual is identified and updated. • Alert: once the patient is positive on the COVID-19 test, the application sends alert messages to the employer or institution. It also alerts neighboring hospitals • Predictor: evaluate the database and identify the possible location where the virus may spread in the coming weeks. AI is also used to detect abnormal respiratory patterns of people [37] (which is an essential indicator of infection) and this is done through Deep Learning (DL) (aiming to classify six significant respiratory patterns related to COVID-19). There is also the COVIDX-Net solution [38] which is intended to help radiologists automatically diagnose COVID-19 in radiological images. This solution is based on deep learning. The authors of this work [39] use the AI-based machine learning to predict any synthetic inhibitory antibodies to the Corona virus. AI also includes the deep learning that will be used to detect whether a person is wearing a mask or not [40]. The use of AI and surveillance technologies to track the spread of coronavirus or to improve control and detection capabilities appears to be an effective response, but any excessive or unethical use can lead to serious violations of the right to privacy and non-discrimination.

5 Coronavirus Fighting Using AI – Architecture The proposed architecture (See Fig. 1) consists of three main layers and two cross functions layers. The first three layers are data sources, AI applications and user domain layers. The two inter layers are security and network layers. The main purpose of this architecture is to ensure a distributed and intelligent management of the pandemic, by means of a distributed and intelligent structure based on artificial intelligence technologies and a meta-structure driven by secure and efficient networking and interoperability mechanisms. The first layer is an acquisition layer designed to collect a large amount of data and information on the real environment as well as on the evolution of the pandemic and its social and economic impacts on the short and long term through a set of data collection and acquisition devices such as smart sensors, IoT objects and clinical databases conceived for the scientific and medical exploration of Covid-19 around the world. These data collected through the hardware and software components of this layer will undergo a first pre-processing aiming at ensuring their fast and secure transport, hence the usefulness of the integration in the architecture of the two crosscutting layers. The second layer of the architecture consists of three modules. The first data management module collects the incoming data and information flows and ensures its processing over several stages, cleaning, merging, analysis, and processing. The resulting flow from

Survey of Global Efforts to Fight Covid-19

1291

Peer System

User

Stakeholders

Application Layer

Medical

Treatment Research

Diagnosis

Service Layer

Data Management Layer

Future Outbreak Prediction

Deconfineme nt Strategy

Security Layer

Outbreaks Estimation

Network

AI Application

Outbreaks Detection

Data Cleaning

Data Fusion

Data Analysis

Data Processing

Data Storing

Data Source

Sensors

Mobile phone operators

Social media

Crowdsourcing

Hospitals/clinical Data

DB of infected people

Norms & Standards

...

Fig. 1. Coronavirus fighting using AI – architecture.

this module is routed to an intermediate storage device that communicates it to the other two service and application modules. This layer provides a number of functions and is the basic rules engine for pandemic prediction, diagnosis and analysis applications. The security of the information and data flows between these modules and the connectivity between them is ensured by the cross-functional layers. The last layer of the architecture is a front-end platform for interfacing with the various users of the architecture. Three categories of users constitute this layer: stakeholders who are involved in the management of the pandemic including government institutions, industrials or external parties and a larger scientific community that can exploit the feedback resulting from the different analyses output of the previous layers. The second category are the main users of the architecture including citizens, health care professionals, and patients suffering from Covid-19. The third category consists of peer systems that can communicate with the architecture to exploit the results obtained in specific applications. The communication with these users, the access management and the security of the transactions are ensured by security and network layers.

1292

B. Zineb et al.

6 Countries’ Experience with Covid-19 6.1 New Zealand New Zealand appears to be a model for combating the spread of the Coronavirus, with a strategy of prudence and acting rapidly. Furthermore, New Zealand’s objective was not to slow down the coronavirus epidemic, but simply to eliminate it. Indeed, severe measures had been taken to restrict entry into the territory, even though the country had only 6 confirmed cases. Regarding the Covid-19 detection tests, they were carried out on a large scale (nearly 8,000 tests per day) in the population, compared to some European countries, which made it possible to detect the infected persons very quickly and to isolate them. In addition, strict containment measures have been put in place, and on the whole, the population remains disciplined and confined. An economic plan to reduce the future recession had been announced. The results of the measures taken by the government (to date) are very encouraging, as the number of active cases in New Zealand is decreasing every day. But despite these results, deconfinement will be gradual in order to continue to search for the last cases. 6.2 South Korea For South Korea, it adopted two concepts to win the battle against Covid-19, massive testing and tracing. In addition, in South Korea, containment has not been applied. Masks are worn by everyone and temperature taking is widespread. South Korea was able to draw logistical and economic lessons from the SARS in 2003 and the MERS in 2015, which enabled it to anticipate, in particular, the provision of equipment. Indeed, the country was able to limit the spread of Covid-19 thanks to measures taken very early: South Korea started producing the test kits a week before the first case appeared, in order to be as prepared as possible. In addition, from the very beginning of the outbreak, South Korea took the initiative to quarantine returnees from abroad. This measure helped, in the first instance, to slow down the spread. Secondly, the use of contact tracing applications was introduced, making it possible to trace the people who were in contact with the cases declared positive, in addition to the movements of the latter. 6.3 Morocco Against Covid-19, the Moroccan government has had to face different challenges. Indeed, in parallel with the challenge of the rapid spread of the COVID-19 infection, it has taken the responsibility of raising people’s awareness of the critical situation while avoiding creating panic and convincing people of the idea of containment despite its economic consequences on families [41].

Survey of Global Efforts to Fight Covid-19

1293

Among the instructions taken into account in this regard, limiting the circulation of people, social distancing, stopping almost all professional activities and requiring the population to use protective masks and gloves, closing borders, suspending travel to the cities most affected by the pandemic. To succeed in all these instructions, Morocco is putting in place several actions [42]: • A strong intervention: Elements of the security forces were quickly deployed in the cities to ensure compliance with the containment instructions. • The Economic Supervisory Committee was created with the aim of identifying the measures necessary to maintain economic stability. • Creation of the Special Fund for Coronavirus Pandemic Management. The purpose of the Fund is mainly to finance the hospital and medical equipment needed for people infected with the virus, as well as the deployment of financial measures for people in need. • Involvement of the Central Bank to facilitate access to credit for both businesses and households. • A measure for the benefit of employees who have lost their jobs. All employees who lost their jobs due to the crisis received a monthly allowance and a delay in the payment of their bank loans. This measure is supported by the Covid-19 Special Fund. • Support for the informal sector as it was directly impacted, given the sudden cessation of its activities, which caused a disruption in the financial inputs of the families that depended on it. • … Despite all the instructions and efforts, Morocco could not succeed in its fight against the Covid-19. This is due to the underestimation of the seriousness of the disease on the part of Moroccan citizens. On the other hand, the weak infrastructure of the health, education sector and even of the scientific research stage have participated in Morocco’s failure against Covid-19. 6.4 United States In contrast, we take the example of the United States to draw the lessons and what not to do. The United States is being badly affected and has wasted precious time in managing the crisis. 84133 deaths have officially been recorded, for 1390746 infected people (statistics up to 14/05/2020). This is more than in China, where the epidemic had broken out. Indeed, the United States did not take things seriously and American hospitals are very poorly prepared. Moreover, it took far too long to mobilize the federal government, businesses and to organize aid. In conclusion, the lesson we can learn from the experience of the United States is that to win the battle against Covid-19, it is necessary to be quick and reactive (the United States waited three months and 3,800 deaths to recognize the seriousness of the Covid-19 epidemic).

1294

B. Zineb et al.

6.5 Discussion In view of the experiences of the different countries against Covid-19, there are several lessons to be drawn and adopted in the Moroccan context for the management of any pandemic: • • • • •

A rapid risk assessment, based on scientific criteria. Rapid and decisive government action. Raising awareness and engaging citizens without panicking them. Strengthening and reversing the health and scientific research sectors. Working on innovations and integration of the latest technologies and techniques (IoT, IA…). • Implementing interventions at various levels.

7 Conclusion and Future Works On the basis of the feed-back that we have presented throughout the paper with regard to four axes of the fight against Covid-19 that are artificial intelligence, standardization, territorial intelligence, and capitalization of countries response and preparedness plans to the pandemic, we have been able to draw several conclusions on the pandemic and its current situation. These conclusions were concretized within the paper by proposing a number of contributions to the different axes that are the proposal of a generic architecture for the fight against the pandemic based on AI and an action plan to reduce the risks of the outbreak and to improve the resilience of different health and socio-economic systems based on standardization and territorial intelligence. The aim of its proposals is to develop proactive, smart and resilient local and global response systems based on knowledge capitalization and advanced communication and information technologies, thus limiting the impact of the current crisis and the risks of future outbreaks. Our future work will focus on the concrete implementation of these.

References 1. COVID-19 Map. https://coronavirus.jhu.edu/map.html 2. Nicola, M., Alsafi, Z., Sohrabi, C., Kerwan, A., Al-Jabir, A., Iosifidis, C., Agha, M., Agha, R.: The socio-economic implications of the coronavirus and COVID-19 pandemic: a review. Int J Surg. 78, 185–193 (2020). https://doi.org/10.1016/j.ijsu.2020.04.018 3. Chahrour, M., Assi, S., Bejjani, M., Nasrallah, A.A., Salhab, H., Fares, M.Y., Khachfe, H.H.: A bibliometric analysis of COVID-19 research activity: a call for increased output. Cureus 12, e7357 (2020). https://doi.org/10.7759/cureus.7357 4. Elavarasan, R.M., Pugazhendhi, R.: Restructured society and environment: a review on potential technological strategies to control the COVID-19 pandemic. Sci. Total Environ. 725, 138858 (2020). https://doi.org/10.1016/j.scitotenv.2020.138858 5. AFNOR Spec – Masques barrières - AFNOR Groupe. https://masques-barrieres.afnor.org/ 6. ASTM Standards & COVID-19. https://www.astm.org/COVID-19/index.html 7. ISO 17510:2015(en): Medical devices — Sleep apnoea breathing therapy — masks and application accessories. https://www.iso.org/obp/ui#iso:std:iso:17510:ed-1:v1:en

Survey of Global Efforts to Fight Covid-19

1295

8. CDC: Coronavirus Disease 2019 (COVID-19). https://www.cdc.gov/coronavirus/2019-ncov/ symptoms-testing/share-facts.html 9. Safety and Health Topics | COVID-19 - Standards | Occupational Safety and Health Administration. https://www.osha.gov/SLTC/covid-19/standards.html 10. CDC: Coronavirus Disease 2019 (COVID-19) - interim guidance for businesses and employers. https://www.cdc.gov/coronavirus/2019-ncov/community/guidance-business-res ponse.html 11. Katz, R., Banaski, J.: Essentials of Public Health Preparedness and Emergency Management. Jones & Bartlett Learning, Burlington (2018) 12. Protective Masks: Download our Reference Document for Free! (2020). https://www.afnor. org/en/news/protective-masks-download-our-reference-document-for-free/ 13. WHO | Classification of digital health interventions v1.0. http://www.who.int/reproductive health/publications/mhealth/classification-digital-health-interventions/en/ 14. Oxford Covid-19 Government Response Tracker | UNESCO Inclusive Policy Lab. https://en. unesco.org/inclusivepolicylab/learning/oxford-covid-19-government-response-tracker 15. https://plus.google.com/+UNESCO: Distance learning solutions. https://en.unesco.org/cov id19/educationresponse/solutions 16. ISO/TC 307 - Blockchain and distributed ledger technologies. https://www.iso.org/cms/ren der/live/en/sites/isoorg/contents/data/committee/62/66/6266604.html 17. Bay, J., Kek, J., Tan, A., Hau, C.S.: BlueTrace: a privacy-preserving protocol for communitydriven contact tracing across borders (2020) 18. Country Policy Responses (COVID-19 and the World of Work). https://www.ilo.org/global/ topics/coronavirus/regional-country/country-responses/lang–en/index.htm 19. Sasaki, N., Kuroda, R., Tsuno, K., Kawakami, N.: Workplace responses to COVID-19 associated with mental health and work performance of employees in Japan. J. Occup. Health 62, e12134 (2020). https://doi.org/10.1002/1348-9585.12134 20. Hafsi, N.: Portnet in Morocco: creating a strategic alliance between port and foreign trade communities for a competitive economic operator (English). IFC Smart Lessons Brief. World Bank Group, Washington, D.C. (2017). http://documents.worldbank.org/curated/en/811641 488802069586/Portnet-in-Morocco-creating-a-strategic-alliance-between-port-and-foreigntrade-communities-for-a-competitive-economic-operator 21. ISO 22301:2019(en): Security and resilience—business continuity management systems— requirements. https://www.iso.org/obp/ui#iso:std:iso:22301:ed-2:v1:en 22. ISO 31000:2018(en): Risk management—Guidelines. https://www.iso.org/obp/ui#iso:std: iso:31000:ed-2:v1:en 23. Beyond Coronavirus: The path to the next normal | McKinsey. https://www.mckinsey.com/ industries/healthcare-systems-and-services/our-insights/beyond-coronavirus-thepath-to-thenext-normal 24. Pelissier, M., Pybourdin, I.: L’intelligence territoriale. Les Cahiers du numerique 5, 93–109 (2009) 25. Hu, D., et al.: More Effective Strategies are Required to Strengthen Public Awareness of COVID-19: Evidence from Google Trends. Social Science Research Network, Rochester (2020) 26. Makoni, M.: Africa prepares for coronavirus. Lancet 395, 483 (2020). https://doi.org/10.1016/ S0140-6736(20)30355-X 27. Torres, I., Sacoto, F.: Localising an asset-based COVID-19 response in Ecuador. Lancet 395, 1339 (2020). https://doi.org/10.1016/S0140-6736(20)30851-5 28. Williams, A.E.: Transforming Economies and Generating Sustainable “Green” Economic Growth After the COVID-19 Pandemic through General Collective Intelligence. SocArXiv (2020)

1296

B. Zineb et al.

29. Peng, M., et al.: Artificial Intelligence Application in COVID-19 Diagnosis and Prediction. Social Science Research Network, Rochester (2020) 30. Gamoura, S.C.: Proposition De Strategie De Deconfinement Basee Sur L’intelligence Artificielle Et La Prediction: Analytiques Et Prediction En Temps Reel De La Pandemie Covid-19 (2020). https://doi.org/10.13140/rg.2.2.17094.22083 31. AlloCovid: Numéro national 0 806 800 540. https://www.allocovid.com/. Consulté le mai 10 2020 32. Covid-19: L’hôpital Foch utilise l’intelligence artificielle pour détecter les lésions pulmonaires. usine-digitale.fr. https://www.usine-digitale.fr/article/covid-19-l-hopital-foch-uti lise-l-intelligence-artificielle-pour-detecter-les-lesions-pulmonaires.N961481. Consulté le mai 10 2020 33. L’intelligence artificielle au service de la lutte contre le Covid-19. https://www.defense.gouv. fr/dga/actualite/l-intelligence-artificielle-au-service-de-la-lutte-contre-le-covid-19. Consulté le mai 10 2020 34. Nouvelle, L.: [Covid-19] Comment l’Europe compte sur l’IA pour accélérer la découverte de médicaments - Technos et Innovations. (avr. 2020). https://www.usinenouvelle.com/edi torial/covid-19-comment-l-europe-compte-sur-l-ia-pour-accelerer-la-decouverte-de-medica ments.N956061?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+ a-la-une+%28Usine+Nouvelle+-+A+la+une%29. Consulté le mai 10 2020. [En ligne] 35. Rahioui, A.: Anasse BARI, lauréat d’Al Akhawayn, conçoit un outil d’intelligence artificielle pour appréhender l’évolution du COVID-19. Industrie du Maroc Magazine (avr. 15 2020). https://industries.ma/anasse-bari-laureat-dal-akhawayn-concoit-un-outil-dintellig ence-artificielle-pour-apprehender-levolution-du-covid-19/. Consulté le mai 11 2020 36. Vanathi, J., SriPradha, G.: BreakTheChain: A proposed AI powered mobile application framework to handle COVID-19 pandemic, no 2231, p. 8 (2020) 37. Wang, Y., Hu, M., Li, Q., Zhang, X.-P., Zhai, G., Yao, N.: Abnormal respiratory patterns classifier may contribute to large-scale screening of people infected with COVID-19 in an accurate and unobtrusive manner. ArXiv200205534 Cs Eess (févr. 2020). http://arxiv.org/abs/ 2002.05534. Consulté le mai 13 2020. [En ligne] 38. Hemdan, E.E.-D., Shouman, M.A., Karar, M.E.: COVIDX-Net: a framework of deep learning classifiers to diagnose COVID-19 in X-ray images, p. 14 (2020) 39. Magar, R., Yadav, P., Farimani, A.B.: Potential neutralizing antibodies discovered for novel corona virus using machine learning. Immunology (mars 2020). https://doi.org/10.1101/2020. 03.14.992156 40. Wang, Z., et al.: Masked face recognition dataset and application, p. 3 (2020) 41. Ait Addi, R., Benksim, A., Amine, M., Cherkaoui, M.: COVID-19 outbreak and perspective in Morocco. Electron. J. Gen. Med. 17(4), em204 (2020). https://doi.org/10.29333/ejgm/7857 42. Bouhia, P.H.: Le Maroc Face au Covid-19: Agilité, Cohésion et Innovation, p. 14 (2020)

3D City Modelling and Augmented Reality

3D City Modelling Toward Conservation and Management. The Digital Documentation of Museu do Ipiranga – USP, San Paulo, Brazil M. Balzani1 , L. Rossato1 , F. Raco1(B) , and B. Mugayar Kühl2 1 Department of Architecture, DIAPReM/TekneHub, University of Ferrara, Ferrara, Italy

{bzm,rsslcu,rcafbn}@unife.it 2 FAU-USP Faculdade de Arquitetura e Urbanismo dell’Universidade de São Paulo, São Paulo,

Brazil [email protected]

Abstract. The present paper illustrates the survey and documentation activities for the 3D city modelling and visualisation carried out since 2016 on complex monumental buildings of the city of São Paulo in Brazil by the DIAPReM research centre and the TekneHub Laboratory of the University of Ferrara in collaboration with FAU-USP Faculdade de Arquitetura e Urbanismo of Universidade de São San Paolo and funded by the Fundação de Apoio à Universidade de São Paulo FUSP for the definition of interdisciplinary collaboration protocols and the development of integrated digital databases of Brazilian cultural heritage. Starting from a wider joint international research collaboration dated more than five years ago, the project aims to define interdisciplinary protocols for the digital documentation of built heritage in order to support the knowledge, restoration, maintenance, management and enhancement of Museu do Ipiranga - USP involving both academic and research competencies, as well as professional and technical skills. The definition of the first integrated digital database of the Museu do Ipiranga took into account the documentation needs of complex architecture for restoration and the project for new accessibility and the extension of the Museum itself and a wider digitisation project for urban planning as well as new Smart Cultural Heritage accessibility. Keywords: 3D city modeling · 3D integrated survey · Point cloud processing and analysis · Smart cultural heritage · Museu do Ipiranga – USP · Brazil

1 Introduction The recent world pandemic emergency has further highlighted the fragility of the world’s tangible and intangible cultural heritage, increasing the need to implement knowledge sharing and awareness and understanding of the importance of the protection and enhancement of built heritage (ICCROM 2013). The intervention process is marked by discontinuity and lack of information relating to the current state of built heritage, the duplication of data, poor accessibility (Parrinello 2019) and usability (Ramos 2015) © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 1299–1316, 2021. https://doi.org/10.1007/978-3-030-66840-2_99

1300

M. Balzani et al.

of information, characteristics which, despite improved digitization processes, offer an interesting field for the application of exceptional ICT and Key Enabling Technologies (KETs) in order to support knowledge accessibility and semantic modelling from a Smart City management viewpoint. On one hand the uniqueness and variety of cultural heritage hinders the definition of a single method for knowledge and documentation, but at the same time it is becoming increasingly urgent to have real time sharing of an integrated information system that allows the specific features of an individual context to be overcome and the complex knowledge, management and promotion of the world’s cultural heritage to be made effective (Jokilehto 1998). Consequently, the paper aims to illustrate the impact of Smart City and Smart Object approaches also on decision-making processes and urban planning (Angelidoua 2017) with reference to a prime example of Brazilian cultural heritage, which is the result of an ongoing international cooperation between the University of Ferrara and the University of San Paolo USP which began over five years ago. From this perspective, the issue of “Smart City” and Smart Cultural Heritage refers to the social impact and inclusiveness of the intervention on built heritage as well as a means of enhancing the application of ICT, IoT and KETs integrated technologies (EU 2018). The optimization of digitisation processes on Cultural Heritage preservation, management (ICOMOS 2017) and enhancement aims to define, from the data acquisition phase, the accuracy and level of detail of the data in order to develop the digital model, digital twin, in accordance with the needs of the end-users involved. Therefore, from the perspective of ICT and integrated Key Enabling Technologies it helps to develop the accessibility of information, system automation and cost-effective solutions. On the other hand the specific cognitive skills required (Brusaporci 2018), in terms of perception and understanding of the digital model, from the real to the virtual, are still preventing the dissemination and wider adoption of Smart Objects and contents (Gaiani 2017). The development of smart contents in relation to Cultural Heritage may connect the diachronic and synchronic levels of meanings of the Cultural Heritage itself. The aim of the research project is to make the complexity of information and meanings related to built heritage understandable to a variety of experienced and inexperienced users. Consequently, the technology transfer activities within the project deal with the transfer of the skills and competences required in order to understand, use and implement digital models and make project innovation effective for more interactive and responsive involvement of citizens and an interoperable Smart City ecosystem (EU 2018). 1.1 Museu do Ipiranga - USP The building was begun in 1885 as a memorial to the independence of Brazil and as of its inauguration in 1895 it was designed to function as a museum, initially housing the natural history collection. Conceived as a monument that symbolizes Independence, the history of Brazil and Sao Paulo, the Museu do Ipiranga in Sao Paulo, commonly known as “Museu do Ipiranga”, stands on the river site where it is said that the Emperor Pedro I proclaimed Brazil’s independence in 1822, near the banks of the Ipiranga River.

3D City Modelling Toward Conservation and Management

1301

The museum, which currently houses a collection of over 125,000 exhibits of Brazilian history dating from the 16th to the mid-20th century, was designed by Eng. Tommaso Gaudenzio Bezzi and it represents, as stated in the preliminary report and recommendations for the Museu Paulista, a conservation project dated 2013, a unique testimony to the irreplaceable values of Brazil’s history, architecture and engineering (Emerich 2016). The Museum is certainly the repository of the national memory linked to São Paulo, which was built through the epic expeditions of the various local explorers and the pioneering vocation of São Paulo, since the colonial era, as testified by the objects, documents, iconography and specimens from the collection (Fonseca Brefe 2005) (Fig. 1).

Fig. 1. Museu do Ipiranga – USP main façade. View of the central projecting pronaos from the monumental gardens overlooking the main building

Even though the construction of the building was interrupted several times and underwent variations, the Museu do Ipiranga – USP was intended to testify the memory of Brazilian independence. Moreover, although increasing urbanisation is still taking place in the area of the Museu do Ipiranga – USP, Parque da Independência, Monumento à Independência do Brasil and Casa do Grito and the history of the place is still unknown, every year crowds of tourists and locals attend the Independence Anniversary celebrations. Consequently, the international research group started developing the 3D city

1302

M. Balzani et al.

modelling as described below in order to support conscious conservative and management intervention processes as well as future IoT and KETs technology applications across industries and society. Currently, the building is bound under Brazilian law through the Instituto do Patrimônio Histórico e Artístico Nacional - IPHAN and has been closed to the public since 2013 for total restoration, one of the aims of which is to preserve the important functions performed by this significant example of architecture. Based on a series of studies on the behaviour of the building and on the state of conservation of the roofs and facades and a structural analysis, a competition was held in 2017 in order to explore ideas for the restoration of the museum and the extension of the building. The renovation project consists of 3,345 m2 in order to accommodate the new accessibility to the building, in compliance with safety and accessibility standards, under the Esplanada. The main aim of the new project is to emphasise inclusive public fruition and engagement (Fig. 2).

Fig. 2. View of the perspective axis in the direction of the Monumento à Independência from the roof of the Museu do Ipiranga

Consequently, all the administrative areas have been moved, in order to allow the full enjoyment of the building’s architecture, in a way that is fully integrated with the urban complex. In fact, in addition to the dissemination and interpretation of its permanent collections, focusing on material culture, the Museu Paulista institute conducts advanced

3D City Modelling Toward Conservation and Management

1303

research for the presentation and interpretation of new collections of Brazilian history, culture and society. At the same time, it carries out intense advanced training in museology conservation techniques and promotes many of the museology initiatives in the State of São Paulo.

2 Related Works First of all, the project involving the survey and implementation of the first integrated digital database of the Museu do Ipiranga took into consideration the needs of the conservation and restoration project as well as the requirement to document the state of deterioration and maintenance of the building elements. Subsequently, the completed 3D database allowed H+F Arquitetos studio, the winner of the public competition launched by the University of São Paulo for the restoration and modernization of the building, to verify the design hypotheses relating to the new access to the building. The project by the architects Pablo Herenu and Eduardo Ferroni (selected from the 13 proposals received by invitation) was also supported by the collaboration with the DIAPReM centre which took the form of a period of face-to-face training, at the University of Ferrara, of some of the architects from the firm. The technology transfer training activity focused on the real time management of the database in order to allow the representation of the project in the 3D point cloud model, for the purpose of verifying the feasibility of the interventions from a morphological and structural point of view. The mentioned training activity is part of a wider field of research and experimentation coordinated by the DIAPReM centre and the TekneHub Laboratory within the activities, developed in a public-private partnership with over ten large companies and SMEs of the Emilia-Romagna High Technology Network (Balzani 2020), and aimed at assessing the impact of enabling technologies on the value chain of the intervention on built heritage. In fact, it is interesting to note that even in the presence of complex projects in international contexts that are advanced from the point of view of the spread of survey and digital modelling technologies, the behaviour of the operators is mainly implemented through traditional procedures and the use of 2D and analogue representation tools. The requirement of the public client first and of the designers for 2D representation drawings with a high level of detail, and the skills of the team in terms of management of 3D point cloud databases, have led to the definition of data processing extraction protocols. As a result, the expected level of accuracy and the digital detailed survey performed were addressed towards the definition of the database structure and 2D representation processes in order to support the analysis of the geometric characteristics and the aims of the intervention project such as the understanding of: – morphologies of vertical elevation and construction elements for the understanding of the static and structural safety of the building; – development, consistency, and material characterisation of the monumental façades to allow for further investigation of the conservation status of the surfaces and integration of data from diagnostic investigations (Fig. 3).

1304

M. Balzani et al.

Fig. 3. Integrated digital database and three-dimensional point cloud model hierarchy and structure

Nevertheless, the survey and documentation project took into account the complex relationship between the Museu do Ipiranga, the park and the Monument to the Independence, which is the most important and complex urban system that bears witness to this phase of Brazilian history. Moreover, the Monument to the Independence was the subject of an integrated 3D survey by the DIAPReM/TekneHub research centre of the University of Ferrara in 2018 and the system of the park and the emerging architecture housed therein is close to completion. The survey campaign planned for spring 2020 has been interrupted and postponed due to the current world pandemic.

3 Methodology The digital documentation of tangible and intangible cultural heritage, aimed at its accessibility (UCLG 2018), as well as dissemination and more effective management (Kioussi 2012) in terms of networks and services (UN 2020), in support of the strengthening of the identity of the territories, raises challenges and opportunities which refer to both the horizontal and vertical direction of the technological development of the value chain.

3D City Modelling Toward Conservation and Management

1305

The survey of the Museu do Ipiranga in São Paulo was developed, starting in 2017, as part of the collaboration between the Universidade de São Paulo, Faculdade de Arquitetura e Urbanismo, University of São Paulo; CPC (Centro de Preservação Cultural) University of São Paulo (USP); “Fundaçao de Apoio a Universidade de São Paulo” São Paulo, Brazil, the Future Consortium in Research, CFR and the DIAPReM Research Centre (Development of automatic integrated procedures for the restoration of monuments), University of Ferrara, which has over twenty years’ experience in the definition of optimised protocols of integrated 3D survey for built heritage (Balzani et altri 2019); from the urban scale to the architectural scale. Over time, the evolution of laser scanner technologies has allowed operators to use tools in order to optimise the 3D point cloud acquisition phases in relation to the amount of information recorded, reducing the time required (Young 2019). Nevertheless, the subsequent data processing phases remain dependent on the action and experience of the operator or can at most be considered semi-automatic, for example: – noise reduction, which depends both on the instrumental characteristics and on the methods of execution of the survey project; – the segmentation (Grilli 2017) of the point cloud in order to clean up the disturbing elements and manage them both in assisted drawing and 3D modelling programs, BIM for example. The experience gained over years of applied research on numerous case studies has highlighted the close correlation between the project and the way the survey is carried out and the optimisation of the post-processing phases of noise reduction and segmentation with the same accuracy. In fact, the integration of surveying technologies, such as 3D laser scanner survey with time-of-flight technology (Leica P40 Laser Scanner), and topographic survey, Leica Total Station TCR, for the creation of a first and second level polygonal system, aims to: – reduce the number and overlapping of scans typical of a morphological approach to 3D surveying which, in current practice, tends to eliminate support target networks; – guarantee in the post-processing phase the usability of the model through the reduction of the quantity of data, but not of the accuracy necessary to guarantee that the operators of the value chain of the built heritage intervention have information models consistent with conservation needs on the one hand or enhancement with a wider public on the other. 3.1 Digital Documentation The integrated digital survey campaign involved the entire complex of more than fiftyfive rooms on the five main levels (basement, ground floor, first floor, attic and roof) which includes the Museu do Ipiranga, for a total area of over 12,500 m2 divided into about 2,500 m2 per floor. The development of the building was also taken into account, varying from 15 m to 25 m, corresponding to a development of the external fronts of about 123 m (Fig. 4).

1306

M. Balzani et al.

Fig. 4. Digital reconstruction of the Digital Elevation Models, DEM, of the mezzanine plane for the study of the relationship between the spatial and functional composition of the building and the consistency of the vertical elevation structures. The representation highlights the central system of the building jutting out at the pronaos on the main front and the monumental staircase in the atrium. Floor plan projection of the ceiling 3D survey.

Concerning the data processing phase (Weinmann 2016), the management of the large amount of data deriving from the morphological characteristics of the building determined the choice of the hierarchy of the model breakdown structure (Georgopoulos 2017). For the purposes of the survey project and the choice of the integrated time-offlight technologies adopted, the survey campaign took place over 21 working days for a total of 208 h of instrument activity, corresponding to 160 h for each operator involved in the survey activities; 1800 stations were built and 375 targets were employed for a total of 240,260,487,654 acquired coordinates. The overall database is accessible and searchable both through the Leica Cyclone 7.0 software source and through open source software compatible with the format. imp (CloudCompare). The subsequent registration phase took into account, on the one hand, the accessibility of the data in the whole point cloud model in order to allow the analytical interrogation of the data, and on the other hand, the identification and management of the most representative section plans, in order to understand the morphological relationships, for the extraction of the Digital Elevation Model (DEM) and subsequent 2D processing through assisted drawing software (CAD). The reference systems related to the identified cutplanes and the reference planes were documented in a report, together with the rototranslation matrix with the average square deviations of the registration, in order to ensure, at any time, the accessibility and verification of the choices and actions implemented, allowing in this sense any possible subsequent integration and implementation of the model also through new campaigns (Fig. 5).

3D City Modelling Toward Conservation and Management

1307

Fig. 5. Cross-section and façade with specification of the cutplanes for the identification of the planimetric DEM and analysis of the elevation profile in the relationship between museum and the gardens

3.2 Data Processing and Analysis Subsequently, the acquisition, data processing and analysis phase of the point cloud model was guided by the aim of documenting the geometrical and morphological characteristics of the building’s factual state while at the same time enabling multiscale and interdisciplinary investigations of a restoration project on the one hand and functional adaptation on the other (Kioussi 2011). There was a particular focus on the slopes and section elevation profiles in the relationship between ground level, the podium and the ground, near the relevant areas in front of the main part, now used as a park, once designed to house the extension of the Museum. As previously mentioned, for the purpose of the completion of the urban survey (2020) of the entire area that houses the Museu do Ipiranga (2016), the Parque da Independência (postponed due to the world pandemic) and the Monumento à Independência do Brasil (2019), the choice of the cross-section planes was conducted with the further geometric morphological characteristics of the site in relation to the original project, its variants and the construction choices adopted. The orographic and planoaltimetric characteristics of this area, which was at the time sparsely urbanised, of the city of São Paulo were well suited to the idea of a man-made landscape, the park, within which the main architectural features of the Museu do Ipiranga, Parque da Independência, Monumento à Independência do Brasil would enter into a direct visual relationship, together with secondary architecture such as the Casa do Grito (House of Scream), through the construction of precise perspective cones (Figs. 6, 7 and Table 1).

1308

M. Balzani et al.

Fig. 6. Digital Elevation Model, DEM, of the roofing system for the analysis of the wooden truss elements of the primary structural system.

3D City Modelling Toward Conservation and Management

1309

Fig. 7. Digital Elevation Model, DEM, Monumento à Independência do Brasil; longitudinal section; axonometric cutaway; planimetry of the imperial crypt.

1310

M. Balzani et al.

Table 1. Survey data and technical equipment: 3D integrated survey of Museu do Ipiranga – USP General info Mission start

10th August 2017

Mission end

31st August 2017

Days of work

21 working days

Hours of work

208 h

Technicians

2

Survey data Equipment

2 Leica P40 + LEICA TCR 1203

Time of data capturing

160 h/person

Number of scan stations

1800

Number of targets

375

Number of points (coordinates) 240.260.487.654

The 3D survey level of detail allows experts to develop both conventional and nonconventional representation for a variety of end users involved. 3.3 Medium and Long-Term Fallout The digitisation process, through digital survey and documentation of the complex urban area of Museu do Ipiranga – USP, Parque da Independência and Monumento à Independência, is the first challenge of the international cooperation activity toward the digitisation of the Brazilian Cultural Heritage in order to support more effective decision making processes and awareness of the use of resources, with reference to the identity of the place. Moreover, the level of accuracy pursued in the data acquisition and processing phases took into account the following need to develop digital contents, model segmentations and interoperable and detailed representations that even allow an inexperienced audience to access cultural heritage, to engage new users and to develop creative and accessible content for education and the enhancement of cultural heritage enhancement. As a result, the digital model is suitable for implementing specialist uses such as data integration related to: sensors and remote sensing in order to monitor the state of conservation and maintenance of buildings, public spaces and so on; integrated Key Enabling Technologies, thermal imaging, GIS, satellite maps, for the management of security in public spaces; VR and AR content for the enhancement of tangible and intangible cultural heritage, which helps with intercultural dialogue, encourages mutual respect and provides innovative solutions for accessing cultural contents (Fig. 8). The research is certainly part of a broader challenge for the development of cross sector digital platforms (EIP SCC 2020) of built heritage, in order to support data-based decision-making processes as well as the production of knowledge through participatory action research and knowledge accessibility enhancement (Gaiani 2017) (Fig. 9).

3D City Modelling Toward Conservation and Management

1311

Fig. 8. (Left) Museu do Ipiranga main façade: central perspective view from Monumento à Independência do Brasil and the relationship with the Parque da Independência. (Right) Monumento à Independência do Brasil: central perspective view from Museu do Ipiranga and the relationship with the Parque da Independência.

Fig. 9. Museu do Ipiranga, Parque da Independência, Monumento à Independência do Brasil: 3D survey campaigns

4 Conclusion The research activities carried out since 2017 have made it possible to consolidate integrated survey protocols and the digital documentation of complex architectures subject to interventions, restoration and in part adaptation, which require multidisciplinary contributions (Ioannides 2018). At the same time, the experience gained with the Museu do Ipiranga in the relationship between the urban system of the park and the Monument to the Independence of Brazil is contributing to the optimisation of the protocols for the acquisition and management of integrated digital information systems from the scale of architectural survey to urban survey, toward the development of a digital platform for the digitisation of cultural heritage as well as an open information system. The activity carried out concerned the strategy of integration of digital terrestrial survey methods aimed at the development of databases and 3D models that are reliable and implementable over time (cit.). In particular, the use of topographic survey together with the Terrestrial Laser Scanner (TLS) survey instrumentation pursues multiple purposes: firstly, the verification and control of geometric and morphological characteristics, from

1312

M. Balzani et al.

the architectural scale to the detail scale, with a view to the accessibility and usability of the database created and subsequent implementations, also conducted with different tools and survey techniques. Subsequently, the normalisation of the information models generated is pursued, also with a view to subsequent semantic modelling through the use of Building Information Modelling tools, taking into account both the cognitive needs and methods of data representation and visualisation during the project phase and subsequent needs for knowledge, management, use and enhancement by multidisciplinary teams (Parrinello 2019) (Figs. 10 and 11).

Fig. 10. 3D points cloud processing and analysis. Variety of uses of Digital Elevation Models, DEM: from state of the art documentation and analysis for the restauration project toward 3D point cloud comparison with the photographic survey of Monumento à Independência do Brasil in order to develop non-conventional representation for both experienced and inexperienced end users.

In addition to the huge amount of data from an integrated 3D survey, one of the main results of the project was aimed at consolidating the methods of technology transfer to allow teachers and researchers of the FAU USP of São Paulo to manage and implement digital models and complex databases of existing built heritage (Balzani et al. 2017). The research staff of the DIAPReM centre and of the TekneHub laboratory have in fact been engaged, for over five years, in training activities involving professors, researchers, PhD students and students both at the University of São Paulo and at the Department of Architecture of the University of Ferrara. The result of this continuous exchange is the sharing of an integrated multidisciplinary and multiscale approach that allows the results of the research activity conducted on Brazilian cultural heritage to be implemented, explored and enhanced in a continuous way and not exclusively related to international mission activities (Balzani 2020).

3D City Modelling Toward Conservation and Management

1313

Fig. 11. 3D points cloud processing and analysis. Variety of uses of Digital Elevation Models, DEM: geometric and morphological analysis

1314

M. Balzani et al.

Overall, a team of over fifteen scholars, both with academic and professional backgrounds, was involved in the project of historical, digital integrated documentation for the survey of the state of affairs and structural analysis allowing the team in charge of the restoration project to verify, in real time, the hypotheses and spatial design configurations in relation to the actual geometry and morphology of the built architecture.

5 Credits Museu do Ipiranga International cooperation: DIAPReM center, University of Ferrara, UNIFE, TekneHub Technopole of Ferrara; Consorzio Futuro in Ricerca, Ferrara, Italy; Universidade de Sao Paulo, Faculdade de Arquitetura e Urbanismo della Universidade de São Paulo; CPC (Centro de Preservação Cultural) dell’Universidade de São Paulo (USP); Museu Paulista dell’Universidade di São Paulo; “Fundaçao de Apoio a Universidade de Sao Paulo” Sao Paulo, Brasil. UNIFE: DIAPReM/TekneHub Scientific coordinator: Marcello Balzani Project coordinators: Luca Rossato 3D survey coordinators: Guido Galvani 3D survey technician: Daniele Felice Sasso Diagnostic survey: Federica Maietti USP: FAUUSP-CPC USP Scientific coordinator: Beatriz Mugayar Kuhl (FAU USP) Responsible for the Cooperation Agreement: Mônica Junqueira de Camargo (CPC USP; FAU USP); Research group: Renata Cima Campiotto (PhD Student; FAUUSP) Monumento à Independência do Brasil International cooperation: DIAPReM center, University of Ferrara, UNIFE, TekneHub Technopole of Ferrara; Consorzio Futuro in Ricerca, Ferrara, Italy; Mackenzie University; Department of Historical Heritage (DPH) – SP Scientific coordinator: Marcello Balzani Project coordinators: Luca Rossato 3D survey coordinators: Guido Galvani Survey Support Team: Maria Fernanda Torgal Fonseca, Henrique Shoiti (Mackenzie University) Survey Support Team Coordinator: Guilherme Antonio Michelin (Mackenzie University) Elaboration of 3D data: Guido Galvani Leader for support logistics: Valter Caldana (Mackenzie University) Logistics support coordinator: Guilherme Antonio Michelin (Mackenzie University).

3D City Modelling Toward Conservation and Management

1315

References Angelidoua, M., Karachalioua, E., Angelidoua, T., Stylianidisa, E.: Cultural heritage in smart city environments. In: The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2/W5, 2017 26th International CIPA Symposium 2017, 28 August–01 September 2017, Ottawa, Canada, pp. 27–32 (2017) Balzani, M., Maietti, F., Rossato, L.: 3D data processing toward maintenance and conservation. The integrated digital documentation of casa de vidro. In: The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, XLII-2/W9, Proceedings, 8th International Workshop 3D-ARCH 3D Virtual Reconstruction and Visualization of Complex Architectures, Copenicus, Göttingen, Germany, pp. 65–72 (2019) Balzani, M., Maietti, F., Mugayar Kühl, B.: Point cloud analysis for conservation and enhancement of modernist architecture. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. XLII-2/W3, 71–77 (2017) Balzani, M., Raco, F.: Integrated digital models for the representation and diagnosis in existing buildings: the clust-ER BUILD project for the value chain innovation. In: Bolognesi, C.M., Santagati, C. (eds.) Impact of Industry 4.0 on Architecture and Cultural Heritage, Business Science Reference, Advances in Civil and Industrial Engineering, pp. 181–201 (2020) Brusaporci, S., Maiezza, P., Tata, A.: For a cultural-based smart city. In: Rappresentazione Materiale/Immateriale - Drawing as (in) Tangible, 40° Convegno internazionale dei Docenti delle discipline della Rappresentazione, 2018, Gangemi, pp. 73–80 (2018) Emerich, D.: Museu Paulista: 120 Anos de História, Editora Brasileira, São Paulo, Brazil, pp. 200 (2016). ISBN: 9788563186362 European Commision: Innovation in cultural heritage research (2018). https://op.europa.eu/it/pub lication-detail/-/publication/1dd62bd1-2216-11e8-ac73-01aa75ed71a1 Fonseca Brefe, A.C.: O Museu Paulista, Editora UNESP, São Paulo, Brazil, p. 336 (2005). ISBN: 8571395888 Gaiani, M.: A framework for smart cultural heritage. In: New activities For Cultural Heritage. Springer (2017). https://doi.org/10.1007/978-3-319-67026-3_24 Georgopoulos, A.: Data acquisition for the geometric documentation of cultural heritage. In: Mixed Reality and Gamification for Cultural Heritage, pp. 29–74. Springer, Cham (2017) Grilli, E., Menna, F., Remondino, F.: A review of point clouds segmentation and classification algorithms. In: The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2/W3, 2017 3D Virtual Reconstruction and Visualization of Complex Architectures, 1–3 March 2017, Nafplio, Greece, pp. 339–344 (2017) ICCROM: Management guidelines for world cultural heritage sites (2013). www.iccrom.org/it/ publication/managing-cultural-world-heritage ICOMOS: Smart heritage policy (2017). http://openarchive.icomos.org/1834/ Ioannides, M., Fink, E., Brumana, R., Patias, P., Doulamis, A., Martins, J., Wallace, M.: Digital heritage. Progress in Cultural Heritage: Documentation, Preservation and Protection 7th International Conference, EuroMed 2018, Nicosia, Cyprus, October 29–November 3, 2018, Proceedings, Part I. Springer, Berlin (2018) Jokilehto, J.: International trends in historic preservation: from ancient monuments to living cultures. APT Bull. 29(3–4), 17–19 (1998) Kioussi, A., Labropoulos, K., Karoglou, M., Moropoulou, A., Zarnic, R.: Recommendations and strategies for the establishment of a guideline for monument documentation harmonized with existing European standards and codes. J. Geoinform. FCE CTU 6(2011), 178–184 (2011)

1316

M. Balzani et al.

Kioussi, A., Karoglou, M., Bakolas, A., Moropoulou, A.: Integrated documentation protocols enabling decision making in cultural heritage protection. In: Ioannides, M., Fritsch, D., Leissner, J., Davies, R., Remondino, F., Caffo, R. (eds.) Progress in Cultural Heritage Preservation. EuroMed 2012. Lecture Notes in Computer Science, vol. 7616, pp. 211–220. Springer, Heidelberg (2012) Parrinello, S., Dell’Amico, A.: Experience of documentation for the accessibility of widespread cultural. Heritage 2, 1032–1044 (2019). https://doi.org/10.3390/heritage2010067 Parrinello, S., Picchio, F., De Marco, R.A., Dell’Amico, A.: Documenting the cultural heritage routes. The creation of informative models of historical Russian churches on upper kama region. In: The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences Volume XLII-2/W15, 2019 27th CIPA International “Documenting the past for a better future”, 1–5 September 2019, Ávila, Spain, pp. 887–894 (2019) Ramos, M.M., Remondino, F.: Data fusion in cultural heritage a review. ISPRS–Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 5(W7), 359–363 (2015) Weinmann, M.: Preliminaries of 3D point cloud processing. In: Reconstruction and Analysis of 3D Scenes, pp. 17–38. Springer, Cham (2016) Young, H.J., Seonghyuk, H.: Three-dimensional digital documentation of cultural heritage site based on the convergence of terrestrial laser scanning and unmanned aerial vehicle photogrammetry. ISPRS Int. J. Geo-Inf. 8, 53 (2019). https://doi.org/10.3390/ijgi8020053 UCLG: Cultural heritage and sustainable cities key themes and examples in european cities (2018). http://www.agenda21culture.net/sites/default/files/report_7_-_cultural_her itage_sustainable_development_-_eng.pdf UNESCO: Managing cultural world heritage (2013). https://www.iccrom.org/sites/default/files/ 2018-07/managing_cultural_world_heritage_en.pdf United Nations 2020. The SDG partnership guidebook. A practical guide to building high impact multi-stakeholder partnerships for the Sustainable Development Goals. https://sustainabledeve lopment.un.org/PartnershipAccelerator

3D Documentation of Göreme Saklı Church Sümeyye Ertürk(B)

and Leyla Kaderli

Faculty of Architecture, Department of Architecture, Erciyes University, Kayseri, Turkey [email protected], [email protected]

Abstract. Rock-carved churches, which have an important place in the architectural texture of the Cappadocia region, have an important place as cultural assets that show the social, economic and religious structure of the period in which they were built, and should be protected with their historical and aesthetic values. There are many churches and monasteries in the Cappadocia Region, which appeared as an important religious center in the 4th century. Religious sites in Göreme Valley, which is a settlement center for religious communities with monastic life, are dated between the 10th and 11th centuries. Saklı Church, built with the rock carving technique that is the subject of the study, is located on the lower valley, 500 m west of Göreme Open Air Museum. In this study, it is aimed to develop conservation suggestions in order to examine the current status of the Saklı Church, its historical development, architectural documentation studies with the monasteries located in its vicinity, structural deterioration and its causes, and to transfer this cultural heritage safely into the future. The churches, chapels, refectories and other places around the Saklı Church were recorded with aerial photographs taken with the aid of a drone in order to compare the monastery architecture. The site plan of the area was created with three dimensions prepared in computer environment using drone photographs. On the other hand, detailed drawings of the church were created using point cloud data created by digital examination using a 3D laser scanner. Keywords: Cappadocia · Göreme · Saklı church · 3D documentation

1 Cappadocia 1.1 Monasteries in Cappadocia The geographical borders of Cappadocia Region, which is one of the largest states of the Asian continent, have changed many times over the centuries. It extends to Lake Tuz in the west, Kızılırmak and Northern Anatolia Mountains in the north, the Euphrates River in the east and the Taurus Mountains in the south. Today, it includes the east of Ankara, the north of Adana, the south of Yozgat and Sivas, and the cities of Kır¸sehir, Nev¸sehir, Aksaray, Ni˘gde, Kayseri, Malatya [1]. Christians, who worshiped by hiding in deserts in Egypt in the first years of Christianity, secretly worshiped in caves in the Cappadocia Region. The people of Cappadocians built houses, temples, shelters for animals, storage areas and tomb structures with rock © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 1317–1331, 2021. https://doi.org/10.1007/978-3-030-66840-2_100

1318

S. Ertürk and L. Kaderli

carving technique in the current topography. With the spread of Christianity in the region, they built churches, monasteries and lavras in religious functions to retreat to seclusion [2]. A.D. In the 300’s, Cappadocia Region became one of the important pilgrimate centers of Christianity with the influence of monasticism. The clergy retreated away from people and hesitated to go to extremes in activities such as eating, drinking and dressing. In order to be closer to God, distant and high rocky areas have been chosen. In the topographic structure of the region, the presence of easily workable and durable tuffs creates an easy place to live; since the valley slopes are suitable for hiding, it has been a region where various cultures, states and societies live. Cappadocia Göreme Valley is one of the areas where early Christian individuals and communities in Anatolia settled. The reason for this is the church fathers1 such as St. Basil, Gregorius of Caesarea and Gregorios of Nazianzus who are active in the region [3]. Göreme Valley. It is a settlement center that includes many religious sites carved into the rock, located 2 km east of Göreme Town, 13 km from Nev¸sehir. Göreme Valley and its surrounding have the same topographic structure as the Cappadocia Region in general. The main material of all structures of monastery settlements is soft ignimbrite type tuffs, which are composed of volcanic tuffs and can be processed easily. The fairy chimneys, which form the surface shapes, greatly influenced the monastic architecture developing in the valley. The land structure of Göreme Valley2 has been formed as a result of volcanic shakes and erosion. In addition to deep and narrow valleys, flood deposits from “fairy chimneys”, a special form of erosion [4]. Surface sections have greatly affected its architecture. The settlement in the Cappadocia Region was formed by carving into the valley slopes or into the monolith rock mass in harmony with the topographic structure [5]. There are many rock-carved monasteries, churches and chapels of different sizes in the Göreme Valley. There are various sections such as refectory, rooms for monks, cellar, kitchen and chapels in the monasteries, which are carved into the rock. Written sources such as endowments, inventories or letters or letters about rock-carved churches and monasteries are unknown. Therefore, it is dates with the help of the architectural elements of the churches in the monasteries and the stylistic and iconographic features of the fresco decorations [4]. Researchers presenting different opinions about the churches in Göreme Valley date the churches to different periods. Since there is no inscription dating back to the 10th century, dating the wall paintings of churches built up to that period is also controversial 1 Basileios of Caesarea, his brother Gregorius and his close friend Gregorius Nazianzus were the

most important Christian theologians of Cappadocia at that time [6]. 2 Göreme Valley was declared a pilot area for conservation and restoration applications in 1972

by ICCROM and UNESCO. With the rescue efforts that started in 1973, protection and repair practices were carried out at different scales in the Saint Barbara Church, Çarıklı Church, Elmalı Church, Dark Church, Kar¸sı Church, Kılıçlar Church, Saklı Church, Tatlarin Church and Tokalı Church in the Cappadocia Region [7].

3D Documentation of Göreme Saklı Church

1319

[8]. The structures in the Cappadocia Region were dated by comparing the wall paintings and manuscripts of the period, whose chapel numbers and names were introduced [9] (Fig. 1).

Fig. 1. Göreme Valley and Saklı Church [10]

Göreme Valley Monasteries. The largest monastic formations in the Cappadocia Region are located in Göreme Valley. No structure that can be described as a house carved into the rock was found in the valley. Therefore, it was concluded that the valley is an area for monastic life. These monastic spaces and structures are located in two main points3 of Göreme Valley [12]. The upper valley, which is now called Göreme Open Air Museum; is located approximately 2 km east of Göreme Town to the west of Göreme-Ortahisar road. It contains 3 These points are the Lower Valley, where structures dating from the 9th century to the middle of the 10th century, and the Upper Valley, where the structures dated to the 11th and 13th centuries

are located [12].

1320

S. Ertürk and L. Kaderli

many religious sites carved into the rock dated to the 11th and 13th centuries and is considered as one of the largest monastic formations in the Cappadocia Region [11]. Complexes with a church inside, which serve various purposes for collective life, are called monasteries. Churches, most of which have cafeterias in their immediate vicinity, are named as complexes. The Lower Valley, whose formations are generally dates to the 10th century, is located 500 m west of the Göreme Open Air Museum. Ten churches, including Saklı Church and Tokalı Church, in the lower valley are considered in the first group [12].

2 Göreme (Lower) Valley Monasteries 2.1 Location and of History Saklı Church Saklı Church and surrounding monasteries are located in Göreme (Lower) Valley. A large number of refectories were built within the rocky texture to the north of the Saklı Church. The refectories numbered 2E, 2F, 2G, 2H in Saklı Church and its surroundings were carved into the slope of the rock mass. The entrances of the rock carving spaces, which are placed in harmony with the natural structure of the topography, are located in the east and west directions of the rock mass. Spaces placed in different directions are placed on different levels. No precise rules have been established for the location of the monasteries in the area where the Hidden Church is located. The church was taken to the holy center and other places were produced around it, and architectural plan schemes were created according to the needs and topographic features of the regions where they are located [13]. Saklı Church (Göreme 2a, Church of John the Baptist, Hagios Ioannes Church) is located in the northeast of El Nazar Church, on the slope of the rock mass between Göreme and Ortahisar overlooking Uçhisar (Fig. 2).

CHAPEL2E D 2H C B

2G

2A (SAKLI CHURCH) 2 (2D) CHURCH A

Fig. 2. Saklı Church and Monastery buildings settlement view

The entrance of Saklı Church was closed years ago as a result of the landslide caused by flood waters, and it was named after discovered by accident in 1957 [14] (Fig. 3). Transportation is also provided by climbing up 250–300 m from the pathway between the rocks in the south of Göreme - Maçan road. In addition, transportation is provided by car using a dirt road from Ortahisar (see Fig. 4).

3D Documentation of Göreme Saklı Church

1321

CHAPEL2E 2F C SAKLI CHURCH D 2H 2G

2 (2D) B

A

Fig. 3. Saklı Church and monastery buildings settlement plan

CHAPEL 2H

C

2E

2F

SAKLI CHURCH

D 2G 2 (2D) B

A

Fig. 4. Saklı Church and monastery buildings situation plan

2.2 Plan Features of Saklı Church When the architectural plan types of the churches in Göreme are examined, there are five types: single-nave churches, two-nave churches, three-nave basilicas, free cross planned churches, and closed Greek Cross planned churches. The single nave plan type is a plan type used in small churches and chapels in the region. Since the main space consists

1322

S. Ertürk and L. Kaderli

of a simple rectangle, this type of rock was widely used in churches, ideal for small communities scattered in the countryside [15]. Saklı Church, one of the single-nave churches in the Type 1 b group, has a rectangular plan [4]. This plan type, examples of which are seen in Mesopotamia, was seen in Christian architecture in Syria in the 5th and 6th centuries. In this plan type seen in Saklı Church, St. Basil’s Chapel, Yılanlı Church, Yeni Tokalı Church and Kılıçlar (Ku¸sluk) Church in Göreme, the main space is in the form of a transverse rectangle in the northsouth direction and the apse is located on the long side of the rectangle [15]. The single-nave rock-carved Saklı Church consisting of a narthex and a naos was built with a transverse rectangular plan and three apses [14]. The naos and the narthex are divided into two by an arcade (two columns and three arches). Except for the apses, the church is approximately 5 m long and 7 m wide. The long side of the naos is arranged as an apse. The middle apse is wider than the lateral apse. The walls of the church, which is flat on the west and vaulted on the east, were roughly carved. There are cross and geometric decorations on the flat ceiling (Fig. 5).

Fig. 5. Saklı Church settlement plan

3D Documentation. The values of cultural assets should be documented in terms of transferring, continuity and protection to the future. The plan, section and views of the building are drawn by drawing the survey, restitution and restoration projects in accordance with the documentation principles [16]. Today, by using advanced technologies instead of traditional methods in the documentation process, data are obtained in a shorter time and with reliable results [17]. In the past, documentation studies of the Hidden Church have been done with simpler techniques. However, it is very difficult to

3D Documentation of Göreme Saklı Church

1323

document the places made in the rock carving technique in the region in realistic dimensions and texture, the height in the places where access is not possible, the depth and plan directions in the spaces [21]. Planned spaces to the extent permitted by topography, traditional methods and measurement techniques (heights, elevations, depth and length and diagonal measurements taken by meter) only allow to produce a general sketch far from measuring every point of the space (see Fig. 11–12). Tools, methods and techniques used with new developing technologies have been developed by using documentation methods that have started to be used in many different fields and disciplines. Nowadays, more detailed data can be reached by digital detections of drone and Lidar photographs and photogrammetric shots in scans made with laser scanning devices that can transfer each point of the interior of the spaces with the near environment.4 In addition, ground scans made with geo-radar or electrodes provide the opportunity to reach invisible data underground. Besides, the data obtained are recorded as computer-based documents. Also, thermal cameras, which allow to detect the destruction of the building, and the tests for the materials and strengths carried out in the laboratory environment support and detail these documentations (Figs. 6 and 7).

Fig. 6. Control point determination

Fig. 7. Drone shooting

In this context, the documentation studies of Saklı Church were supported by the Scientific Research Projects Unit of Erciyes University with the project coded FYL2020-9762. The work carried out by Altınoran Technical Workshop in the field by aerial drone shooting for 20 min and laser scanning which took 4 h in and around the structure and could reach every point. Drone shooting and satellite-aided GNNS-GPS measurements were made with 5 control points5 placed in the valley. (see Fig. 13). The valley where the other churches, chapels, refectories and other places around the Hidden Church are located was recorded with 466 aerial photographs taken with a drone at 2349 m (Figs. 8 and 9). Using drone photographs of the scanned 7-hectare area, a 3D

4 The architectural documentation process allows the current state of the building to be determined

using precise measurement techniques. The 3D laser scanning used allows more precise measurements and easy recording than traditional methods by converting the rays reflected on the structure into point cloud data [18]. 5 Each point is aligned in the coordinate system and temporarily marked on the field for scanning and scanning with GPS [22].

1324

S. Ertürk and L. Kaderli

DEM map and site plan were created with the point cloud data prepared in with Agisoft Metashape program.6 (see Fig. 10) (Table 1).

Fig. 8. Camera locations and error estimates

Table 1. Control points X – Easting, Y – Northing, Z – Altitude.

6 It enables the production of 3D models with techniques that greatly eliminate measurement

difficulties by facilitating field study and data collection [19].

3D Documentation of Göreme Saklı Church

Fig. 9. Reconstructed digital elevating model

Fig. 10. Lower valley DEM map

1325

1326

S. Ertürk and L. Kaderli

Fig. 11. GPS measurement

Fig. 12. Laser scanning

Fig. 13. Saklı Church ceiling view

Thus, with the use of the laser scanner (Faro Focus S150), better information was obtained about the geometric, volumetric dimensions of the rock church. By processing the data obtained with laser scanners and drone photographs on the computer, the detailed traditional survey and current situation drawing of Saklı Church were determined and a 3D model reality-based on reality was created. The laser scanner with the feature of taking HDR photographs, the 3D model created without contact with the point cloud data, the murals were selected, in real scale and in color. It is a 3D model consisting of a

3D Documentation of Göreme Saklı Church

1327

series of point clouds that preserves actual values (with a maximum error of 3 mm).7 (see Fig. 14–15). All sections were taken that allow scaled drawings of the wall paintings in the church. Detailed survey drawings of the building were prepared in computer using the scaled perspective photographs obtained from the data, through all measurement and documentation utilities (Figs. 16, 17, 18, 19, 20, 21, 22, 23 and 24).

Fig. 14. Orthophotos of Saklı Church southwest view

Fig. 15. Orthophotos of Saklı Church southeast view

7 The recorded models provide virtual visibility in the digital database, and the perception of the

architectural structure. At the same time, scaled models are produced by 3D printing method and wall paintings can be displayed on the model with the augmented reality system [20].

1328

S. Ertürk and L. Kaderli

Fig. 16. Saklı Church plan [9]

Fig. 17. Saklı Church plan and section [14]

Fig. 18. Saklı Church plan

3D Documentation of Göreme Saklı Church

Fig. 19. Saklı Church A-A section

Fig. 21. Saklı Church B-B section

Fig. 23. Saklı Church view

1329

Fig. 20. Saklı Church A-A perspective

Fig. 22. Saklı Church B-B perspective

Fig. 24. Saklı Church west view

3 Conclusions The monastic structures within the natural protected area encounter many internal and external problems and face the danger of deterioration. These structures, arranged within the rock formations to the extent permitted by the topography, are affected by the climatic and natural conditions. Natural and human-induced damages are increasing day by day and causing irreversible damage to the building. Necessary documentation studies and protection measures should be carried out to ensure the continuity of natural and cultural assets. For this purpose, documentation studies have been carried out with the help of various current technologies within the scope of the research. Most of the time, it is not possible to explain the physical values of cultural assets with the data obtained by traditional methods. In this context, the combination of traditional methods and advanced technologies increases the reliability, precision and easy archiving possibilities in documentation processes. These methods, which facilitate land working

1330

S. Ertürk and L. Kaderli

conditions, shorten the number of people working on the land and the working process However, the method that requires qualified personnel for the process of processing and combining data takes some time with today’s technological possibilities. The works that offer 3D modeling opportunity are obtained architectural data with a very low margin of error in order to transfer and introduce the cultural heritage to the future. At the same time, the perception of the building facilitates the expression of its location, importance and architectural value. The digital data collected from the Saklı Church and the monastery buildings with the work carried out allows to analyze the structure of the building in its period, namely restitution, for protection measures, as well as the current documentation. In the analytical survey study, materials, deteriorations, periodic changes were conveyed on the prepared survey drawings and the proposed interventions were specified. As a result of the analysis, the damages in the structure were determined. At the same time, research projects for improvement, restitution proposals and determination of the causes and types of destructions and the development of restoration interventions are facilitated. These documentation methods allow us to fully perceive the topography of the building’s surroundings and the shape and location of the rockcarved church. However, with the study, the data of the rock-carved church at risk can be recorded with 3D space reconstructions. Easily accessible computer-based data can be accessed at any time for control purposes. With this study, the current status of the Hidden Church has been documented with various architectural representation methods and the opportunity to transfer our cultural heritage to the future has been provided by developing preservation proposals.

References 1. Hild, F., ve Restle, M.: Kappadokien, Tabula Imperii Byzatini I-II, Wien (1981) 2. Rodley, L.: Cave Monasteries of Byzantine Cappadocia. Cambridge University Press, Londra (1985) 3. Mitchell, S.: Anatolia, Land, Men, and Gods in Asia Minor, vol. I ve II. Oxford University Press, Oxford (1993) 4. Ötüken, Y.: Göreme, Kültür Bakanlı˘gı Yayınları, Ankara (1987) 5. AnaBritannica, AnaBritannica Genel Kültür Ansiklopedisi 4.- 9.- 12. Cilt Hürriyet Yayınları, ˙Istanbul (1986) 6. Vasiliev, A.A.: Bizans ˙Imparatorlu˘gu Tarihi. (Tevabil Alkaç). Alfa Basım Yayım Da˘gıtım San. ve Tic. Ltd. Sti. ¸ ˙Istanbul (2017) 7. Co¸skuner, B.: Göreme Kılıçlar Kilisesi Duvar Resimlerinin ˙Ikonografisi. Hacettepe Üniversitesi Sosyal Bilimler Enstitüsü, Sanat Tarihi Anabilim Dalı, Yüksek Lisans Tezi, ˙Istanbul (2002) 8. Epstein, A.W.: Rock-cut chapels in Göreme Valley, Cappadocia: the Yılanlı group and the column churches. Cahiers Archéologiques 24, 115–136 (1975) 9. Restle, M.: Byzantine wall painting Asia minor, I–III, Recklinghausen, Vienna (1967) 10. Giovannini, L.: Arts of Cappadocia. Nagel Publishers, Geneva (1971) 11. Jerphanion, G.: Une Nouvelle Province de l”art Byzantin, Les Eglises Rupestresde Cappadocia. I–V, Paris (1925–1942) 12. Teteriatnikov, N.: Monastic settlements in Cappadocia: the case of the Göreme valley. In: Mullett, M., ve Kirby, A. (eds.) Work and Worship at the Theotokos Evergetis 1050–1200, pp. 21–47. Belfast Byzantine Texts and Translations (1997)

3D Documentation of Göreme Saklı Church

1331

13. Koch, G.: Erken Hristiyan Sanatı, (Ay¸se Aydın). Arkeoloji ve Sanat Yayınları ˙Istanbul (2007) 14. ˙Ip¸siro˘glu, M.S, ¸ Eyubo˘glu, S.: Saklı Kilise. Yenilik Basımevi, ˙Istanbul (1958) 15. Akyürek, E.: M.S. IV.–XI. Yüzyıllar: Kapadokya’daki Bizans. In: Sözen, M., et al. (eds.) Kapadokya, pp. 226–395. Ayhan Sahenk ¸ Vakfı, ˙Istanbul (2000) 16. Rafanelli, F.: The complex of St. Daniel in Göreme, Cappadocia. In: Proceedings of the 19th International Conference on Cultural Heritage and New Technologies, CHNT 19, 2014, Vienna, vol. 15 (2015) 17. Özkul, D.: Ileri Belgeleme Tekniklerinin Mimari Belgeleme Sürecinde Kullanımı, pp. 62–85. Mimari Korumada Güncel Konular (2010) 18. Haddan, N., Ishakat F.: 3D laser scanner and reflectorless total station: a comparative study of the slots of El-Khazneh at Peta in Jordan. In: XXI International CIPA Symposium, Atina (2007) 19. Salonia, P., Bellucci, V., Scolastico, S., Marcolongo, A., Messina, T.L.: 3D survey technologies for reconstruction, analysis and diagnosis in the conservation process of cultural heritage. In: XXI International CIPA Symposium, Atina (2007) 20. Verdiani, G.: Bringing impossible places to the public: three ideas for Rupestrian churches in Göreme, Kapadokya utilizing a digital survey, 3D printing, and augmented reality, pp. 131– 143 (2015) 21. Carpiceci, M., Inglese, C.: Laser scanning and automated photogrammetry for knowledge and representation of the Rupestrian architecture in Cappadocia: Sahinefendi and the open air museum of Göreme. In: Proceedings of the 42nd Annual Conference on Computer Applications and Quantitative Methods in Archaeology, Paris, pp. 87–95 (2015) 22. Andaloro, M., Bixio, R., Crescenzi, C.: The complex of S. Eustachius in Göreme, Cappadocia reading the relationship between the landscape and a very articulated underground settlement. In: International Conference on Cultural Heritage and New Technologies, Vienna (2013)

Appropriateness of Using CityGML Standard Version 2.0 for Developing 3D City Model in Oman Khalid Al Kalbani(B) and Alias Bin Abdul Rahman 3D GIS Research Lab, Faculty of Built Environment and Surveying, Universiti Teknologi, Malaysia (UTM), 81310 Skudai, Johor Bahru, Malaysia [email protected], [email protected]

Abstract. This paper investigates the appropriateness of using CityGML version 2.0 for developing 3D city model in Oman. Some countries have started using CityGML standard (version 1.0 and version 2.0) to establish its 3D city model. Despite the fact that the CityGML standard is an important initiative for exchanging 3D city objects, it is still not fully supportive of the 3D city model requirements. To investigate problems that might face the implementation of the upcoming 3D city model in Oman using the CityGML standard version 2.0, the state of current CityGML standard implementation has been evaluated in selected countries (Germany, the Netherlands, Turkey, Singapore), which are considered significant in CityGML standard practicing. The degree of CityGML standard implementation has been investigated using a questionnaire that was designed based on a 5-point Likert scale. Moreover, in order to investigate the data structure issues and challenges based on CityGML standard, the study has carried some experiments. The study has used geospatial tools and databases such as FME, PostgreSQL-PostGIS and 3DCityDB to generate small-scale 3D city model. The survey results reveal that the degree of CityGML implementation in the countries surveyed focused on building model applications more than other CityGML models. Furthermore, the result of the experiments at the CityGML data structure shows that there are issues and challenges that need to be addressed. We expect this paper can prompt a better vision for upcoming CityGML standards and 3D city model. Keywords: 3D city model · CityGML

1 Introduction Currently, the 3D city model and its related geospatial activates are considered as a good geospatial platform for standardizing, storing, representing, and sharing 3D geospatial data at the national level [1–5]. Moreover, the capability of the current 2D GIS application still has its limitations to address complex spatial data structures and solve its issues [6, 7]. Hence, the 3D geospatial data and 3D city model solution have been suggested to facilitate the management of the urban area complex infrastructure such as multi-floor buildings and underground utilities [3, 8]. Several 3D city model initiatives that employed © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 1332–1343, 2021. https://doi.org/10.1007/978-3-030-66840-2_101

Appropriateness of Using CityGML Standard Version 2.0

1333

CityGML version 1.0 and version 2.0 emerged in Germany, Canada, the Netherlands, Switzerland, Austria, Finland and the United States of America for surface infrastructure management [6, 9–13]. Obviously, in these countries, most 3D city models that were designed based on using CityGML standard version 2.0 focused on specific models such as buildings, roads, parking lots, water, terrain at LoD1 and LoD2 [3, 14, 15]. CityGML standard is an important initiative for the 3D city model, but it is still not fully supportive of the city’s infrastructure requirements [8]. Despite improvements in CityGML standard starting from version 1.0 to version 3.0 including the creation of new thematic modules for representing the bridges and tunnels in version 2.0, developing new construction objects, revising the CityGML LoD concept and increasing interoperability with other standards in the upcoming version 3.0 [11, 16–19]. Still, there are issues that need to be solved to use CityGML for the 3D city model at the national level in an effective way [8]. In fact, developing a spatial data structure for the 3D city model is a complicated task [6, 12, 15]. Therefore, to explore the issues that can be an obstacle in integrating CityGML standard and the 3D city model, the present study has investigated the current CityGML schema(s) and LoD(s) implementation in selected countries (Germany, the Netherlands, Turkey, Singapore) and its related spatial data structure issues. Besides, to achieve the study outcomes, the study has applied the statistical approach by designing an e-questionnaire based on a five-point Likert scale, which can be analysed by the Statistical Package for the Social Sciences (SPSS). Also, the study has carried some experiments to investigate the spatial data structure issues based on CityGML standard by using FME, PostgreSQL-PostGIS and 3DCityDB. In fact, the present paper links the statistical method and experiments to have a better understanding of the current state-ofthe-practice of CityGML version 2.0 in terms of the schema(s) and LoD(s). Hence, the degree of the current CityGML implementation reflects practical challenges that need to be highlighted and solved. Also, the results of the experiments explained the nature of spatial data structure challenges that may face 3D city model using CityGML version 2.0 at national coverage. This paper is organised in seven sections, where the Sect. 2 reviews the related works. Section 3 discusses the need for 3D city model in Oman. Then, Sect. 4 reviews the CityGML standard. The methodology and the statistical approach have been explained in the Sect. 5. The Sect. 6 includes the discussion and outcomes of the study. Finally, the Sect. 7 concludes the paper.

2 Related Works Many researchers have investigated the problems of applying CityGML from a technical and programming perspective. Studies in this field have also dealt with some practical applications to develop 3D city model for some countries based on their requirements and specifications [15]. Stoter et al. [3, 10, 15, 20] proposed a generic approach for 3D SDI in the Netherlands and discussed the challenges that faced 3D modeling based on CityGML at national coverage, while Kolbe et al. [11, 19] investigated how to establish unified 3D city models and the virtual 3D city model of Berlin, both of which helped to enhance CityGML features and make it more useable. Moreover, Soon and Khoo [5] investigated

1334

K. Al Kalbani and A. B. Abdul Rahman

the implementation of CityGML in modeling Singapore 3D national mapping and discussing the capability of implementing CityGML for the government geospatial core business. Chaturvedi et al. [21] proposed an approach to extend CityGML for managing different versions of a 3D city model in a unified dataset based on interoperable solutions. In this context, Al Kalbani and Abdul Rahman [1, 8] investigated the issues and challenges for implementing 3D Spatial Data Infrastructure (3D SDI) in Oman for surface and subsurface spatial objects based on the CityGML standard. Also, Agugiaro [22] discussed the process of creating a virtual model for the city of Vienna based on using CityGML and solving data structure integration issues. Preka and Doulamis [23] explored the issues for creating 3D building modeling in a relational database at LoD2 in the municipality of Kaisariani, Athens. Obviously, these studies have provided solutions to fill some of the gaps identified in the application of the 3D city model based on using CityGML. This paper contributes in showing the current state-of-the-practice of CityGML version 2.0 in terms of the schema(s) and LoD(s) in a statistical way. Besides, it presents some of the challenges that may face the 3D city model implementation at the national level based on using CityGML for the spatial data structure.

3 The Need for 3D City Model in Oman The geospatial workflow in Oman’s geospatial community is limited to 2D and 2.5D geospatial data [1, 24–26]. Until the day, there is no clear vision to integrate Oman spatial data infrastructure (SDI) with 3D geospatial initiatives [1]. On the other hand, Oman is one of the developed countries which has a complex urban infrastructure [8]. Hence, the decision-maker in Omani municipalities requires a 3D city model to manage complex problems. Furthermore, the 3D city model is considered as a significant investment for future applications such as 3D cadastral applications, e-government and 3D smart cities [8].

4 CityGML Standard CityGML is an open XML file format for exchanging, storing and representing 3D objects [27]. Moreover, the CityGML initiatives have been developed by the Special Interest Group3D (SIG 3D), and it is organized now by Open Geospatial Consortium (OGC) [27]. CityGML (version 2.0) includes 13 models to store spatial objects and five levels of detail (LoD) [6, 20, 28]. On the other hand, the standard of CityGML mainly focuses on the spatial perspective and presents the most common spatial objects that can be found in cities such as buildings, bridges, roads and others [27, 29]. Also, CityGML standard mainly focuses on the building model-schema more than other schemas [8]. Additionally, the concept of CityGML standard allows decomposing the spatial objects to sub-objects [6, 9, 14, 20, 30–32]. Furthermore, the structure of CityGML file format is developed based on a hierarchical structure, both for geometric and semantic information [27]. On the other hand, different geospatial solution providers have integrated their GIS products with CityGML extension for reading, writing and viewing in CityGML format [8]. What is more, CityGML has been supported with some solutions such as PostgreSQL-PostGIS,

Appropriateness of Using CityGML Standard Version 2.0

1335

Oracle, 3D City Database (3D City DB) and Georocket for providing database structure, which is suitable for CityGML standard [8, 33]. Now, there are various spatial applications such as solar potential estimation, flood risk assessment and noise monitoring for CityGML standard [6, 20, 23, 34].

5 Methodology The study has designed an e-questionnaire to figure out the state of the current CityGML standard and the degree of the implementation in terms of CityGML schema(s) and LoD(s). The e-questionnaire was designed based on using a matrix table question. The statistical analysis covered 9 CityGML schemas (relief, building, transportation, bridge, waterbody, city furniture, vegetation, tunnel, landuse) and related 45 LoD(s) at LoD (0–4). In order to record the responses for each LoD(s) at the level of each CityGML schema(s), the study has used a Likert scale with five responses (highly implemented, moderately implemented, slightly implemented, poorly implemented, not implemented) (see Fig. 1). The questionnaire was sent to 4 specialists in 4 selected countries such as Germany (Prof. Dr. Thomas H. Kolbe), the Netherlands (Prof. Dr. Jantien Stoterfor), Turkey (Prof. Dr. Ismail Buyuksalih) and Singapore that have a valuable background and rich history of practicing CityGML or implementing projects using CityGML standard. In the case of Singapore, as no response has been received from the competent authority, the paper analysed the study of Soon and Khoo (2017) [5] regarding CityGML modeling for Singapore 3D national mapping and converted it into numerical data that were included in the statistical analysis. The SPSS packages were applied to analyse the data and to calculate the mean (M) and the standard deviation (SD). Moreover, the study has used the interval (Mean Range) to arrange the mean (M) in descending order (from high to low values). In order to investigate further the 3D spatial data structure issues and challenges based on using CityGML standard, the study has carried some experiments in a pilot area in Oman (Al Seeb). Besides, the study has collected the data from the related geospatial agencies in Oman (2D, 2.5D geospatial data). Geospatial tools and databases such as FME, PostgreSQL-PostGIS and 3DCityDB were used to generate a small-scale 3D city model for 3D surface spatial objects and to explore the spatial data structure challenges based on using CityGML standard (see Fig. 2). The experiments investigated many issues and challenges related to 3D city model data structure, including the complexity of 3D spatial data structure, 3D spatial data structure standards, metadata, and others. Since CityGML version 2.0 does not fully support subsurface objects [8], the study generated its 3D objects such as pipeline networks and geological strata to study the integration challenges between the surface and subsurface spatial objects in the 3D city model data structure.

1336

K. Al Kalbani and A. B. Abdul Rahman

Fig. 1. Recording the responses for each LoD(s) at the level of each CityGML schema(s) in e-questionnaire: building schema as an example

Fig. 2. Construction of a small-scale city model based on CityGML standard version 2.0

6 Discussion and Results The implementation of CityGML version 2.0 in the countries surveyed provides a significant indication and better understanding of the current state-of-the-practice of CityGML version 2.0. The result of the e-questionnaire was designed to shows the overall mean of

Appropriateness of Using CityGML Standard Version 2.0

1337

Table 1. The degree of CityGML implementation in terms of the schema(s) and LoD(s) ranked from the highest LoD(s) mean to the lowest Likert scale

Description

5 4 3 2 1

Mean Range

Highly implemented Moderately implemented Slightly implemented Poorly implemented Not implemented

(4.21-5) (3.41-4.20) (2.61-3.40) (1.81-2.60) (1-1.80)

Mean: (M) Standard Deviation: (SD) Ranking: (R) (R)

Schema(s)

LoD(s)

Total

(M)

(SD)

1

Building

LoD1

4

4.75

0.500

2

Building

LoD2

4

4.50

1.000

3

Building

LoD0

4

4.25

1.500

4

Relief

LoD0

4

2.75

0.500

5

Transportation

LoD0

4

2.75

0.500

6

Bridge

LoD1

4

2.75

0.957

7

WaterBody

LoD1

4

2.75

0.957

8

Vegetation

LoD1

4

2.75

0.957

9

Building

LoD3

4

2.50

0.577

10

Transportation

LoD1

4

2.50

0.577

11

CityFurniture

LoD1

4

2.50

1.291

12

Tunnel

LoD1

4

2.50

1.000

13

Bridge

LoD0

4

2.25

0.500

14

Tunnel

LoD2

4

2.25

1.500

15

Landuse

LoD0

4

2.25

0.500

16

Relief

LoD1

4

2.00

2.000

17

Relief

LoD2

4

2.00

2.000

18

Bridge

LoD2

4

2.00

1.414

19

Tunnel

LoD0

4

2.00

0.000

20

WaterBody

LoD0

4

2.00

0.816

21

Landuse

LoD1

4

2.00

0.816

2245

The rest of schema(s)

The rest of LoD(s)

4

Interval

Percentage (%)

4.21-5

6.7%

2.61-3.40

11.1%

1.81-2.60

28.9%

1-1.80

53.3%

1338

K. Al Kalbani and A. B. Abdul Rahman

each of the 5 LoD(s), in each of the 9 CityGML schema(s) based on the experts’ point of view in the 4 selected countries. Despite each country that was surveyed varied in the implementation of CityGML in some schema(s) and LoD(s), Table 1 shows that the overall implementation of the CityGML schema(s) and LoD(s) in these four countries mostly concentrated on building schema(s) between (LoD0) to (LoD2), which is classified as highly implementable with a percentage of 6.7% of all LoD(s) and related schema(s). Also, the result shows that the rest of the CityGML LoD(s) and related schemas are classified as slightly implemented (11.1%), poorly implemented (28.9%), and not implemented (53.3%), especially at LoD3 and LoD4. On the one hand, the reason that the CityGML models at LoD1 and LoD2 are more implemented is that they can be easily extruded by using objects’ basic data (footprint) and the height value obtained from the table attribute or from the photogrammetry and the LiDAR process. On the other hand, challenges in creating 3D models at LoD3 and LoD4 are due to the need for rich data and complex processors, especially for national coverage. On the other hand, the statistical analysis in the countries that were surveyed indicated the CityGML relief model and CityFurniture model implementation face challenges in real practice at national coverage. Thus, these two models need more investigation in terms of definition, data structure, and integration with other objects in the city model. The statistical analysis raises questions about the feasibility of defining real-world phenomena like vegetation in CityGML LoD 3 and 4 that are constantly growing, where results indicate that these models at LoD 3 and 4 are difficult to implement because of their ineffectiveness in the 3D city model. The experiments are still ongoing, and the initial results have demonstrated that the CityGML standard and its data structure are still facing some issues in real practice. Most of these challenges are related to the complexity of 3D spatial data structure, 3D spatial data structure standards, metadata for 3D spatial data structure, 3D spatial data structures exchange, quality of 3D spatial data structure design, geometric representation, indexing 3D spatial data structure, arranging the 3D spatial data structure based on class(s), schema(s), and LoD(s), compatibility with database features, homogeneity of 3D spatial data structure, support RDBMS, handling 3D geospatial queries, a spatial reference system (SRS), support 3D spatial data structure operations, capability(s) of using the 3D spatial data structure in advanced database applications, use of the 3D spatial data structure in large-scale and 3D spatial data structure archiving. Besides, topology issues can play a role in the difficulty(s) of handling 3D queries and executing the advanced geospatial analysis. The final remarks of the experiments and statistical analysis show that CityGML standard version 2.0 can be employed to create an Omani 3D city model for some applications at LoD0, LoD1 and LoD2. Some examples of these applications are to support smart city activities, 3D flood risk assessment (see Fig. 3), 3D registration of multi-level property rights (see Fig. 4), managing contingency plans (see Fig. 5), and integration between the surface and subsurface of 3D city objects (see Fig. 6). Nevertheless, there are still some issues and challenges in the CityGML standard that need to be tackled, as mentioned above. Also, Omani SDI needs to develop its own solutions to use the CityGML environment.

Appropriateness of Using CityGML Standard Version 2.0

Fig. 3. Flood risk assessment in 3D (Al Seeb)

Fig. 4. Registration of multi-level property rights in 3D (Al Seeb)

1339

1340

K. Al Kalbani and A. B. Abdul Rahman

Fig. 5. Managing contingency plans (Al Seeb)

Fig. 6. Integration between the surface and subsurface 3D city object (Al Seeb)

Appropriateness of Using CityGML Standard Version 2.0

1341

7 Conclusion This paper has investigated several issues related to establishing a 3D city model based on CityGML version 2.0 for Oman. The previous work shows that the current standard (version 2.0) of CityGML is still in progress and there are issues and challenges that need to be addressed. Also, this paper indicates that 3D city models inevitably are needed in a country like Oman for the next generation of geospatial and city planning and other applications. This particular piece of research work will greatly enhance the level of GIS awareness, especially in the 3D city model for Oman. Furthermore, it will help to establish a framework for the 3D SDI, which adopts 3D geospatial data issues, challenges and needs of the Oman geospatial community. In the future, based on the outcomes, we would like to investigate the appropriateness of using CityJSON solutions for developing the upcoming 3D city model and 3D SDI in Oman. Acknowledgments. The authors would like to thank Prof. Dr. Jantien Stoterfor (TU Delft), Prof. Dr. Thomas H. Kolbe (Technical University Munich) and Prof. Dr. Ismail Buyuksalih (Bimtas) for their generous participation in the CityGML questionnaire. We would also like to thank eng. Najd AL-Hanahnah (Database Management and GIS Officer at Blumont-Jordan) for his support.

References 1. Al Kalbani, K., Abdul Rahman, A., Al Awadhi, T., Alshannaq, F.: Development of a framework for implementing 3D spatial data infrastructure in Oman – issues and challenges. ISPRS Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. XLII-4/W9, 243–246 (2018). https:// doi.org/10.5194/isprs-archives-xlii-4-w9-243-2018 2. Abdul-Rahman, A., Alizadehashrafi, B., Coors, V.: Developing a framework for Malaysian 3D SDI. In: 5th International 3D, vol. XXXVIII, pp. 3–8 (2010) 3. Stoter, J., Van Den Brink, L., Vosselman, G., et al.: A generic approach for 3D SDI in the Netherlands. Lecture Notes Computer Science, pp. 1–22 (2010) 4. Salim, M.J.: 3D spatial information intended for SDI: a literature review, problem and evaluation. J. Geogr. Inf. Syst. 9, 535–545 (2017). https://doi.org/10.4236/jgis.2017.95033 5. Soon, K.H., Khoo, V.H.S.: CityGML modelling for Singapore 3D national mapping. ISPRS - Int. Arch. Photogram. Remote Sens. Spat. Inf. Sci. XLII-4/W7, 37–42 (2017). https://doi. org/10.5194/isprs-archives-xlii-4-w7-37-2017 6. Biljecki, F.: Level of details in 3D city models. Published Ph.D. thesis. Delft University of Technology (2017) 7. Stoter, J.E.: 3D Cadastre. 5th L Adm Domain Model Work (2013) 8. Al Kalbani, K., Abdul Rahman, A.: Integration between surface and subsurface spatial objects for developing Oman 3D SDI based on the CityGML standard. Int. Arch. Photogram. Remote Sens. Spat. Inf. Sci. XLII-4/W16, 79–84 (2019). https://doi.org/10.5194/isprs-archives-xlii4-w16-79-2019 9. Stoter, J., Ploeger, H., Roes, R., et al.: First 3D cadastral registration of multi-level ownerships rights in the Netherlands. In: 5th International Workshop on 3D Cadastres, pp. 491–504 (2016) 10. Stoter, J., Roensdorf, C., Home, R., et al.: 3D modelling with national coverage: bridging the gap between research and practice. In: Breunig, M., Al-Doori, M., Butwilowski, E., et al. (eds.) 3D Geoinformation Science: The Selected Papers of the 3D GeoInfo 2014, pp. 207–225. Springer, Cham (2015)

1342

K. Al Kalbani and A. B. Abdul Rahman

11. Döllner, J., Kolbe, T., Liecke, F., et al.: The virtual 3D city model of Berlin - managing, integrating and communicating complex urban information. In: the 25th International Symposium on Urban Data Management UDMS, pp. 1–12 (2006) 12. Kolbe, T.H.: Representing and exchanging 3D city models with CityGML. Geo-Inf. Sci. 3D, 15–31 (2008). https://doi.org/10.1007/978-3-540-87395-2_2 13. Citygmlwiki: Open Data Initiatives (2020). http://www.citygmlwiki.org/index.php?title= Open_Data_Initiatives. Accessed 3 Sept 2020 14. Arroyo Ohori, K., Biljecki, F., Kumar, K., et al.: Modeling cities and landscapes in 3D with CityGML. In: Building Information Modeling, pp. 199–215. Springer, Cham (2018) 15. Stoter, J., Vosselman, G., Dahmen, C., et al.: CityGML implementation specifications for a countrywide 3D data set. Photogram. Eng. Remote Sens. 80, 1069–1077 (2014). https://doi. org/10.14358/pers.80.11.1069 16. van den Brink, L., Stoter, J., Zlatanova, S.: Establishing a national standard for 3D topographic data compliant to CityGML. Int. J. Geogr. Inf. Sci. 27, 92–113 (2013). https://doi.org/10.1080/ 13658816.2012.667105 17. Sisi Zlatanova, A., Jantien Stoter, A., Umit Isikdag, A.: Standards for exchange and storage of 3D information: challenges and opportunities for emergency response. Int. Cartogr. Assoc. 2, 17–28 (2012) 18. Kutzner, T., Kolbe, T.H.: CityGML 3.0: sneak preview. PFGK18-PhotogrammetrieFernerkundung-Geoinformatik-Kartographie, 37 Jahrestagung München, pp. 835–839 (2018) 19. Kolbe, T.H., Gröger, G.: Towards unified 3D city models. In: Proceedings of the Joint ISPRS Commission IV Workshop on Challenges in Geospatial Analysis, Integration and Visualization II, pp. 1–8 (2003) 20. Biljecki, F., Stoter, J., Ledoux, H., et al.: Applications of 3D city models: state of the art review. ISPRS Int. J. Geo-Inf. 4, 2842–2889 (2015). https://doi.org/10.3390/ijgi4042842 21. Chaturvedi, K., Smyth, C.S., Gesquière, G., et al.: Managing versions and history within semantic 3D city models for the next generation of CityGML. In: Abdul-Rahman, A. (ed.) Advances in 3D Geoinformation, pp. 191–206. Springer, Cham (2017) 22. Agugiaro, G.: First steps towards an integrated CityGML-based 3D model of Vienna. ISPRS Ann. Photogram. Remote Sens. Spat. Inf. Sci. 3, 139–146 (2016). https://doi.org/10.5194/ isprs-annals-III-4-139-2016 23. Preka, D., Doulamis, A.: 3D building modeling in LoD2 using the CityGML standard. ISPRS - Int. Arch. Photogram. Remote Sens. Spat. Inf. Sci. XLII-2/W2, 11–16 (2016). https://doi. org/10.5194/isprs-archives-xlii-2-w2-11-2016 24. NCSI: Oman National Spatial Data Infrastructure Strategy V5.0. National Center for Statistics and Information, Oman (2017) 25. NCSI: Oman National Spatial Data Infrastructure, 1st edn. National Center for Statistics and Information, Oman (2017) 26. Das, A., Chandel, K., Narain, A.: Value of geospatial technology in boosting Omans economy. In: Oman Geospatial Forum 2017. Oman National Survey Authority, Muscat, pp. 1–74 (2017) 27. Open Geospatial Consortium: OGC City Geography Markup Language (CityGML) Encoding Standard (2012) 28. Stouffs, R., Tauscher, H., Biljecki, F.: Achieving complete and near-lossless conversion from IFC to CityGML. ISPRS Int. J. Geo-Inf. 7, 355 (2018). https://doi.org/10.3390/ijgi7090355 29. Ledoux, H., Arroyo Ohori, K., Kumar, K., et al.: CityJSON: a compact and easy-to-use encoding of the CityGML data model. Open Geospat. Data Softw. Stand. 4, 1–19 (2019). https://doi.org/10.1186/s40965-019-0064-0 30. Biljecki, F., Ledoux, H., Stoter, J.: Improving the consistency of multi-LoD CityGML datasets by removing redundancy, pp. 1–17 (2015) 31. Kensek, K.M.: Building information modeling. Build Inf. Model. 1–285 (2014). https://doi. org/10.4324/9781315797076

Appropriateness of Using CityGML Standard Version 2.0

1343

32. Biljecki, F., Ledoux, H., Stoter, J.: Generating 3D city models without elevation data. Comput. Environ. Urban Syst. 64, 1–18 (2017). https://doi.org/10.1016/j.compenvurbsys.2017.01.001 33. Abdul Rahman, A., Rashidan, H., Musliman, I.A., et al.: 3D Geospatial database schema for Istanbul 3D city model. ISPRS - Int. Arch. Photogram. Remote Sens. Spat. Inf. Sci. XLII-4/W16, 11–16 (2019). https://doi.org/10.5194/isprs-archives-xlii-4-w16-11-2019 34. Yao, Z., Nagel, C., Kunde, F., Hudra, G., Willkomm, P., Donaubauer, A., Adolphi, T., Kolbe, T.H.: 3DCityDB - a 3D geodatabase solution for the management, analysis, and visualization of semantic 3D city models based on CityGML. Open Geospat. Data Softw. Stand. 3(1), 1–26 (2018). https://doi.org/10.1186/s40965-018-0046-7

Investigating the Effects of Population Growth and Urban Fabric on the Simulation of a 3D City Model Rani El Meouche1(B) , Mojtaba Eslahi1(B) , and Anne Ruas2 1 Institut de Recherche en Constructibilité (IRC), ESTP Paris, 94230 Cachan, France

{relmeouche,meslahi}@estp-paris.eu 2 LISIS/IFSTTAR, Université de Marne-la-Vallée, 77420 Champs-sur-Marne, France

[email protected]

Abstract. Urbanization induces almost irreversible changes and leads to the phenomenon of urban sprawl and, therefore an increase in artificial lands that affects biodiversity, ecosystems and climate impacts. SLEUTH as a Cellular Automaton (CA) simulation model is widely used by the researchers and urban planners in order to simulate the urban sprawl as a dynamic system. However, SLEUTH, like many other urban simulation models, only considers historical data, and the growth of urban population and urban tissue is not explicitly contemplated and does not have much effect on the simulation results. In this research, we have added these two parameters to SLEUTH and simulated a 3D city model in order to see the impacts of population growth and types of construction on urban sprawl, and to improve the results of simulations. The 3D representation of urban forecasting is based on the results of 2D urban growth simulation while respecting some restrictions on urbanization such as the direction of the footprints, and the distances to urban entities and geographic features. The aim of this research is not only to produce a 3D city model, but also to use the potential of the third dimension to provide different urban future scenarios that can inform prospective strategies for sustainable urban development. The findings allow proposing a set of different simulations that correspond to the different land priorities and constraints, and provide disparate images of city of tomorrow for its application in urban policy as well as smart city modelling and management. Keywords: Urban sprawl · CA (cellular automata) model · SLEUTH · GIS · 3D city modelling

1 Introduction 1.1 Urbanization and Urban Growth Modelling The process of increasing the size of cities is called urbanization. Urbanization is mainly occurring due to the population growth, rural exodus and life style, and often make the irreversible changes. In recent decades, much research has been done on the simulation © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 1344–1358, 2021. https://doi.org/10.1007/978-3-030-66840-2_102

Investigating the Effects of Population Growth and Urban Fabric

1345

of urban growth as a dynamic system, and among the various methods of urban simulation, CA (cellular automata) models are more common for their applications in urban areas due to their simple integration with GIS (Geographic Information System) and RS (Remote Sensing) data. A cellular automaton is a bottom-up and discrete dynamic spatial model. In CA models, the transition rules represent the process of changing each cell state according to the current state of the cell and the states of the cells in its neighborhood [1]. CA models can truly simulate the urban sprawl mechanisms, urban planning theories and urbanization effects [2–4]. Here, we used the SLEUTH urban growth model. SLEUTH is a pattern-based model that uses cellular automaton and terrain mapping and it is widely used to simulate the urban growth [5, 6]. The acronym of SLEUTH is derived from its input maps, including Slope, Land use, Exclusion, Urban, Transportation and Hillshade. The SLEUTH model has evolved over the years and various versions of this model have been proposed to improve the results and reduce the complexity of the model in the calibration and forecasting phases. Goldstein (2004) developed the Genetic Algorithm (GA), which uses a new calibration mechanism instead of the traditional brute force, which reduces calibration time [7]. Dietzel and Clarke (2007) developed an Optimal SLEUTH Metric (OSM) that processes thirteen parameters to find the best coefficients in calibration mode [8]. In 2010, Guan and Clarke developed a parallel version of SLEUTH (pSLEUTH), which uses an open source general-purpose parallel raster processing programming library (pRPL) [9]. This reduces the computational performance of the model during the simulation and calibration process by reducing the computation time for the calibration process with multiple processors. Jantz et al. (2010) used the SLEUTH-3r model to execute a forecasting scenario composed by sub-periods with different spatial dynamics. Their simulations can show the contribution of each of the urban growth patterns according to the scenario hypothesis [10]. In SLEUTH calibration process, several historical maps, such as urban and transportation are needed, nevertheless the rhythm of growth is considered by means of historical data and the model produces the forecasting urban maps with the same tendency as today. In addition, its results are limited to some raster data on which the urbanization is supposed to occur and it is difficult to interpret for decision makers. In this research, we have added more parameters and have explicitly considered the population and building classes to optimize the model. Furthermore, we have provided a 3D representation of the model while respecting some restrictions on urbanization such as the direction of the buildings to the roads, and the distances from the urban entities and the geographic features. 1.2 3D City Modelling In recent years, governments, municipalities and companies have shown great interest in building virtual models of cities in 3D. A 3D city model can be used in different aspects of urban planning and management in a smart city. A 3D urban model can be used to support the risk management such as emergency response and planning of evacuation [11] as well as in forecasting seismic damage [12] and flooding [13]. In addition, it is used in smart city mobility, operational cost savings, increased resilience and improved sustainability. 3D city model simulations can be used to estimate solar radiation, estimate energy demand and energy efficiency, estimate cast shadows with urban features, estimate noise emission in a restricted environment, and lighting simulations [14]. To

1346

R. El Meouche et al.

select the most appropriate 3D modelling technique, factors such as data availability, performance accuracy, efficiency, speed and human capital, and costs must be considered. There are many different techniques for producing a 3D city model, such as 3D building of urban footprints and 3D reconstruction and data integration used in the integration of photogrammetry or laser scanning with GIS data. In this research, 3D buildings are created by giving third dimensions to the 2D footprint of buildings. The third dimension represents the height of a building obtained from urban fabric scenarios according to type of building and population density. In this research, we aim to provide the different scenarios for future urban growth resulting from today’s urban planning that conform to different priorities and land constraints, so that it can inform prospective development strategies for sustainable cities that are the foundation of smart city development. In the next section, the study area is presented. The process and methodology are defined in Sect. 3. In Sect. 4, the urban growth simulations results are shown and discussed. A 3D visualization of the urban growth model is provided in this section. The paper is concluded in Sect. 5.

2 Study Area In this paper, we present the application of the model in a small study area called Rieucros (43°05 07 N, 1°46 04 E) which is a community in a rural area and is located in the department of Ariege in south of Toulouse, France. We select this area due to its small scale which makes it less complicated to model as well as to study the effects and contributions of various factors on it. In addition, a significant increase in the area and rate of population growth has been observed in the last two decades. The study area is 400 hectares with a population of 686 people (INSEE - national institute of statistics and economic studies, France, 2016). We have used the geospatial database and GIS to create the input maps of SLEUTH. The input maps are the raster data of 100 × 100 pixels, where a cell corresponds to 20 m × 20 m (~400 m2 ) on the ground. For this research, we have created urban, land use, excluded and transportation maps from BD TOPO and BD ORTHO of IGN (national institute of geographic information and forestry) database for the year 2017. The slope and hillshade maps are derived from Digital Elevation Model (DEM) of RGE ALTI with a spatial resolution of 5 m, provided by IGN. We have calculated the compound annual population growth rate (extracted from INSEE) and the average population for the coming years, and have defined the building classification as supplementary parameters to add to SLEUTH in order to achieve results that are more reliable.

3 Model Procedure and Methodology In this research, the data of 2017 is used in order to simulate the urban growth forecast for 2050. In SLEUTH model, a growth cycle is the basic unit of SLEUTH execution and therefore a simulation is composed of 33 growth cycle (from 2017 to 2050). In the proposed method, an urban growth scenario considering environmental constraints, e.g. waterbodies, vegetation areas and forest, is defined. As we discussed before, the

Investigating the Effects of Population Growth and Urban Fabric

1347

rhythm of growth in SLEUTH is based on the historical data and the impacts of the population growth and the types of the buildings, and as a result the density is not explicitly considered in this model. Furthermore, the SLEUTH results are limited to some raster data which represents the growth areas. To overcome these limitations, two more data i.e. building types and population are added to the model, and different sprawl scenarios are defined as urban fabric scenarios. Later, 3D representations of the urban fabric scenarios are provided considering some urban constraints. Figure 1 illustrates the model procedure.

Fig. 1. Method procedure

3.1 SLEUTH Simulation Urban Growth Structure of the SLEUTH Model. SLEUTH as a cellular automaton model is based on a probabilistic and self-adaptive process that corresponds with only two possible states of the urban or non-urban. The model consists of three process of calibration, forecasting and self-modification. In this model, four spatial growth rules determine and update the growth, including spontaneous growth, new spreading centre growth, edge growth and road-influenced growth. These rules are controlled by five coefficients i.e. dispersion coefficient, breed coefficient, spread coefficient, slope coefficient and road gravity coefficient. These coefficients are specified in the calibration process. In the forecasting process, the model is initialized with the best-fit growth coefficient values that are derived during calibration and the model generates an urban growth map for each growth cycle. The growth cycle begins by giving a unique value to each of the coefficients and then each growth rule is applied and evaluated. A self-modification process controls if the growth rate exceeds or falls below a specific critical threshold.

1348

R. El Meouche et al.

Attraction-Based Scenario. Using the composition of input layers, we make the possibility of defining the attraction-based scenario for the model, which can improve the overall performance of the model by allowing the inclusion of growth attractors as well as protect the desired zones. The basic excluded maps have the pixel value range from zero to 100. In our model, we have provided an excluded/attractive map. We have defined the pixel value of 50 as neutral value. This makes it possible to define the less or more desirable zones for urbanization in excluded and attractive maps by giving the value more or less than 50, respectively. Therefore, the value of 100 indicates 100% protected areas and the value of zero introduces 100% attractiveness. Urban Fabric Scenario. We have defined the urban fabric scenarios by integrating the type of buildings and the population on the previous results. We have classified the existing buildings into six different residential categories, by considering their height and configuration. This classification is based on the classification of different kinds of the buildings in a city provided by department of planning and environment of government of Australia. Using demographic information extracted from INSEE, the expected population growth is estimated to be another parameter for model integration. Next, the fictitious urban fabric scenarios are assumed to better understand how a simulated area can be used and for how many residents. Based on the results, a modified urban fabric scenario for enrichment of simulations has been defined. In fact, in our approach, by defining attraction-based scenario, we have tried to answer the question of where urban expansion is likely to occur, given the environmental constraints. The classification of buildings and the integration of the types of buildings in the model have been done with the aim of answering the question of what kind of buildings may be built in the grown area. By integrating demographic data, we can answer how many inhabitants can be deployed in an urban growth simulation area, or how many buildings are needed to respond to the estimated increase in population. 3.2 3D Representation of Prospective Urban Growth Simulation In this part, the purpose is to generate a 3D visualization of the simulated urban growth by creating the 3D building representations from the pixels. To do that, first the pixels are transformed from raster data to building footprints. Due to building classification, the current buildings are classified and an average surface for each type of building is calculated. Using the topographic objects such as buildings, rivers, excluded areas and the current buildings a set of constraints are made. Later, the building footprints are created based on these constraints as well as orientation of the buildings to the road. The buildings height for the building footprints are calculated based on the probability of the adjacent neighbours’ height considering the probabilities estimated in urban fabric scenarios.

4 3D Urban Growth Simulation 4.1 Attraction-Based Environmental Protection Scenario An attraction-based scenario is defined by altering the excluded map. As mentioned in Sect. 3, each pixel of the excluded map has a value between 0 and 100. The value 100

Investigating the Effects of Population Growth and Urban Fabric

1349

indicates 100% protection, value of 50 is neutral and the pixels with value zero are the free zones to build. Therefore, the area that have the value over 50 are resistant to build and the ones under 50 are attractive. An exclusion/attraction map is generated based on the data of 2017. Four concentric zones with different attraction rates make attraction force to the centre. In addition, the areas in distances of seven pixels (~140 m that is obtained experimentally) around water surfaces are considered as attraction areas for dwelling. The excluded areas that are considered in this scenario include remarkable buildings, cemeteries, airfields and sport grounds, railways stations, activity areas (administrative, culture and leisure, education, water management, industrial or commercial, health, sports and transport), water surfaces, national parks and closed forests areas (wood land, closed coniferous forest, closed deciduous forest, mixed closed forest and tree area). Figure 2 illustrates the exclusion/attraction map. The concentric zones have the distance by the radius of 0.2 km, 0.4 km, 0.6 km, 0.8 km from the centre where their corresponding pixels get the values of 10, 20, 30, 40 respectively, and the remain area gets the value of 50. The historical maps of transportation and urban area for the years of 2000, 2008, 2012, and 2017 are used to calibrate the model.

Fig. 2. Exclusion/attraction input map

To obtain the best-fit coefficients for forecasting urban growth, the model is first run in calibration mode. To test and evaluate the model, we first run the model with the input maps of 2000 to simulate the growth for 2017. The results show more than 53% of fits, which is acceptable given the scale of the study area and the size of the pixels. Afterwards, this model is used to produce urban growth maps for 2050 using the forecast mode. The result of the simulation for 2050 are illustrated in Fig. 3. 4.2 Classify the Type of Building The building classification is done considering the land use and the urban tissue in two parts. First, according to the observations of the study area, two classes of the building are defined including single dwellings and low-rise housing. To do that, the information

1350

R. El Meouche et al.

Fig. 3. Urban growth simulated results for 2050

of undifferentiated buildings that are derived from BD TOPO of IGN database are used. The numbers and the height of the buildings are extracted and an average height for each type is calculated. Then, based on the average height of the observed buildings and with respect to the height of the neighbouring buildings and urban fabric scenarios, the appropriate height is given to the new simulated area to represent the new buildings (see Sect. 4.10). 4.3 Estimate the Population Growth The SLEUTH simulation results are needed to be evaluated by the population density. To integrate the population to our model, it is needed to first estimate the compound annual population rate. The average population growth rate that is estimated for the years between 1999 and 2011 is 1.70% per year. Using growth rate we estimate the forecasting population growth. The average population growth rate for 2050 shows the increasing of 77% of growth. We also estimate average residents in each type of building. This data is used to create the urban fabric scenario. 4.4 Urban Fabric Scenario Having the building classes, the simulated urban growth and the estimated mean population, we rate the suitable growth cycle to achieve the desired urban fabrics. The purpose of the proposed model is to compare the determinants of urbanization and its measurement in five different cases of urban dispersion as follows: 1. Sprawl urban: The first scenario considers that all new urban patches filled with single dwellings. 2. Medium dense urban: This scenario assumes 80% of the new simulated urban areas are single dwellings and 20% low-rise housing. 3. Medium/high dense urban: This scenario considers 50% of single dwellings and other 50% low-rise housing. 4. High dense urban: In this scenario 30% of buildings specify as single dwellings and other 70% as low-rise housing. 5. Too high dense urban: Last scenario indicates 100% low-rise housing.

Investigating the Effects of Population Growth and Urban Fabric

1351

These scenarios are fictive and they do not correspond to reality but they help to better understand how this land could be used and how many inhabitants could live in these new areas. Table 1 illustrates the increased population per scenario. Table 1. Population growth in primary urban fabric scenarios for 2050 Increased population per urban fabric scenarios in 2050

Sprawl urban fabric scenario

Medium dense urban fabric scenario

Medium/high dense urban fabric scenario

High dense urban fabric scenario

Too high dense urban fabric scenario

100% single dwelling

80% single dwelling & 20% low-rise housing

50% single dwelling & 50% low-rise housing

30% single dwelling & 70% low-rise housing

100% low-rise housing

1 938 283% 2 132 311% 2 423

353%

2 616 381% 2 907 424%

Table 1 shows that all scenarios passed 77% of population growth that is the desired amount of the population increasing estimated for 2050. Based on the obtained results, the urban fabric scenarios are modified to achieve the reliable scenarios. Therefore, we try to find out which growth cycle would match the desired urban fabric better. Three scenarios consist of low dense, medium dense and medium/high dense urban fabrics are modified by their growth cycles of 23rd, 18th and 13th respectively. The medium/high dense urban fabric can accommodate nearly desired population growth while the two other urban fabric scenarios are more sprawl (see Table 2). Table 2. Population growth in final urban fabric scenarios for 2050 Increased population per urban fabric scenarios in 2050

Low dense urban fabric scenario

Medium dense urban Medium/high dense fabric scenario urban fabric scenario

100% single dwelling 80% single dwelling & 20% low-rise housing

50% single dwelling & 50% low-rise housing

23th growth cycle

18th growth cycle

13th growth cycle

1 200

1 012

758

175%

148%

110%

4.5 Modifying the SLEUTH Output to Create the Building Footprints Up to now, we have integrated the exclusion/attraction area, type of buildings and the estimated population growth in our model. We have simulated the 2D urban growth maps by defining the urban fabric scenarios and we know which type of building may be built in which area in the obtained maps. In order to create the 3D representation of the

1352

R. El Meouche et al.

buildings, it is needed to generate the building footprints. Therefore, we need to consider more parameters such as the distances from the urban entities, geographical features (e.g. water bodies, roads and forest) and current buildings. The SLEUTH outputs include the non-geo-referenced raster data that contains three types of pixels representing the current urban area, new urban area and null pixels. Therefore, we have to first geo-referenced this raster data with respect to our database vector data and then convert them to vector data. 4.6 Positioning and Division of the Building Footprints The generated polygons are rotated along the closest road section (see Fig. 4). In this step, the roads are divided into small sections, then their directing coefficient is calculated. Later, the angle of orientation of the road section is calculated according to the horizontal axis. The squares that are close to this road section area oriented in related to this angle. The change is made according to the following equation. The angle calculated in the counter clockwise direction.  X = Xc + x cos  − y sin  (1) Y = Yc + x sin  + y cos  Where (x, y) are the coordinates of the corners expressed in local coordinate system and (X, Y) their associates in global coordinate system. Afterwards, we change the sign of the cosine and sine for the coordinates of four corners.

Fig. 4. Orientation of a polygon, R1 and R2 are the local and overall references respectively

We have divided each polygon into four smaller squares to reduce the risk of losing the surfaces while facing to constraints. In this case, if constraints drive the model to delete a polygon, the algorithm will delete a small square that meet the restrictions, instead of whole polygon (see Fig. 5). All these steps are considered in our algorithm and are done automatically.

Investigating the Effects of Population Growth and Urban Fabric

1353

Fig. 5. Subdividing a polygon to smaller squares

4.7 Configuration of the Building Footprints The adjustment and positioning of the new buildings respect the old buildings. The polygons may overlap after orienting and dividing. Furthermore, adjustment and positioning of new buildings follow the layout of the old buildings and we should define distance between a polygon and different land occupation entities based on old buildings. Overcoming overlaps and respecting the distance to urban units may lead the elimination some parts of polygons. Two types of constraints are taken into account as land occupation entities including: • Continuous constraints: They have a linear distribution in space including vegetation, water bodies, roads and railways. • Discrete constraints: They consist of the points or small areas such as remarkable buildings, cemeteries, airfields, sport grounds, activity areas, industrial or commercial areas and existing buildings. In order to consider the discrete constraints in model, we measure the distance of the current buildings from each other and from other discrete constraints and calculate the average distances. Later, by defining a buffer equal to average distance, the obtained distances to the nearest discrete constraints are applied for each polygon. These buffers might intersect each other or some other entities that should be removed in these cases. When only a part of a polygon intersects a buffer, the subdividing of polygons that we have done before, can help the model not to lose the polygon completely. Furthermore, we have defined a threshold for the intersection of a square to a buffer equal to 30% of a square area. If a buffer overlaps more than 30% with a square, that square will remove. The process for the continuous constraints is based on a double geo-processing buffer:

1354

R. El Meouche et al.

• First, we measure the distance from the nearest existing building to the continuous constraints and then we make a buffer of ten times of this distance. We assume that all the buildings close to the sections of the continuous constraints are at this distance. • Next, the average distance of these buildings from a continuous constraint is calculated. This average is considered as a minimum distance for new buildings to these constraints. In applying continuous constraints to the polygon, the algorithm makes a second buffer with a distance equal to the average distance and remove the intersection of this buffer with the polygon. 4.8 Assembles the Polygonal Components In order to generate the building footprints, the divided squares must be assembled according to the type of building. Therefore, the building footprints will have the surfaces that remains of small squares. Considering the building types and the size of the polygons, a maximum surface (Smax) for each type of a building footprint is defined. The Smax for the single dwelling and low-rise housing are calculated as 156 m2 and 256 m2 respectively. We have defined IDs for each polygon that help with the assembly process. In the assembly process, the algorithm controls that the total area does not exceed Smax. If the area of the assembled squares is less than Smax, the whole polygon represents a building. The polygons whose surfaces exceed Smax, are subdivided into smaller polygons, corresponding the decomposition of the pixel into subpixels. In this process, we consider the small squares which belong to same pixel. According to Smax, we can gather them as needed so that both on the left or on the right of the set of small squares of the sub-pixel means LU (Left/Up) with LD (Left/Down) and RU (Right/Up) with RD (Right/Down). In the algorithm, it is defined that the width of a building locates on the roadside and therefore have access to nearest road (see Fig. 6).

Fig. 6. Assembling sub pixels respecting the roads

4.9 Building Footprints Generation and Positioning the Building Representations In the assembly process, we gathered the remained components of each polygon and created the new polygons. Different possible types of building footprints are provided

Investigating the Effects of Population Growth and Urban Fabric

1355

based on urban fabric scenarios. In this step, we apply the appropriate erosion to each polygon that respects the defined area of the building classes, the distance between the new buildings and the Smax and forms the footprints of the building (see Fig. 7).

Fig. 7. Erosions performed on polygons according to the type of building. The half-pixel case is not intended for low-rise housing because it cannot provide the required surface for this type of building due to the average area of existing buildings for this type.

4.10 The 3D Visualization of the City Calculating the suitable height for these footprints according to urban fabric scenario (urban sprawl considering the population growth and types of buildings) will lead to generate the 3D representation of prospective urban model. To find the information of the possible heights for the new buildings and give an appropriate height to the footprints, the height probabilities for each polygon are calculated according to the type of neighbouring buildings. Given the scenarios where it is necessary to have mixed height values, we use an algorithm that combines the random aspect and a statistical interpolation. The urban fabric of the study area is composed of two types of building including single dwellings and low-rise housing. As mentioned in Sect. 4.2, the average height for each type is calculated. In this step, the algorithm sorts the buildings in ascending order of their surfaces (SB1 < SB2), and calculates the percentages of combination in

1356

R. El Meouche et al.

different scenarios for each building type of B1 and B2 by Prs1 and Prs2, respectively (see Fig. 8). P1 and P2 indicate the average height probabilities for each building. These parameters are calculated from the nearest current building heights.

Fig. 8. Algorithm of calculating the probability of the height for each building according to the building types and urban fabric scenario

Two radiuses with the values of r1 and r2 are calculated based on the distance of the nearest neighbour of each existing building by applying the quintile classification. The distance between the new building and the current buildings are calculated (DIS), then the inverse distance (IDIS) and the sum of the inverse distance (SIDIS) are computed. The new buildings are classified according to the distance to the neighbour in three clusters. • New buildings that have at least one neighbour that is part of the current buildings on a circle (r1) • New buildings that have at least one neighbour that is part of the current buildings on a limited ring between the small circle (r1) and the large circle (r2) • New buildings that have no neighbours that are part of the current buildings on a circle (r2) The impact rate is calculated for each type of building, which affects the type of new building (building with height equal Hi). Then, the total probability of each type associated with each building is deduced, which leads a new Pi that signifies the probability of a building with height Hi. Next, the initial percentage (Pri) of each type of building for the variable percentage (Pr) is calculated. The final step is the illustration of 3D representation of the urban growth model. To do that, a Digital Elevation Model (DEM) using BD TOPO data altitudes (IGN) is created and using the calculated heights, an extrusion of the various layers including new buildings is applied. Figure 9 illustrates the results displayed in ArcScene.

Investigating the Effects of Population Growth and Urban Fabric

1357

Fig. 9. a) 2D simulated urban map for 2050, (b) Ortho-photo 2017, (c) 3D representation of observed city in 2017, (d) 3D representation of the city for 2050

5 Conclusion Urban sprawl phenomenon is a big challenge for the authorities and urban planners. The urban simulation techniques are willing to solve the various problems of urban growth modelling. Most of the urban growth models are based on the historical data and they simulate the growth similar to tendency as today. Furthermore, they mostly consider the exclusion area but rarely integrate the attraction zones in the simulation. In this research, we have generated an attraction-based scenario that leads to integrate both exclusion and attraction areas in the model. We have included the impact of population growth and types of buildings on the urban growth simulation model. This leads to provide the urban fabric scenarios that aims to compare the determinants of the urbanization and its measurement in different scenarios of urban sprawl. These scenarios can also help to understand the difference between scattered and dense growth. We used the potential of the 3D modelling to provide different urban future scenarios that can inform prospective strategies for sustainable urban development. We have developed a model that can simulate the footprint of buildings in new urban areas and calculate the height of buildings according to the probability of height in the area where the new building is located. We have created a primary 3D model of the simulated area by considering some urban constraints such as distance to urban units and respecting the shape and height of existing buildings. In the created 3D urban growth model, the buildings are illustrated in block models with flat roof structure (similar to LoD1 of CityGML). The

1358

R. El Meouche et al.

provided 3D model aims to help better understand the simulation results and to facilitate the interpretation of the SLEUTH simulation. The 3D urban growth models can be used in various applications such as pollution estimation, energy demand estimation, solar radiation estimation and traffic management.

References 1. Schiff, J.L.: Cellular Automata: A Discrete View of the World, p. 40. Wiley (2011). ISBN 9781118030639 2. Clarke, K.C., Gaydos, L.J.: Loose-coupling of a cellular automaton model and GIS: long-term growth prediction for the San Francisco and Washington/Baltimore. Int. J. Geogr. Inf. Sci. 12, 699–714 (1998) 3. Antoni, J.P., Vuidel, G., Omrani, H., Klein, O.: Geographic cellular automata for realistic urban form simulations: how far should the constraint be contained? (2019). https://doi.org/ 10.1007/978-3-030-12381-9_7 4. Eslahi, M., El Meouche, R., Ruas, A.: Using building types and demographic data to improve our understanding and use of urban sprawl simulation. In: Proceedings of the International Cartographic Association, vol. 2, p. 28 (2019). https://doi.org/10.5194/ica-proc-2-28-2019 5. Clarke, K.C.: A decade of cellular urban modeling with SLEUTH: unresolved issues and problems. In: Brail, R.-K. (ed.) Planning Support Systems for Cities and Region, pp. 47–60. Lincoln Institute of Land Policy, Cambridge (2008) 6. Project Gigalopolis (2018). http://www.ncgia.ucsb.edu 7. Goldstein, N.C.: Brains vs. brawn – comparative strategies for the calibration of a cellular automata–based urban growth model. In: Atkinson, P., Foody, G., Darby, S., Wu, F. (eds.) Geo Dynamics. CRC Press, Boca Raton (2004) 8. Dietzel, C., Clarke, K.C.: Toward optimal calibration of the SLEUTH land use change model. Trans. GIS 11, 29–45 (2007). https://doi.org/10.1111/j.1467-9671.2007.01031.x 9. Guan, Q., Clarke, K.C.: A general-purpose parallel raster processing programming library test application using a geographic cellular automata model. Int. J. Geogr. Inf. Sci. 24(5), 695–722 (2010) 10. Jantz, C.A., Goetz, S.J., Donato, D., Claggett, P.: Designing and implementing a regional urban modeling system using the SLEUTH cellular urban model. Comput. Environ. Urban Syst. 34(1), 1–16 (2010). https://doi.org/10.1016/j.compenvurbsys.2009.08.003 11. Tashakkori, H., Rajabifard, A., Kalantari, M.: A new 3D indoor/outdoor spatial model for indoor emergency response facilitation. Build. Environ. 89, 170–182 (2015) 12. Christodoulou, S., Vamvatsikos, D., Georgiou, C.: A BIM-based framework for forecasting and visualizing seismic damage, cost and time to repair. In: Proceedings of the European Conference on Product and Process Modelling, Cork, Ireland, 14–16 September (2011) 13. Varduhn, V., Mundani, R.P., Rank, E.: Multi-resolution models: recent progress in coupling 3D geometry to environmental numerical simulation. In: 3D Geoinformation Science, pp. 55–69. Springer, Cham (2015) 14. Biljecki, F., Stoter, J., Ledoux, H., Zlatanova, S., Çöltekin, A.: Applications of 3D city models: state of the art review. ISPRS Int. J. Geo-Inf. 4, 2842–2889 (2015). https://doi.org/10.3390/ ijgi4042842

Segmentation-Based 3D Point Cloud Classification on a Large-Scale and Indoor Semantic Segmentation Dataset Ali Saglam(B)

and Nurdan Akhan Baykan

Computer Engineering, Konya Technical University, Konya, Turkey {asaglam,nbaykan}@ktun.edu.tr

Abstract. This paper presents a segmentation-based classification technique for 3D point clouds. This technique is supervised and needs a ground-truth data for the training process. In this work, the Stanford Large-Scale 3D Indoor Spaces (S3DIS) dataset has been used for the classification of points with the segmentation preprocessing. The dataset consists of a huge amount of points and has semantic ground-truth segments (structures and objects). The main problem in this study is to classify raw points according to the predefined objects and structures. For this purpose, each semantic segment in the training part is segmented separately by a novel successful segmentation algorithm at first. The extracted features of each sub-segments resulted from the segmentation of the semantic segments in the training part are trained using the classifier, and a trained model is obtained. Finally, the raw data reserved for testing are segmented using the same segmentation parameters as used for training, and the result segments are classified using the trained model. The method is tested using two classifiers which are Support Vector Machine (SVM) and Random Forest (RF) with different segmentation parameters. The quantitative results show that RF gives a very useful classification output for such complicated data. Keywords: Point cloud segmentation · Classification · Feature extraction · Support Vector Machine · Random Forest

1 Introduction Gathering meaningful information from 3D point clouds is a very important topic in remote sensing, robotics, and computer vision fields [1]. Point cloud classification is a key task for many applications such as object detection [2, 3], urban mapping [4, 5], vegetation mapping [6], defining building structures [7], city modeling [8] and interpretation of complex 3D scenes [9]. For computer vision systems, 3D point cloud data have some advantages and disadvantages according to 2D range data [10]. The most important advantage is the fact that 3D point clouds give much information about the spatial geometric feature of the surfaces. For this reason, the spatial features (x, y, and z coordinate values) of the points © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 1359–1372, 2021. https://doi.org/10.1007/978-3-030-66840-2_103

1360

A. Saglam and N. A. Baykan

are more preferable than radiometric (spectral) features to segmentation and classification processes. Many point cloud segmentation algorithms use only spatial features of the points to segment the point cloud data because spectral features can be misleading due to ambient light and less distinctive for the structures and objects. Its main disadvantage is the difficulties of processing. 3D point clouds are usually very large amounts and scattered. Their raw form is not rasterized into a regular grid structure unlike 2D or 2.5D range data. Since there is not a spatial adjacency order in a raw point cloud, some additional data organization methods are used in the point cloud processing applications such as the kd-tree organization [11] to search a point and its nearest neighbors and the octree organization [12] to rasterize the data into a regular structure by sampling into equal-sized 3D cubic volumes (voxels) [13]. In the computer vision field, the segmentation process is a very useful intermediate stage before classification [14]. Segmentation groups the data elements according to their similarity to reduce the unnecessary processing time spent on similar data elements and supply more distinctive features for the next stages. In the literature, many point cloud segmentation methods have been proposed within some categories. The voxelbased region growing methods proposed recently in the study [15], named “Boundary Constrained Voxel Segmentation (BCVS)”, gives good accurate results in low processing time compared to some well-known methods in the literature. For this reason, the octree voxelization process as a pre-processing and the BCVS as a mid-level process have been used in this work. In this paper, the grouped points in segments are classified with respect to the segment features preferred in this study. In the experiments, a large-scale and indoor semantic segmentation dataset [16], named “the Stanford Large-Scale 3D Indoor Spaces (S3DIS)”, is used. Each point in the dataset has the spatial (values on the x, y, and z coordinates) and spectral (the RGB color values) features. The dataset also has semantic ground-truth segments labeled by 12 classes, which consists of real-life structures and objects, and clutters. We divided the dataset into two parts as the train part that composes the large percentage of the dataset and the test part. In the scope of this work, the semantic ground-truth segments are trained and tested using both the Support Vector Machine (SVM) [17] and Random Forest (RF) [18] classifiers separately to evaluate the success of the classifiers and features. The SVM and RF classifiers are widespread for the classification of 3D point clouds in the literature [1, 3, 19, 20]. The evaluation results are given in this paper as the rate of the true detected semantic segments to the number of all semantic segments in the test part for each classifier. The main problem of this work is the classification of raw points. A segmentationbased classification method to solve this problem is presented within some stages. At first, each semantic ground-truth segment in the training part is segmented into sub-segments using the BCVS method, following the octree voxelization. As the segmentation parameters, several segmentation parameters that accord the segmentation level as over and under segmentation are tested. Secondly, the features of the sub-segments are trained and, the trained model is created for the selected segmentation parameters. As the third, the raw test data is segmented using the same segmentation method and the segmentation parameters as in the model used in the training phase. Finally, the segments of the test

Segmentation-Based 3D Point Cloud Classification

1361

part are classified using the model. As a result of the segment classification, each point in a segment is labeled with the assigned class label of the segment it belongs to. To evaluate the correctness of the classification results quantitatively, the precision, recall, F1 score, and overall accuracy measurements [14, 15, 21] are used. The overall accuracy value indicates the success of the classification respecting the true detected points over the complete test data. The precision, recall, and F1 score values indicate the success of each class separately. The overall F1 score is calculated using the mean precision and recall values, and gives a success measure for the complete test data. The overall F1 score measurement equally evaluates the precision and recall measures of every classes, namely the classes with few points and with more points are evaluated with an equal impact, unlike the overall accuracy measurement.

2 Methodology 2.1 Segmentation of Raw Data The segmentation of raw 3D point cloud data is a challenging problem because of its unorganized and noisy structure. On the other hand, 3D data has geometric local surface features to facilitate the segmentation process. Because the radiometric features such as color, lightness and intensity are affected by the ambient light and material properties, most of the segmentation methods in the literature use local geometric features such as local normal vectors, tangent vectors and the angular differences between them [13, 15, 22]. In this work, the BCVS method has been used in the segmentation stage because of its successes in both accuracy and processing time [15]. The method uses the octree voxelization and graph structures to obtain local neighboring and contiguity connections. After the voxelization, the connections between the adjacent voxels are weighted and added an empty list to obtain the weighted connection list. For extraction of the local surface gradients between the adjacent voxels, the method uses the surface inclinations and spatial barycenters of the local point groups involved in the voxels. To obtain the surface inclination of a point group, the Principal Component Analysis (PCA) is a commonly used method. PCA extracts the eigenvalues λ1 ≥ λ2 ≥ λ3 and their respectively corresponding eigenvectors e1 , e2 and e3 from the co-variance matrix based on the distribution of points on the spatial x, y, and z axis. The normal vector n is the eigenvector e3 that corresponding the smallest eigenvalue λ3 and represents the surface inclination. The angle θ between two normal vectors is one of the curvature angles the BCVS method uses. The other angles α1 and α2 the BCVS method uses are the angles between the adjacent local surfaces and the tangent vector between two barycenters. In the study [15] that proposes the BCVS method, the weight values of the connections are calculated according to the Eq. 1.   α1 + α2 (1) min θ, 2 After the weighted connections are added to the connection list, the connections in the list are sorted in ascending order of weight. Each voxel seems as a segment in initial.

1362

A. Saglam and N. A. Baykan

Starting from the least weighted connection, the segments at the ends of the connection in consideration are evaluated to merge them. In the merge evaluation, the boundary voxels between two segments are paired one-to-one mutually by respecting the weight values of the connection between them and giving priority to the smallest connection. If every connection weight between the paired voxels are less than or equal to the angular segmentation parameter, the segments are merged. Otherwise, the segments at the ends of the next connection in the sorted connection list are taken into consideration to evaluation [15]. 2.2 Feature Extraction from Segments In this work, the basic segment features such as the number of points in the segments, the average height of the points (mean z value), the difference between the minimum and maximum height, the eigenvalues, the scaled eigenvalues, the normal vector values, the standard deviations of the points through the eigenvectors and mean color values are used. In Table 1, the features are listed. Table 1. The segment features used in this study. Feature name

Explanation

PointNum

Number of points in the segment

MeanZ

Mean z value (average height)

Height

Difference between the maximum and minimum z values

Eigenvalues

λ1 , λ2 and λ3

Scaled eigenvalues

λ1 λ2 λ3 λ1 +λ2 +λ3 , λ1 +λ2 +λ3 and λ1 +λ2 +λ3

Normal vector values nx , ny and nz Std of e1 , e2 and e3 Standard deviations through e1 , e2 and e3 (respectively σ1 , σ2 and σ3 ) Mean color

Mean color values (if the data used has color features)

2.3 Classification In this work, the SVM and RF classifiers, which are supervised classification methods, were tested in the classification stages in our experiments. The SVM classifier finds a hyperplane in a high- or infinite-dimensional feature space, which can be used for classification, regression, or other tasks like outliers detection. The final separation is achieved by maximizing distance from the hyperplane to the nearest training data point of any class and minimizing the generalization error of the classifier [23, 24]. In this work, the kernel function is selected as Gaussian radial basis function (RBF) for SVM, by which the original input space can be transformed non-linearly into a higher dimensional feature space, since it has been proved effective in many classification problems. The RF

Segmentation-Based 3D Point Cloud Classification

1363

method gathers a number of randomly formed decision trees to create a random forest. Each decision tree is trained with differently selected subsets of the data. In the test, the trained RF classifies the consequent new data according to the majority vote of all decision trees [1, 25].

3 Experimental Results 3.1 Dataset The S3DIS consists of 6 indoor places which are named Area1, Area2, Area3, Area4, Area5 and Area6 [16, 26]. The points in the dataset have the spatial values (x, y and z) (XYZ) and the color values (red, green and blue) (RGB). The numbers of points in the places are listed in Table 2. Table 2. The number of points in the places. Area 1

Area 2

Area 3

Area 4

Area 5

Area 6

44,026,810

47,294,447

18,662,173

43,682,507

78,649,818

41,353,055

The dataset also has semantic ground-truth segments labeled with 12 main classes which are ceiling, floor, wall, beam, column, window, door, table, chair, sofa, bookcase, and board. The other objects, which are not in the 12 main classes, are labeled as clutter. The numbers of semantic ground-truths in the classes for each place are given in Table 3. Table 3. The numbers of semantic ground-truth segments in the classes for each place. Class name

Area 1

Area 2

Area 3

Area 4

Area 5

Area 6

Total

Ceiling

56

82

38

74

77

64

391

Floor

45

51

24

51

69

50

290

Wall

235

284

160

281

344

248

1 552

Beam

62

12

14

4

4

69

165

Column

58

20

13

39

75

55

260

Window

30

9

9

41

53

32

174

Door

87

94

38

108

128

94

549

Chair

156

546

68

160

259

180

1 369

Table

70

47

31

80

155

78

461

Bookcase

91

49

42

99

218

91

590

Sofa

7

7

10

15

12

10

61

Board

28

18

13

11

43

30

143

Total

925

1 219

460

963

1 437

1 001

6 005

In this work, the areas numbered from 1 to 5 are merged to be used as the training part. On the other hand, Area 6 is reserved as the test part. The original RGB colored

1364

A. Saglam and N. A. Baykan

Fig. 1. The original RGB colored version of Area 6 (a), its randomly colored semantic groundtruth segments (b) and the labeled points with the main classes and clutter (c).

Segmentation-Based 3D Point Cloud Classification

1365

version of Area 6 in Fig. 1 (a), its colored semantic ground-truth segments in Fig. 1 (b) and the labeled points with the main classes and clutter in Fig. 1 (c) are visualized. In Fig. 1, the covering points mostly comprising the ceiling class are not illustrated in order to show the objects inside the rooms. The points in the dataset have the color information as RGB (red, green and blue). However, the RGB values are converted to L*a*b* color space for the classification stages in our experiments because L*a*b* is closer to the human perception system than RGB [27, 28]. 3.2 Parameter Settings In our experiments, the segmentation processes and visualizations of output points are implemented and carried out using the C++ programming language and the PCL library [29]. The feature extraction from segments and the classification processes for both RF and SVM are carried out using Python programming language and the scikit-learn library [30]. In the segmentation stages, the voxel size was selected as 0.05 m (meter) examining the point distribution empirically. The angular parameter has been tested for 5°, 10°, 15°, 25°, 35°, and 45° in both the training and test parts. The segmentation process in the testing stage is realized with the segmentation parameters used in the segmentation processed for training. The segments whose point number is less than 50 in the semantic ground-truth segments were not trained, and the points in such segments in the segmentation output of the raw test data were labeled as clutter. The classifier parameters were determined by examining the literature and testing some different values to give the best scores. The SVM classifier was used with C = 7 and the RBF kernel. The number of trees in RF classifier was determined as 200. The other parameters were released as default or automatic in the library used. In the SVM classification, the feature values in the training are scaled in the same range (0–1 in our work) using minimum and maximum values for each feature. In the same way, the features in the test data are scaled in the same range using the minimum and maximum values obtained from the training data. 3.3 Testing the Performances of the Classifiers and the Features on the Semantic Ground-Truth Segments In the experiments, we tested the classification method on the semantic ground-truth segments using the defined features of them. In this experiment, any segmentation process is not realized. In Table 4, the percentages of accurately classified semantic segments of the test data (Area 6) is shown in per-class and overall for both the SVM and RF classifiers. Looking at Table 4, the RF classifier is better than the SVM in most classes. However, the SVM classifier is slightly better than RF in the classes which are floor, chair, and board. The classes (especially objects) in the data are very complicated. For this reason, the accuracy values bigger than 90% is a good result and both SVM and RF give the accuracy values bigger than 90% in overall. The importance rates of the features in the RF classification of the semantic labels are also shown in the scale range 0–1 in Fig. 2. In the figure, the features are stated as PointNum (the number of points), MeanZ (the mean

1366

A. Saglam and N. A. Baykan Table 4. The percentages of correctly classified semantic segments. Class name Ceiling

SVM (%) RF (%) 95.24

100.00

Floor

100.00

97.96

Wall

96.36

99.60

Beam

92.65

98.53

Column

96.30

100.00

Window

61.29

70.97

Door

97.85

98.92

Table

81.82

85.71

Chair

98.88

96.65

Sofa

11.11

44.44

Bookcase

61.11

87.78

Board

96.55

93.10

Overall

90.60

95.15

z value), Height (the height length), Eval1 (λ1 ), Eval2 (λ2 ), Eval3 (λ3 ), sEval1 (scaled λ1 ), sEval2 (scaled λ2 ), sEval3 scaled (λ3 ), Normal1 (nx ), Normal2 (ny ), Normal3 (nz ), Std1 (σ1 ), Std2 (σ2 ), Std3 (σ3 ), L (L*), a (a*), and b (b*).

Fig. 2. The importance rates of the features in the RF classification of the semantic labels.

Segmentation-Based 3D Point Cloud Classification

1367

3.4 Segmentation-Based Classification Results of the Raw Test Data The classification of raw data is the main problem of this work. According to the method presented in this study, each semantic ground-truth segment is segmented to sub-segments in the training data at first. The defined features of the sub-sets are trained by the classifier used and, the trained model is created. Finally, the raw data is segmented with the same segmentation parameters used in the segmentation of the semantic segments in the training data and classified using the trained model. In the experiment, the segmentation is carried out with different parameters to examine the successes of over and under segmentations. As the classifiers, the SVM and RF classifiers were tested, separately. As the quantitative evaluating, the precision, recall, F1 score (F1), overall accuracy (OA) and overall F1 score (OF1) were used [1, 31]. OA is the rate of the true classified points to all points in the main classes. On the other hand, the rates of the true classified points in a class to all points in the class of the reference data (recall) and to all points in the detected points as the class (precision) are calculated. F1 is the harmonic mean of the precision and recall values of a class, while OF1 is the harmonic mean of the average precision and recall values of all classes. In Table 5, the OA and OF1 values are seen for both SVM and RF classifiers according to the different segmentation parameters. Table 5. OA and OF1 values of the segmentation-based classification of raw test data. Segmentation parameter (°) OA SVM

OF1 RF

SVM

RF

5

0.7580 0.8299 0.4813 0.6982

10

0.6868 0.8243 0.4411 0.6660

15

0.7000 0.7327 0.4426 0.6249

25

0.5540 0.6747 0.4074 0.5671

35

0.3936 0.4808 0.3150 0.4287

45

0.2049 0.1771 0.2188 0.2219

Looking at the results in Table 5, the over-segmentation gives more accurate results because of saving the borders between the objects. On the other hand, RF is better on the dataset than SVM in terms of both OA and OF1. In Fig. 3 (a), the segmentation output of Area6 is visualized as colored. The classification results of the SVM and RF classifiers with the segmentation parameters 5° that gives the best results are also visualized in Fig. 3 (b) and (c) respectively. In Table 6, the precision, recall and F1 values are shown for each class. Looking at the results, the RF classifier is quite better than SVM for each class in the segmentation-based classification of the raw test data. In Fig. 4, the importance rates of the features in the RF classification of the raw data classification are also shown.

1368

A. Saglam and N. A. Baykan

Fig. 3. The randomly colored segmentation output (a), the SVM classification result (b) and the RF classification result (c).

Segmentation-Based 3D Point Cloud Classification

1369

Table 6. The precision, recall and F1 values of the classifiers for each class (the segmentation parameter = 5°). Class Name SVM

RF

Precision Recall F1

Precision Recall F1

Ceiling

0.9073

0.9588 0.9323 0.9380

0.9617 0.9497

Floor

0.9876

0.9454 0.9660 0.9910

0.9614 0.9760

Wall

0.7122

0.8784 0.7866 0.7730

0.9225 0.8411

Beam

0.0000

0.0000 0.0000 0.9176

0.4621 0.6146

Column

0.0000

0.0000 0.0000 0.7709

0.3187 0.4510

Window

0.5711

0.5116 0.5397 0.7961

0.5726 0.6661

Door

0.7079

0.6204 0.6613 0.8355

0.6897 0.7556

Table

0.8802

0.6587 0.7535 0.8660

0.8017 0.8326

Chair

0.4839

0.7535 0.5893 0.7160

0.7544 0.7347

Sofa

0.0000

0.0000 0.0000 0.8316

0.1630 0.2726

Bookcase

0.4068

0.5715 0.4753 0.5986

0.7339 0.6594

Board

0.0000

0.0000 0.0000 0.6101

0.0635 0.1151

Average

0.4714

0.4915 0.4753 0.8037

0.6171 0.6557

Fig. 4. The importance rates of the features in the RF classification of the raw test data.

As seen in Fig. 4, the feature of MeanZ is highly important because the segments are composed of only surface, namely the shapes of them is not as distinctive as those of the semantic segments.

1370

A. Saglam and N. A. Baykan

4 Conclusion In this paper, a segmentation-based classification method is presented. For testing the method, a large-scale indoor point cloud dataset is used. According to the method, the semantic ground-truth segments in the training data are segmented into sub-segments. The features listed in this paper, a segmentation-based classification method is presented. For testing the method, a large-scale indoor point cloud dataset is used. According to the method, the semantic ground-truth segments in the training data are segmented into sub-segments. The features listed in this paper of the sub-segments are used for training. In the testing stage, the raw test data are segmented with the same parameters of the segmentation method, which are used in the segmentation processed for the training. As the segmentation method, a novel voxel-based region growing segmentation method available in the literature is used. In the classification, the SVM and RF classifiers are used. In the experiments, firstly the semantic ground-truth segments are classified with both SVM and RF with no sub-segmentation. The rates of the correctly classified semantic segments are 90.60% with SVM and 95.15% with RF. The classification of the raw test data based on segmentation is the main problem of this work. This method is carried out with different segmentation parameters. As the evaluation of raw data, the overall accuracy, precision, recall, F1 score and overall F1 score measurements are used. Looking at the results, the segmentation parameter that provides an over-segmentation gives the best result for both classifiers. On the other hand, RF gives better results than the SVM method for every evaluation measurements. The future work is using the contextual segment features in the classification and increasing the success results.

References 1. Weinmann, M., Urban, S., Hinz, S., Jutzi, B., Mallet, C.: Distinctive 2D and 3D features for automated large-scale scene analysis in urban areas. Comput. Graph. (Pergamon) (2015). https://doi.org/10.1016/j.cag.2015.01.006 2. Pu, S., Rutzinger, M., Vosselman, G., Oude Elberink, S.: Recognizing basic structures from mobile laser scanning data for road inventory studies. ISPRS J. Photogram. Remote Sens. (2011). https://doi.org/10.1016/j.isprsjprs.2011.08.006 3. Serna, A., Marcotegui, B.: Detection, segmentation and classification of 3D urban objects using mathematical morphology and supervised learning. ISPRS J. Photogram. Remote Sens. (2014). https://doi.org/10.1016/j.isprsjprs.2014.03.015 4. Guindon, B., Zhang, Y., Dillabaugh, C.: Landsat urban mapping based on a combined spectral-spatial methodology. Remote Sens. Environ. (2004). https://doi.org/10.1016/j.rse. 2004.06.015 5. Haala, N., Peter, M., Kremer, J., Hunter, G.: Mobile LiDAR mapping for 3D point cloud collection in urban areas—A performance test. Int. Arch. Photogram. Remote Sens. Spat. Inf. Sci. 37, 1119–1127 (2008) 6. Wurm, K.M., Kretzschmar, H., Kümmerle, R., Stachniss, C., Burgard, W.: Identifying vegetation from laser data in structured outdoor environments. Içinde: Robot. Auton. Syst. (2014). https://doi.org/10.1016/j.robot.2012.10.003 7. Vanegas, C.A., Aliaga, D.G., Benes, B.: Automatic extraction of Manhattan-world building masses from 3D laser range scans. IEEE Trans. Vis. Comput. Graph. (2012). https://doi.org/ 10.1109/TVCG.2012.30

Segmentation-Based 3D Point Cloud Classification

1371

8. Lafarge, F., Mallet, C.: Creating large-scale city models from 3D-point clouds: a robust approach with hybrid representation. Int. J. Comput. Vis. (2012). https://doi.org/10.1007/s11 263-012-0517-8 9. Boulch, A., Houllier, S., Marlet, R., Tournaire, O.: Semantizing complex 3D scenes using constrained attribute grammars. In: Eurographics Symposium on Geometry Processing (2013). https://doi.org/10.1111/cgf.12170 10. Verdoja, F., Thomas, D., Sugimoto, A.: Fast 3D point cloud segmentation using supervoxels with geometry and color for 3D scene understanding. In: Içinde: Proceedings - IEEE International Conference on Multimedia and Expo, pp. 1285–1290 (2017). https://doi.org/10.1109/ ICME.2017.8019382 11. Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18, 509–517 (1975). https://doi.org/10.1145/361002.361007 12. Samet, H.: Applications of Spatial Data Structures: Computer Graphics, Image Processing, and GIS. Addison-Wesley, Boston (1989) 13. Vo, A.V., Truong-Hong, L., Laefer, D.F., Bertolotto, M.: Octree-based region growing for point cloud segmentation. ISPRS J. Photogram. Remote Sens. 104, 88–100 (2015). https:// doi.org/10.1016/j.isprsjprs.2015.01.011 14. Vosselman, G., Coenen, M., Rottensteiner, F.: Contextual segment-based classification of airborne laser scanner data. ISPRS J. Photogram. Remote Sens. (2017). https://doi.org/10. 1016/j.isprsjprs.2017.03.010 15. Saglam, A., Makineci, H.B., Baykan, N.A., Baykan, Ö.K.: Boundary constrained voxel segmentation for 3D point clouds using local geometric differences. Expert Syst. Appl. (2020). https://doi.org/10.1016/j.eswa.2020.113439 16. Armeni, I., Sener, O., Zamir, A.R., Jiang, H., Brilakis, I., Fischer, M., Savarese, S.: 3D semantic parsing of large-scale indoor spaces. In: Içinde: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2016). https://doi.org/10. 1109/CVPR.2016.170 17. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-based Learning Methods (2000). https://doi.org/10.1017/cbo9780511801389 18. Breiman, L.: Random forests. Mach. Learn. (2001). https://doi.org/10.1023/A:101093340 4324 19. Ni, H., Lin, X., Zhang, J.: Classification of ALS point cloud with improved point cloud segmentation and random forests. Remote Sens. (2017). https://doi.org/10.3390/rs9030288 20. Hackel, T., Wegner, J.D., Schindler, K.: Fast semantic segmentation of 3D point clouds with strongly varying density. ISPRS Ann. Photogram. Remote Sens. Spat. Inf. Sci. III-3, 177–184 (2016). https://doi.org/10.5194/isprsannals-iii-3-177-2016 21. Weinmann, M., Jutzi, B., Mallet, C., Weinmann, M.: Geometric features and their relevance for 3D point cloud classification. Içinde: ISPRS Ann. Photogram. Remote Sens. Spat. Inf. Sci. (2017). https://doi.org/10.5194/isprs-annals-iv-1-w1-157-2017 22. Xu, Y., Yao, W., Tuttas, S., Hoegner, L., Stilla, U.: Unsupervised segmentation of point clouds from buildings using hierarchical clustering based on gestalt principles. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 11, 4270–4286 (2018). https://doi.org/10.1109/JSTARS. 2018.2817227 23. Hastie, T., Tibshirani, R., Friedman, J.: Elements of Statistical Learning 2nd edn. (2009). https://doi.org/10.1007/978-0-387-84858-7 24. Zhang, J., Lin, X., Ning, X.: SVM-Based classification of segmented airborne LiDAR point clouds in urban areas. Remote Sens. (2013). https://doi.org/10.3390/rs5083749 25. Blomley, R., Weinmann, M.: Using multi-scale features for the 3D semantic labeling of airborne laser scanning data. Içinde: ISPRS Ann. Photogram. Remote Sens. Spat. Inf. Sci. (2017). https://doi.org/10.5194/isprs-annals-iv-2-w4-43-2017

1372

A. Saglam and N. A. Baykan

26. Armeni, I., Sax, A., Zamir, A.R., Savarese, S.: Joint 2D-3D-Semantic Data for Indoor Scene Understanding. ArXiv e-prints (2017) 27. Margulis, D.: Photoshop LAB Color: The Canyon Conundrum and Other Adventures in the Most Powerful Colorspace. Peachpit Press, Berkeley (2006) 28. Saglam, A., Baykan, N.A.: Evaluating the attributes of remote sensing image pixels for fast k-means clustering. Turk. J. Electr. Eng. Comput. Sci. 27, 4188–4202 (2019). https://doi.org/ 10.3906/elk-1901-190 29. Rusu, R.B., Cousins, S.: 3D is here: Point Cloud Library (PCL). In: Içinde: Proceedings IEEE International Conference on Robotics and Automation (2011). https://doi.org/10.1109/ ICRA.2011.5980567 30. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, É.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011) 31. Li, Z., Zhang, L., Tong, X., Du, B., Wang, Y., Zhang, L., Zhang, Z., Liu, H., Mei, J., Xing, X., Mathiopoulos, P.T.: A three-step approach for TLS point cloud classification. IEEE Trans. Geosci. Remote Sens. (2016). https://doi.org/10.1109/TGRS.2016.2564501

Big Data and Parallel Computing

Feature Learning of Patent Networks Using Tensor Decomposition Mohamed Maskittou(B) , Anass El Haddadi, and Hayat Routaib Data Sciences and Competitive Intelligence (DSCI) Team, National School of Applied Sciences, Abdelmalek Essaidi University, Al-Hoceima, Morocco [email protected], [email protected], [email protected]

Abstract. In the age of big data, the graph clustering algorithms are visual analytics can help the decision-makers to have a precise description of the relationships between data information in the technological areas and companies. Considering the patent as a source of science and technologies this paper presents a new road map for the patent landscape begins with searching and recognizing related text data in similar patent using a deep learning approach to build a model that can exhibit the relationships between text patent in the form of a graph and constructing large patent networks, to complete the last pass in a patent network project and owing to the complexities of the data structure involve the application of the basic tensor concept and their properties to perform automatic unsupervised learning without taking account of the noisy patent data, throw this process the visualization of graphs knowledge much time more powerful and helpful could prove very useful in improving the decision-making process within an organization. Keywords: Patent analysis · Deep learning · Patent network data analytic · Graph clustering · Node embedding · Tensor decomposition

1

· Big

Introduction

Since the beginning of the global public health crisis due to covid-19, people lives changed and become more digital, the adaptation of the new data generation model and data consumption model in different areas health, finance, technology, economy, there are 2.5 Quintilian bytes of data created each day, the world economic forum 2012 confirm that “data is a new class economic asset, like accuracy and gold”, clearly the big data is a massive volume of both structured and unstructured data that require new architecture and techniques to manage it and extract value and hidden knowledge from data. To overcome the complexity of the data the researchers involved propose innovative methods for the data store and data management, analysis and tools for visualization, one of the useful techniques in the field of visualization techniques is the graphs. We can apply graph technology in a different field and c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021  M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 1375–1390, 2021. https://doi.org/10.1007/978-3-030-66840-2_104

1376

M. Maskittou et al.

generate semantic networks like linked open data, social network, patent network, naturally, graphs and data are very linked then big data generate big graphs [1]. Like other structure of data so we can apply analysis on the graphs structure, graph clustering task refuses to cluster data in the form of graphs (nodes, edges) using algorithm inspired by the nature or mathematical models consequently we will use skillfully these models to develop solutions for real-world large scale problems especially if we want to understand how analysing sets of technology documents affect the economic growth and find solutions in crisis. Patent document has attracted the attention of researchers for its creativity, novelty and practicality. According to the European Patent Office statistics, 80% of all technical information in the world can be found in the patent document. The relevant departments of World Intellectual Property Organization (WIPO) have also done statistics, 90% to 95% of the world’s inventions are published in the form of patent document, of which about 70% inventions have never been published in other non-patent document. If you can apply the patent documents, scientific research institutions can save 40% of R & D expenditures, saving 60% of the time [2]. Patent analysis and patent networks have long been employed as a useful analytical tool for technological opportunity analyses, particularly supporting to create new ideas by using graphs technology and bibliometrics analysis to build the patent network. The semantic patent network helps organizations to identify the technological trends, shows the important competitors, determine the trend shift for ubiquitous technologies in the marketplace [3], by using graphs and patent analysis techniques to build graphs of patent information, exclusively we can accelerate and gain a powerful process of decision making. The rest of the paper is structured as follows. In Sect. 2, we introduce related works with patent analysis and visualisation techniques. In Sect. 3, we explain a new methodology of the clustering patent networks, finally we conclude the paper with a discussion comparing the proposed approaches of clustering graphs based on patent data.

2

Related Works

Patent Analysis Techniques. Measuring similarities among bibliometric (journals, patent, authors) is a central task in bibliometric [4], by measuring similarities between the content of the patent (IPC, authors, description, abstract) we can generate a basic set of similar patent, several studies take this challenge, Salton’s cosine [5] and Jaccard’s index [6] are two suitable methods for measuring similarities. Klavans and Boyack [4] compared six approaches for measuring similarity in science maps - raw frequency, cosine, Jaccard’s index, Pearson’s coefficient, average relatedness factor. Boyack performs an other comparative analysis of the accuracy of several text-based approaches to similarity measures, e.g. TF-IDF, latent semantic analysis, and topic models, and from a technical perspective [7], this comparison provides insights for further topic analysis

Feature Learning of Patent Networks Using Tensor Decomposition

1377

in bibliometrics [4]. The United States Patent and Trademark Office database, and employ a hidden Markov model–which is an unsupervised machine learning technique based on a doubly stochastic process–to estimate the probability of technology being at a certain stage of its life cycle [8], which means deep learning technology is used in these areas [2,9]. Also NLP text mining for patent analysis is poplar tool to display the similarities between patent data [10]. Li Shengzhen use BP neural network to classify the patents downloaded from the State Intellectual Property Office network, the category system reaches the main group level of IPC [11], some papers discusses how to apply latent Dirichlet allocation in a trend analysis methodology by exploits patent information [12]. Other work use LDA to visualizing the development paths among patents through sensitivity analyses based on semantic patent similarities and citations [13]. Patent Visualization Techniques. One of the advantages of patent analysis is exploring the outcome of this different techniques to build word graphs, multiple approaches are based on forming a semantic network of keywords from patent documents by clustering the patent document applying Kmeans [3], in addition way, ploting the citation network from kernel k-means clustering using exponential kernel [14], by the semantic network is dependent on the number of groups which is set temporarily by the k-means clustering algorithms, so there are many semantic networks [15], as always the number of clusters have to be determined, mining the optimal number of clusters in a data set is a fundamental issue in partitioning clustering, such as k-means clustering, which requires the user to specify the number of clusters k to be generated [16], therefore, the studies targeted new bag of automatic and nature-inspired algorithms [17,18], other work interest Graph drawing by force-directed [19]. In the era of big data suggest the dimension reduction techniques for the node embedding task, to put it simply, tensors which are a multidimensional extension of matrices a principled and mathematically sound way of modeling such multi aspects’ data [20], in other word graph data modeled as tensors and apply factorization to learn graph embedding, some studies, they work on multi-views graphs [21], faced with huge volume and complexity of data the Mathematical models of basic tensor decomposition to develop algorithms for learning graphs and solve the survey graph embedding problems [22,24].

3

Methodology

Building a patent network is hard work, especially when examining large patent data, the success of this process can be greatly enhanced by the proposed methodology in Fig. 1, thoroughly the process of deep learning and tensor machine learning applications is applied with two different types of data (Textual data, graphs) to display graph clusters based on patent data. At present contributing practical problem obligate dealing with high dimensional data in many field social network, recommendation system, Aggregation, and Fraud News Detection and technology development as image recognition [25], speech recognition [26], Text classification, or document classification more

1378

M. Maskittou et al.

Fig. 1. Methodology of patent visualization analysis workflow

specifically are typical Natural Language Processing (NLP) tasks which deal with the automated assignment of one or multiple pre-defined labels to a given text or document respectively [27]. In a recent period, the deep learning algorithms defeat other machine learning algorithms giving significant results and high precision. 3.1

Search Patent and Find Related Terms in Similar Patent

On the way to identify patent topics, the data science methodology of John Rollins is familiarised with the problem-solving concept of the patent analysis as results the dataflow in Fig. 2 is properly manipulated. The U.S. Patents and Trademark Office (USPTO) provides a searchable database of patent applications from the years 2001–present, after, finish the data acquisition from USPTO the preprocessing operations begin, first removing useless information in the document like stop words in the document, there are many useless features in the document, punctuation, numbers, special characters and some other irrelevant information in the documents, second we perform the Word segmentation for documents and divide the sentences in the document into words. As mentioned in the related works the deep learning enables data scientist to recognize a complex pattern in patent data giving away to technology insight

Feature Learning of Patent Networks Using Tensor Decomposition

1379

Fig. 2. Patent analysis methodology (DATAFLOW)

platforms which are the base of competitive intelligence. In the AI modeling, Set up the digital mapping table of vocabulary in the text, statistics on the words, using the text space vector to store, so that the text is expressed as a feature vector [2], besides of building a context-based model for the data pipelines using statistics. The overview of RBMs (Fig. 3), with a focus on understanding how they work, get started providing a feature patent vector as input to an RBM. The textual data are processed by the input layer RBMs learn patterns and extract important features in data by reconstructing the input. After training is complete, the net can reconstruct the input based on what it learned [28]. The reconstructed patent vector, here, is only a representation of what happens. What’s important to take from this, though, is that the RBM can automatically extract meaningful features from a given input in the training process. A trained RBM can reveal which features are the most important ones when detecting patterns.

Fig. 3. The RBM architecture: Forward pass and Backward pass

1380

M. Maskittou et al.

It can also represent each patent with some hidden values. Looking at the RBM’s training process, in which, three major steps are repeated. The first step is the forward pass. In the forward pass, the input feature patent vector is converted to binary values, and then, the vector input is fed into the network, where it’s values are multiplied by weights, and overall bias, in each hidden unit. Then, the result goes to an activation function, such as the sigmoid function, which represents the probability of turning each individual hidden-unit on, or in other words, the probability of the node activation. Then, a sample is drawn from this probability distribution, and it finds which neurons may or may not activate. This means it makes stochastic decisions about whether or not to transmit that hidden data. The intuition behind the sampling is that there are some random hidden variables, and by sampling from the hidden layer, you can reproduce sample variants encountered during training. So, as you can see, the forward pass translates the inputs into a set of binary values that get represented in the hidden layer. Then we get to step 2: the backward pass. In the backward pass, the activated neurons in the hidden layer send the results back to the visible layer, where the input will be reconstructed. During this step, the data that is passed backward is also combined with the same weights and overall bias that were used in the forward pass. So, once the information gets to the visible layer, it is in the shape of the probability distribution of the input values, given the hidden values. And sampling the distribution, the input is reconstructed. So, as you can see, the Backward pass is about making guesses about the probability distribution of the original input [29]. Step 3 consists of assessing the quality of the reconstruction by comparing it to the original data. The RBM then calculates the error and adjusts the weights and bias to minimize it. That is, in each epoch, we compute the “error” as a sum of the squared difference between step 1 and the next step. These 3 steps are repeated until the error is deemed sufficiently low [29]. In this section, the basic concept behind autoencoders we’ll be covering. As we increase the dimensionality, the time to deal with data increases exponentially, to train and fit the raw data into a neural network that can detect the patterns. Then extract the most important features of a patent, and represent each patent with those features which are of lower dimensions. An autoencoder works well for this type of problem. An autoencoder is a type of unsupervised learning algorithm that will find patterns in a dataset. Generally speaking, autoencoders excel in tasks that involve feature learning or extraction, data compression, and learning generative models of data and dimensionality reduction [29]. A deep belief network is a kind of deep learning network formed by stacking several RBMs. The loss function of the sparse encoder is, sparse limit quantitative KL divergence and sparse penalty factor, which can effectively control the sparsity: Jspr (W, B) = J(W, b) + β

S2 

KL(p||pj ¬)

(1)

j=1

KL(p||pj ) = plg

1−p p + (1 − p)lg pJ ¬ 1 − pj

(2)

Feature Learning of Patent Networks Using Tensor Decomposition

1381

to explain the back propagation algorithm Suppose a fixed training set (x1 , y 1 ), ..., (xm , y m ) of m training examples. For a single training example (x,y), the loss function will respect that single example to be: J(W, b; x, y) =

1 2 ||JW,b (x) − y|| 2

(3)

The error of the neurons node in the input layer is: (3)

δi

=

∂ (3) J(W, b; x, y) = −(yi − ai )f  (zi3 ) ∂

(4)

the partial derivative is ∂ (l) ∂wi

(l+1) l aj

j(W, b; x, y) = δi

(5)

x: Input features for a training example. θ Parameter vector. It is useful to think of this as the result of taking the parameters W, b and “unrolling” them into a long column vector. f(.): The activation function. (l) zi : Total weighted sum of inputs to unit i in layer l. The Softmax regression is used as an activation function in classification, it generates the probabilities for the output, and the features learned from the feature selection module being used as the input of Softmax classifier ⎡ T (i) ⎤ ⎤ ⎡ (eθ1 ×x ) P (y 1 ) T (i) ⎥ ⎢ 2 ⎢P (y )⎥ ⎢(eθ2 ×x )⎥ ⎥= 1 ⎢ × ⎢ ⎥ T xi k ⎣ . ⎦ θT ⎣ ⎦ . j=1 e T (i) k P (y ) (eθk ×x ) 3.2

Plot Patent Network or Graphs Information

Based on the results of the previous stage. For the visualization GraphX is a new component in Spark for graphs and graph-parallel computation. At a high level, GraphX extends the Spark RDD by introducing a new Graph abstraction: a directed multigraph with properties attached to each vertex and edge. To support graph computation, GraphX exposes a set of fundamental operators (e.g., subgraph, joinVertices, and aggregate Messages) as well as an optimized variant of the Pregel API. In addition, GraphX includes a growing collection of graph algorithms and builders to simplify graph analytics tasks [41]. 3.3

Clustering Graphs Data

Graphs or patent networks are a general language for describing and modeling complex relationships between patents we carried on the work to deal with

1382

M. Maskittou et al.

the high dimensional data to represent this complex interconnected communication system in the same time we extract small clusters which are core patent sets consist valuable information, early the researchers tend to use classical Ml algorithms in graphs: node classification, link prediction, community detection, network similarity. Graphs are far more complex than text or visual data, the nodes haven’t a fixed node ordering or reference due to the isomorphism problem, often dynamic and have multimodal features. Nowadays, the studies tend to be more concerned with the similarities between nodes, by using nodes embedding algorithms to learn low dimensional vector representations for nodes in a graph. To run the process of learning embedding nodes, it’s necessary owning an embedding space generated by optimizing the parameters of the node similarity function. Nowadays, there are three famous unsupervised feature learning. Matrix decomposition-based approaches, multihop similarity-based approaches, and random walk-based approach [31]. Matrix decomposition-based approaches decompose various matrix representations of graphs by eigendecomposition or Singular Value Decomposition (SVD) [22]. The similarity function is just the edge weight between u and v in the original network, the intuition is Dot products between node embeddings approximate edge existence [31].  2 ||ZuT Z(v) − A(u,v) || (6) γ= (u,v)∈(U,V )

loss function: γ Sum over all pairs:

 (u,v)∈(U,V )

Embedding similarity:

ZuT Z(v)

adjacency matrix for the graph (weighted): A(u,v) , The goal is finding an embedding matrix using stochastic gradient descent (SGD) or Solve matrix decomposition solvers, this approach has drawbacks like considering all nodes pairs, and the ability to learn one node per vector [31]. The basic idea behind the multi-hop similarity is train embedding to predict k-hop neighbors. By this similarity function.  2 γ= ||ZuT Z(u,v) − Ak(u,v) || (7) (u,v)∈(U,V )

The measure of overlap between nodes neighborhoods in this case by practicing log-transformed, (probabilistic adjacency matrix) and train multiple different

Feature Learning of Patent Networks Using Tensor Decomposition

1383

hop lengths and concatenate output [32]. The other way by overlap function like Jaccard similarity [33].  2 γ= ||ZuT Z(u,v) − S(u,v) || (8) (u,v)∈(U,V )

loss function: γ Sum over all pairs:

 (u,v)∈(U,V )

Embedding similarity:

ZuT Z(u,v)

The neighborhood overlap between u and v (e.g., Jaccard overlap) S(u,v) This method has also some drawbacks, generally need to iterate over all pairs of nodes that make the process expensive and time consuming beside the dealing withe massive parameter space. Concisely, the previous methods show features that are not easily interpretable, to rise this challenge and generate an interpretable feature space for the nodes, tensor decomposition-based node embedding algorithms can manage the situation of the nodes with high accuracy. The review papers [34], tensors are multidimensional arrays. In the proposed method of node embedding using tensor decomposition, considering the thirdorder tensors and CP decomposition Fig. 4.

Fig. 4. CP decomposition of a third-order tensor. [30]

briefly review the CP decomposition, CP decomposition factorizes the tensor into a sum of rank-one tensors. Given a third-order tensor X ∈ RI×J×K , where I, J, and K denote the indices of tensor elements in three of its modes, CP decomposition factorizes the tensor in the following way. X≈

R  (r=1)

ar obr ocr = [[A, B, C]]

(9)

1384

M. Maskittou et al.

Here, o denotes the outer product of the vectors, R is the tensor rank which is a positive integer, ar , br , and cr are vectors, where ar ∈ (RI ), br ∈ (RK ), and cr ∈ (RK ) for r = 1, 2, 3, ...R. After stacking those vectors, we can get the factor matrices A = [a1 , a2 , ...aR ], B = [b1 , b2 , ...bR ], and C = [c1 , c2 , ...cR ], where AinRI × R, B ∈ RJ×R , and C ∈ RK×R . Figure 5 is a visualization of the CP decomposition of a third-order tensor. The matricized forms of the tensor X is given by, X(1) ≈ A(C ⊗ B)T T, X(2) ≈ B(C ⊗ A)T , X(3) ≈ C(B ⊗ A)T

(10)

where ⊗ represents Khatri-Rao product of two matrices. ALS Solution of CP Decomposition: CP decomposition can be solved by Alternating Least Squares [34]. The cost function of CP decomposition can be formulated as, R 

minA,B,C X −

ar obr ocr 2F

(11)

(r=1)

where the .2F the tensor froinuis norm which is the sum of squares of all elements of the tensor. By initializing B and C with random values, ALS updates A by following rule. A ← argmin X(1) − A(C ⊗ B)T 2F A

(12)

Then by xing A and C, it updates B by B ← argmin X(2) − B(C ⊗ A)T 2F A

(13)

Then by xing A and B, it updates C by C ← argmin X(3) − C(B ⊗ A)T 2F C

(14)

the last trees equations are repeated until convergence of ALS solution of CP decomposition.

Fig. 5. CP decomposition-based representation learning of source nodes, target nodes, and transition steps [22]

The algorithms of tensor decomposition based node embedding use CP decomposition to extract factor matrices containing the representations of the source and/or target properties of the nodes, and the transition steps.

Feature Learning of Patent Networks Using Tensor Decomposition

1385

After we find the source factor matrix A, target factor matrix B, and transition factor matrix C, we can compute the projection of source embedding of node i on the transition embedding j, where 1 ≤ i ≤ n and 1 ≤ j ≤ K, and get source-transition embedding matrix ST ∈ Rn×K . Similarly, we can get a targettransition embedding matrix T T ∈ Rn×K that reflects the projection of target embeddings on transition step embeddings. Finally, we get the node embedding matrix Z ∈ Rn×2K by concatenating ST and TT. First K columns of Z represent source role of a node with varying transition steps, and last K columns of Z represent target role of a node with varying transition steps [22]. ST = A ∗ CT

(15)

T T = B ∗ CT

(16)

Z = [ST, T T ]

(17)

the Case TDNEperSlice: ST k = A(k) × C (k)

T

(T T )k = B (k) × C (k)

(18) T

Z = [ST (1) , ST (2) , ...ST (K) , T T (1), T T 2 , ...T T K ] Input: 1-step transition probability matrix A Maximum transition step K CP decomposition rank R Output: Node embedding matrix Z Source-transition embedding matrix ST 1: n = count rows(A) 2: for k in 1 to K do 3: X k = tensor(n, n, 1) 4: X k = Ak 5: [Ak , B k , C k ] ← CPA LS(X k , R) 6: (ST )k = Ak × (C k )T 7: T T k = B k × (C k )T 8: end for 9: Z = [ST 1 , ST 2 , ...ST K , T T 1 , T T 2 ...T T K ] 10: return Z Algorithm TDNE: Tensor Decomposition-based Node Embedding

(19) (20)

1386

M. Maskittou et al.

Input: 1-step transition probability matrix A Maximum transition step K Source-transition embedding matrix ST CP decomposition rank R Output: Node embedding matrix Z 1: n = count rows(A) 2: for k in 1 to K do 3: for k in 1 to K do 4: X(:, :, k) = Ak 5: end for 6: [A, B, C] ← CPA LS(X, R) 7: ST = A × C T 8: Z = [ST, T T ] 9: Z = [ST, T T ] 10: return Z Algorithm TDNE per: Tensor Decomposition-based Node Embedding

3.4

Visualization and Evaluation

This stage provides tips to evaluate the model of tensor decomposition based node embedding and carry out a comparative study with other models for nodes embedding, Laplacian Eigenmaps (LAP) [35], LLE [36], HOPE [37], GraRep [38] DeepWalk [39], Node2vec [39], GraRep [41] using different dataset brain network, social network, Air traffic network [22]. To reach the accuracy and precision in network reconstruction, and Precision in link prediction (Figs. 6, 7 and 8).

Baseline algorithms Parameters GRAPrep DeepWalk HOPE Node2Vec

Maximum transition step, and log shifted factor Walks per node, walk length, context size, return parameter, in-out parameter The decay parameter for Katz Index transition step, and log shifted factor β Walks per node, walk length, context size, return parameter, in-out parameter

Feature Learning of Patent Networks Using Tensor Decomposition

Fig. 6. Precision in link prediction [22]

Fig. 7. Precision in network reconstruction [22]

Fig. 8. Precision in network reconstruction [22]

1387

1388

4

M. Maskittou et al.

Conclusion

In this paper, a new methodology to build patent networks using patent document classifier based on deep learning for automatic classification to enhance high performance for the patent analysis stage, to do so in another stage of the methodology a tensor decomposition and higher-order transition probability matrices to achieve feature learning of Graphs, as a result, higher accuracy and precision compared to bases algorithms, consequently, patent networks allow analysts to visualize interconnected patent and leverage their visual abilities to decipher knowledge of a important and could prove very useful in improving the decision-making process within an organization.

References 1. Kheddouci, H.: Big Data et Graphes: Défis et pistes de recherche. franch Laboratoire d’InfoRmatique en Image et Systèmes d’information LIRIS UMR 5205 CNRS/INSA de Lyon/Université Claude Bernard Lyon 1/Université Lumière Lyon 2/Ecole Centrale de Lyon 2. Xia, B., Zur Elektrodynamik, Li, B., Lv, X.: 2nd International Conference on Artificial Intelligence and Industrial Engineering (AIIE 2016), Advances in Intelligent Systems Research. China Research on Patent Document Classification Based on Deep Learning, vol. 133, pp. 308–311 (2016) 3. Kim, Y.G., Suh, J.H., Park, S.C.: Visualization of patent analysis for emerging technology (South Korea). Expert Syst. Appl. 34, 1804–1812 (2008) 4. Zhang, Y., Shang, L., Huang, L., Porter, A.L., Zhang, G., Lu, J., Zhu, D.: A hybrid similarity measure method for patent portfolio analysis. J. Infometr. 10, 1108–1130 (2016) 5. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988). J. Am. Soc. Inf. Sci. Technol. 57 6. Klavans, R., Boyack, K.W.: Identifying a better measure of relatedness for mapping science, USA, vol. 57, no. 10, pp. 251–361 (2006) 7. Boyack, K.W., Newman, D., Duhon, R.J., Klavans, R., Patek, M., Biberstine, J.R., Börner, K.: Clustering more than two million biomedical publications: comparing the accuracies of nine text-based similarity approaches. PLoS One 6(3), 53–64 (2016). J. Am. Soc. Inf. Sci. Technol. 57 8. Lee, C., Kim, J., Kwon, O., Woo, H.-G.: Stochastic technology life cycle analysis using multiple patent indicators. J. Am. Soc. Inf. Sci. Technol. 106, 53–64 (2016) 9. Li, S., Hu, J., Cui, Y., Hu, J.: DeepPatent: patent classification with convolutional neural networks and word embedding. J. Am. Soc. Inf. Sci. Technol. 117, 721–744 (2018) 10. Abbas, A., Zhang, L., Khan, S.U.: A literature review on the state-of-the-art in patent analysis. Word Patent Inf. 37, 3–13 (2018) 11. Li, S., Wang, J., Qu, J.: Automated, categorization of patent based on back propagation network. Comput. Eng. Des. 31(25), 5075–5078 (2010) 12. Oh, S., Choi, S., Yoon, J., Choi, H.: Innovation topic analysis of technology: the case of augmented reality patents. J. Am. Soc. Inf. Sci. Technol. USA (2018) 13. Kim, G., Park, S., Jang, D.: Technology analysis from patent data using latent Dirichlet allocation. Soft Comput. Big Data Process. 57(10), 71–80 (2014)

Feature Learning of Patent Networks Using Tensor Decomposition

1389

14. Kim, D., Lee, B., Lee, H.J., Lee, S.P., Moon, Y., Jeong, M.K.: Graph Kernel approach for detecting core patents and patent groups. In: AIP Conference Proceedings, vol. 1827 (2014) 15. Suh, J.H., Park, S.C.: A new visualization method for patent map: application to ubiquitous computing technology. In: Advanced Data Mining and Applications, pp. 566–573 (2006) 16. Shanie, T., Suprijadi, J., Zulhanif: Determining The Optimal Number Of Clusters: 3 Must Know Methods, Cluster Validation Essentials (2020) 17. Kennedy, J., Eberhart, R.C.: Swarm intelligence. Scholarpedia 2, 1462 (2020) 18. Boulouard, Z., El Haddadi, A., Dousset, B.: “Forced” Force Directed Placement: a New Algorithm for Large Graph Visualization (2018) 19. Thomas, M., Fruchterman, J., Edward Reingold, M.: Graph Drawing by Forcedirected Placement, Department of Computer Science, University of Illinois at Urbana-Champaign, USA 20. Papalexakis, E.E., Faloutsos, C.: Unsupervised tensor mining for big data practitioners. Big Data 4(3), 179–191 (2016) 21. Wu, J., Xie, X., Nie, L., Lin, Z., Zha, H.: Unified graph and low-rank tensor learning for multi-view clustering. Association for the Advancement of Artificial Intelligence (www.aaai.org) (2020) 22. Hamdi, S.M., Angryk, R.: Interpretable feature learning of graphs using tensor decomposition. In: 2019 IEEE International Conference on Data Mining (ICDM) (2020) 23. Bailly, R., Rabusseau, G.: Graph Learning as a Tensor Factorization Problem (2017) 24. Malik, O.A., Ubaru, S., Horesh, L., Kilmer, M.E., Avron, H.: Tensor Graph Convolutional Networks for Prediction on Dynamic Graphs (2020) 25. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1090–1098 (2012) 26. Mikolov, T., Deoras, A., Povey, D., Burget, L., Cernocky, J.: Strategies for training large scale neural network language models. In: Automatic Speech Recognition and Understanding, pp. 196–201 (2011) 27. Basili, R., Moschitti, A., Pazienza, M.T.: NLP-driven IR: evaluating performances over a text classification task, University of Rome Tor Vergata Department of Computer Science, Systems and Production 00133 Roma (Italy) (2020) 28. Nayak, M.: Dimensionality Reduction and Feature Extraction with RBM. www.medium.com/tag/deep-learning/latest (2020) 29. Aklson, A.: Restricted Boltzmann Machines (RBMs) (2020). www.coursera.org/ lecture/building-deep-learning-models-with-tensorflow 30. Navamani, T.M.: Efficient deep learning approaches for health informatics (chap. 7). In: Deep Learning and Parallel Computing Environment for Bioengineering Systems, pp. 123–137 (2019) 31. Ahmed, A., Shervashidze, N., Narayanamurthy, S., Josifovski, V., Smola, A.J.: Representation learning on graphs: methods and applications. IEEE Data Eng. Bull. Graph Syst. (2018) 32. Cao, S., Xu, Q., Lu, W.: GraRep: learning graph representations with global structural information. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 891–900, October 2015 33. Ou, M., Cui, P., Pei, J., Zhang, Z., Zhu, W.: Asymmetric transitivity preserving graph embedding. In: The International World Wide Web Conference Committee (IW3C2), pp. 13–17 (2016)

1390

M. Maskittou et al.

34. Rabanser, S., Shchur, O., Günnemann, S.: arXiv:1711.10781v1. [stat.ML] (2017) 35. Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Advances in Neural Information Processing Systems, NIPS 2001, Vancouver, British Columbia, Canada, 3–8 December 2001, pp. 585–591 (2001) 36. Roweis, S., Saul, L.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000) 37. Ou, M., Cui, P., Pei, J., Zhang, Z., Zhu, W.: Asymmetric transitivity preserving graph embedding. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016, pp. 1105–1114 (2016) 38. Cao, S., Lu, W., Xu, Q.: GraRep: learning graph representations with global structural information. In: Proceedings of the 24th ACM International Conference on Information and Knowledge Management, CIKM 2015, Melbourne, VIC, Australia, 19–23 October 2015, pp. 891–900 (2015) 39. Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016, pp. 855–864 (2016) 40. Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014, New York, NY, USA, 24–27 August 2014, pp. 701–710 (2014) 41. https://spark.apache.org/docs/latest/graphx-programming-guide.html

Lambda-IVR: An Indexing Framework for Video Big Data Retrieval Using Distributed In-memory Computation in the Cloud Muhammad Numan Khan1 , Aftab Alam1 , Tariq Habib Afridi1 , Shah Khalid2 , and Young-Koo Lee1(B) 1

Department of Computer Engineering, Kyung Hee University, Yongin, South Korea {numan,aftab,afridi,yklee}@khu.ac.kr 2 School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang, China [email protected]

Abstract. Feature indexing for video retrieval poses a significant hurdle for indexing due to three significant challenges. First, there are different types of features in varying nature, such as deep Convolutional Neural Network (CNN) features, handcrafted features, recognized text from the videos, and audio features, etc. Secondly, feature matching for those varying types of features requires different similarity measure methods. And thirdly, considering the Big-Data era the number of features to be indexed is enormous. To address these issues, in this paper, we present a lambda style distributed in-memory scale-out inverted-index based feature indexing framework for video retrieval, which operates as SaaS in the cloud. First, the video features are acquired, decoupled, and the visual features are encoded using an adaptation of an existing feature encoder with improvements. Secondly, the visual encoded features and the textual features are aggregated. Finally, the aggregated features are indexed and readily available for retrieval. Our framework supports incremental updates without the need to re-index the data and can serve enormous concurrent queries. Experimental results show that our framework performs reasonably well in terms of, accuracy, precision, and efficiency. Keywords: Deep features · Indexing · Video retreival index · Big-data · Apache spark · Cloud computing

1

· Inverted

Introduction

Indexing plays a very important role in video retrieval. Accurate video retrieval is dependent on the features being extracted from the videos. Multiple types of features can be used for retrieval such as visual features extracted from the video frames, text recognized from the video, and audio features whenever video content contains an audio stream. Visual features are categorized into two types; c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021  M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 1391–1404, 2021. https://doi.org/10.1007/978-3-030-66840-2_105

1392

M. N. Khan et al.

global features and local features. Several strategies have been proposed and developed to provide a global representation of video content for efficient video retrieval. Among them, CNN’s have shown an outstanding performance. However, video data is huge in nature and the number of extracted features is enormous when dealing with Big-Data. This has led to the development of efficient and scalable technologies for large-scale video data retrieval. Compared to the local descriptors, the features computed by CNN’s are compact, but still cannot efficiently perform large-scale video retrieval because the computation and storage of the features increase linearly. In recent years, text-based search engines have shown excellent performance and they are a good choice for large-scale video retrieval if the features extracted by CNN’s are tailored to work with Text-based search engines. However, the CNN features, in general, are dense in nature and high dimensional and are not suitable to be used in text-based search engines directly. Moreover, text-based search engines often rely on the sparsity of data. Furthermore, in the case of multi-type features, feature indexing, matching, and retrieval is an issue. The global features computed by CNN’s suffers from three core challenges. First, the high dimensionality of the global features makes them computationally expensive in terms of indexing and retrieval. Secondly, to make the global features work with inverted-index based search engines, they need to be transformed into a representation suitable for inverted-index, which introduces quantization error resulting in loss of precision, due to which the accuracy gets hurt. And thirdly, feature similarity cannot be computed using the similarity measures meant for deep features such as cosine similarity. Furthermore, in the case of multi-type varying features, efficient storage and retrieval also pose challenges due to the varying similarity measures for each type. Several studies have been conducted in this area. Pouyanfar et al. [20] reviewed several indexing strategies for multimedia and presented a comprehensive survey. Liu et al. [17] proposed an indexing framework for feature indexing targeting large-scale image retrieval. Each image is encoded by a pre-trained CNN model, the feature vectors are then mapped to multiple codewords and indexed into corresponding lists. This approach has two issues: First, hashing suffers from linearly increasing search time with the increasing volume of data, and secondly, this approach is not scalable and needs to be redesigned to make it work in a distributed or cloud environment. Giuseppe Amato et al. [4] proposed the idea of Surrogate Text Representation for CNN features, which is a permutation-based indexing technique. A set of reference objects belonging to metric space is used to construct permutations. However, it requires the distances between pivots and objectives to be calculated, which is expensive in the case of CNN deep features. To address the above challenges, we propose and implement a lambda style based distributed large-scale video indexing and retrieval framework. It is proposed as an extension of our previous work [15,16,19]. Our contribution in this work is a distributed in-memory computation based feature indexing framework for video Big-Data retrieval, which works as SaaS in the Cloud. We adopted

Lambda-IVR: An Indexing Framework for Video Big Data Retrieval

1393

the feature encoder’s implementation in [5], fine-tuned it for video retrieval, and further improved the feature encoder by making it independent of the fixed quantization factor. Furthermore, we tailored it to work with multiple types of features. We implemented our framework on top of Hadoop [25], Spark [27], and Apache Solr [22]. The rest of the paper is structured as follows. Section 2 discusses related works. Section 3 presents an overview of the background technologies. Section 4 explains our proposed system architecture, Sect. 5 presents the example scenario, Sect. 6 presents a discussion and evaluation, and finally Sect. 7 concludes the paper.

2

Related Work

Zhang et al. [28] utilizes image queries for large-scale video retrieval by designing a model for information extraction from the video frames using CNN and Bag of Visual Words (BoVW). Gani et al. [11] categorize the indexing strategies into two approaches, Artificial Intelligence based Indexing and non Artificial Intelligence-based Indexing approach, as summarized in Adamu et al. [1]. In the non-AI approach, each index item is treated as an individual entity separately. This approach does not rely on the pattern or semantics of the data. Non AI-based indexing can further be classified into several subcategories mainly tree-based strategies (B-tree [2], R-tree [8], X-tree [26]), Hashing [12], Gist [14], GIN [21], and inverted index [13]. In AI approaches, data semantics are analyzed by examining the relationships between data items, patterns are learned and are then used to organize the data items in an index. This makes the AI approaches more accurate and effective compared to non-AI approaches, however, they are computationally very expensive and complicated. Latent Semantic Indexing (LSI) [3,7], Hidden Markov Model (HMM)-based [18], and Affinity Hybrid (AH) tree [6] are some of the well known AI approaches. Some research work is also being done to combine the benefits of both the semantic indexing approaches and the traditional Database Management System (DBMS) indexing approaches [9,10]. In the context of deep CNN feature indexing and retrieval, Liu et al. [17] proposed an indexing framework for indexing and retrieval of large-scale deep CNN features. They constructed a visual dictionary with limited samples using partitioned k-means (PKM) [24]. Each image is encoded by a pre-trained CNN model, the feature vectors are then mapped to multiple codewords and indexed into corresponding lists in their designed inverted table using the hashing method. This approach has two main issues: First, hashing suffers from linearly increasing search time with the volume of data, and in big-data context, it becomes a considerable challenge. Secondly, this approach is not scalable and needs to be redesigned to make it work in a distributed or cloud environment. Giuseppe Amato et al. [4] proposed the idea of Surrogate Text Representation for CNN features, which is a permutation-based indexing technique. They used a set of reference objects belonging to a metric space for the construction of permutations. However, this method is expensive in the case of CNN’s deep features as

1394

M. N. Khan et al.

it requires the calculation of the distances between pivots and objectives to be calculated.

3

Background

Traditional data processing platforms and techniques are less efficient when dealing with video Big-Data. Big-Data in general is computationally expensive both in terms of storage and processing due to the characteristics of Big-Data, which is commonly referred to as the 7Vs (volume, variety, velocity, variability, veracity, validity, and value). To address the challenges, big data processing technologies were developed. Apache Hadoop is one of the most widely used and well know big-data processing technology. Hadoop is open-source and having the capability to rapidly process huge amounts of data across a cluster of commodity machines. The philosophy of Hadoop is to bring processing to the data, not data to the processing. This gives Hadoop immense power. Hadoop employs its own file system for data storage called Hadoop Distributed File System (HDFS). It provides a cost-effective distributed storage of data. It supports structured and structured data in huge volumes. Apache Spark is another open-source distributed processing framework that was created at the UC Berkeley AMPLab. Spark works similar to Hadoop however it employs in-memory processing of the data for performance gain. The basic computation unit of Spark is RDD (Resilient Distributed Dataset), this is what gives Spark lightning-fast processing power. RDD is a read-only collection of data elements kept in memory for processing across the cluster. It is resilient because the data can be recovered in case of node failures. Video data is complex and very huge as one video can contain tens of thousands of frames. Efficiently indexing the huge amounts of deep features for video Big-Data is a time-consuming and expensive, which makes indexing of the video features a challenging job. In inverted-index based systems, Apache Solr is very popular which is an open-source enterprise search engine based on the Lucene project. It employs inverted index for the indexing and retrieval of documents. Solr can be operated in cloud mode in which case it works as a distributed index system. This is a viable choice for video big data indexing if the video features are encoded to make them indexable in Solr.

4

System Architecture

Our framework is implemented on top of Hadoop and Zookeeper and is composed of three main modules: Indexing Service, Query Handler, & Load Balancer as shown in Fig. 1. The Solr Cluster is a distributed cluster running Solr in Cloud mode, managed by the Zookeeper Cluster. The indexer service further consists of four components: Acquisition, Encoder, Aggregator, and Indexer. The Indexing Service is implemented on top of Spark. The Acquisition component acquires the video features from the Cloud Analytics Service Pool. The Cloud Analytics Service Pool consists of different video analytics services that are

Lambda-IVR: An Indexing Framework for Video Big Data Retrieval

1395

Fig. 1. System architecture of the indexing framework

running in the cloud. They acquire video data from various connected sources (IPCam’s, Smartphones, Drones, etc.), processes them by performing intelligent video analytics operations, and then sends the computed CNN features in batches to the Indexing Service along with metadata. The Indexer services expose a REST endpoint for the Client Service Pool to which they can send data for indexing. Once the indexer service receives the data, it is decoded, and the feature vectors are extracted from the JSON. The feature vectors are then encoded using the Encoder component, aggregated by the Aggregator component where it is repackaged along with the metadata, and finally indexed the Solr Cloud using the Indexer component. This process is outlined in Algorithm 1. In Algorithm 1, the incoming batches of the features from the Cloud Analytics Service Pool are first decoded and the feature vectors and the coupled metadata in the batch records are extracted. Then the feature vectors and metadata are transformed into Spark RDD. The features RDD is mapped for parallel processing. First, the metadata is extracted from the feature record because the metadata is also indexed. Then, the feature vector in itself is encoded and finally, packaged with the metadata into the resultant precessed RDD. The processed RDD is then mapped partition wise. The reason for partition wise mapping is to reduce the network traffic for the indexer service by indexing multiple records in batches. The indexing records are then sent to the Indexer service to be indexed and is awaited for acknowledgment for the confirmation of success. In case of failure, the indexing record is logged and resent after some delay. The Feature Encoding algorithm proposed in [5] works by quantizing the features and uses a fixed constant value to quantize the real values of the feature vector into integers. This suffers from two issues. First, the value of the quantization factor (Q) needs to be determined by trial and error and then hardcoded. Secondly, floor function is used for quantization because of which the values of the features which are near zero are yielded empty. This results in empty encodings and hence the videos corresponding to those features are missed. On the other hand, if the value of the quantization factor is too large, then the size of the resultant encoding becomes large as well. Table 1 illustrates this phenomenon

1396

M. N. Khan et al.

Algorithm 1: Batch Feature Acquisition and Indexing

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Input : Feature records in mini batches bf Output: Indexed Features in Solr Cloud // Decode the mini batch record decf ← Decode(bf ) records ← Extract(decf ) // Transform the record into an RDD RDDf ← transform(records) // Map Operation RDDp ← map RDDf md ← Extract(elm) fv ← Extract(elm) tf v ← Encode(fv ) sir ← IndexRecord(md , tfv ) end map // Perform partition wise operation to reduce network load RDDp .ForEachPartition SendToIndexService(f rsir ) // Wait for acknowledgment from the Indexer Service Await(acknowledgment) end foreach

of the empty yield from the experiments performed on 117977 features of the UCF-101 [23] dataset extracted using the VGG19 prediction layer.

Table 1. Fixed quantization factor effect on empty yield Value of quantization factor

10

20

30

40

50

Number of features yielded empty 18338 2737 502 117 40 Percent empty yield

15.54 2.32 0.43 0.1 0.03

We found that these issues can be fixed by replacing the fixed constant value of the quantization factor with the product of e2 and log of the feature dimensions. The Feature Encoder takes as input a feature vector fv ∈ R1000 , and generates the encoding by performing quantization, pruning, and finally, Hex coding, as shown in Algorithm 2. Once the encoded representations of the feature vectors are computed, they are incrementally indexed in the Solr Cloud. In Algorithm 2, every dimension of the input feature vector is first traversed and quantized by taking the product of the dimension and the log of the dimensions of the feature vector. Here, the fixed quantization factor is eliminated and log(D) operation is performed which greatly eliminates the null yield. For further improvement, e2 is used to give the quantization operation an additional boost. This ensures that no feature vector is yielded empty. All the processed dimensions

Lambda-IVR: An Indexing Framework for Video Big Data Retrieval

1397

are assigned to a new feature vector. Then, the new feature vector is pruned to retain only the non-zero dimensions. This eliminates the gaps between elements in the encoding representation of the feature vector. A series of sub encoding is constructed from every dimension which is the repeated concatenation of the hex codes. Finally, the values of all the dimensions are concatenated together to yield the resultant encoding of the feature vector.

Algorithm 2: Feature Transformation and Encoding

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Input : Feature vector fv ∈ Rd where D is the dimension of fv Output: The Encoded Feature Vector Fv // Initialization tf v ← Init(empty) Fv ← init(empty) // Map Operation foreach dimension d ∈ fv do pfd ← e2 × log(D) × d qfd ← Floor(pfd ) afv ← Absolute(qfd ) Push(tf v, afv ) end foreach tsvp ← Sort(tfv ) // Pruning to eliminate zero values tfvp ← Prune(tsvp ) // Encoding Operation foreach value v ∈ tfvp do hd ← ToHex(v) ec ← ToSeries(hd , v) Concat(Fv , ec ) end foreach

The Query Handler Service Works as middleware between our core framework and the outside world. The user’s requests as well as requests from other services are received by the Quer Handler service. It is designed to handle a large number of concurrent requests in combination with the Load Balancer. Upon receiving requests, it first formulates a query against each request and then executes the query in Solr Cloud. Finally, The results of the query are ranked and returned. The Load Balancer distributes the indexing and query operations evenly across the Solr Cloud Cluster. Since different nodes have different specifications in the cluster, we utilized the Weighted Round Robin algorithm for load distribution according to the node capacity. The weight matrices are updated periodically according to the workload on each node. This enables the load balancer to work in automatic fashion adjusting to the workload without tuning.

1398

5

M. N. Khan et al.

Example Scenario

This section describes an example scenario of how the proposed framework works. An example scenario of the overall system flow is shown in Fig. 2. Global features are computed from the input collection(s) of videos using the prediction layer of VGG19. The computed feature contains 1000 dimensions. The next step is feature quantization, where every dimension of the input feature vector is mapped to scalar quantities. This step is necessary to make the input features compatible with the inverted index based indexing mechanism. Then the quantized feature vector is sorted and then pruned, thus eliminating any dimensions which are yielded 0. After pruning, hex coding is applied to the resultant feature. Then, the features are sent along with metadata to the indexer service for indexing. As soon as the features are indexed, they become available for querying.

Input Videos

10

Feature Vector f 1.6

2 1

0.9

Query Video

Quanzaon 4

1.5

1.4 0.8

0.8

0.4

0.3

0.5

2

0.2

1

1

1

0

1

0

0

0

3

3

Pruning

4

Sort 3

3

2 2

3 2

0

4

3

2 1

1

1

2

1

1

1

1

1 0

0

0

LB Store Hex-Coded Encodings

Query

Fig. 2. Example scenario

Index

0

0

Lambda-IVR: An Indexing Framework for Video Big Data Retrieval

6

1399

Evaluation and Discussion

6.1

Experimental Setup

Our experimental setup consists of ten nodes running Hadoop as shown in Fig. 3. The server node is a Core-i7 with 128 GB Ram. The Video Analytics Services are running on two Core-i7 GPU nodes with 64 GB Ram and Nvidia GeForce RTX 2070 GPU. Four nodes (core-i5, 32 GB Ram) are running Apache Solr in Cloud mode, and Zookeeper is running on three Core-i5 nodes with 32 GB Ram. Apache Spark is running on all nodes.

Lambda-IVR Server Lambda-IVR Web Server UI Server Cluster Server Cento7.5 i7 | 20 | 128 | 3TB

Agent-1

Agent-2

Agent-3

Agent-4

Agent-5

Agent-6

Agent-7

Agent-8

Agent-9

Indexing Service

Indexing Service

Indexing Service

Solr Node

Solr Node

Solr Node

Solr Node

Video Analycs Services

ZK Client

ZK Client

ZK Client

Collecons

Collecons

Collecons

Collecons

Client

Client

Client

Docs

Docs

Docs

Docs

Streaming Services Batch Services

Video Analycs Services Streaming Services Batch Services

NameNode

SC NameNode

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Cento7.5 i5 | 4 | 32 | 1TB

Cento7.5 i5 | 4 | 32 | 1TB

Cento7.5 i5 | 4 | 32 | 1TB

Cento7.5 i5 | 4 | 32 | 1TB

Cento7.5 i5 | 4 | 32 | 1TB

Cento7.5 i5 | 4 | 32 | 1TB

Cento7.5 i5 | 4 | 32 | 1TB

Cento7.5 i5 | 8 | 64 | NVD

Cento7.5 i5 | 4 | 64 | NVD

Zookeeper Cluster

Apache Solr Cluster (Cloud Mode)

Cloud Analycs Service Pool

Fig. 3. Lambda-IVR cluster

6.2

Evaluation

We conducted extensive experiments and found that our proposed framework performs reasonably well in terms of efficiency and accuracy. Figure 4 shows the index storage size comparison between the original CNN features and the encoded representations. As the number of features increases, the size of the CNN features without encoding increases linearly whereas the size of the encoded features remains sufficiently small. Due to the small size of the encoded representations, not only the indexing time is reduced but also the query latency, thus speeding up the overall system performance. Cosine similarity is largely used for similarity measure of the CNN features as shown in Eq. 1. When dealing with a very large number of features, this similarity measure becomes a computationally expensive operation. The inverted-index of Solr uses the TF-IDF similarity measure which significantly improves performance as well as accuracy.

1400

M. N. Khan et al.

n fi × qi f ·q n = n i=1 d (f, q) = 2 2 |f | × |q| i=1 fi × i=1 qi

(1)

3000 Size (MB)

2406.7

2400

2041.5

1633.5

1800

1225.3

1200

817.1 409.2

600

4.4

2.4

0 0

20000

6.4

8.8

40000 60000 80000 Number of Features Size Normal Size Converted

10.6

100000

12.5

120000

Fig. 4. Feature index size comparison

Response Time (s)

40

30 20 10 0 100

200

300

400 500 600 700 800 900 Number of Parallel Queries With Load Balancing Without Load Balancing

1000

Fig. 5. Query performance with and without load balancing

The query performance results are illustrated in Fig. 5. The query time remains very small with the increasing number of concurrent queries which shows that the proposed framework works very well in terms of query performance against concurrent queries. Furthermore, the load balancing also shows good performance in significantly reducing the retrieval time by diving the load across the cluster. Figure 6 shows the feature indexing time. Determining an optimal batch size is important in the case of Solr because after each batch, a commit operation needs to be performed which is expensive in terms of computation and IO cost. If the batch size is too small, more commit operations will be needed and the

Lambda-IVR: An Indexing Framework for Video Big Data Retrieval

1401

90

Time (Minutes)

75 60

45 30 15 0 50

100

500

1000 5000 Feature Batch Size

10000

50000

Fig. 6. Feature batch size vs time

feature indexing time increases. On the other hand, increasing the batch size too much does not improve the indexing time. From the experiments, it is evident that a batch size between 500–1000 is optimal. 350 300 Time (ms)

250 200 150 100 50 0 0

20000

40000 60000 80000 Number of Features

100000

120000

Fig. 7. Feature conversion and encoding time

Figure 7 shows the feature conversion time in batches of 100 features. The feature encoding is computationally inexpensive as it takes 170 ms on average to encode a batch of 100 features. Furthermore, the feature transformation operations are carried out using Apache Spark. The distributed in-memory computation of Apache Spark significantly speeds up the encoding process of the feature vectors.

1402

M. N. Khan et al.

3000

Time (ms)

2500 2000 1500 1000 500 0 0

20000

40000 60000 Number of videos returned

80000

100000

Fig. 8. Query time vs number of instances in retrieval

Figure 8 shows the retrieval time against the user query. For performance testing, queries are executed with fetching a different number of videos in the result per query. Although the retrieval time increases with the number of videos being returned in the result, it still remains significantly small for the retrieval operation. Furthermore, the retrieval latency can be reduced by using incremental retrieval of videos per query.

7

Conclusion

In this paper, we presented a lambda style based distributed deep feature indexing framework for video Big-Data retrieval which works as SaaS in the cloud. Features are acquired from different video analytics services. To enable the CNN deep features to be indexed and retrieved using Inverted-index, an existing feature encoder algorithm is fine-tuned and optimized for CNN deep features encoding. To further accelerate the retrieval process, Load balancing is employed in the cluster to divide the load across the cluster. Our contribution is harnessing the power of Big-Data technologies and in-memory computation framework and Inverted-Index based system for deep features to facilitate the indexing and retrieval of large-scale video Big-Data. Experimental results show that our proposed system performs reasonably well in terms of precision, latency, and scalability in large-scale video data retrieval. Acknowledgment. This work was supported by the Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIT) (No. 2016-0-00406, SIAT CCTV Cloud Platform).

Lambda-IVR: An Indexing Framework for Video Big Data Retrieval

1403

References 1. Adamu, F.B., Habbal, A., Hassan, S., Cottrell, R.L., White, B., Abdullah, I.: A survey on big data indexing strategies. Tech. rep., SLAC National Accelerator Laboratory (2015) 2. Alvarez, V., Richter, S., Chen, X., Dittrich, J.: A comparison of adaptive radix trees and hash tables. In: 2015 IEEE 31st International Conference on Data Engineering, pp. 1227–1238 (2015). https://doi.org/10.1109/ICDE.2015.7113370 3. Amato, F., Santo, A.D., Gargiulo, F., Moscato, V., Persia, F., Picariello, A., Poccia, S.R.: Semtree: an index for supporting semantic retrieval of documents. In: 2015 31st IEEE International Conference on Data Engineering Workshops, pp. 62–67 (2015). https://doi.org/10.1109/ICDEW.2015.7129546 4. Amato, G., Bolettieri, P., Carrara, F., Falchi, F., Gennaro, C.: Large-scale image retrieval with elasticsearch. In: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR 2018, pp. 925–928. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/ 3209978.3210089 5. Amato, G., Debole, F., Falchi, F., Gennaro, C., Rabitti, F.: Large scale indexing and searching deep convolutional neural network features. In: International Conference on Big Data Analytics and Knowledge Discovery, pp. 213–224. Springer (2016) 6. Chatterjee, K., Chen, S.C.: Hah-tree: towards a multidimensional index structure supporting different video modelling approaches in a video database management system. Int. J. Inf. Decis. Sci. 2(2), 188–207 (2010) 7. Chen, X., Zhang, C., Chen, S.C., Chen, M.: A latent semantic indexing based method for solving multiple instance learning problem in region-based image retrieval. In: Seventh IEEE International Symposium on Multimedia (ISM 2005), p. 8 (2005). https://doi.org/10.1109/ISM.2005.10 8. Eldawy, A., Mokbel, M.F.: Spatialhadoop: a mapreduce framework for spatial data. In: 2015 IEEE 31st International Conference on Data Engineering, pp. 1352–1363 (2015). https://doi.org/10.1109/ICDE.2015.7113382 9. Fleites, F.C., Chen, S.: Efficient content-based multimedia retrieval using novel indexing structure in PostgreSQL. In: 2013 IEEE International Symposium on Multimedia, pp. 500–501 (2013). https://doi.org/10.1109/ISM.2013.96 10. Fleites, F.C., Chen, S.C., Chatterjee, K.: A semantic index structure for multimedia retrieval. Int. J. Semant. Comput. 6(02), 155–178 (2012) 11. Gani, A., Siddiqa, A., Shamshirband, S., Hanum, F.: A survey on indexing techniques for big data: taxonomy and performance evaluation. Knowl. Inf. Syst. 46(2), 241–284 (2016). https://doi.org/10.1007/s10115-015-0830-y 12. Giangreco, I., Kabary, I.A., Schuldt, H.: Adam - a database and information retrieval system for big multimedia collections. In: 2014 IEEE International Congress on Big Data, pp. 406–413 (2014). https://doi.org/10.1109/BigData. Congress.2014.66 13. Gollub, T., V¨ olske, M., Hagen, M., Stein, B.: Dynamic taxonomy composition via keyqueries. In: IEEE/ACM Joint Conference on Digital Libraries, pp. 39–48 (2014). https://doi.org/10.1109/JCDL.2014.6970148 14. Hellerstein, J.M., Naughton, J.F., Pfeffer, A.: Generalized search trees for database systems (September 1995)

1404

M. N. Khan et al.

15. Khan, M.N., Alam, A., Lee, Y.: Falkon: large-scale content-based video retrieval utilizing deep-features and distributed in-memory computing. In: 2020 IEEE International Conference on Big Data and Smart Computing (BigComp), pp. 36–43 (2020). https://doi.org/10.1109/BigComp48618.2020.0-102 16. Khan, M.N., Alam, A., Uddin, M.A., Lee, Y.K.: SurVRet: distributed surveillance video retrieval on large-scale video data using deep learning tt - survret: , pp. 892–894. The Korean Institute of Information Scientists and Engineers (June 2019). http://www.dbpia.co.kr/journal/articleDetail?nodeId=NODE08763364 17. Liu, R., Wei, S., Zhao, Y., Yang, Y.: Indexing of the CNN features for the large scale image search. Multimed. Tools Appl. 77(24), 32107–32131 (2018). https:// doi.org/10.1007/s11042-018-6210-3 18. Matsui, A., Nishimura, S., Katsura, S.: A classification method of motion database using hidden Markov model. In: 2014 IEEE 23rd International Symposium on Industrial Electronics (ISIE), pp. 2232–2237 (2014). https://doi.org/10.1109/ISIE. 2014.6864965 19. Khan, M.N., Alam, A., Islam, M.A., Khan, J., Lee, Y.K.: DISIVR: distributed deep feature indexer for video big dataretrieval on spark. , pp. 118–120 (2020). http://www.dbpia.co.kr/ journal/articleDetail?nodeId=NODE09874358 20. Pouyanfar, S., Yang, Y., Chen, S.C., Shyu, M.L., Iyengar, S.S.: Multimedia big data analytics: a survey. ACM Comput. Surv. 51(1), 1–34 (2018). https://doi.org/ 10.1145/3150226 21. Seres, A.: Three database management systems (DBMS) compared. Open Source Sci. J. 2(4), 65–82 (2010) 22. Shahi, D.: Apache Solr. Springer (2016) 23. Soomro, K., Zamir, A.R., Shah, M.: Ucf101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012) 24. Wei, S., Wu, X., Xu, D.: Partitioned k-means clustering for fast construction of unbiased visual vocabulary. In: The Era of Interactive Media, pp. 483–493. Springer, New York (2013) 25. White, T.: Hadoop: The Definitive Guide. O’Reilly Media, Inc. (2012) 26. Xu, H., Yao, N., Hu, W., Pan, H., Gao, X.: The design and implementation of image information retrieval. In: 2012 International Conference on Computer Science and Service System, pp. 1547–1550 (2012). https://doi.org/10.1109/CSSS.2012.387 27. Zaharia, M., Xin, R., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M., Gonzalez, J., Shenker, S., Stoica, I.: Apache spark: a unified engine for big data processing. Commun. ACM 59, 56–65 (2016). https://doi.org/10.1145/2934664 28. Zhang, C., Lin, Y., Zhu, L., Liu, A., Zhang, Z., Huang, F.: CNN-VWII: an efficient approach for large-scale video retrieval by image queries. Pattern Recogn. Lett. 123, 82–88 (2019). https://doi.org/10.1016/j.patrec.2019.03.015. http://www.sciencedirect.com/science/article/pii/S0167865518308341

Parallel Computing for Multi-core Systems: Current Issues, Challenges and Perspectives Soumia Chokri(B) , Sohaib Baroud , Safa Belhaous , and Mohammed Mestari Laboratory SSDIA, ENSET Mohemmadia, University Hassan II, Casablanca, Morocco [email protected]

Abstract. Computing machines (supercomputers) have constantly evolved to provide the greatest possible computing power for scientific applications. The trend for a decade has been clearly in favor of massively parallel architectures. To increase computing power, increasing the frequency of processors is no longer possible; energy consumption is indeed becoming a critical issue and multi-core architectures are a serious avenue to prevent the explosion of this consumption. Parallelism is therefor an interesting solution for computation-intensive simulations and storage capacity which will have to run it on multi-core architectures. This article aims to highlight the possibilities of enhancing the parallelism of applications simulation, in particular, by improving both the partitioning and load balancing quality which are fundamental problems of parallel computing, other relevant aspects are also discussed in order to make this review as complete as possible. Keywords: Parallel computing · Multi-core architectures · Parallelism challenges · Graph partitioning model · Data-intensive simulation

1

Introduction

Computing machines have constantly evolved to provide the greatest possible computing power for scientific applications Fig. 1. However, the computing power requirements of scientific simulations are constantly increasing, and many applications and questions have so far remained unanswered due to the insufficiency of computing resources. Over the past decades, parallelism has become an important topic of interest for a large scientific community, it has emerged as a response to huge increases requirements in computing power. it is an interesting solution for computation-intensive simulations and storage capacity which will have to run it on multi-core architectures. For example, in the field of molecular dynamics [1], especially for applications based on functional density theory like BigDFT [2], L. Genovese et al. explained that the need for computation and memory increases as the cube of the system size. For a 1,500 atom system, 3,000 c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021  M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 1405–1417, 2021. https://doi.org/10.1007/978-3-030-66840-2_106

1406

S. Chokri et al.

Fig. 1. Power evolution of supercomputers (source: www.top500.org).

GB of memory and one day of calculation are required on 1,500 CPU cores ina modern calculation cluster for the optimization of atomic positions. The parallelization is, therefore, necessary to reduce the execution time but also for the reasons of memory capacity. To achieve such computing capacities, parallel machines are forced to multiply the number of CPUs that will operate jointly. To achieve such computing capacities, multi-core architectures are forced to use a large number of processors (several thousand), also called CPUs, which will operate jointly. This parallel use of CPUs is a specificity introduced by these multi-core architectures. Indeed, before the emergence of this parallelization movement, computers contained only one CPU whose operating frequency was the only characteristic allowing the performance of the machine to be evaluated. The rule was then relatively simple: the higher the frequency, the faster the CPU. Thus, it was then possible to allow a program limited by the speed of the CPU to execute more quickly without any modification. However, the increase in frequency within a CPU is limited by the appearance of multiple physical obstacles, in particular in terms of miniaturization and heat dissipation. Parallelism, therefore, appeared to be the solution to circumvent these difficulties in order to increase the performance of computers. The parallelization of an application on several computing nodes has been studied for a long time to provide more possibility of development of the available computing power, in addition, hundreds of research have been carried out for several years in order to develop the concept of parallelism and improving its efficiency in order to improve performance.

Parallel Computing for Multi-core Systems

1407

The model graph approach is widely used to establish how to distribute the task and data for efficient parallel computation. its aim is to divide computations uniformly over p processors by distributing the vertices, which represent the set of tasks into P partition of the same size, while minimizing the inter-processor communication which is represented by edges. However, the difficulty of treatment of this problem lies in the fact that it is a multidisciplinary research field with many problems that fall within the NP-hard field, in particular, that of the balanced load distribution which is a fundamental problem relating to parallelism. Inter-processor communication, and resource allocation are also research axes among many others. The purpose of this work is to bring together the future challenges that must be taken up by researchers in order to improve the efficiency of parallelism. The rest of this article has been organized as follows: We first introduce the technique universally used to model the parallelization process in multi-core systems which is graph partitioning, and some existing frameworks for graph partitioning, then we discuss the different problems that hamper the performance of parallel execution of scientific simulations. Ultimately, we conclude and discuss some perspectives to overcome these challenges.

2 2.1

Related Works Graph Partitioning Models

Computer scientists are regularly use graphs abstractions to model a simulation applications. Dividing a graph into smaller parts is one of the important algorithmic operations for parallelization. Given a non-oriented graph G = (V, E), where V is the set of vertices and E is the set of edges that connect pairs of vertices. Vertices and edges can be weighted, where |V | is the weight of the vertex V, and where |E| is the weight of the edge E. The problem of partitioning a graph is to divide G into disjoint k partition Fig. 2. From a mathematical point of view, we can partition the vertices or the edges. On the other hand, in most applications, we are only interested in partitioning graph vertices. Let G = (V, E) a set of k subsets of V, denoted Pk = V1 , V2 , ..., Vk . We say that Pk is a partition of G if : The union of all kthe elements of Pk is V, and No Vi ∩Vj = ∅ ∀i = j, subset of V that is an element of Pk is empty: i=0 Vi = V and the elements Vk of Pk are called the parts of the partition . The parts must be balanced, that is, of the same size : |V1 | ≈ |V2 | ≈ ... |Vk |, with a minimized cost (cut)  function that represents the communication time between the processors: min( |ei,j |), vi ∈ Vk , vj ∈ Vp ∀ k = p Where: ei,j weight of an edge ei,j = (vi , vj ). 2.2

Graph Partitioning Based Algorithms for Parallel Computing

Graph partitioning is extensively used to model the data dependencies within a computation, and it is also used for solving optimization problems that arise in

1408

S. Chokri et al.

Fig. 2. Example of a graph partitioned in K = 4 parts.

many real-world applications [30,31]. There are many examples of graph partitioning applications: data mining, design of electronic integrated circuits VLSI, load distribution for parallel machines, fluid dynamics, matrix computing, air traffic, etc. Graph partitioning (GP) is an NP-complete problem [3,4] . We, therefore, use different heuristics to be able to calculate a partition within a reasonable time. The GP problem is well studied, and various of GP-based algorithms for data distribution and load balancing have been developed: Spectral methods [17], combinatorial approach [18], and multilevel framework [6]. The multilevel algorithm appeared as a very efficient method for calculating a k-way balanced partition of a graph [5,6]. The multi-level approach makes it possible to speed up the classical partitioning methods while maintaining good quality. This approach is broken down into three steps, as shown in Fig. 3. – The contraction step: First, the size of the graph is reduced by merging vertices. This is repeated for several iterations, until a sufficiently small graph is obtained. A series of graphs of decreasing size has thus been created: (G0 , G1 ,..., Gk ). – The initial partitioning: Once the graph has been sufficiently contracted, we apply a partitioning heuristic to calculate the partition Pk of the graph Gk . Any partitioning strategy can be applied here. – The expansion step: The sequence of the different graphs constructed during the contraction phase is then “reassembled”. The partition Pi + 1 of the graph Gk + 1 is extended on the graph Gi then this new partition Pi is refined using a heuristic which locally improves the cut. These algorithms have various significant challenges that will be addressed later. In the literature, there are several techniques and software that give good results like: Metis [19] is one of the best known and used partitioners. It has two partitioning method: recursive or k-way but the k-way method is based on recursive bisections for its initial partitioning. ParMetis [20] it is a parallel version of Metis, allowing to create a partition in parallel on at least 2 processes. Scotch [21] is

Parallel Computing for Multi-core Systems

1409

Fig. 3. Phases of a multi-level partitioning.

a free licensed graph and mesh partitioner. It is very configurable and modular. It offers a wide variety of partitioning methods, but only bisection methods are available for initial partitioning. PT-Scotch is the parallel multi-threaded, multi-process version of Scotch. Zoltan [22] is a library allowing to manage the distribution of data for parallel applications. In particular, it offers partitioner functionality.

3

Main Issues and Challenges of Parallelism

Parallelization consists of decomposing a problem into sub-problems which will be solved simultaneously on a parallel architecture. Each of the sub-problems will be treated by one or more processors of the parallel machine. Consequently, the processors of this machine do not have fast and direct access to the data of the other processors because that would constitute a bottleneck. The memory being distributed, the distribution of the calculations implies a distribution of the data. A. Meade et al. claim in [8] that the process of parallelization consist of four phases, as shown in Fig. 4: – Decomposition phases: is to divide the problem into several sub-problems,it consists in partitioning the computations and relative data into tasks which are as independent as possible from one another. Two partitioning techniques can then be differentiated. The first is task decomposition is partition the computations first and then work with the data. The other technique; data decomposition is to first study the data needed for the problem, then look for the most suitable decomposition of these data and then identify the calculations that will be applied to them. These are two complementary techniques that applied together during the parallelization of simulation. – Communication: In this phase,a communication model is necessary for the information exchange between the tasks.

1410

S. Chokri et al.

– Load balancing: this step is a melting of partitioning and communication requirements, its aim is to divide evenly the computations into N parts of the same size, while minimizing the communication overheads. – Mapping: during dis phase the tasks of the parallel application are assigned to the processors of the parallel architecture on which will be executed.

Fig. 4. Process of the parallelization of a simulation

In the literature the prallelization process has been described as more challenging and error prone [7–9,32]. Thus it is advantageous to take into consideration several issues which may be a hindrance to achieving high degrees of parallelism [10,11]: data and task decomposition, the number of processors, inter processor communication, load balancing...etc. 3.1

Data-Task Decomposition

Parallelising simulation applications to operate in parallel architectures has been described as an extremely challenging task. In particular, data decomposition is one key issue among many other [12]. The data decomposition process entails a communication cost that has a negative impact on the performance of the simulation. Consequently, it is of crucial importance to define a data decomposition strategy that increases computation while minimizing communication through an available processor. In [13,14] the authors indicate that the process of finding suitable decomposition of a complex problem is a balance of competing forces, during the decomposition, we must take into account the size of the tasks as granularity. A fine-grained decomposition creates a large number of small tasks, then increases the communication and synchronization overhead.

Parallel Computing for Multi-core Systems

1411

This might leads to poor performance, while coarse-grained decomposition may generate not enough parallelism and an unbalanced load. Having defined a very large number of fine-grained tasks in the early stages of developing the parallel algorithm limits the efficiency of its execution on a parallel architecture. The most critical point of such execution is the cost of communication. In fact, with certain types of parallel programming models, calculations are stopped when messages are received or sent. The performance of the algorithm can, therefore, be considerably increased if the time spent communicating is reduced or covered. This improvement can, therefore, be obtained by reducing the number of messages sent. This result can also be achieved by using fewer messages while preserving the same amount of data in transit. This is because the cost of a call includes a fixed cost and is not simply proportional to the amount of data sent. For a judicious decomposition, the partitioning of the initial problem must include more tasks than processors available on the target machine in order to be as efficient as possible during parallel execution. Otherwise, processors would end up with no task to execute. But we must be careful not to create too small tasks under the pretext of making all the processors work, because they would undoubtedly involve a large number of communications and parallelization would, therefore, be inefficient. There is therefore a compromise to be found between the size of the tasks, their number, the quantity of communication generated, as well as the number of processors to use. 3.2

Communication Cost

Another parallelization issue to consider is the communication cost between the tasks, a large number of messages to be exchanged inter task may decrease overall performance. The cost of communication can become particularly marked when the architecture on which the program is executed is of the multi-processor type, that is to say, when the communications and exchanges between the processes/processors are carried out by the intermediary of a network. During the parallel execution of a simulation, it frequently happens that the execution time is largely dominated by the time required to carry out the communications between processors the cost of communication can be several orders of magnitude higher than the cost of execution of normal instruction. In this case, it may then be sufficient to estimate the complexity of the number of communications required by the simulation. Several works are already carried out in this context to investigate the impact of communication on the efficiency of execution of a simulation [12,15,16]. The researchers claim that the most critical point of parallel simulation execution is the communication cost. The evolution of communication cost change depending on several factors: number of partitions , the structure of the graph, and the amount of information that any particular partition needs to send and receive...etc.

1412

S. Chokri et al.

Fig. 5. Evolution of Normalized Communication Volume By increasing number of partitions

As it is shown in Fig. 5, the communication evolves rapidly by increasing the number of partitions, which certainly leads to significant degradation of efficiency as in Fig. 6. Performance can, therefore, be dramatically increased if the time spent communicating is reduced or covered. This improvement can ,therefore, be obtained either by reducing the number of messages sent or by the optimum choice of the number of processors.

Fig. 6. Evolution of efficiency by increasing number of partitions

R. Muresano et al. in [29] indicate that performing intensive simulations on a multi-core systems balancing computational speed and efficiency is a complicated issue facing parallel computing. Therefor, communications on multi-core systems must be carefully managed in order to enhance performance.

Parallel Computing for Multi-core Systems

1413

In [16] Tint´ o Prims, O. et al. are explained that the paradigm of message passing then developed with the introduction of parallel distributed memory machines. Message passing involves establishing a communication channel between two execution streams to send and receive data. The MPI (Message Passing Interface) library is, in this context, the most used. Running an application implemented using this library can be very problematic when using a large number of compute nodes made up of multi-core computers. The problem is caused by the very large number of tasks relying on global communications that a loosely coupled MPI application generates. Document passing is a special, higher level case of message passing where data is typed. 3.3

Load Balancing

Load balancing is a key issue that conditions the performance of parallel numerical simulations. The goal is to distribute evenly the workload among a given number of processors, in order to minimize the overall execution time. Parallel execution of scientific simulations in a multi-core system often imposes a distribution of the workload among the available processors in order to ensure efficient parallelization. Since memory is distributed, the distribution of computations implies a data partitioning procedure. In this context, the distribution of data is done according to two objectives: – the computational load assigned to the different processors must be balanced to minimize the computing time; – Inter-processor communications should be minimized. Load balancing can be done in two different ways [23]: static balancing strategy, assigns the workload to processors at the start of the simulation. In other cases, in applications with changing or unpredictable workloads during run-time, a dynamic strategy for redistributing calculations is necessary at run-time. Load balancing is one of the major challenges facing multiprocessors and multi-cores systems, however, several research works are carried out in this track [24–27], indeed algorithms have been proposed in particular, graph partitioning algorithms [5,6,17,18] described above to distribute the workload. Even if the success and relevance of the existing partitioning methods for load balancing, research challenges remain and new avenues for improvement should be proposed for the efficient execution of simulations under multi-core architectures. In particular, the choice of the right number of (partitions) processors and the efficient use of resources of a parallel architecture are key challenges of load balancing. 3.4

Resource-Aware Load Balancing

– The choice of the number of partition The partitioning methods for load balancing presented to calculate the distribution of a simulation’s computations and the associated data over a fixed number of partitions, which can have a severe impact on the performance

1414

S. Chokri et al.

and scalability of the computation. The number of parts must be taken into account during the partitioning process. Partitioning the problem on a small number of partitions can generate insufficient parallelism and an unbalanced load. However,partitioning in a large number of partitions results in poor performance due to high communication costs, etc. In this context, the choice of the number of partitions is crucial to achieving a high performance of a simulation application. – Processors allocation for a simulation Another similar problem if taking into account the processors of the target machine to which the calculated partitions will be assigned. We suppose that each part will be assigned to a processor, choosing the right number of processors allocated to a simulation is essential to have good performance or efficiency. If a simulation is run in parallel on too many processors, the time spent communicating can become too long compared to the computation time. Using as many processors as possible is not always a good choice depending on the size of the problem. As the size of the problem may vary during simulation, the number of processors should vary accordingly. The resource-aware load balancing issue is not well studied, however, there are researchers who affirm that among the issues encountered to reach a maximum speed up it is the number of processors. In [28] L. yang et al. claim that the pan-sharpening algorithms are data and computation intensive, therefore, they have adopted a parallel strategy to solve the existing problems on a multi-core computer. however, they show that the choice of the number of processors is a crucial and essential issue to achieve maximum speed. It is a difficult and complicated problem to select a specific number of processors for an application, as the factors determining the maximum speed are different and varied. They give empirical suggestions based on experimental results to define the number of processors in order to efficiently use the available resource.

4

Perspectives and Future Works

In order to address the different issues and challenges presented in this work, several avenues are considered and many different directions to explore as future work: 4.1

Computation-Data Decomposition

For efficient partitioning, the data-computation decomposition of the initial problem must include more tasks than processors available on the target machine in order to be as efficient as possible during parallel execution. Otherwise, processors would end up with no task to execute. But we must be careful not to create too small tasks under the pretext of making all the processors work, because they would undoubtedly involve a large number of communications and parallelization would therefore be inefficient. There is, therefore, a compromise to be

Parallel Computing for Multi-core Systems

1415

found between the size of the tasks, their number, the quantity of communication generated, as well as the number of processors to use. 4.2

Communication Cost

Knowing that the communication cost of an application depends on several factors such as the topology of the graph, the number of messages sent and received, the number of partitions, ... etc, therefore the optimization of any of these factors is an unaffordable solution, for this reason, a thorough performance analysis approach is required to identify bottlenecks and understand the impact of interprocess communication on the performance. Once these bottlenecks are revealed and the features impacting model performance are defined, it would be appropriate to adopt a learning system such as an artificial neural network (ANN) to predict in a more or less precise way the maximum speedup and the ideal number of partitions (processors) for an application. 4.3

Resource Allocation and Load Balancing

To address this challenge, it appears advantageous to develop an ANN predictive module to predict the appropriate processor for each workload in order to enhance the energy-efficiency and performance. Resource allocation and load balancing are very useful in code coupling, several parallel codes run simultaneously and must regularly exchange data. This exchange phase is synchronizing. It is therefore, important that all the codes concerned progress at the same speed to minimize the waiting time during this synchronization. The choice of the number of processors used by each code must be made by taking into account the relative loads of each. This balancing between several codes can be difficult, more particularly if a load of these codes can vary dynamically. To rebalance these codes, one solution would, therefore, be to reallocate resources from one code to another. The ideal number of processors allocated to each code can therefore be approached experimentally by correcting during the simulation the imbalances which would occur.

5

Conclusion

The computing power and frequency limitations remarked on single-core machines have paved the way for multi-core systems and will be the industry trend to move forward. However, full performance throughput can only be achieved when the challenges of running simulations on multi-core processors are fully resolved. A graph partitioning model to establish how to distribute the task and data for efficient parallel computation is presented, and the existing algorithms in the literature are detailed.

1416

S. Chokri et al.

This paper also highlights the key issues and challenges of performing simulations on multi-core systems, especially computation-data partitioning, communication overhead, resource-aware load balancing...etc. In addition, we try to cite possible avenues for improvement in order to enhance the performance execution on multi-core systems. Acknowledgements. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 777720.

References 1. Phillips, J.C., Braun, R., Wang, W., Gumbart, J., Tajkhorshid, E., Villa, E., Chipot, C., Skeel, R.D., Kal´e, L., Schulten, K.: Scalable molecular dynamics with NAMD. J. Comput. Chem. 26(16), 1781–1802 (2005) 2. Genovese, L., Neelov, A., Goedecker, S., Deutsch, T., Ghasemi, S.A., Willand, A., Caliste, D., Zilberberg, O., Rayson, M., Bergman, A., Schneider, R.: Daubechies wavelets as a basis set for density functional pseudopotential calculations. J. Chem. Phys. 129, 014109 (2008) 3. Garey, M.R., Johnson, D.S.: Computers and Intractibility: A Guide to the Theory of NP-Completeness. Freeman W. H., New York City (1979) 4. Andreev, K., Racke, H.: Balanced graph partitioning. In: Proceedings of the Sixteenth Annual ACM Symposium on Parallelism in Algorithms and Architecturesn (SPAA), pp. 120–124. ACM (2004) 5. Pellegrini, F.: Graph partitioning based methods and tools for scientific computing. Parallel Comput. 23, 153–164 (1997) 6. Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998) 7. Massingill, B.L., Mattson, T.G., Sanders, B.A.: Reengineering for Parallelism: an entry point into PLPP for legacy applications. Concurr. Comput.: Pract. Experience 19(4), pp. 503–529 (2007) 8. Meade, A. , Buckley, J. , Collins, J.J. . Challenges of evolving sequential to parallel code: an exploratory review. In: Proceedings of the 12th International Workshop on Principles of Software Evolution and the 7th annual ERCIM Workshop on Software Evolution. ACM (2011) 9. Vandierendonck, H., Mens, T.: Averting the next software crisis. IEEE Comput. 44(4), 88–90 (2011) 10. Nielsen, I.M., Janssen, C.L.: Multicore challenges and benefits for high performance scientific computing. Sci. Program. 16(4), 277–285 (2008) 11. Mccool, M.: Scalable programming models for massively multicore processors. Proc. IEEE 96(5), 816–831 (2008) 12. Meade, A., Deeptimahanti, D.K., Buckley, J., Collins, J.J.: An empirical study of data decomposition for software parallelization. J. Syst. Softw. 125, 401–416 (2017) 13. Dovolnov, E. , Kalinov, A. , Klimov, S. Natural block data decomposition for heterogeneous clusters. In: Proceedings of the International Parallel and Distributed Processing Symposium (2003) 14. Massingill, B., Mattson, T., Sanders, B.: Reengineering for parallelism: an entry point into PLPP for legacy applications. Concurr. Comput.: Pract. Experience 19(4) 2007, 503–529 (2007)

Parallel Computing for Multi-core Systems

1417

15. Chokri, S., et al.: Impact of communication volume on the maximum speedup in parallel computing based on graph partitioning. In: 2019 Third International Conference on Intelligent Computing in Data Sciences (ICDS), pp. 1–6 (2019) 16. Tint´ o Prims, O., et al.: Finding, analysing and solving MPI communication bottlenecks in earth system models. J. Comput. Sci. 36, 100864 (2019) 17. Fj¨ allstr¨ om, P.: Algorithms for Graph Partitioning: A survey. Link¨ oping University, Link¨ oping, Sweden, Department of Computer and Information Science (1998) 18. Kernighan, B.W., Lin, S.: An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49(2), 291–307 (1970) 19. George K.: METIS- Family of Multilevel Partitioning Algorithms. http://glaros. dtc.umn.edu/gkhome/views/metis 20. George K.: PARMETIS. http://glaros.dtc.umn.edu/gkhome/metis/parmetis/ overview 21. Fran¸cois P.: Scotch and LibScotch 6.0 User’s Guide (2012) 22. Zoltan: Parallel partitioning, load balancing and data-management services. http://www.cs.sandia.gov/Zoltan/Zoltan.html 23. Bohme, D.: Characterizing Load and Communication Imbalance in Parallel Applications, Volume 23 of IAS, Forschungszentrum Julich (2014). https://doi.org/10. 1109/IPDPSW.2012.321 24. Campbell, P.M., Devine, K.D., Flaherty, J.E., Gervasio, L.G., Teresco, J.D.: Dynamic octree load balancing using spacefilling curves. Technical Report CS03-01, Williams College Department of Computer Science (2003) 25. Cybenko, G.: Dynamic load balancing for distributed memory multiprocessors. J. Parallel Distrib. Comput. 7(2), 279–301 (1989) 26. Pilkington, J.R., Baden, S.B.: Dynamic partitioning of non-uniform structured workloads with spacefilling curves. IEEE Trans. Parallel Distrib. Syst. 7(3), 288– 300 (1996) 27. Niemoller, A., Schlottke-Lakemper, M., Meinke, M., Schroder, W.: Dynamic Load balancing for direct-coupled multiphysics simulations. Comput. Fluids 199, 104437 (2020) 28. Yang, J., Zhang, J., Huang, G.: A parallel computing paradigm for pan-sharpening algorithms of remotely sensed images on a multi-core computer. Remote Sens. 6, 6039–6063 (2014) 29. Muresano, R., Meyer, H., Rexachs, D., Luque, E.: An approach for an efficient execution of SPMD applications on multi-core environments. Future Gener. Comput. Syst. 66, 11–26 (2017) 30. Wang, N., Wang, Z., Gu, Y., Bao, Y., Yu, G.: TSH: easy-to-be distributed partitioning for large-scale graphs. Future Gener. Comput. Syst. 101, 804–818 (2019) 31. Kang, U., Tsourakakis, C.E., Faloutsos, C.: Pegasus: a peta-scale graph mining system implementation and observations. In: Proceedings of ICDM, pp. 229–238. IEEE (2009) 32. Belhaous, S., Baroud, S., Chokri, S., Hidila, Z., Naji, A., Mestari, M.: Parallel implementation of a search algorithm for road network. In: 3rd International Conference on Intelligent Computing in Data Sciences (ICDS 2019) (2019)

Semantic Web and Business Intelligence in Big-Data and Cloud Computing Era Adedoyin A. Hussain1,3(B) , Fadi Al-Turjman1,2 , and Melike Sah3 1 Research Centre for AI and IoT, Near East University, via Mersin 10,

Nicosia, North Cyprus, Turkey [email protected] 2 Artificial Intelligence Engineering Department, Near East University, via Mersin 10, Nicosia, North Cyprus, Turkey [email protected] 3 Computer Engineering Department, Near East University, via Mersin 10, Nicosia, North Cyprus, Turkey [email protected]

Abstract. Business Intelligence (BI) involves strategies and technologies for the analysis of business information. Business data can be collected using data mining from online resources, reports, operations, social media and from different sources in the big data era . Thus, BI makes use of heterogeneous data for the analysis of the past, present and future of enterprises and this is challenging; mining the data, storing in an appropriate format and managing/analysing/inferencing of the heterogeneous information is difficult. These problems can be alleviated to a great extent using Semantic Web (SW). SW provides technologies to create rich machine-understandable data that can be automatically processed, analysed and consumed by intelligent applications. Therefore, BI can benefit from SW technologies. In this paper, we review BI methods that apply SW for intelligent data mining, processing and analysis of BI procedures. In particular, we summarize and compare different properties of SW and BI methods and how they handle big data. Finally, we discuss how cloud computing technologies can be integrated with SW and BI. Finally, conclude with future open research challenges. Keywords: Semantic web · Business intelligence · Big data · Cloud computing · Artificial intelligence

1 Introduction Semantic Web (SW) in general can be considered as a way to develop a semantic data using standard technologies. In SW, information is given well-defined meaning using ontologies and this data can be automatically processed by intelligent applications using a query language (i.e. SPARQL) and inference rules [1]. SW web technologies can be applied to any kind of content [2]; to web pages, text documents, slides, videos, speech files, etc. In the context of SW, data and metadata is separated. SW data about a resource (called metadata) can be generated by applying data mining techniques. Then, © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 1418–1432, 2021. https://doi.org/10.1007/978-3-030-66840-2_107

Semantic Web and Business Intelligence

1419

the extracted information has to be converted to a machine-understandable format. For this purpose, first, there is a need for a schema (such as RDF Schema [3]) or an ontology [4]. A schema/ontology is a shared/agreed vocabulary. It decides the exact semantics of the data, the meaning of the data and the connections between information according to a shared understanding. Web Ontology Language (OWL) [5] is the recommended standard for an ontology and RDFS is the schema of the RDF. Based on the ontology/schema, then the extracted information is converted to a semantic data format, called Resource Description Framework (RDF) [6]. RDF is the data model of the SW, where metadata is separated from the original source content, and stored as triples (subject predicate object). RDF data can be stored in variety of syntaxes such as RDF/XML [7], N3, Turtle, Json, RDFa (embedded into HTML documents). Once, collected metadata from heterogeneous sources are stored as RDF data, then it can be automatically processed by intelligent applications. In particular, semantic rules can be applied to learn new knowledge, or SPARQL queries can be applied to query incomplete data. SW can assist to solve these problems such as heterogeneous data, handling of large information automatically and inference/analysis of BI data using SW technologies. The aim of this paper is reviewing and comparing SW and BI advances in big data era. It is one of the early surveys in this field. Given that there is an incredible enthusiasm inside the BI about dissecting web-distributed information, SW advancements appear to be a significant method to move toward the included semantic incorporation difficulties just as new operational abilities, such as deductive thinking over big data. In the following sections, we will explain BI and SW approaches in different fields, which have made unprecedented achievements. The main discussions of the paper can be summarized as follows: • Outline and compare the blend of SW and BI methods and how SW is integrated into Online Analytical Processing (OLAP). • Outline and compare conceptualization approaches of BI using SW. • Different than existing surveys, we also discuss the consolidation of cloud computing with SW and BI • Finally, we inspect open research challenges and issues. We describe through the paper as follows. In Sect. 2 discusses advantages of combining SW and BI. Section 3 discusses SW and Online Analytical Processing (OLAP) approaches and compare their properties. Section 4 explains the contextualization efforts in BI using SW. Section 5 discusses big data and its relative aspect. While in Sect. 6, we describe incorporating the cloud with SW and BI. Lastly, Sect. 7 brings the end to our survey with the conclusion. 1.1 Related Works The Business Intelligence (BI) aim is to grant various devices, strategies, and innovations improvement and encourage dynamics. With regards to BI, the resource distribution centre is used to gather, arrange, and store subject-situated, incorporated, time variance, and uncertain resource.

1420

A. A. Hussain et al.

Creators in [7] proposed another portrayal of designs, all relying upon XML, to vanquish the limitations of current metadata gatherings. Thus, the fundamental thought regarding these arrangements is that any idea or case utilized for depicting an object on the web must have alluded by the novel resource identifier (URI). Hence, the common essential approach to portray an item comprises, making a connection towards the URI that speaks to the expected semantics. However, with the RDF, Authors in [6] had the option to make increasingly complex metadata components permitting the portrayal of connections between descriptors (for example significantly increases). Moreover, creators in [8] use the RDFS augmentation which permits clients to characterize a conformant outline for RDF depictions. It merits referencing that the semantics of RDFS is very like edge-based and object-situated formalisms. Authors in [9, 5] proposed progressively expressive semantic portrayal by embracing rationale based structures: ontology web language and the DAML + OIL. In opposition, the RDFS and every one of these models depend on depiction rationales, they are subgroup for the FOL (first-order logic). Distributed computing will make another flood of advancements in the cloud and an expanded processing industrialization of IT. These developments will bring about changes in plans of action and permit new degrees of advantage for organizations.

2 Semantic Web and Business Intelligence The Business Intelligence (BI) aim is to grant various devices, strategies, and innovations for improving and encouraging dynamics [10]. With regards to BI, the resource distribution center is used to gather, arrange, and store subject-situated, incorporated, time variance, and uncertain resource [11]. Information from various sources by and large inner databases is occasionally included in the information stockroom in the wake of cleaning and changes to a particular format by the assistance of the Extract-TransformLoad (ETL) model. Customary the BI devices, for example, the OLAP. It is effectively implemented to a huge measure of information originating from an inward data house. In any case, the compelling idea of the present organization exercises powers conventional BI to open its door to outer information to reply to progressively heterogeneous and open investigation situations [12, 13]. A classical BI architecture is shown on the left side of Fig. 1 [14]. Data is in databases, where information is gathered using ETL model and stored to a data warehouse in a proprietary format. Then, restricted queries are applied to query/analyse this data. On the other hand, SW offers data to be extracted from heterogeneous data sources such as databases, blogs, documents, social media, multimedia content, and so forth (right side of Fig. 1) [14]. Using semantic ETL approaches, metadata can be extracted from large set of resources and can be stored in a machine-processable format as RDF (triples). Finally, more flexible querying and analysis is possible using SPARQL queries and reasoning rules. As an expanding amount of semantically enriched information is accessible over the Web with the Linked Open Data movement, which also includes the semantic metadata in the conventional OLAP investigation process [15] is an auspicious method to improve customary BI examinations [16]. For example, a leader may need a superior review of an item by occupying an organization record with web-distributed clients’

Semantic Web and Business Intelligence

1421

sentiments and business data [17]. Although BI and SW is considered as two distinctive research headings in the state of the most current decades, ongoing examination shows the combination of these two areas is unavoidable and advantageous for both areas [18].

Fig. 1. Classical business intelligence and semantic business intelligence compared [14]

3 SW and Online Analytical Processing (OLAP) These days, enormous numbers of inquiries are made about an attempt to consolidate OLAP investigation with SW advancements both in information mix and information preparation levels. This exploration course allows us to consolidate incredible assets and innovations in the two spaces. In any case, it is a crucial work mostly because of the clarification that follows were OLAP needs a specific data format to help complex assessment over amassed estimations of estimations at various granularity stages [18]. Performing OLAP examination straightforwardly over SW information is troublesome and wasteful by the absence of a reasonable information model overcoming any issues among SW and OLAP areas. All things considered, OLAP is initially imagined for investigation behind homogenous and stable data house information. Considering the appearance of a profusion of information used in BI processes, the resources become increasingly heterogeneous and unpredictable. Thus, this problem can be addressed by a great extent by SW using a standard and machine processable data conforming to ontologies. Thus, conventional OLAP advances can be tested regardless of its application using semantic metadata. Taking a gander at these issues, very few research tries have been made in joining BI OLAP with SW. These approaches can be divided into two categories as depicted in Fig. 2: Multidimensional modelling oriented and OLAP analysis oriented. The primary methodology is OLAP-investigations arranged, which comprises removing, changing, and afterward putting away multidimensional SW data in customary OLAP information distribution centres, with the goal that it very well may be examined by a present OLAP devices. Thus, the subsequent methodology is said to be the multidimensional demonstrating focus, whereas it points to complete OLAP examinations legitimately

1422

A. A. Hussain et al.

over RDF-like information displayed in a proper multi-dimensional arrangement. In this section, we discuss the related work in these two fields, and finally in Table 1, we summary different properties of these referenced works.

Fig. 2. Illustration of BI and SW combination.

3.1 Multidimensional Model Oriented To beat the downsides of past methodologies, the other research hub comprises of completing multidimensional investigation legitimately over SW information without ETL forms. The greater part of the present model depends on the Data Cube vocabulary (QB) for RDF. W3C proposed a hypothesis for planning to distribute measurable and complex datasets regarding the standard RDF form. Legitimately archiving OLAP examination through the QB method is by all accounts increasingly proficient as a result, no ETL process is needed. However, regarding the standard inquiry for OLAP examination, it needs a mind-boggling scheme of information 3D squares which contains realities, measurements, different orders, and levels [20]. Indeed, even idea QB permits speaking to progressive connections between measurement occurrences by employing SKOS vocabulary’s skos: narrower property. In this way, more specific connections can be made. Authors in [21] characterizes an expansion of the QB model to speak to measurable information in regards to a multidimensional scheme. The authors also delineate how to complete the OLAP examination through information distributed in the QB model by utilizing SPARQL query language. Authors in [22] presents another multidimensional demonstrating language called Open Cube (OC), which underpins various chains of importance in a measurement. Execution of OLAP administrators through SPARQL queries are likewise introduced in this work. In any case, OC is a particular demonstrating language, henceforth information previously distributed in QB, it is normalized and cannot be used again by the OC. In defeating this issue, the same authors in [22] present the QB4OLAP. QB4OLAP broadens and stays perfect with QB to help the multidimensional display of SW information. In [23], an augmentation of QB4OLAP is proposed. It boosts measurement with different progressive systems and it considers cardinalities between level individuals. Instruments to change an existent social information distribution centre into QB4OLAP construction have likewise been introduced in [23]. The correlation similarity among QB4OLAP and QB gives rise to questioning QB4OLAP by using SPARQL conceivable, also questioning the problems utilizing OLAP examination in the QB4OLAP model instead of questioning, everything is to be talked about.

Semantic Web and Business Intelligence

1423

In [24] introduces a theoretical multidimensional scheme dependent on QB that underpins realities, measurements, and multi-chains of command along various sorts. They additionally tell the best way to actualize OLAP administrators utilizing SPARQL questions with the proposed multidimensional model. As far as we could understand, authors in [24] were the pioneers to describe the OLAP administrator’s usage by SPARQL questions in a total multidimensional information scheme. The multidimensional displaying focused methodology defeats the issues regarding the non-automaticity of the ETL model. Thus, it gives good multidimensional demonstrating answers for the OLAP investigations over the SW information. Generally, huge data of the SW information are questioned on-the-go [24], thus the effectiveness of the OLAP investigation utilizing the QB mode is very minimal. 3.2 Analysis Oriented Approach of OLAP OLAP examinations are brought out through investigation administrators, for example, move up activities, drill-down activities, turn, etc. [26]. Examination results are generally introduced in Multidimensional Table (MT) permitting envisioning a few investigation around a subject. In light of an MT, chiefs can additionally complete OLAP administrators to proceed with their investigations. OLAP administrators are just pertinent to particular information structures [26, 27], RDF portrayals, notwithstanding, don not arrange segment that can legitimately bolster OLAP examination. For example, to do drill-down and rollup activities, we have to speak to information as per various stage levels inside a measurement. Be that as it may, although RDF triple implementation may be utilized in depicting web assets and connections amid them on an occurrence level, thus it does not permit uncovering progressive connections inside a measurement structure (pattern level). Looking at this issue, the OLAP examination arranged methodology comprises of changing SW information into OLAP solid shape utilizing ETL forms. Along these lines, OLAP investigation can also be brought over extricated SW information behind existing examination devices. In this sub-section, we discuss works utilizing this methodology. Authors in [28] propose a self-loader way to deal with characterizing an OLAP information distribution centre from a solitary space cosmology. The subsequent information distribution centre might consolidate composite web resources through ensuing a standard OLAP resource mode. With this, the procedure makes the OLAP examination to be performed on SW resources. In any case, significant information can be found in a couple of territory ontologies presented in a veritable application. The strategy proposed in [28] depends on a solitary area cosmology, it does not answer accommodating covering ideas in various space ontologies. Authors in [29] propose a system to characterize a semi-organized information distribution centre from various area ontologies. This information distribution centre, called Semantic Data Warehouse (SDW), utilizes metaphysics mappings to oversee area overlapping. Intelligible examples from various space ontologies are inferred and afterward amassed to semi-consequently produce a focused on OLAP 3D square. These works focus on separating, changing, and stacking SW information into OLAP 3D shapes with the goal that leaders can legitimately complete OLAP investigation. The fundamental preferred position will be the chance of utilizing current OLAP instruments

1424

A. A. Hussain et al.

at the same time dissecting changed SW information in the OLAP block. Notwithstanding, putting away SW information into a generally static neighborhood information distribution centre conflicts with the profoundly unique nature of web-distributed data. Also, the ETL procedure is not yet absolutely programmed, which is tedious [30]. From a client’s point of view, for example, requiring high information with no ceaseless questioning [31], semi-naturally or physically fabricated neighborhood SW information stockroom can barely respond to changes in information sources continuously. Thusly, the firmness between data house information and information in the online source is difficult in managing. In Table 1, we summary the works in multidimensional model oriented and analysis oriented OLAP approaches. Table 1. Multidimensional model oriented and analysis oriented approach of OLAP comparison summary. Ref

OLAP Multiple Heterogeneous Querying Reuse Multiple Multiple OLAP analysis hierarchies sources standard ontologies levels operators ✓

[21]



[22]





[23]







[24]







[28] ✓



[29] ✓



✓ ✓



✓ ✓ ✓

4 Contextualizing Business Intelligence Analysis Other than being utilized as information hotspots for investigation, SW information can likewise be utilized as integral data to clarify the setting of business examination. For example, the web-distributed news discussing consistent high temperatures in an area could clarify the expanding deals of air conditioning systems. The mix of outside SW information with real information in an OLAP information distribution canter furnishes chiefs with different perspectives over their business exercises. Recognizing applicable SW information to contextualize business examination is a promising method to manufacture choice emotionally supportive networks of the people to come, however, in contextualizing OLAP the examination is cultivated basically over substance mining and resource recuperation headways [32]. Only few works address contextualizing BI analysis as discussed below. The contextualization of business investigation is accomplished by recovering applicable data put away in various frameworks. Authors in [33] presents a model allowing to relate important records in the content administration framework with a preordained report of OLAP in the OLAP framework. This model divides the client’s setting in order to introduce independently put away yet related data together. A proper methodology allowing to convey clients’ investigation setting is introduced in [34], by utilizing instruments of meta-data over the heterogeneous metadata, with this, accurate and nontruthful

Semantic Web and Business Intelligence

1425

information can be introduced together to clarify the setting of business investigation. The model depends on the metadata that is improved with ontology idea mappings. This ontology idea mapping licenses, to relate a similar idea in heterogeneous information sources to similar metadata. The methodology proposed by authors in [34] permits a segment of big business entryway to impart the current client’s assignment with different segments, so all parts in a gateway could show different data identified with a given examination setting. Nonetheless, if leaders could openly communicate their investigation setting, the contextualization procedure would be progressively adaptable and increasingly versatile to clients’ needs. To this end, authors in [35] present a design of the information distribution centre contextualized with records. By coordinating important report portions in OLAP shape, this contextualized information distribution centre gives leaders data positioned based on pertinence to the current investigation setting. While breaking down, leaders can imagine related archive sections alongside truthful information in the OLAP block. Crafted by [35] contrasts from the above method for the most part because license chiefs to communicate their examination setting. Another approach in contextualizing business examination is to recover similar data on the web. Authors in [36] present a way to deal with partner applicable unstructured information from the web with accurate information in the information stockroom. Authors in [37] propose an instrument to extricate catchphrases from organized inquiry itself without the need of question execution: rather than getting data from inquiry’s outcome, they abuse data implanted in the question. A question is changed in a lot of catchphrases by evacuating distractive and random data. The removed watchwords are then utilized for a catchphrase based hunt in a web index to give investigation setting. This is a progressively nonexclusive methodology contrasting with the above method because a wide range of organized inquiry SQL question, XML question, and so on, are upheld by authors of [37]. Moreover, in this work we can discover further conversation about the advantages of joining catchphrase extraction with space information. Be that as it may, this conversation is extremely uncertain, a solid incorporation system of watchword extraction of the SW advancements is up till now missing in the author’s work. Additionally, all recently referenced works rely upon customary IR developments. Table 2. Contextualizing BI analysis with SW comparison summary. Ref

Heterogeneous data sources

Ontology based

36







37







38

OLAP analyses



Updated data



Storage of retrieved context



39





40





41







42







1426

A. A. Hussain et al.

Authors in [38] and [39] propose a structure alongside a model permitting recognizing outer occasions in gushing information that would possibly influence the business tasks. Because of content mining procedures, this system licenses to concentrate and relate printed data from inside and outer information sources. Along these lines, recently created web data is continually connected with related inside data, which gives leaders a state-of-the-art setting for their choices. The accompanying Table 2 summarizes the common properties of the discussed works.

5 Big Data With big data, there is possible conflicting information that is managed in broad capacity, various countries, and also being created quickly. To achieve this, devices ought to be utilized that oversee and work on enormous information in a manner to catch process and break down the information. The Management of huge data gives those kinds of gadgets and procedures to surpass these difficulties. Information Science Series provides a widened once-over of possible favorable circumstances for the business and just as for clients to transform into the huge information assets. As the rundown portrays that any organization autonomous of the segment can get profited by large information or claim to fame. The executives ought to use genuine gadgets and philosophies to make it conceivable to catch, procedure, and separate data that is fast, tremendous, and heterogeneous. Authors in [40] uses a disseminated engineering, with the data held in a monotonous path on a couple of centre points. By including more servers the framework can without much of a stretch scale-out and at the same time a server disappointment may persevere. This sort of scheme is used for a colossal measure of information the executives and these databases scales on a level plane. By making excess duplicates of information, creating them solid to parcel disappointment, high accessibility parts of NoSQL databases are improved. Corrosive angles can likewise be considered by databases of NoSQL. Weight puts on databases can yield that NoSQL does not give all-out consistency across scattered centre points. Various NoSQL databases can in like manner be treated as an example of an open database. Thus, in this approach, it licenses several applications to overhaul the architecture of resources without the table being adjusted. Thus, likewise it can give more noteworthy adaptability to store heterogeneously organized information. 5.1 Big Data Integration in SW Authors in [41] investigates the issue in the huge information combination process understudies of basic semantic heterogeneity and heterogeneity. Thus, to take care of some examination issues in the process combination, they put forth a major information semantic method dependent on metaphysics by consolidating the innovation for the semantic web, by building the ontology between the semantic schemes, to tackle the issue of information incoherency. To take care of the issue of heterogeneous information stockpiling, they built a Key/Value stockpiling model. Another framework named Karma is put forth by [42] for a situation study, with the idea of attempting to tackle the assortment difficulties by utilizing the semantic innovation to coordinate diverse sort of

Semantic Web and Business Intelligence

1427

the big data areas. The framework put forth utilizes semantic RDF innovation to determine combination difficulties. For example, distinguishing the same elements in various datasets at the diagram level. The framework is approving just specific sorts of basic and semi-basic information sources. In [43], the passionate and nostalgic examination of informal communities in the budgetary area utilizing a mix of interpersonal organizations, semantic money related ontologies (FIBO), and different resources that gives uniform jargon to communicate feelings and estimations in a legitimate arrangement. Various information like feelings, assessments, and exercises accumulated from Twitter with FIBO and other information sources give a clearer comprehension of various networks, which might be something imperative in budgetary space. The work is practiced in a genuine and separated manner and the structure proposed is explained in detail. In the field of clinical science thinks about a patient, who needs customized social insurance administration structure [44]. To accomplish this need, a restorative administration structure needs to realize another establishment that can permit live conveyance of the patient resource heavily influenced by a specialist. A gigantic proportion concerning homogeneous and heterogeneous resource for individual patients have and dealt with large information. Since the information is not sifted through as a rule, the semantic web technologies become perhaps the most significant factor and are used to clarify various thoughts.

6 Incorporating the Cloud with SW and BI Authors in [45] state that cloud administrations are accessible in full degree in regards to the undertaking innovation stack. Organizations can be allowed to advance users IT shows, because of business needs as opposed to innovation imperatives. This new, versatile IT structure could improve overseeing cost, scale, and readiness. Distributed computing will make another flood of advancements in the cloud and an expanded processing industrialization of IT. These developments will bring about changes in plans of action and permit new degrees of advantage for organizations taking advantage of its capacity [46, 47, 48]. Cloud incorporation is a game-changing period in IT by BI at long last reasonable and available when contrasted with customary BI. The OLAP system as a whole including the data assessment layer and the dashboards could be encouraged as a SaaS (Software as a Service). The OLAP application encouraged over the Cloud might not have a great web organization. In creating an OLAP application impeccable over the web organization, the provider of the SaaS might grant the creation of a transitional layer to have a dependence chart that assist in dropping the characteristics that are not required in the finished up XML resource [49]. Anyway, a few difficulties were confronted like, complying BI implementation with web administrations building measures and also the gauges characterized by either a SaaS or PaaS supplier, similar to Google Apps guidelines, an arrangement of colossally identical information data house framework with decently appropriated question load and even instances of response times from all database servers. IaaS provider should effectively use the virtualized server pack, the chairmen, and improvement to satisfy the benefits on request and the framework engineering ought

1428

A. A. Hussain et al.

to be planned so that the inquiry burden can be equitably appropriated amidst the servers. If the server exhibit utilizes capacity zone organizing for putting away the XML information documents and the OLAP 3D squares, the information bringing from different capacity gadgets ought to again be equally disseminated by prudence of suitable system associations. 6.1 Benefits of Cloud Incorporation These days, Cloud BI arrangements are step by step picking up ubiquity among organizations, the same number of organizations are understanding the advantages of information investigation. Organizations need quality bits of knowledge-driven by exact information like never before. The SaaS suppliers are filling in as an essential interface to the business client’s locale [50]. Cloud BI poses the idea of conveying BI capacities as assistance. 6.1.1 Scalability Cloud BI arrangements take into consideration more noteworthy adaptability to be changed rapidly to give specialized clients access to new information sources, exploring different avenues regarding explanatory modes. In cloud BI arrangements, clients will have the option to maintain a superior monetary authority over IT and having the adaptability to scale rapidly as necessities change. Additionally, in regards to the Cloud, assets can naturally and quickly scale in or out, and it can bolster huge quantities of synchronous clients. This implies clients may undoubtedly build their product utilization immediately or the expense of conveying and introduce extra equipment and programming [50]. 6.1.2 Cost-Effective In the Cloud, organizations don not have financial plan for enormous, direct front acquisition of programming bundles or complete tedious reports on neighborhood servers to put the BI framework fully operational. They will regard it as a help, paying just for the processing assets they require and maintain a strategic distance from expensive resource procurement and upkeep decreasing the passage limit boundary [50]. 6.1.3 No Capital Expenditure This is a key advantage in regards to the cloud schema, with these organizations pay for the help they utilized. In regards to this arrangement, cloud vendors permit organizations better management over CAPEX (capital use) and the OPEX (task consumption). Thus, the advantages of BI could turn out quicker to more clients inside the association [50]. 6.1.4 Capability of Enhanced Data Sharing Cloud applications permit sharing and accessing information remotely and empower simple cross-area information distribution capacities as they are conveyed employing the net and outside an organization’s firewall [50].

Semantic Web and Business Intelligence

1429

6.1.5 Reliability It improves using numerous excess destinations, which can give dependability and secure areas to information stockpiling and the assets can be spread over an enormous number of clients, which makes Cloud figuring reasonable for catastrophe recuperation and business congruity [50].

7 Conclusions and Future Works In this paper, we summary and compare different properties of Semantic Web (SW) and Business Intelligence (BI) approaches and discuss how they handle big data. Finally, we explain integration of cloud computing technologies into SW and BI approaches in the era of big data. In particular, BI requires the reconciliation of huge information originating from unique information sources. Generally, these information models were restricted to a corporate value-based data house, depending on social information mode for the most part. As a result, BI researches have been primarily centred on this information model up until now. In any case, the expanding accessibility of significant information assets, open databases, and an extraordinary assortment of data sources on the Internet are requesting new BI schemes and methodologies for overseeing sorted out and unordered resources. This is unquestionably engaged to empower the trade-off of web information by giving semantic metadata that follow some agreed ontologies using Semantic Web (SW) technologies. To this end, we discuss the fundamental methodologies that have been used by SW innovation to confront the old-style issues of BI and big data usage. Furthermore, we explore the integration of Cloud technologies in order to solve some of the crucial problems of BI such as cost-effectiveness and reliability . Although significant advances have been made so far, some future challenges still remain. For example, coordination of large information utilizing ontologies and semantic web progressions for dealing with heterogeneous data is a significant research problem that needs future research, specifically in the human administration’s space, business, and cash related zone. Another future challenge is integration of SW, BI, big data and cloud computing. We need intelligent data frameworks to accomplish the real combination of BI and SW in the era of big data and cloud computing, such as new client suggestion strategies, enormous information contextualization, huge SW information stockpiling for explanatory errands, and performing BI in the cloud possibly enhanced by artificial intelligence incorporation. Additionally, a proposition of an architecture based on Semantic Web and IoT to empower BI and user experiments can be carried out.

References 1. Hatirnaz, E., Sah, M., Direkoglu, C.: A novel framework and concept-based semantic search Interface for abnormal crowd behavior analysis in surveillance videos. Multimed. Tools Appl. 79, 1–39 (2020) 2. Sah, M., Direkoglu, C.: Semantic annotation of surveillance videos for abnormal crowd behaviour search and analysis. In: 2017 14th IEEE International Conference on Advanced Video and Signal Based (2017)

1430

A. A. Hussain et al.

3. Brickley, D., Guha, R.V.: RDF vocabulary description language 1.0: RDF schema (2004). https://www.w3.org/TR/rdf-schema/ 4. Zhihong, A.T., Shiwei, Z., Dongqing, Y.M., Jie, C.: Overview of ontology. Acta Scientiarum Naturalium Universitatis Peldnensis 38, 730–738 (2002) 5. Dean, M., Schreiber, G., Bechhofer, S., van Harmelen, F., Hendler, J., Horrocks, L., Stein, L.A.: OWL Web ontology language reference. W3C Recommendation, 10 February 2004 (2004). https://www.w3.org/TR/owl-ref/ 6. Klyne, G., Carroll, J., McBride, B.: Resource description framework (RDF) concepts and abstract syntax. W3C Recommendation, 10 February 2004 (2004). https://www.w3.org/TR/ rdf-concepts/, (Eds.) 7. Bray, T., Paoli, J., Sperberg-McQueen, C.M., Maler, E.: Extensible markup language (XML) 1.0 (2nd edn.). W3C Recommendation, 6 October 2000 (2000). https://www.w3.org/TR/RECxml/, (Eds.) 8. Brickley, D., Guha, R.V.: RDF vocabulary description language 1.0: RDF schema. https:// www.w3.org/TR/rdf-schema/, (Eds.) 9. Horrocks, L., van Harmelen, F., Patel-Schneider, P.: Reference description of the DAML+OIL ontology markup language (2001). https://www.daml.org/2000/12/reference.html 10. Trujillo, J. and Maté, A.: Business intelligence 2.0: a general overview. In: Dans: European Business Intelligence Summer School, pp. 98–116. Springer, Heidelberg (2012) 11. Inmon, W.H.: Building the Data Warehouse. 2nd edn. 401 p. Wiley, New York (1996). ISBN 0471141615 12. Kimball, R.: The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses. John Wiley, New York (1996) 13. Chen, H., Chiang, R.H., Storey, V.C.: Business intelligence and analytics: from big data to big impact. MIS Q. 36(4), 1165–1188 (2012) 14. Airinei, D., Berta, D.A.: Semantic business intelligence - a new generation of business intelligence. Informatic˘a Economic˘a 16, 72–80 (2012) 15. Ibragimov, D., Hose, K., Pedersen, T.B., Zimányi, E.: Towards exploratory OLAP over linked open data–a case study. In: Dans: BRITE. HangZhou, p. 18 (2014) 16. Fernández, M., Iván, C., Vanesa, L., David, V., Pablo, C., Enrico, M.: Semantically enhanced information retrieval: an ontology-based approach. J. Web Semant. 9(4), 434–452 (2011) 17. Berlanga, R., María José, A., Dolores, M.L., Lisette, G.-M.: Towards a semantic data infrastructure for social business intelligence. In: Dans: New Trends Databases Information System, pp. 319–327 (2014) 18. Thi, A., Nguyen, B.T.: A semantic approach towards CWM-based ETL processes. Proc. -Semant. 8, 58–66 (2008) 19. Codd, E.F., Codd, S.B., Salley, C.T.: Providing OLAP. On-line Analytical Processing) to User-analysts, An IT Mandate. Codd & Associates (1993) 20. Abelló, A., Romero, O., Pedersen, T.B., Berlanga, R., Nebot, V., Aramburu, M.J., Simitsis, A.: Using semantic web technologies for exploratory OLAP: a survey. IEEE Trans. Knowl. Data Eng. 27(2), 571–588 (2015) 21. Kämpgen, B., O’Riain, S., Harth, A.: Interacting with statistical linked data via OLAP operations. In: Dans: International Workshop Linked APIs Semantic Web LAPIS, 9th Extended Semantic Web Conference. Heraklion, pp. 36–49. Citeseer, Greece (2012) 22. Etcheverry, L., Vaisman, A.A.: QB4OLAP: a New Vocabulary for OLAP cubes on the Semantic Web (2012) 23. Etcheverry, L., Vaisman, A., Zimányi, E.: Modelling and querying data warehouses on the semantic web using QB4OLAP. In: Dans: International Conference on Data Warehousing and Knowledge Discovery, pp. 45–56. Springer, Cham (2014) 24. Saad, R., Teste, O., Trojahn, C.: OLAP manipulations on RDF data following a constellation model. In: International Semantic Web Conference ISWC2013, Sydney (2013)

Semantic Web and Business Intelligence

1431

25. Kämpgen, B., Harth, A.: No size fits all – running the star schema benchmark with SPARQL and RDF aggregate views. In: Dans: Semantic Web Semant. Big Data, pp. 290–304 (2013) 26. Ravat, F., Teste, O., Tournier, R., Zurfluh, G.: Algebraic and graphic languages for OLAP manipulations: Int. J. Data Warehous. Min. 4(1), 17–46 (2008) 27. Harinarayan, V., Rajaraman, A., Ullman, J.D.: Implementing data cubes efficiently. ACM SIGMOD Rec. 25(2), 205–216 (1996) 28. Romero, O., Abelló, A.: Automating multidimensional design from ontologies. In: Dans: Int. Workshop Data Warehouse. OLAP, pp. 1–8. ACM Press (2007) 29. Nebot, V., Berlanga, R., Pérez, J.M., Aramburu, M.J., Pedersen, T.B.: Multidimensional integrated ontologies: a framework for designing semantic data warehouses. Dans: J. Data Semant. XIII, 1–36 (2009) 30. Pardillo, J., Mazón, J.N.: Using ontologies for the design of data warehouses. Int. J. Datab. Manag. Syst. 3(2), 73–87 (2011) 31. Pedersen, T.B., Castellanos, M., Dayal, U.: Report on the Seventh International Workshop on Business Intelligence for the Real Time Enterprise (BIRTE 2013). ACM SIGMOD Rec 43(4), 55–58 (2015) 32. Perez, J.M., Berlanga, R., Aramburu, M.J., Pedersen, T.B.: Integrating data warehouses with web data: a survey. IEEE Trans. Knowl. Data Eng. 20(7), 940–955 (2008) 33. Priebe, T.: INWISS–integrative enterprise knowledge portal. In: Demonstration at the 3rd International Semantic Web Conference (ISWC 2004), ISWC 2004, Hiroshima, Japan, pp. 33– 36 (2004) 34. Priebe, T.: Building integrative enterprise knowledge portals with semantic web technologies. In: Dans: Intelligent Learning Infrastructure for Knowledge Intensive Organizations: A Semantic Web Perspective, pp. 146–188 (2005). ISBN 9781591405030, 9781591405054 35. Pérez-Martínez, J.M., Berlanga-Llavori, R., Aramburu-Cabo, M.J., Pedersen, T.B.: Contextualizing data warehouses with documents. Decis. Support Syst. 45(1), 77–94 (2008) 36. Roy, P., Mohania, M., Bamba, B., Raman, S.: Towards automatic association of relevant unstructured content with structured query results. In: Dans: International Conference on Information and knowledge management, pp. 405–412. ACM Press (2005). ISBN 1595931406 37. Liu, J., Dong, X., Halevy, A.Y.: Answering structured queries on unstructured data. In: Dans: Web and Databases (WebDB), pp. 20–25 (2006) 38. Castellanos, M., Wang, S., Dayal, U., Gupta, C.: SIE-OBI: a streaming information extraction platform for operational business intelligence. In: ACM SIGMOD, pp. 1105–1110 (2010). ISBN: 9781450300322 39. Castellanos, M., Gupta, C., Wang, S., Dayal, U., Durazo, M.: A platform for situational awareness in operational BI. Decis. Support Syst. 52(4), 869–883 (2012) 40. Quboa, Q., Mehandjiev, N.: Creating intelligent business systems by utilizing big data and semantics. In: IEEE 19th Conference on Business Informatics, vol. 2, pp. 39–46 (2017) 41. Kang, L., Yi, L., Dong, L.: Research on construction methods of big data semantic model. In: World Congress on Engineering 2014, WCE, vol. I (2014) 42. Knoblock, C.A., Szekely, P.: Exploiting semantics for big data integration. AI Mag. 36(1), 25–39 (2015) 43. Sánchez-Rada, J.F., Torres, M., Iglesias, C.A., Maestre, R., Peinado, E.: A linked data approach to sentiment and emotion analysis of twitter in the financial domain. In: Proceedings of the Second International Workshop on Finance and Economics on the Semantic Web (FEOSW 2014), pp. 51–62 (2014) 44. Panahiazar, M., Taslimitehrani, V., Jadhav, A., Pathak, J.: Empowering personalized medicine with big data and semantic web technology: promises, challenges, and use cases. In: IEEE International Conference on Big Data (2014). 978-1-4799-5666-1

1432

A. A. Hussain et al.

45. Swaminathan, K.S., Daugherty, P., Tobolski, J.F.: What the Enterprise Needs to Know about Cloud Computing, Accenture Technology Labs (2009). Accessed 27 April 2009. www.acc enture.com/accenturetechlabs/ 46. Deebak, D., Al-Turjman, F., Aloqaily, M., Alfandi, O.: IoT-BSFCAN: a smart context-aware system in IoT-cloud using mobile-fogging. Elsevier Future Gener. Comput. Syst. 109, 368– 381 (2020) 47. Hussain, A.A., Bouachir, O., Al-Turjman, F., Aloqaily, M.: AI techniques for COVID-19. IEEE Access 8, 128776–128795 (2020) https://doi.org/10.1109/ACCESS.2020.3007939 48. Al-Turjman, F.: Intelligence and security in big 5G-oriented IoNT: an overview. Elsevier Future Gener. Comput. Syst. 102, 357–368 (2019) 49. Aboulnaga, A., Salem, K., Soror, A.A., Minhas, U.F., Kokoseilis, P., Kamath, S.S.: Deploying database appliances on the cloud. IEEE Data Eng. Bull. 32(1), 13–20 (2009) 50. Al-Aqrabi, H., Liu, L., Hill, R., Antonopoulos, N.: Taking the business intelligence to the clouds. In: Proceedings of 14th IEEE International Symposium on High Performance Computing and Communications, HPCC 2012, 25–27 June 2012. IEEE Computer Society Press, Liverpool (2012)

Video Big Data Analytics in the Cloud: Research Issues and Challenges Aftab Alam1 , Shah Khalid2 , Muhammad Numan Khan1 , Tariq Habib Afridi1 , Irfan Ullah1 , and Young-Koo Lee1(B)

2

1 Department of Computer Science and Engineering, Kyung Hee University (Global Campus), Yongin 1732, South Korea {aftab,numan,afridi,irfan,yklee}@khu.ac.kr School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang, China [email protected]

Abstract. On the rise of distributed computing technologies, video big data analytics in the cloud have attracted researchers and practitioners’ attention. The current technology and market trends demand an efficient framework for video big data analytics. However, the current work is too limited to provide an architecture on video big data analytics in the cloud, including managing and analyzing video big data, the challenges, and opportunities. This study proposes a service-oriented layered reference architecture for intelligent video big data analytics in the cloud. Finally, we identify and articulate several open research issues and challenges, which have been raised by the deployment of big data technologies in the cloud for video big data analytics. This paper provides the research studies and technologies advancing the video analyses in the era of big data and cloud computing. This is the first study that presents the generalized view of the video big data analytics in the cloud to the best of our knowledge. Keywords: Big data · Intelligent video analytics · Cloud-based video analytics system · Video analytics review · Deep learning · Distributed computing · Intermediate results orchestration · Cloud computing

1

Introduction

Videos are recorded and uploaded to the cloud on a regular base. Many sources include CCTV, smartphones, drones, satellites, etc. They are actively contributing to video generation, leading to the evolution of video analytics and management systems. Video management and services providers such as Facebook [14], and YouTube [39], are considered as valuable sources of large-scale video This work was supported by the Institute for Information and Communications Technology Promotion Grant through the Korea Government (MSIT) under Grant R712017-1007 (SIAT CCTV Cloud Platform). c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021  M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 1433–1448, 2021. https://doi.org/10.1007/978-3-030-66840-2_108

1434

A. Alam et al.

data. Along with these, various leading industrial organizations have successfully deployed video management and analytics platforms that provide more bandwidth and high-resolution cameras collecting videos at scale and has become one of the latest trends in the video surveillance industry. For example, more than 400 hours of videos are uploaded in a minute on Youtube [32], and more than one hundred and seventy million video surveillance cameras have been installed in china only [30]. It has been reported that the data generated by various Internet of Things (IoT) devices will see a growth rate of 28.7% over the period 2018–2025, where surveillance videos are the majority shareholder [8]. Such an enormous video data is considered “big data” because various sources generate a large volume of video data at high velocity that holds high Value. Even though 65% of the big data shares hold by surveillance videos are monitored, but still, a significant portion of video data has been failed to notice [19]. That neglected data contain valuable information directly related to real-world situations. Video data provide information about interactions, behaviors, and patterns, whether its traffic or human patterns. However, handling such a large amount of complex video data is not worthwhile utilizing traditional data analytical approaches. Therefore, more comprehensive and sophisticated solutions are required to manage and analyses such unstructured video big data. Due to the data-intensive and resources hungry nature of large-scale video data processing, extracting the video’s insights is a challenging task. A considerable video data video poses significant challenges for video management and mining systems that require powerful machines to deal with video big data. Moreover, a flexible solution is necessary to store and mine this large volume of video data for decision making. However, large-scale video analytics becomes a reality due to the popularity of big data and cloud computing technologies. Cloud computing is an infrastructure for providing convenient and ubiquitous remote access to a shared pool of configurable computing resources. These resources can be managed with minimal management effort or service. Big data technologies, such as Hadoop or Spark echo system, are software platforms designed for distributed computing to process, analyze, and extract the valuable insights from large datasets in a scalable and reliable way. The cloud is preferably appropriate to offer the big data computation power required to process these large datasets. Amazon web service [4] and Oracle Big Data Analytics [13] are some examples of big data analytics platforms. When video analytics solutions are provided in a cloud computing environment, then it is called Cloudbased Video Analytics System (CVAS). Large-scale video analytics in the cloud is a multi-disciplinary area, and the next big thing in big data, which opens new research avenues for researchers and practitioners. This work aims to conduct a comprehensive study on the status of largescale video analytics in the cloud-computing environment while deploying video analytics techniques. First, a lambda style [27] service-oriented reference architecture called Lambda CVAS (L-CVAS) has been proposed for video big data analytics in the cloud. Then open research issues and challenges are discussed, with a focus on proposed architecture.

Video Big Data Analytics in the Cloud: Research Issues and Challenges

2

1435

Proposed L-CVAS Architecture

Figure 1 presents the proposed L-CVAS reference architecture for distributed Intelligent Video Analytics (IVA) in the cloud. It is composed of five layers i.e., Video Big Data Curation Layer (VBDCL), Video Big Data Processing Layer (VBDPL), Video Big Data Mining Layer (VBDML), Knowledge Curation Layer (KCL), and Web Service Layer (WSL).

Fig. 1. A reference architecture for intelligent video analytics in the cloud.

2.1

Video Big Data Curation Layer

Effective data management is key to extract insights from the data. It is a petascale storage architecture that can be accessed in a highly efficient and convenient manner. We design the VBDCL to manage video big data efficiently. VBDCL’s

1436

A. Alam et al.

consists of three main components: Real-time Video Stream Acquisition and Synchronization (RVSAS), Distributed Persistent Data Store (DPDS), and VBDCL Business Logic. Real-Time Video Stream Acquisition and Synchronization. To handle large-scale video stream acquisition in real-time, to manage the Intermediate Results (IR), anomalies, and the communication among Real-time IVA (RIVA) services, we design the RVSAS component while assuming a distributed messaging system. RVSAS provides client APIs on the top of a distributed messaging system for the proposed framework. Distributed Message Broker, e.g., Apache Kafka [23], is an independent application responsible for buffering, queuing, routing, and delivering the messages to the consumers being received from the message producer. The RVSAS component is responsible for handling and collecting real-time video streams from device-independent video data sources. Once the video stream is acquired, then it is sent temporarily to the distributed broker server. The worker system, on which an IVA service is configured, e.g., activity recognition, reads the data from the distributed broker and process. The RVSAS component is composed of five sub-modules, i.e., Distributed Message Broker Manager (DMBM), Video Stream Producer (VSP), Video Stream Consumer (VSC), Intermediate Results Manager (IRM), and Lifelong Video Stream Monitor (LVSM).

Fig. 2. Real-time video stream acquisition and synchronization

DMBM are used to manage the queues in the distributed message broker cluster considering RIVA services. Three types of queues, RIVA ID, RIVA IR ID, and RIVA A ID as shown in Fig. 2, are automatically generated by the DMBM module on the distributed message broker when a new real-time RIVA service is created. Here RIVA, ID, IR, and A stands for RIVA service, unique identifier of the service, IR, and Anomalies, respectively. These queues are used to hold the actual video stream being acquired by VSP, IR produced by an algorithm, and anomalies detected by the video analytics services. The VSP module is used to provide interfaces to video stream data sources, acquire large-scale streams from device-independent video data sources, and send it to the broker server queue, i.e., RIVA ID, in the form of compressed mini-batches. The VSC assists the RIVA service to read the mini-batches of the video stream from the respective queue for

Video Big Data Analytics in the Cloud: Research Issues and Challenges

1437

analytics, as shown in Fig. 2. A RIVA is subject to produces two types of results, i.e., intermediate results and anomalies. In this context, RIVA utilizes VSC and LVSM to sent the intermediate results and anomalies to the RIVA IR ID, and RIVA A ID queues respectively. Immediate Structured Distributed Datastore. The Immediate Structured Distributed Data Store (ISDDS) is provided to manage large-scale structured data in the distributed environment over Distributed Big Datastore (DBDS). Because of the data-intensive operation and according to the other layer’s requirements, technologically, a distributed big data store can be deployed such as Cassandra, HBase [37], etc. The ISDDS hosts five types of data. Role-based user logs are maintained through the User Profile and Logs meta-store. The proposed framework manages two types of video data sources through the Data Source meta-store. These are video data sources, such as IP-cameras, Kinect, body-worn cameras, etc., and batch video datasets. The former can be subscribed to RIVA service while the latter is eligible for Batch IVA (BIVA) services. The meta-information of these sources, along with access rights, are managed through the Data Source meta-store. Administrator and developer roles can develop, create, and deploy video analytics algorithms through the L-CVAS. Similarly, different IVA algorithms can be pipelined into an IVA service. The management of video analytics algorithms and services is managed through Video Analytics Algorithm and Service meta-store, respectively. As stated that in IVA pipelining environment, the output of one IVA algorithm can be the output of another algorithm. In this context, we design a general container called IR datastore to persist and index the output of an IVA algorithm and services. Finally, the users are allowed to subscribe to the data sources to the IVA services. The subscription information is maintained through the Subscription meta-store, and the anomalies are maintained through the Anomalies meta-store. Unstructured Persistent Distributed Datastore. The Unstructured Persistent Distributed Data Store (UPDDS) component built on the top of the Distributed File System (DFS), such as Hadoop File System (HDFS), that facilitates permanent and distributed big-data storage. Upon new registration, a formal User Space is created on the top of DFS. The User Space is managed through a proper hierarchical directory structure, and special read and writes access are granted to the owner. All the directories are synchronized and mapped in the corresponding user profile logs. Under the User Space, three types of distributed directories are created, i.e., Raw Video Space, Model Space, and Project Space. Raw Video Space is used for the management of the video data. The Model Space is provided to facilitate the developers to manage the training and testing model according to the deployed IVA algorithm. The Project Space is provided to manage the source code of the respective developer and practitioners. Active and Passive Data Readers and Writer. This module gives readwrite access to the underlying data securely according to the business logic of

1438

A. Alam et al.

the VBDCL Business Logic and according to the registered user access rights. This sub-module is composed of two types of readers and writers, i.e., Active and Passive Data Reader, which are used to access ISDDS and UPDDS. ISBDS Business Logic. The actual business logic is provided by the VBDCL Business Logic, which implements seven modules. The User Manager module encapsulates all the user-related operations such as new user account creation, access role assignment, and session management. Through the Data Source Manager and Model Manager modules, the user can manage the Video Stream Data Source (VSDS), video data uploading, and model management. The (R/B)IVA Algorithm and Service Manager are built to manage, develop, and deploy new IVA algorithms and services, respectively. The former is provided as-a-Service (aaS) to the developers, while the latter is provided aaS to the consumers. The developer role can create and publish a new video analytics algorithm. The algorithm is then made available aaS to other developers and can use it. Once IVA services are created, then the users are allowed to subscribe to the streaming video data sources and batch data against the provided RIVA and BIVA services, respectively, using the Service Discovery and Subscription Manager. The IR Manager provides a secure way of getting the IR and maps it according to ontology. Finally, to provide the proposed system’s functionality over the web, it incorporates top-notch functionality into simple unified role-based web services. The Web Service Layer is built on the top of VBDCL Business Logic. 2.2

Video Big Data Processing Layer

IVA requires video data pruning and strong feature extraction. With such intentions, the VBDPL layer consists of three components, i.e., Video Pre-processing, Feature Extractor, and Dimensionality Reduction. Video Pre-processing component is designed to clean and remove noise from videos. It is supposed to deploy several distributed video pre-processing operations, including frame extraction, frame-resizing, frame-conversion from RGB to grayscale, shot boundary detection, segmentation, transcoding, etc. In the first step, frames are extracted from a video for processing. The number of frames to be extracted is dependent on the user objective and task-dependent. Candidate frames can be all frames, step frames (every second frame, fifth frame, etc.), or keyframes. The spatial operations highly depend on the scenario and objective. Spatial operations include frame resizing (for reducing computational complexity), corrections (brightness, contrast, histogram equalization, cropping, keyframes), mode (RGB, Grayscale, etc.), and many other operations. Segmentation is used for various purposes, such as partitioning video into semantically related chunks. The Feature Extractor extracts the features from the raw videos that can be interpreted by the Machine Learning (ML) algorithm. In this context, several feature extraction algorithms have been introduced for video data. These feature extraction approaches can be categorized into static features of keyframes,

Video Big Data Analytics in the Cloud: Research Issues and Challenges

1439

object features, dynamic/motion feature extraction, trajectory-based features extraction, and deep learning-based feature extraction [36]. The Feature selection and dimensionality reduction reduce the size of the features. Large sizes of feature sets are expensive in terms of time for training and/or performing classification acquired by trained classifiers. For example, Principal Component Analysis (PCA) and its variants are used to reduce the size of features. During feature selection, most relevant features are selected by discarding irrelevant and weak features. Inappropriate or partially relevant features can negatively affect model performance. Therefore, only a limited set of features should be selected and used for training classifiers. This is what precisely the purpose of this component is and deploy different algorithms in this context. Similarly, some feature reduction techniques available that selects the specific set of limited features in real-time. 2.3

Video Big Data Mining Layer

The VBDML utilizes diverse types of machine-learning algorithms, i.e., supervised, semi-supervised, and unsupervised algorithms to find different type of information from the videos [38]. In this context, VBDML layer hosts three types of components, i.e., Classification, Regression, Clustering. The classification component provides various ML algorithms, e.g. Support Vector Machine (SVM), Nearest Neighbors, Random Forest, Decision Tree, Na¨ıve Bayes, etc., that identifies that a particular object in a video frame belongs to which category while using predefined classes. The Regression component includes algorithms, e.g., Linear Regression, Decision Tree Regression, Logistic Regression, and many more., which predict a continuous-valued attributed associated with objects rather than discrete values. The Clustering component encapsulates algorithms, e.g., K-Mean, spectral clustering, etc. that produces data groups depending upon the similarity of data items. Distributed Deep Learning for IVA. Recently, Convolutional Neural Network (CNN) based approaches have shown performance superiority in tasks like optical character recognition [5], and object detection [26]. The motive of the deep learning is to scale the training in three dimensions, i.e., size and complexity of the models [10], proportionality of the accuracy to the amount of training data [18], and the hardware infrastructure scalability [44]. A CNN or ConvNet is a type of neural network that can recognize visual patterns directly from the pixels of images with less preprocessing. CNN based video classification methods have been proposed in the literature to learn features from raw pixels from both short video and still images [40]. In the proposed L-CVAS framework, both the VBDPL, and VBDML are capable to deploy deep-learning approaches for distributed IVA. Since on the dawn of deep learning, various open-source architecture have been developed. Some of the well-known and state-of-the-art CNN architectures are AlexNet [24], GoogleNet [34], VGGNet [33], and ResNet [17]. There are two approaches for leveraging DL in distributed computing, i.e., model and data distribution [28]. In the former, the DL model is partitioned in

1440

A. Alam et al.

logical fragments and loaded to different worker agents for training, as shown in Fig. 3a. The training data are input to the work-agent(s) that have the input layer. In the second approach, the deep learning model is replicated to all the cluster’s worker-agents, as shown in Fig. 3b. The training dataset is partitioned into non-overlapping sub-dataset, and each sub-dataset are loaded to the different worker-agents of the cluster. Each worker-agent executes the training on its sub-dataset of training data. The model parameters are synchronized among the cluster worker-agents to updates the model parameters. The data distribution approached naturally fits in the distribute computing MapReduce paradigm [12].

Fig. 3. (a) Scalable deep learning utilizing model distribution. (b) Scalable deep learning utilizing data distribution.

Big Data Engines, ML Libraries, and IVA. The VBDPL, and VBDML are assumed to be built on the top of distributed computing engines. Hadoop MapReduce [11] is a distributed programming model developed for dataintensive tasks. Apache Spark follows a similar programming model like MapReduce but extends it with Resilient Distributed Datasets (RDDs), data sharing abstraction [42]. Hadoop’s MapReduce operations are heavily dependent on the hard disk while Spark is based on in-memory computation, making Spark a hundred times faster than Hadoop [42]. Spark support interactive operations, Directed Acyclic Graph (DAG) processing, and process streaming data in the form of mini-batches in near real-time [41]. Apache Spark is batch centric and treats stream processing as a special case, lacking support for cyclic operations, memory management, and windows operators. Such issues of Spark has been elegantly addressed by Apache Flink [6]. Apache Flink treats batch processing as a special and does not use micro-batching. Similarly, Apache Storm and Samza is another prominent solution focused on working with large data flow in real-time. To achieve scalability, big data techniques can be exploited by existing video analytics modules. The VBDPL is not provided by default and needs its implementation on the top of these big data engines. However, The ML approaches can be categorized into two classes in the context of VBDML. One class reimplements the existing ML task by providing a middleware layer to run them

Video Big Data Analytics in the Cloud: Research Issues and Challenges

1441

on a big data platform. This general type of middleware layer provides general primitives/operations that assists in various learning tasks. Users who want to try different ML algorithms in the same framework can take benefits from it. In the second class, the individual video analytics and ML algorithm are executed on a big data platform that is directly built on top of a big data engine for better scalability. Spark MLlib [29], Mahout [31], FlinKML [6] are list of some open-source ML packages built on the top of Hadoop, Apache Spark and Flink, respectively, that support many scalable learning algorithms. For deep learning, various open-source libraries have been develop including TensorFlow [1], DeepLearning4J [35], Keras [7], BigDL [9], and PyTorch [20]. 2.4

Knowledge Curation Layer

The KCL layer has been proposed under CVAS architecture, on the top of VBDML, which map the IR (both online and offline) into the video ontology in order to allow domain-specific semantic video and complex event analysis. The KCL is composed of five components, i.e., Video Ontology Vocabulary, Video Ontology, Semantic Web Rules, FeatureOnto Mapper, and SPARQL queries. Video Ontology Vocabulary standardizes the basic terminology that governs the video ontology, such as concept, attributes, objects, relations, video temporal relation, video spatial relation, and events. Video Ontology is a generic semantic-based model for the representation and organization of video resources that allow the CVAS users for contextual complex, event analysis, reasoning, search, and retrieval. Semantic Web Rules express domain-specific rules and logic for reasoning. When videos are classified and tagged by the VBDML then the respective IR are persistent to VBDCL and also mapped to the Video Ontology while using the FeatureOnto Mapper. Finally, SPARQL based semantic rich queries are allowed for knowledge graph, complex event reasoning, analysis, and retrieval.

3

Research Issues, Opportunities, and Future Directions

Intelligent video big data analytics in the cloud opens new research avenues, challenges, and opportunities. This section provides in-depth detail about such research challenges (summarized in Table 1). IVA on Video Big Data: Big data analytics engines are the general-purpose engine and are not mainly designed for big video analytics. Consequently, video big data analytics is challenging over such engines and demand optimization. Almost all the engines are inherently lacking the support of elementary video data structures and processing operations. Further, such engines are also not optimized, especially for iterative IVA and dependency among processes. Furthermore, the focus of the existing research on IVA are velocity, volume, velocity, but the veracity and value have been overlooked. One promising direction in addressing video big data veracity is to research methods and techniques capable of accessing the credibility of video data sources so that untrustworthy video

1442

A. Alam et al.

Table 1. Video big data analytics open research issues and challenges in the cloud. Component

Aspect

Video big data Volume

Layer

Open research issues

VBDCL

Orchestration and Optimization of IVA Pipeline

VBDCL, VBDPL, VBDML

Big dimensionality reduction, and indexing Cleaning and compressing video big data

Velocity

VBDCL, VBDPL, VBDML

Real-time video streams and online learning

Variety

Big dimensionality reduction Data modality for single IVA goal

Veracity

Vide big data veracity assessment Learning with unreliable data

Value

Understandable IVA for decision support Semantic concepts extraction in distributed environment

User

Developer/ Researcher

WSL

Declarative IVA IVA and distributed computing technologies abstraction. Comprehensive evaluation measures for IVA in the cloud environment Visualizing video big data

VBDCL, WSL

IVA algorithm, model, and services statistics maintenance and ranking Effective price scheme for IVA algorithm deployment, and subscription

WSL, VBDCL

Model management and algorithm selection

WSL

IVA services utilization Improving consumer experience based on feedback

VBDCL, VBDPL, VBDML, VKCL

Effective price scheme for multiple IVA service subscription

Security and Privacy

-

Privacy preserving distributed IVA, security, and trust

Analytics engine

VBDPL, VBDML

IVA on video big data (general big data middleware for IVA) Parameter Server optimization

Consumer

Cloud system

Infrastructure

IVA on video big data (general big data middleware for IVA)

Video Big Data Analytics in the Cloud: Research Issues and Challenges

1443

data can be filtered. Another way is to come up with novel ML models that can make inferences with insufficient video data. Likewise, users’ assistance is required to comprehend IVA results and the reason behind the decision to realize the value of video big data in decision support. Thus, understandable IVA can be a significant future research area. IVA and Human-Machine Coordination: IVA on video big data grants a remarkable opportunity for learning with human-machine coordination for numerous reasons. IVA on video big data in cloud demands researchers and practitioners mastering both IVA and distributed computing technologies. Bridging both the worlds for most analysts is challenging. Especially in an educational environment, where the researcher focuses more on the understanding, configuration, and tons of parameters rather than innovation and research contribution. Thus there is a growing need to design such L-CVAS that provide high-level abstractions to hide the underlying complexity. IVA service to become commercially worthwhile and to achieve pervasive recognition, consumer lacking technical IVA knowledge. The consumers should be able to configure, subscribe, and maintain IVA services with comfort. In traditional IVA, consumers are usually passive. Further, research is required to build more interactive IVA services that assist consumers in gaining insight into video big data. Orchestration and Optimization of IVA Pipeline: The real-time and batch workflow are deeply dependent on the messaging middleware and distributed processing engines. The dynamic (R/B)IVA service creation and multisubscription environment demand the optimization and orchestration of the IVA service pipeline [3] while guarantees opportunities for further research. In the map-reduce infrastructure, a slowdown predictor can be utilized to improve the agility and timeliness of scheduling decisions. Spark and Flink can accumulate a sequence of algorithms into a single pipeline but need research to examine its behavior in dynamic service creation and subscription environment. Further, concepts from the field of query and queuing optimization can be utilized while considering messaging middleware and distributed processing engines with the aim of orchestrating and optimization of IVA service Pipeline. IVA and Big Dimensionality: The VSDS multi-modality can produce diverse types of data streams. Similarly, algorithms generate varied sorts of multidimensional features. The high-dimensionality factor poses many intrinsic challenges for data stream acquisition, transmission, learner, pattern recognition problems, indexing, and retrieval. In literature, it has been referred to as a “Big Dimensionality” challenge [43]. VSDS variety leads to key challenges is how to acquire and process the heterogeneous data in an effective way. Most existing IVA approaches can consider a specific input, but in many cases, for a single IVA goal, different kinds, and formats can be considered. With growing features dimensionality, current algorithms quickly become computationally inflexible and, therefore, inapplicable in many real-time applications [16]. Dimension reduction approaches are still a hot research topic because of data diver-

1444

A. Alam et al.

sity, increasing volume, and complexity. Effect-learning algorithms for first-order optimization, online learning, and paralleling computing will be more preferred. Model Management: An algorithm might hold a list of parameters. The model selection process encompasses feature engineering, IVA algorithm selection, and hyperparameter tuning. Feature engineering is a strenuous activity and is influenced by many key factors, e.g., domain-specific regulations, time, accuracy, video data, and IVA properties, which resultantly slow and hinder exploration. IVA algorithm selection is the process of choosing a model that fixes the hypothesis space of prediction function explored for a given application [15]. This process of IVA algorithm selection is reliant on technical and non-technical aspects, which enforce the IVA developer to try manifold techniques at the cost of time and cloud resources. Hyperparameter is vital as they govern the tradeoffs between accuracy and performance. IVA analysts usually do ad-hoc manual tuning by iteratively choosing a set of values or using heuristics such as grid search [15]. From IVA analysts’ perspective, model selection is an expensive job in terms of time and resources that bringing down the video analytics lifecycle. Model selection is an iterative and investigative process that generally creates an endless space, and it is challenging for IVA analysts to know a priori which combination will produce acceptable accuracy/insights. In this direction, theoretical design trade-offs are presented by Arun et al. [25], but further research is required that how to shape a unified framework that acts as a foundation for a novel class of IVA analytics while building the procedure of model selection easier and quicker. Statistics Maintenance and Ranking: A user can develop and deploy an IVA algorithm, model, or service that can be either extended, utilized, or subscribed by other users. The community members run such architecture, and rapidly, the number of IVA services can be reached to tons of domain-dependent or independent IVA services. This scenario develops a complex situation for the users, i.e., which IVA service (when sharing the parallel functionalities) in a specific situation, especially during service discover. Against each IVA service, there is a list of Quality of Service (QoS) parameters. Some of these QoS parameters (not limited to) are user trust, satisfaction, domain relevance, security, usability, availability, reliability, documentation, latency, response time, resource utilization, accuracy, and precision. Such types of IVA services against the QoS parameters lead to the 0–1 knapsack issue. In this direction, one possible solution is utilizing multi-criteria decision-making approaches. It gives further opportunities to the research community to investigate how to rank and recommend IVA algorithms, models, and services. IVAaaS and Cost Model: The proposed L-CVAS is supposed to provide IVA-Algorithm-as-a-Service (IVAAaaS) and IVA-as-a-Service (IVAaaS) in the cloud while adopting the Customer-to-Customer (C2C) business model. Unfortunately, current Software-as-a-Service (SaaS) cost models might not be applicable because of the involvement of diverse types of parameters that drastically affect the cost model. Such parameters are, business model (Business-to-Business

Video Big Data Analytics in the Cloud: Research Issues and Challenges

1445

(B2B), Business-to-Customer (B2C), and C2C), unite of video, user type (developer, researcher, and consumers), services (IVAAaaS and IVAaaS), service subscription (algorithm, IVA service, single, multiple, dependent or independent), cloud resource utilization, user satisfaction, QoS, location, service subscription duration, and cost model fairness. The addition of further parameters is subject to discussion, but the listed are the basic that govern L-CVAS cost matrix. Additionally, the cost model demands further research and investigations to develop an effective price scheme for IVA services while considering the stated parameters. Video Big Data Management: Despite video big data pose high value, but its management, indexing, retrieval, and mining are challenging because of its volume, velocity, and unstructuredness [21]. In the context of video big data management, the main issue is the extraction of semantic concepts from primitive features. A general domain-independent framework is required that can extract semantic features, analyze and model the multiple semantics from the videos by using the primitive features. Further, semantic event detection is still an open research issue because of the semantic gap and the difficulty of modeling temporal and multi-modality features of video streams. The temporal information is significant in the video big data mining mainly, in pattern recognition. Limited research is available on content-based video retrieval while exploiting distributed computing. Further study is required to consider different types of features ranging from local to global spatiotemporal features utilizing and optimizing deep learning and distributed computing engines. Semantic-based approaches have been utilized for video retrieval because of the semantic gap between the lowlevel features and high-level human-understandable concepts [2]. Ontology adds extra concepts that can improve the retrieval results and lead to unexpected deterioration of search results. In this context, a hybrid approach can be fruitful and need to design different query planes that can fulfill diverse queries in complex situations. Privacy, Security and Trust: Video big data, acquisition, storage, and subscriptions to shared IVA in the cloud become mandatory, which leads to privacy concerns. For the success of such platforms, privacy, security, and trust are always central. In literature, the word ‘trust’ is commonly used as a general term for ‘security’ and ‘privacy’. Trust is a social phenomenon where the user has expectations from the IVA service provider and willing to take action (subscription) on the belief based on evidence that the expected behavior occurs [22]. In the cloud environment, security and privacy are playing an active role in the trustbuilding. To ensure security, the L-CVAS should offer different levels of privacy control. The privacy and security phenomena are valid across VSDS, storage security, multi-level access controls, and privacy-aware IVA and analysis.

4

Conclusion

In the recent past, the number of public surveillance cameras has increased significantly, and an enormous amount of visual data is produced at an alarming

1446

A. Alam et al.

rate. Such large-scale video data pose the characteristics of big data. Video big data offer opportunities to the video surveillance industry and permits them to gain insights in almost real-time. The deployment of big data technologies such as Hadoop, Spark, etc., in the cloud under aaS paradigm to acquire, persist, process and analyze a large amount of data has been in service from last few years. This approach has changed the context of information technology and has turned the on-demand service model’s assurances into reality. This paper presents a comprehensive layered architecture for intelligent video big data analytics in the cloud under the aaS. Furthermore, research issues, opportunity, and challenges being raised by the uniqueness of the proposed CVAS, and the triangular relation among video big data analytics, distributed computing technologies, and cloud has been reported.

References 1. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al.: Tensorflow: a system for large-scale machine learning. OSDI 16, 265–283 (2016) 2. Alam, A., Khan, M.N., Khan, J., Lee, Y.K.: IntelliBVR-intelligent large-scale video retrieval for objects and events utilizing distributed deep-learning and semantic approaches. In: 2020 IEEE BigComp, pp. 28–35. IEEE (2020) 3. Alam, A., Lee, Y.K.: Tornado: intermediate results orchestration based serviceoriented data curation framework for intelligent video big data analytics in the cloud. Sensors 20(12), 3581 (2020) 4. Amazon, E.: Amazon web services, November 2012 (2015). http://aws.amazon. com/es/ec2/ 5. Borisyuk, F., Gordo, A., Sivakumar, V.: Rosetta: large scale system for text detection and recognition in images. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–79. ACM (2018) 6. Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi, S., Tzoumas, K.: Apache flink: stream and batch processing in a single engine. Bull. IEEE Comput. Soc. Tech. Committee Data Eng. 36(4) (2015) 7. Chollet, F., et al.: Keras (2015) 8. Corporation, I.D.: The growth in connected IoT devices (2019). https://www.idc. com/getdoc.jsp?containerId=prUS45213219. Accessed 12 July 2019 9. Dai, J., et al.: BigDL: a distributed deep learning framework for big data. arXiv preprint arXiv:1804.05839 (2018) 10. Dean, J., et al.: Large scale distributed deep networks. In: Advances in Neural Information Processing Systems, pp. 1223–1231 (2012) 11. Dean, J., Ghemawat, S.: MapReduce: a flexible data processing tool. Commun. ACM 53(1), 72–77 (2010) 12. Deshpande, A., Kumar, M.: Artificial Intelligence for Big Data: Complete Guide to Automating Big Data Solutions Using Artificial Intelligence Techniques. Packt Publishing Ltd. (2018) 13. Dijcks, J.P.: Oracle: big data for the enterprise. Oracle white paper, p. 16 (2012) 14. Facebook.Com: Facebook (2020). https://www.facebook.com. Accessed 20 Dec 2019

Video Big Data Analytics in the Cloud: Research Issues and Challenges

1447

15. Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning, vol. 1. Springer, New York (2001) 16. Gao, L., Song, J., Liu, X., Shao, J., Liu, J., Shao, J.: Learning in high-dimensional multimedia data: the state of the art. Multimedia Syst. 23(3), 303–313 (2017) 17. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) 18. Hestness, J., Narang, S., Ardalani, N., Diamos, G., Jun, H., Kianinejad, H., Patwary, M., Ali, M., Yang, Y., Zhou, Y.: Deep learning scaling is predictable, empirically. arXiv preprint arXiv:1712.00409 (2017) 19. Huang, T.: Surveillance video: the biggest big data. Comput. Now 7(2), 82–91 (2014) 20. Ketkar, N.: Introduction to pytorch. In: Deep learning with python, pp. 195–208. Springer (2017) 21. Khan, M.N., Alam, A., Lee, Y.K.: FALKON: large-scale content-based video retrieval utilizing deep-features and distributed in-memory computing. In: 2020 IEEE International Conference on Big Data and Smart Computing (BigComp), pp. 36–43. IEEE (2020) 22. Khusro, S., Alam, A., Khalid, S.: Social question and answer sites: the story so far. Program 51(2), 170–192 (2017) 23. Kreps, J., Narkhede, N., Rao, J., et al.: Kafka: a distributed messaging system for log processing. In: Proceedings of the NetDB, pp. 1–7 (2011) 24. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012) 25. Kumar, A., McCann, R., Naughton, J., Patel, J.M.: Model selection management systems: the next frontier of advanced analytics. ACM SIGMOD Rec. 44(4), 17–22 (2016) 26. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015) 27. Marz, N., Warren, J.: Big Data: Principles and Best Practices of Scalable Realtime Data Systems. Manning Publications Co., New York (2015) 28. Mayer, R., Jacobsen, H.A.: Scalable deep learning on distributed infrastructures: challenges, techniques and tools. arXiv preprint arXiv:1903.11314 (2019) 29. Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., Freeman, J., Tsai, D., Amde, M., Owen, S., et al.: MLlib: machine learning in apache spark. J. Mach. Learn. Res. 17(1), 1235–1241 (2016) 30. Olatunji, I.E., Cheng, C.H.: Dynamic threshold for resource tracking in observed scenes. In: 2018 9th International Conference on Information, Intelligence, Systems and Applications (IISA), pp. 1–6. IEEE (2018) 31. Owen, S., Owen, S.: Mahout in action (2012) 32. Pouyanfar, S., Yang, Y., Chen, S.C., Shyu, M.L., Iyengar, S.: Multimedia big data analytics: a survey. ACM Comput. Surv. (CSUR) 51(1), 10 (2018) 33. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) 34. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Computer Vision and Pattern Recognition (CVPR) (2015). http://arxiv.org/abs/1409.4842 35. Team, D., et al.: Deeplearning4j: open-source distributed deep learning for the JVM. Apache Softw. Found. License 2 (2016) 36. Uddin, M.A., Lee, Y.K.: Feature fusion of deep spatial features and handcrafted spatiotemporal features for human action recognition. Sensors 19(7), 1599 (2019)

1448

A. Alam et al.

37. Vora, M.N.: Hadoop-HBase for large-scale data. In: 2011 International Conference on Computer Science and Network Technology (ICCSNT), vol. 1, pp. 601–605. IEEE (2011) 38. Xie, L., Sundaram, H., Campbell, M.: Event mining in multimedia streams. Proc. IEEE 96(4), 623–647 (2008) 39. youtube.com: Youtube statistics (2019). https://www.youtube.com/about/press/, accessed: 2019-12-20 40. Yue-Hei Ng, J., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., Toderici, G.: Beyond short snippets: deep networks for video classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4694–4702 (2015) 41. Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., Stoica, I.: Discretized streams: fault-tolerant streaming computation at scale. In: Proceedings of the TwentyFourth ACM Symposium on Operating Systems Principles, pp. 423–438. ACM (2013) 42. Zaharia, M., Xin, R.S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M.J., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016) 43. Zhai, Y., Ong, Y.S., Tsang, I.W.: The emerging “big dimensionality” (2014) 44. Zhang, H., Zheng, Z., Xu, S., Dai, W., Ho, Q., Liang, X., Hu, Z., Wei, J., Xie, P., Xing, E.P.: Poseidon: an efficient communication architecture for distributed deep learning on {GPU} clusters. In: 2017 {USENIX} Annual Technical Conference ({USENIX}{ATC} 17), pp. 181–193 (2017)

Smart Modeling Systems and Natural Language Processing

A Multimodal Memes Classification: A Survey and Open Research Issues Tariq Habib Afridi , Aftab Alam , Muhammad Numan Khan , Jawad Khan , and Young-Koo Lee(B) Department of Computer Science and Engineering, Kyung Hee University (Global Campus), Yongin 1732, South Korea {afridi,aftab,numan,jkhanbk1,yklee}@khu.ac.kr

Abstract. Memes are graphics and text overlapped so that together they present concepts that become dubious if one of them is absent. It is spread mostly on social media platforms, in the form of jokes, sarcasm, motivating, etc. After the success of BERT in Natural Language Processing (NLP), researchers inclined to Visual-Linguistic (VL) multimodal problems like memes classification, image captioning, Visual Question Answering (VQA), and many more. Unfortunately, many memes get uploaded each day on social media platforms that need automatic censoring to curb misinformation and hate. Recently, this issue has attracted the attention of researchers and practitioners. State-of-the-art methods that performed significantly on other VL dataset, tends to fail on memes classification. In this context, this work aims to conduct a comprehensive study on memes classification, generally on the VL multimodal problems and cutting edge solutions. We propose a generalized framework for VL problems. We cover the early and next-generation works on VL problems. Finally, we identify and articulate several open research issues and challenges. This is the first study that presents the generalized view of the advanced classification techniques concerning memes classification to the best of our knowledge. We believe this study presents a clear roadmap for the Machine Learning (ML) research community to implement and enhance memes classification techniques. Keywords: Visual and linguistic · BERT · Multimodal language processing · Deep learning · Cross-modal

1

· Natural

Introduction

From the past couple of years, there has been an elevation in the research community on multimodal problems such as VQA [3,18,58] and image captioning [6,19,58], memes classification [26], and many more [40]. Many real-world problems are multimodal, just like humans perceive the world using multimodal senses such as eyes, ears, and tongues. Likewise, data on the internet and machine are also multimodal, which is in text, image, video, sound, etc. Memes classification is a multimodal problem as most of the memes have two modalities, such c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021  M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 1451–1466, 2021. https://doi.org/10.1007/978-3-030-66840-2_109

1452

T. H. Afridi et al.

as text and graphics from the image. Due to the heavy use of social media platforms, there has been a requirement to curb the negative impact automatically. One such problem is the automatic filtering of hateful memes to stop the users from spreading hate across the internet. Big social media platforms like Twitter, Facebook are often instructed by different countries to stall the spread of online hatred. Facebook has recently called for memes classification challenges2020 [26], which includes hateful memes such as racist, sexist, and some may incite violence. Recent VL multimodal techniques such as [33,37] have been found far from human accuracy and multimodal memes classification is still in its infancy [26]. This is a challenging problem because separately, a meme can have a pleasant caption and a normal picture, but it may become offensive when combined in a certain way. Consider a meme that has a caption like “love the way you smell toda”. Combine that caption with the image of a skunk, and it became mean. Similarly, consider a caption “look how many people love you” which seems good but adds that with barren land, it also becomes mean. Likewise, consider an image of women face beaten and add that with the texts such as “women ask for equal rights, so I give them equal lefts aswell” and then “women ask for equal rights, and this is why”. Now the challenge will for the vision model be to identify beaten women to classify these two memes as hateful/non-hateful. This brings up the need for multimodal models for training jointly on the text and images simultaneously. To classify memes accurately, we will need the VL multimodal models to understand the concepts in memes which otherwise need human intelligence. It will also need a greater intelligent model for general VL multimodal problems [40]. Since the vision [20,29,48,55], and language [11,30,36] tasks independently has seen a lot of progress in recent years. Unfortunately, they are still lagging on multimodal problems like VQA, image captioning, hateful memes classification, etc. In the next few years, we may see a peak on VL multimodal problems for the research community. Further, recent research on multimodal has been found with similar kinds of problems such as language can carelessly enforce strong priors that can end in an outwardly impressive performance, neglecting the core model’s visual content [12]. Related issues can be found in VQA [3], where without refined multimodal, a simple baselines model performed unusually well [1,18,61] that will unlikely work on memes classifications as well. This work aims to conduct a thorough investigation on the status of advanced ML approaches on memes classification in social media. First, a generic framework has been proposed for social media memes classification. Then we present a broad overview of up-to-date, relevant literature. Eventually, open research issues and challenges are addressed, with a focus on the proposed framework. The remaining paper is organized in the following order. Section 2 discusses a generic memes classification framework. In Sect. 3, a recent literature review has been discussed. Section 4 presents several open research issues and Challenges. Finally, we conclude this study in Sect. 5.

A Multimodal Memes Classification: A Survey and Open Research Issues

2

1453

Memes Classification: A Generic Architecture

The memes classification task can be seen as a combined VL multimodal problem. It is different from some of the current VL problems like image captioning, where efforts are made to find the best possible explanations for the image in the form of the caption, whereas, in memes, we have to make the decision based on semantically correlated text with that of the visual content in the image. Therefore, a cross-modal approach under vision and text will only perform better on memes classification. Traditional VL approaches were based on simple fusion in the form of early or late fusion while unimodally learning each vision and language problems. However, a multimodally pre-trained model may perform better to classify memes. Based on an extensive literature review, we proposed a generic multimodal architecture for memes classification shown in Fig. 1. The proposed architecture has two types of flows, i.e., Linguistic Processing Flow (LPF) and Visual Processing Flow (VPF). There is a middle phase called Fusion and Pre-Training (FPT), which will define the fusion and Pre-training strategies for merging LPF, and VPF [5,26].

PREPROCESSING TokenizaƟon Case Conversion

Text Data

Noise Removal

REPRESENTATION POS N-gram W2V GLOVE Term WeighƟng Fast Text

Stop-word Rem.

OCR

LemmaƟzaƟon

BERT XLM

TEXT CLASSIFIER

DIM. RED. Info. Gain Gain RaƟo Chi-Square Relief

DIM. REDUCTION Albert Roberta ELMo GPT

SVM NB

RNN Dec. Tree

CNN GRU Bagging

BLSTM DL AdaBoost

SoŌMax

LinguisƟc Processing Flow Visual LinguisƟc MulƟmodal Data

Fusion and Pre-training Early Fusion

Hybrid Fusion

Decision Hateful

Late Fusion

Not Hateful

Unimodal Pre-training MulƟmodal Pre-training Visual Processing Flow FEATURE Ext.

OBJ. DetecƟon

TEXT GENERATOR

Resize

Color

VGGNet

LSTM

Noise Removal

Texture

ResNet

RNN

RGB2Gray

Shape

IncepƟon-V3

Language CNN

SegmentaƟon

Deep features

GoogleNet

MELM

Int. Output Text

Images

PREPROCESSING

Fig. 1. A generic architecture for memes classification.

Both NLP and vision has a long history of ML methods, for which we have categorized them in the first and second generation. After the success of AlexNet [29], the next generation of vision has begun, which is based solely on deep learning models, specifically convolutional neural networks. Similarly, After the

1454

T. H. Afridi et al.

success of Bert [11], the next generation of NLP also gets started. We divided both LPF and VPF flow into two generations; that is First Generation (1G) and Second Generation (2G). We will further elaborate on each of them in the following subsections. 2.1

Linguistic Processing Flow

In 1G LPF, embedding’s can mostly capture the semantic meanings of words. However, such techniques are context unaware and fail to capture higher-level contextual concepts, such as polysemous disambiguation, syntactic structures, semantic roles, and anaphora. Formally LPF, considering 1G, is a four-step process. i.e., pre-processing, feature engineering, dimensionality reduction, and classification. Tasks like stop word removal, capitalization, tokenization, abbreviation handling, slang and idioms handling, spelling correction, noise removal, lemmatization, and stemming are performed in pre-processing [28]. Fortunately, we will not need many of these sub-tasks as text on memes is observed mostly to be clean. However, non-English memes may need some other kind of preprocessing for cleaning linguistic defects. After pre-processing, feature engineering steps are performed to extract useful features from the text. Feature engineering is a non-trivial task as they have to look for better representation of the extracted features. Some popular feature engineering techniques being used are word embeddings like Word2Vec, GloVe, syntactic word representation like N-Gram, weighted words such as Bag of words (BoW), Term Frequency-Inverse Document Frequency (TF-IDF), and FastText [28]. The noteworthy flaws in these techniques are that they failed to capture the context and correct meaning of words such as words with multiple meanings in a different context. Dimensionality reduction techniques like PCA, ICA, LDA, etc., are also employed by research communities to terminate unwanted features. Once the quality features are extracted, the most critical phase in 1G text classification pipeline is picking the best classification model. To determine the most effective classification model for any NLP task, a conceptual understanding of each of these algorithms was a necessity. Therefore, researchers have employed typical text classifiers, such as SVM, kNN, Na¨ıve Bayes, ensemble classifiers such as Bagging, Adaboost, and Decision Tree, Random Forest, which are tree-based [28]. Recently, Deep Learning (DL) methods have attained superior results compared to the earlier ML algorithms on tasks such as object detection in image, face recognition, NLP, etc. The reason for the success of these approaches depends on the ability to model the complex and non-linear associations inside the data. In 2G, the focus of the research community in NLP has been shifted to neural network-based approaches such as RNN, CNN, and transformer-based attention models such as BERT [11], OpenAI GPT-2 [45], Roberta [36], and Albert [30]. Since the inception of BERT, a new era has been started in NLP as it attained state-of-the-art results on many NLP tasks. BERT is built on top of several past clear ideas, and it incorporates ideas from semi-supervised sequence learning [9], ElMo [42] (that solved the problem of Polysemy by using layers of complex Bi-directional Long-short Term Memory (LSTM) architecture), ULMFiT [22]

A Multimodal Memes Classification: A Survey and Open Research Issues

1455

(which trained language models that could be fine-tuned to provide excellent results even with fewer data thus cracking the code for transfer learning in NLP), and substituting LSTM by the transformer [57] (gives better parallel processing and shorter training time than that of LSTM). The transformer from Vaswani [57] has enhanced the NLP by capturing relationships and the sequence of words in sentences, which is vital for a machine to understand a natural language understanding. Unlike, 1G approaches, which heavily relied on feature engineering and choosing the best classifier, were a burdensome task. However, BERT has made the job easy as it is pre-trained on a huge corpus of data, and by consuming the transfer learning, it can be fine-tuned to any given task. Attention layers [4] from the transformer tends to align and extract information from a query vector using context vectors. Attention normalizes the calculated matching score between the query vector and each context vector among all vectors using softmax. Self-attention is an attention layer in which the input query vector is in the set of context vectors i.e it just replace the target sequence with the same input sequence. Explicitly, most researchers now tends to the multi-head attention [57]. The common transformer architecture is composed of encoders and decoders, which are a heap of several identical layers comprising of position-wise Feed Forward Network (FFN) layer and multi-head self-attention layer. The temporal aspect of sequential input has been explored by the position-wise FFN and is accounted for by the transformer in the encoder phase by generating content embedding and position encoding for each token of the input sequence. While, the self-attention within each sub-layer in the Multihead Self-Attention is used to align tokens and their positions among the same input sequence. Sequence models usually capture the local context of a word in sequential order such as LSTM, which is common in language processing and generation among researchers. However, transformer architecture attains substantial parallel processing, reduced training time, and sophisticated accuracy for translation without any recurrent component, unlike LSTM, which is a remarkable advantage. On the contrary, weakly incorporated position information from the position encoding may perform worse for problems that are sensitive to positional variation. 2.2

Visual Processing Flow

After the success of AlexNet [29], the focus has been shifted from traditional 1G that consists of old-fashioned steps such as pre-processing, feature engineering, and classification. In 1G, the researchers have an inept job exploring and redesigning the feature engineering process for any particular but slightly different problem or domain. They also had an additional load of choosing the best classification model for their generated features. Various feature extraction methods have been employed on images like LBP, SIFT, HOG, SURF, BRIEF, and many more [27]. Similarly, many traditional classification methods have also been employed with zero transfer learning capability. Moreover, the traditional visual processing cycle required a similar work for finding the best methods in

1456

T. H. Afridi et al.

each sub-step, like pre-processing, feature extraction, feature selection, and classification. This issue has been overcome by CNN, to acquire features at its own and, at the same time, it can be fine-tuned to other related tasks by transfer learning.

Input Image

SoŌmax

Pooling

Pooling Labels Full ConvoluƟon ConvoluƟon+Relu ConvoluƟon+Relu

Full ConvoluƟon +Relu

Fig. 2. A generic CNN Architecture.

ImageNet database [10] has changed the course of ML by shifting the focus to deep learning altogether, and thus we called it the 2G era of VPF. In DLbased models, it can incorporate a large and diverse set of images and videos as they automatically learn features from training data and can be generalized well on related problems. The general architecture of Convolutional Neural Network (CNN) can be seen in Fig. 2, they are mostly a combination of convolution layers plus activation functions like Relu and its variants, subsampling layers like maxpooling, fully convolutional layer, dense layer, and a softmax layer in the end. Since AlexNet [29] astonishing results in ImageNet challenge, computer vision research has been shifted to enhance the CNN architecture. VGG [48] and the inception module of GoogLeNet [55] shown the benefits of expanding the depth and width of the CNN architecture. ResNets [20] developed the residual learning block by going through the shortcut connection of identity mapping, enabling the neural network model to burst through the obstruction of hundreds or even thousands of layers. DenseNet reformulates the connections between network layers that further boost the learning and representational properties of deep networks [23]. Moreover, extensive research has been done in object detection, and many CNN-based approaches have been proposed, such as RCNN, FasterRCNN, Yolo, etc. [35].

A Multimodal Memes Classification: A Survey and Open Research Issues

2.3

1457

Fusion and Pre-training: Towards Multimodality

Based on the literature [5], we categorize the VL multimodal fusion into three categories, they are early fusion, late fusion, and hybrid fusion. Early fusion merges the features instantly after they are extracted. On the other end, late fusion integrates the decisions after each modality has taken its decision. Lastly, hybrid fusion fuse outputs from individual unimodal predictors and early fusion. The pre-training consists of unimodally pre-trained and multimodally pre-trained. A unimodally pre-trained language and vision model combined by different fusion types is called a unimodally pre-trained multimodal. In contrast, the multimodally pre-trained language and vision model is called multimodally pre-trained [26].

3

State-of-the-Art on Memes Classification

Since there isn’t much work done on multimodal memes classification, we consider other VL problems for state-of-the-art. A classic task in the VL multimodal exploration is to understand an alignment between multimodal feature spaces. Generally, in this context, a CNN and an RNN are trained together to learn a combined embedding space from aligned multimodal VL data and is a commonly followed architecture in image captioning [19,21]. Contrastively, VQA merges both VL modalities to decide the right answer instead of learning an alignment between two spaces. It requires the precise correlations modeling between the image and the question representations. In hateful memes, a similar kind of accurate correlation modeling between image and texts is required as we need to find suitable alignment to both VL modalities to comprehend the underlying correlation among modalities and finally make a decision. For memes, we take inspiration from the VQA literature for the state-of-the-art models [50]. At the beginning of the VQA, researchers employed early fusion by feature concatenation. Later methods learned multimodal features using bilinear pooling [14]. These methods have severe limitations as the multimodal features are fused in the latter stage of the model, so the alignment of VL was also weakly extracted. Also, the acquired visual features by demonstrating the output of CNN as a one-dimensional vector significantly losses the spatial information from the input image [17]. Recently, the focus has been shifted to cross-modality by multimodal pre-training approaches like Visualbert [33], UNITER [7], and Vilbert [37]. They have outclassed many recent approaches on multiple VL multimodal datasets such as VQA [3], Visual Commonsense Reasoning (VCR) [60], NLVR [52], Flicker30K [43], and many more. 3.1

Hateful Speech Classification

Considerable work has been carried out in recent years on detecting the hate speech [13]. Many techniques have been proposed by researchers varying in the feature engineering domains as well as in the choosing of classifiers. Traditional feature engineering techniques like BOW, Ngrams, POS, TF-IDF, CBOW,

1458

T. H. Afridi et al.

word2vec, and text features have been employed for hate speech detection. Similarly, various classification algorithms have also been employed, out of which most frequents are SVM, Random Forest, Decision tree, Logistic regression, Na¨ıve Bayes, and many more [13]. Unlike other tasks in NLP, hate speech has a taste of cultural and regional implications; subjected to one specific cultural background, any expression may be professed as offensive or not. Also, hate speech detection in English by well-known methods can be seen as how correspondingly effective they are in other languages [47]. Very little 2G NLP methods have been employed on hate speech from social media sites [41,49]. One such method has used BERT by using a new finetuning approach based on transfer learning to capture hateful content within social media posts [41]. These fine-tuning were; initially with minimum changes, then inserting nonlinear layers, and finally, with inserting the Bi-LSTM layer and CNN layer. They achieved the best result in the insertion of CNN layers for finetuning. Another method proposed a multi-channel BERT model employing three BERT, one for the multilingual task, one for the English, and one for the Chinese [49]. They explored the translations capabilities by interpreting training and test sentences to the equivalent languages requisite by these three different BERT models. They also evaluated their model on three non-English and non-Chinese language datasets and compared their previous methods approach. Further, they used Google Translation API to translate the text of the source language to English and Chinese for feeding that into corresponding English and Chinese BERTs. Lastly, they compared and presented the state of the art performance using their model. 3.2

Multimodal Visual-Linguistic Classification

Substantial research has been conducted in the past decade on integrating the vision and language modalities. Most of these VL models have similar architectures as they are generally pre-trained CNN models for a variety of computer vision tasks ranging from scene recognition to object detection and object relation among them as well. Likewise, for the language representation, most of these models employed RNN, specifically LSTM and GRU being the most popular choices in the near past [25]. Such approaches employed the traditional early, late, and hybrid fusion [5]. They were unimodally pre-trained in the case of early and late fusion. However, some also multimodally pre-trained using hybrid fusion among CNN-based visuals and an RNN or language model. Some have combined language models with visual information from images and videos at different levels of extracted language features starting at a word level, then to sentence level, and similarly from the paragraph to the end document level. The maximum amount of work was primarily focused on joining a word-level linguistic unit with features from images or videos for creating visual-semantic embeddings valuable on downstream applications. Additionally, numerous approaches are proposed built on n-grams, templates, and dependency parsing [40]. Furthermore, the encoder-decoder framework [8] became famous, which are image description based generation models. These are further extended with attention

A Multimodal Memes Classification: A Survey and Open Research Issues

1459

mechanisms [4] to improve the harvesting of local image features, benefiting the word’s initiation at each time step. Likewise, tasks such as VQA [3,18] often consist of approaches that consist of an image feature extractor, a text encoder, a fusion module (normally with attention), and a response classifier. Lately, the focus has been shifted to BERT based VL models [54] as BERT success has two keys: one is effective pre-training tasks over big language datasets, secondly, the use of Transformer [57] a contextualized text representations for learning instead of LSTM, which further pushed the VL multimodal learning. Therefore, the focus has been shifted towards multimodal pre-training, which has brought leaping advances in VL understanding tasks such as VQA and VCR, with great potential in extending to other VL problems like memes classification, visual captioning, visual dialog, vision-language navigation, as well as video-and-language representation learning. Previously, most methods are designed for specific tasks, while BERT based VL cross-modal such as VisualBERT [33], VilBERT [37], LXMERT [56], VL-BERT [51], B2T2 [2], UnicoderVL [31], ImageBert [44], Pixel-BERT [24] and UNITER [7] has the ability to be fine-tuned to other downstream tasks. These BERT based VL multimodal have attained state-of-the-art performance across diverse VL problems, such as VQA, VCR, image-text retrieval and textual grounding. Existing methods can be divided into two groups based on their model architecture. Fewer works like VilBERT [37] and LXMERT [56] utilize two-stream architecture based on the Transformer. The two-stream architectures process visual and language information respectively and fuse them afterward by another Transformer layer. On the other hand, there are methods such as B2T2 [2], VisualBERT [33], Unicoder-VL [31], UNITER [7], Pixel-BERT [24], and VL-BERT [51] which, apply single-stream architecture where two modalities are directly fused in the early stage and a single transformer is applied to both image and text modalities. They use BERT to learn a bi-directional joint distribution over the detection bounding box feature and sentence embedding feature. Further differences among them are in the training method, loss function, and datasets. Two-Stream Architecture. VilBERT [37] and LXMERT [56] have both employed a two-stream architecture, and both visual and linguistic inputs are processed in separate streams. VilBERT proposed a co-attention mechanism that is also a transformer-based architecture. It allows vision attended language features to be integrated into visual representations and also vice versa by replacing key-value pairs in multi-head attention. It also permits flexible network depth for each modality and thus facilitates the cross-modal connections at various depths. VilBERT comprises of two parallel BERT-based models functioning over text segments and image regions. A third cross-modal where a succession of transformer blocks and co-attentional transformer layers is made to facilitate information exchange between modalities. It used Faster-RCNN [46] and ResNet-101 [20] as the backbone and are pre-trained on the Visual Genome dataset for regional feature extraction. Likewise, LXMERT proposed a similar kind of transformer model comprising three encoders: a language encoder, an

1460

T. H. Afridi et al.

object relationship encoder, and a cross-modality encoder. Its cross-modality encoder is slightly different from VilBERT as it comprises of two self-attention sub-layers, two feed-forward sub-layers, and one bi-directional cross-attention sublayer. It also utilized the Faster R-CNN [46] for object detection. Other differences among these two approaches lie in the pre-training datasets, pre-training tasks, and downstream tasks after fine-tuning. A two-stream architecture’s main issue is having a greater number of parameters with similar performance to a single stream architecture. Single-Stream Architecture. Many other recent BERT-based approaches have opted for a single-stream architecture as it provides the same performance with fewer parameters [7]. The single-stream model takes a mixed sequence of two modalities as an input. Many of the recent BERT based VL models opted for single-stream architectures that include VisualBert [33], B2T2 [2], UnicoderVL [31], VL-BERT [51], Pixel-BERT [24], ImageBERT [44] etc. VisualBERT comprises several transformer layers piled such that it aligns features from an input text and regions that are extracted through Faster-RCNN, in the corresponding input image with a self-attention mechanism. They propose two VL model pre-training tasks such as sentence-image alignment and Masked language Model (MLM). They pre-trained their model on the coco caption dataset. They further evaluated their model on downstream tasks such as VQA, VCR, NLVR. B2T2 has proposed a similar architecture with the same pre-training tasks as MLM and sentence-image alignment. However, it is evaluated on a single downstream task VCR. The UNITER designs its model with four pre-training tasks, which are Image-Text matching, MLM, Masked region Modeling, and wordRegion Alignment. They propose an image embedder and a text embedder to extract the corresponding embedding from image regions and tokens from a sentence. These embeddings are finally fed to the multi-layer transformer for cross-modality learning. Freshly, many similar kind of cross-modal techniques has been proposed with a minor differences in architecture and fusion strategy, Pre-training tasks, Pretraining datasets, Pre-training strategy and downstream tasks such as ImageBERT [44], Unicoder-VL [31], VL-bert [51], Pixel-BERT [24], InterBERT [34], B2T2 [2], Vd-bert [59], and many more. Some of them have used a similar model in different domains such as FashionBERT [16] that deploy the text and image matching in cross-modal retrieval for the fashion industry. Some of them in video and text domains, like videoBERT [54], learn joint embeddings of video frame tokens and linguistic tokens from video-text pairs. CBT [53] presented contrastive learning to handle real-valued video frame features and others such as Univilm [39] and ActBERT [63]. Villa [15] has proposed Large-Scale Adversarial Training for Vision-and-Language Representation Learning. Pixel-BERT [24] suggests aligning image pixels with text in contrast to conventional bottom-up features. HERO [32] proposed hierarchical Transformer architectures to leverage both global and local temporal visual-textual alignments. VLP [62] introduced pre-training tasks via the manipulation of attention masks for image captioning

A Multimodal Memes Classification: A Survey and Open Research Issues

1461

and VQA. Multi-task learning was recently used in [38] to boost the performance further and enhance fine-tuning by using detected image tags.

4

Research Issues, Opportunities, and Future Directions

Memes Classification: Generally, memes can be classified into two categories that are hateful/non-hateful. However, the issue arises in defining the hate in memes. Sometimes the margins are too narrow in memes identifying from hilarious to hateful. Usually, hate can be defined as an attack on people’s characteristics like race, ethnicity, religion, sexual orientation, and many more [26]. Endorsing hateful memes can also be categorized as hateful. Furthermore, we can assume that memes can be further classified into many subcategories targeting the relevant issues. The recent black lives matter protest is a classic example of putting down a direct or indirect attack on someone’s race and color. Similarly, a new trend in social media in statuses where the certain text is written on colored background images can be tracked down into many categories like hateful/non-hateful, rumor, fake news, extremist, etc. Therefore, memes classification further into subcategories will also need to focus on research communities into sub-task-specific datasets and approaches such as fake, true or lies, propaganda memes, especially during an election. Memes Reasoning: Further research issues will be to understand the semantic, especially those based both equally on text and images as into different categories, like whether the memes are humorous or hateful. Further memes categories, as described above, can also be elaborated in providing the possible relationship among different objects from the images and that of the text associated with it. It can also be seen as a problem like classification, and detection and segmentation of images can be aligned and form different elaborations of the images. However, identifying the most suitable alignment of detected objects and regions to that of the text part on that meme can be very challenging and may enhance the Vision and language multimodal ability to such a level that it can increase the model general ability further for other generalize Vision and language tasks. Memes Semantic Entailment: Another research issue can be memes semantic entailment. It is to predict if image semantically entails the text from the memes or independent of one another. This will be good in the sense like the example elaborated above where statuses on social media are frequently uploaded. Some user points of view or opinions have been shared on some color backgrounds. This will tell that if the visual and text are not entailing, an independent Text or image model from state-of-the-art can successfully classify the meme. Nevertheless, if semantic entails from image to the text have been found, it can be further processed by the multimodal like in the above two tasks for further categorization.

1462

T. H. Afridi et al.

Multimodal Fusion and Co-learning: Another vital research issue is related to the multimodal fusion and co-learning of visual and linguistic models. Traditionally, the researchers used many fusion strategies for a multimodal model wherein some have translated the modality into some uniformly feature set and then was fused into the ML model in some cases, individually model has been trained. In the final stage, their decision has been fused [5]. However, recent advancements in Deep learning on Vision and NLP have brought the concept of co-learning to the next level by introducing the cross-modality training. Thus a hybrid fusion technique in the middle of the multimodal model has been prominent, and thus they are simultaneously trained on a cross-modal module. Thus a new concept of multimodally trained as unimodally and fused and multimodally trained as multimodally and a hybrid fusion has emerged along unimodally trained early and late fusion.

5

Conclusion

On the rise of the Web, memes are regularly uploaded that need automatic censoring to hinder hate. Researchers from vision and language are inclining to VL multimodal problems of which memes classification is picking the pace. Recent ML methods performed significantly on VL data but fail on memes classification. In this context, we presented an inclusive study on memes classification, generally on the VL multimodal problems and current solutions. We further proposed a generalized framework for VL problems. We also covered the early and next-generation works on VL problems. Furthermore, we articulated several open research issues and challenges intending to guide the ML research community for further investigation. Acknowledgement. This work was supported by the Institute for Information and Communications Technology Promotion Grant through the Korea Government (MSIT) under Grant R7120-17-1007 (SIAT CCTV Cloud Platform).

References 1. Agrawal, A., Batra, D., Parikh, D.: Analyzing the behavior of visual question answering models. arXiv preprint arXiv:1606.07356 (2016) 2. Alberti, C., Ling, J., Collins, M., Reitter, D.: Fusion of detected objects in text for visual question answering. arXiv preprint arXiv:1908.05054 (2019) 3. Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Lawrence Zitnick, C., Parikh, D.: VQA: visual question answering. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2425–2433 (2015) 4. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014) 5. Baltruˇsaitis, T., Ahuja, C., Morency, L.P.: Multimodal machine learning: a survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41(2), 423–443 (2018) 6. Chen, X., Fang, H., Lin, T.Y., Vedantam, R., Gupta, S., Doll´ ar, P., Zitnick, C.L.: Microsoft coco captions: data collection and evaluation server. arXiv preprint arXiv:1504.00325 (2015)

A Multimodal Memes Classification: A Survey and Open Research Issues

1463

7. Chen, Y.C., Li, L., Yu, L., Kholy, A.E., Ahmed, F., Gan, Z., Cheng, Y., Liu, J.: Uniter: learning universal image-text representations. arXiv preprint arXiv:1909.11740 (2019) 8. Cho, K., Van Merri¨enboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014) 9. Dai, A.M., Le, Q.V.: Semi-supervised sequence learning. In: Advances in Neural Information Processing Systems, pp. 3079–3087 (2015) 10. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009) 11. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018) 12. Devlin, J., Gupta, S., Girshick, R., Mitchell, M., Zitnick, C.L.: Exploring nearest neighbor approaches for image captioning. arXiv preprint arXiv:1505.04467 (2015) 13. Fortuna, P., Nunes, S.: A survey on automatic detection of hate speech in text. ACM Comput. Surv. (CSUR) 51(4), 1–30 (2018) 14. Fukui, A., Park, D.H., Yang, D., Rohrbach, A., Darrell, T., Rohrbach, M.: Multimodal compact bilinear pooling for visual question answering and visual grounding. arXiv preprint arXiv:1606.01847 (2016) 15. Gan, Z., Chen, Y.C., Li, L., Zhu, C., Cheng, Y., Liu, J.: Large-scale adversarial training for vision-and-language representation learning. arXiv preprint arXiv:2006.06195 (2020) 16. Gao, D., Jin, L., Chen, B., Qiu, M., Li, P., Wei, Y., Hu, Y., Wang, H.: FashionBERT: text and image matching with adaptive loss for cross-modal retrieval. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2251–2260 (2020) 17. Gomez, R., Gibert, J., Gomez, L., Karatzas, D.: Exploring hate speech detection in multimodal publications. In: The IEEE Winter Conference on Applications of Computer Vision, pp. 1470–1478 (2020) 18. Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., Parikh, D.: Making the V in VQA matter: elevating the role of image understanding in visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6904–6913 (2017) 19. Gurari, D., Zhao, Y., Zhang, M., Bhattacharya, N.: Captioning images taken by people who are blind. arXiv preprint arXiv:2002.08565 (2020) 20. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) 21. Hossain, M.Z., Sohel, F., Shiratuddin, M.F., Laga, H.: A comprehensive survey of deep learning for image captioning. ACM Comput. Surv. (CSUR) 51(6), 1–36 (2019) 22. Howard, J., Ruder, S.: Universal language model fine-tuning for text classification. arXiv preprint arXiv:1801.06146 (2018) 23. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017) 24. Huang, Z., Zeng, Z., Liu, B., Fu, D., Fu, J.: Pixel-BERT: aligning image pixels with text by deep multi-modal transformers. arXiv preprint arXiv:2004.00849 (2020)

1464

T. H. Afridi et al.

25. Kafle, K., Shrestha, R., Kanan, C.: Challenges and prospects in vision and language research. arXiv preprint arXiv:1904.09317 (2019) 26. Kiela, D., Firooz, H., Mohan, A., Goswami, V., Singh, A., Ringshia, P., Testuggine, D.: The hateful memes challenge: detecting hate speech in multimodal memes. arXiv preprint arXiv:2005.04790 (2020) 27. Kortli, Y., Jridi, M., Falou, A.A., Atri, M.: A comparative study of CFs, LBP, HOG, SIFT, SURF, and BRIEF techniques for face recognition. In: Alam, M.S. (ed.) Pattern Recognition and Tracking XXIX. vol. 10649, pp. 184–190. International Society for Optics and Photonics, SPIE (2018). https://doi.org/10.1117/12. 2309454 28. Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., Mendu, S., Barnes, L., Brown, D.: Text classification algorithms: a survey. Information 10(4), 150 (2019) 29. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012) 30. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: a lite BERT for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019) 31. Li, G., Duan, N., Fang, Y., Gong, M., Jiang, D., Zhou, M.: Unicoder-VL: a universal encoder for vision and language by cross-modal pre-training. In: AAAI, pp. 11336– 11344 (2020) 32. Li, L., Chen, Y.C., Cheng, Y., Gan, Z., Yu, L., Liu, J.: Hero: hierarchical encoder for video+ language omni-representation pre-training. arXiv preprint arXiv:2005.00200 (2020) 33. Li, L.H., Yatskar, M., Yin, D., Hsieh, C.J., Chang, K.W.: VisualBERT: a simple and performant baseline for vision and language. arXiv preprint arXiv:1908.03557 (2019) 34. Lin, J., Yang, A., Zhang, Y., Liu, J., Zhou, J., Yang, H.: InterBERT: vision-andlanguage interaction for multi-modal pretraining. arXiv preprint arXiv:2003.13198 (2020) 35. Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., Pietik¨ ainen, M.: Deep learning for generic object detection: a survey. Int. J. Comput. Vis. 128(2), 261–318 (2020) 36. Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019) 37. Lu, J., Batra, D., Parikh, D., Lee, S.: VilBERT: pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. In: Advances in Neural Information Processing Systems, pp. 13–23 (2019) 38. Lu, J., Goswami, V., Rohrbach, M., Parikh, D., Lee, S.: 12-in-1: multi-task vision and language representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10437–10446 (2020) 39. Luo, H., Ji, L., Shi, B., Huang, H., Duan, N., Li, T., Chen, X., Zhou, M.: UniViLM: a unified video and language pre-training model for multimodal understanding and generation. arXiv preprint arXiv:2002.06353 (2020) 40. Mogadala, A., Kalimuthu, M., Klakow, D.: Trends in integration of vision and language research: a survey of tasks, datasets, and methods. arXiv preprint arXiv:1907.09358 (2019) 41. Mozafari, M., Farahbakhsh, R., Crespi, N.: A BERT-based transfer learning approach for hate speech detection in online social media. In: International Conference on Complex Networks and Their Applications, pp. 928–940. Springer (2019)

A Multimodal Memes Classification: A Survey and Open Research Issues

1465

42. Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettlemoyer, L.: Deep contextualized word representations. arXiv preprint arXiv:1802.05365 (2018) 43. Plummer, B.A., Wang, L., Cervantes, C.M., Caicedo, J.C., Hockenmaier, J., Lazebnik, S.: Flickr30k entities: collecting region-to-phrase correspondences for richer image-to-sentence models. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2641–2649 (2015) 44. Qi, D., Su, L., Song, J., Cui, E., Bharti, T., Sacheti, A.: ImageBERT: cross-modal pre-training with large-scale weak-supervised image-text data. arXiv preprint arXiv:2001.07966 (2020) 45. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019) 46. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015) 47. Schmidt, A., Wiegand, M.: A survey on hate speech detection using natural language processing. In: Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, pp. 1–10 (2017) 48. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) 49. Sohn, H., Lee, H.: MC-BERT4HATE: hate speech detection using multi-channel BERT for different languages and translations. In: 2019 International Conference on Data Mining Workshops (ICDMW), pp. 551–559. IEEE (2019) 50. Srivastava, Y., Murali, V., Dubey, S.R., Mukherjee, S.: Visual question answering using deep learning: a survey and performance analysis. arXiv preprint arXiv:1909.01860 (2019) 51. Su, W., Zhu, X., Cao, Y., Li, B., Lu, L., Wei, F., Dai, J.: VL-BERT: pre-training of generic visual-linguistic representations. arXiv preprint arXiv:1908.08530 (2019) 52. Suhr, A., Zhou, S., Zhang, A., Zhang, I., Bai, H., Artzi, Y.: A corpus for reasoning about natural language grounded in photographs. arXiv preprint arXiv:1811.00491 (2018) 53. Sun, C., Baradel, F., Murphy, K., Schmid, C.: Learning video representations using contrastive bidirectional transformer. arXiv preprint arXiv:1906.05743 (2019) 54. Sun, C., Myers, A., Vondrick, C., Murphy, K., Schmid, C.: VideoBERT: a joint model for video and language representation learning. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 7464–7473 (2019) 55. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015) 56. Tan, H., Bansal, M.: LXMERT: learning cross-modality encoder representations from transformers. arXiv preprint arXiv:1908.07490 (2019) 57. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L  ., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017) 58. Wang, T., Huang, J., Zhang, H., Sun, Q.: Visual commonsense R-CNN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10760–10770 (2020) 59. Wang, Y., Joty, S., Lyu, M.R., King, I., Xiong, C., Hoi, S.C.: VD-BERT: a unified vision and dialog transformer with BERT. arXiv preprint arXiv:2004.13278 (2020)

1466

T. H. Afridi et al.

60. Zellers, R., Bisk, Y., Farhadi, A., Choi, Y.: From recognition to cognition: visual commonsense reasoning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6720–6731 (2019) 61. Zhou, B., Tian, Y., Sukhbaatar, S., Szlam, A., Fergus, R.: Simple baseline for visual question answering. arXiv preprint arXiv:1512.02167 (2015) 62. Zhou, L., Palangi, H., Zhang, L., Hu, H., Corso, J.J., Gao, J.: Unified visionlanguage pre-training for image captioning and VQA. In: AAAI, pp. 13041–13049 (2020) 63. Zhu, L., Yang, Y.: ActBERT: learning global-local video-text representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8746–8755 (2020)

Fireworks Algorithm for Solving the Fixed-Spectrum Frequency Assignment Mohamed El Bouti1(B) , Raouan El Ghazi1 , Lamia Benameur2 , and Alami Chentoufi Jihane1 1 Faculty of Science, IBN TOFAIL University, BP 133, Kenitra, Morocco [email protected], [email protected], [email protected] 2 Faculty of Science, Abdelmalek Essaadi University, Tetouan, Morocco [email protected]

Abstract. In this paper a new approach based on the fireworks algorithm (FWA) is proposed to solve the frequency assignment problem (FAP), the FAP is well known NP-complete optimization problem, the objective of the FAP is to minimize the number of interference generated by a solution, fireworks algorithm has been performed on several instances of the FAP. The obtained results show that this algorithm can be very useful to solve more complex benchmark problems studied in the literature. Keywords: Frequency assignment · Fireworks algorithm · Meta-heuristic · Swarm intelligence · Optimization problem · Frequency assignment problem · Hamming distance

1 Introduction Wireless communication systems have grown rapidly in recent years through the development of wireless telephone networks, satellite communications, wireless LANs and military operations. Which appears a frequency assignment problem. Frequency assignment problems (FAP) have already been investigated by many authors; first appeared in the 1960s Metzger [9]. Also, Hale [6] presentation about frequency planning problems of this time and more particularly on modeling the problems, whose objective was to find an assignment that minimizes the total number of violations in an assignment. Currently, it is a question of finding acceptable solutions by minimizing the overall level of interference of affected frequencies. The FAP is closely associated with a well-known generalization the coloring of graphs, most of the research has focused on theories and algorithms for coloring graphs. The relation of FAP with graph coloring problem was introduced by Hale, giving new results, studied by Roberts (1991). This survey is pursued with Eisenblätter et al. [10] (2002). Give an overview of the evolution of frequency planning from graph coloring and its generalizations to the models used nowadays, with an emphasis on the GSM practice. Many different methods have been designed to solve the different variants of FAP but their solution still far from being satisfactory. That © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Ben Ahmed et al. (Eds.): SCA 2020, LNNS 183, pp. 1467–1478, 2021. https://doi.org/10.1007/978-3-030-66840-2_110

1468

M. El Bouti et al.

has been tied to the fact that frequency assignment which belongs to the class so-called NP-complete combinatorial optimization problems. Generally, NP-complete problems are assumed to need mainly time-consuming algorithms for their exact solution. A great deal of research focused their attention on this area. The objectives of this article are, the presentation and formulation of the problem, and other share the setting up of a perspective of resolution by Fireworks Algorithm (FWA), the new swarm intelligence (SI) algorithm inspired by real fireworks exploding and illuminating the night sky, the fireworks algorithm (FWA) was proposed by Y. Tan and Y. Zhu, in 2010 as an optimization technique. The principle of the implementation is depicted as follows: In initialization, FWA generates N fireworks randomly at N different locations, for evaluating their quality (fitness) and to determine the explosion amplitude and the number of sparks for each firework. After that, some sparks are produced by the Gaussian operator in the feasible region; and the mapping rule will map the new generated sparks into the feasible region. Finally, select the sparks for the next generation. Fireworks algorithm runs iteratively until it reaches the termination conditions.

2 Frequency Assignment Problem The frequency assignment problem has already been investigated by many authors, but their solution still far from being satisfactory. This is mainly due to the fact that frequency assignment belongs to the class of so-called NP-complete combinatorial optimization problems. NP-complete problems are generally supposed to require extremely timeconsuming algorithms for their exact solution. A routine application of these algorithms in the design of large systems seems to be out of the question. It is therefore necessary to use more time efficient algorithms which, however, cannot guarantee optimal solutions. Thus the assignments generated by them may require more frequencies than an optimal assignment. To our present knowledge it cannot even be excluded that any of the known approximate algorithms overestimates the number of frequencies by more than 100% in certain cases. An even worse feature of these approximate algorithms is that most of them do not supply any information about how far away their results are from the optimum. Even if by some lucky coincidence an optimal assignment has been found, this is usually not recognized. Frequency assignment problem (FAP) consists of establishing a two-way link between two remotes sites by assigning a frequency to each channel in a given frequency spectrum, by satisfy the network with duplex constraints, co-site constraints and blocking constraints Let us define X = {x1 , x2 … xn }, a System of n cells is represented by n vectors and the available channels are numbered 1… m. A matrix n * n represents the set of constraints to be respected called compatibility matrix, C = (Cij ), i = 1, .., n n et j = 1,…, n where Cii defines the minimal distance between frequencies assigned; the Co-Site Constraint (CSC). The elements Cij then represent the Adjacent Channel Constraint (ACC): the adjacent frequencies in the frequency domain cannot be assigned to adjacent cells simultaneously or Co-Channel Constraint (CCC): for a certain pair of cells, the same frequency can’t be used simultaneously.

Fireworks Algorithm for Solving the Fixed-Spectrum Frequency Assignment

1469

A triple (X, R, C) is called a frequency assignment problem where X is a cell system, and R is a requirement vector; where R = {r1 , r2 … rn } is a demand vector that describes the channel requirements for each cell; C is a compatibility matrix. We assume (X, R, C) be a frequency assignment problem and N = {1, 2… m} be a set of available channels, and let Hi be the subset of N assigned to xi |Hi| = ri , for 1 ≤ i ≤ j  h − h  ≥ cij , for all h ∈ Hi , h ∈ Hj

(1)

where 1 ≤ i ≤ n, and 1 ≤ j ≤ n, i = j

(2)

  h − h  ≥ cii , for all h, h ∈ Hi , where h = h

(3)

Where |Hi | denotes the number of channels in the set of Hi . We call such an assignment an admissible assignment. The objective of the FAP is to find an assignment that minimizes the total number of violations in an assignment. Formally, we have: Min

m  m n  n  

p(i, a) ε(i, a, j, b) p(j, b)

(4)

i=1 a=1 j=1 b=1

 And where ε(i, a, j, b)  p(i, a) =

0 (|a − b|) ≥ cij 1 otherwise

1 if the channel a is assigned to the ith cell 0 otherwise

ε (i, a, j, b) is set to 1 if the assignment of channel a to cell xi and channel b to cell xj violates the EMC.

3 Fireworks Algorithm Firework algorithm is a new swarm intelligence (SI) algorithm inspired by real fireworks exploding and illuminating the night sky, the fireworks algorithm (FWA) was proposed by Y. Tan and Y. Zhu, in 2010 as an optimization technique. The principle of the implementation is depicted as follows: In initialization, FWA generates N fireworks randomly at N different locations, for evaluating their quality (fitness) and to determine the explosion amplitude and the number of sparks for each firework. After that, some sparks are produced by the Gaussian operator in the feasible region; and the mapping rule will map the new generated sparks into the feasible region. Finally, select the sparks for the next generation. Fireworks algorithm runs iteratively until it reaches the termination conditions (Fig. 1).

1470

M. El Bouti et al.

Select n initial locations Set off n fireworks at n locations

Obtain the locations of sparks Evaluate the quality of the locations

Optimal location Found

NO END

Select n locations

Fig. 1. Framework of fireworks algorithm.

3.1 Design of Fireworks Explosion The number of sparks generated by each firework xi is defined as follows. Ymax − f (xi ) + ε i=1 (Ymax − f (xi )) + ε

Si = m. n

(5)

m is a parameter which controls the total number of sparks gained from the n fireworks, the size of the population, Ymax which considered the maximum (worst) value of the objective function among the n fireworks, and, ε indicates the smallest constant in the computer, is utilized to avoid zero-division error. From Eq. (5) if the gap between the quality of the solution and the worse one is big, this mean that this solution is good in term of quality and many sparks will be generated from this solution. From Eq. (5) we will get a real number. Therefore, Eqs. (6) will convert the real number obtained from Eq. (5) to integer number. In order to convert to integer number, show Eq. (6). ⎧ Si > am ⎨ round (a. m) Sˆ = round (b. m) if Si > bm, a